#archiveteam-bs 2016-04-20,Wed

↑back Search

Time Nickname Message
00:03 🔗 tomwsmf-a has joined #archiveteam-bs
00:09 🔗 Start has joined #archiveteam-bs
00:53 🔗 tomwsmf-a has quit IRC (Read error: Operation timed out)
01:29 🔗 tomwsmf-a has joined #archiveteam-bs
02:09 🔗 vitzli has joined #archiveteam-bs
02:10 🔗 tomwsmf-a has quit IRC (Read error: Operation timed out)
02:16 🔗 godane http://www.wsj.com/articles/the-librarian-who-saved-timbuktus-cultural-treasures-from-al-qaeda-1460729998
02:29 🔗 Yoshimura terrorists are governments.
02:30 🔗 Yoshimura Those are dumbfcks, destroying what they can.
02:32 🔗 godane Yoshimura: btw i'm always on the roll: https://archive.org/details/@chris85
02:36 🔗 godane SketchCow: some more info on SNET in cuba: https://cachivachemedia.com/de-la-comunidad-del-anillo-a-snet-las-redes-en-la-tierra-media-fe44b4a62319#.hff3ls3la
02:36 🔗 godane that website is being saved in archivebot
02:36 🔗 godane i also saved the 4 podcasts they had so far to archivebot
02:42 🔗 Yoshimura godane: While English sounds goood, this makes interesting chimes as the day changes.
02:42 🔗 Yoshimura http://listen.hatnote.com/#no,uk,fr,sv,he,as,pa,ml,or,pl,sr,fi,de,ja,nl,ar,id,hi,te,mr,sa,mk,bg,ru,es,it,fa,zh,bn,ta,kn,gu,be,hu
02:48 🔗 bwn has joined #archiveteam-bs
02:58 🔗 bwn_ has quit IRC (Read error: Operation timed out)
03:55 🔗 JesseW thanks for the reminder of listen.hatnote
03:58 🔗 * Yoshimura cheers.
03:59 🔗 Yoshimura Btw, I did some re-evaluation on the Danish social site, and it seems there is much less data. A lot deleted photos and stuff.
04:00 🔗 Yoshimura But still a lot of links to check, or to go through or just regular crawl with smart exctractor, but there is about one month remaining, I am currently running a fetch on the newest 300k(minus holes) images. Just to get their links. They have throttling for abusive bots, but have very very fast servers.
04:01 🔗 Yoshimura So warrior should work really great in this case, but someone maybe said it would not be warrior.
04:02 🔗 Yoshimura The regular throttling of archivebot would work, but not enough time to get millions of links and stuff.
04:03 🔗 JesseW thank you for looking into it
04:04 🔗 Yoshimura Aww. You so kind.
04:05 🔗 JesseW eh, it's work that needs doing.
04:09 🔗 Zebranky has quit IRC (Read error: Operation timed out)
04:10 🔗 Kaz has quit IRC (Read error: Operation timed out)
04:10 🔗 sigkell has quit IRC (Ping timeout: 260 seconds)
04:10 🔗 Yoshimura JesseW: Yeah, work too, I have no idea how can I help more though. Related on main channel. ... Btw, running !ao only pipeline seems as best start for me and the pipes it seems. Would like to. Who can provide me with credentials for ssh to the tracker?
04:10 🔗 sigkell has joined #archiveteam-bs
04:11 🔗 bauruine has quit IRC (Ping timeout: 260 seconds)
04:11 🔗 Yoshimura (It does not say what it is or what's it for, the readme, but apparent)
04:13 🔗 Kazzy has quit IRC (Ping timeout: 260 seconds)
04:13 🔗 Yoshimura Got machine next to 3x10Gbit backbone, the machine itself 100Mbit, so it is sad to watch it do 1Mbit/s xD
04:17 🔗 JesseW heh; one thing you can do independently is run wpull jobs yourself, and upload them to IA. They won't (currently) go into the Wayback Machine, but it's a good way to grab stuff that needs saving.
04:17 🔗 Yoshimura Yeah, but sounds dumb.
04:17 🔗 JesseW clarify?
04:17 🔗 Yoshimura Well, I can run a node, I do not fear anything running AO only pipeline.
04:18 🔗 Yoshimura With regular I would fear running out of space.
04:18 🔗 Kazzy has joined #archiveteam-bs
04:19 🔗 JesseW ah, I was sugesting running *smaller*, manually selected wpull jobs
04:19 🔗 Yoshimura So having ao only node to learn plus contribute seems like fine start. And yes, I can run crawls, I would maybe modify wpull though. I for example do not understand why wget saves crap temp file just to delete it.
04:19 🔗 Yoshimura -O ... .tmp, instead -O - > /dev/null
04:19 🔗 Yoshimura Did not figure out if its used for anything ever. And not sure if wpull does the same stupid thing or not.
04:20 🔗 * JesseW is not familiar with that detail
04:20 🔗 JesseW has left
04:20 🔗 Yoshimura There is --delete-after, but sounds like that removes it after whole crawl, not each file.
04:21 🔗 yipdw_ wget's main use case is saving responses in files; the --output-document option isn't that relevant in WARC mode, but data is still cached there for recording
04:21 🔗 yipdw_ wpull also generates temporary request and response buffers and you'll have to ask chfoo about that one
04:22 🔗 yipdw_ for WARC, you need an entire response before you can write the record, anyway
04:22 🔗 yipdw_ so the buffering strategy is not that stupid
04:22 🔗 Yoshimura yipdw_: Yeah, I know its not relevant in wget, that is why I wondered why it does still save it.
04:23 🔗 yipdw_ it's an implementation detail of the WARC writer
04:23 🔗 Yoshimura Buffering the single response yeah, that is fine, but whole crawl (several GB) does not sound right.
04:23 🔗 yipdw_ --truncate-output may be useful
04:23 🔗 Yoshimura Have to look more in depth how large files the warrior saves.
04:24 🔗 yipdw_ depends on the project
04:24 🔗 Yoshimura But it is tiring..
04:24 🔗 Yoshimura I meant the temporary files, not the warcs.
04:24 🔗 Yoshimura If its single request or all of them.
04:25 🔗 Yoshimura yipdw_: You can grant me user account for archive bot, btw?
04:25 🔗 Yoshimura Or someone else is it?
04:25 🔗 yipdw_ pipelines connect via SSH keys
04:26 🔗 Yoshimura Yeah, that is what I meant.
04:26 🔗 yipdw_ they're really just read/writing a Redis databas
04:26 🔗 yipdw_ e
04:26 🔗 yipdw_ and yes I have access to add those
04:26 🔗 Yoshimura I realize its tunnel to redis :
04:26 🔗 bauruine has joined #archiveteam-bs
04:26 🔗 Yoshimura Would you do that please? Would like to start soon, but small running !ao only pipeline.
04:27 🔗 yipdw_ I can do it once I have your public key
04:27 🔗 Yoshimura Or if there is other way to first test, I can, but followed the install.md. Wonderful, how should I share it?
04:27 🔗 Zebranky has joined #archiveteam-bs
04:27 🔗 yipdw_ anything is fine
04:28 🔗 yipdw_ you can test a pipeline without registering with the #archivebot instance
04:28 🔗 yipdw_ unfortunately that requires backend setup, which I never simplified or automated
04:29 🔗 yipdw_ move this to #archivebot
04:34 🔗 Yoshimura has quit IRC (http://www.kiwiirc.com/ - A hand crafted IRC client)
04:35 🔗 Kaz has joined #archiveteam-bs
04:43 🔗 vitzli has quit IRC (Quit: Leaving)
04:48 🔗 Yoshimura has joined #archiveteam-bs
04:52 🔗 Sk1d has quit IRC (Ping timeout: 194 seconds)
04:59 🔗 Sk1d has joined #archiveteam-bs
05:27 🔗 JesseW has joined #archiveteam-bs
05:37 🔗 JesseW Arghghgh -- finished practicing "er-" in Dutch...
05:42 🔗 godane we are finally at 20k videos with funny or die archive
05:45 🔗 vitzli has joined #archiveteam-bs
05:49 🔗 bwn_ has joined #archiveteam-bs
05:49 🔗 bwn has quit IRC (Read error: Connection reset by peer)
05:49 🔗 DFJustin the clown is everywhere https://i.imgur.com/GTP7Z9G.jpg
06:17 🔗 JesseW has quit IRC (Ping timeout: 370 seconds)
06:34 🔗 bwn_ has quit IRC (Read error: Connection reset by peer)
06:58 🔗 Medowar has joined #archiveteam-bs
07:01 🔗 bwn_ has joined #archiveteam-bs
07:04 🔗 schbirid has joined #archiveteam-bs
07:18 🔗 metalcamp has joined #archiveteam-bs
07:24 🔗 logchfoo2 starts logging #archiveteam-bs at Wed Apr 20 07:24:11 2016
07:24 🔗 logchfoo2 has joined #archiveteam-bs
08:56 🔗 vitzli has quit IRC (Quit: Leaving)
09:42 🔗 SketchCow Grabbing every single thing out of http://www.mixtapetorrent.com/
09:42 🔗 SketchCow What could go wrong.
09:44 🔗 schbirid yo bytes might go down da drizzle
09:48 🔗 SketchCow My torrent ripper is definitely going to have a field day.
09:48 🔗 SketchCow There's 1,123 pages, each with 3-5 torrents on them
10:39 🔗 brayden has joined #archiveteam-bs
10:39 🔗 swebb sets mode: +o brayden
10:43 🔗 brayden_ has quit IRC (Read error: Operation timed out)
10:57 🔗 VADemon has joined #archiveteam-bs
11:08 🔗 RichardG has quit IRC (Read error: Operation timed out)
11:23 🔗 RichardG has joined #archiveteam-bs
11:23 🔗 godane SketchCow: you may want to move this one to one of the gaming collection instead of having it in Ephemeral VHS : https://archive.org/details/Bethesda_2015_E3_Showcase
11:27 🔗 godane i'm starting to upload DTIC Archive stuff again: https://archive.org/details/DTIC_ADA036301
12:39 🔗 RichardG has quit IRC (Read error: Operation timed out)
12:39 🔗 RichardG has joined #archiveteam-bs
13:17 🔗 BlueMaxim has quit IRC (Quit: Leaving)
13:29 🔗 ErkDog has quit IRC (Remote host closed the connection)
13:30 🔗 ErkDog has joined #archiveteam-bs
13:37 🔗 beardicus has quit IRC (Read error: Operation timed out)
13:38 🔗 beardicus has joined #archiveteam-bs
14:02 🔗 pwnsrv has joined #archiveteam-bs
14:04 🔗 pwnsrv_ has quit IRC (Ping timeout: 250 seconds)
14:12 🔗 ErkDog has quit IRC (Remote host closed the connection)
14:13 🔗 ErkDog has joined #archiveteam-bs
14:43 🔗 Yoshimura has quit IRC (http://www.kiwiirc.com/ - A hand crafted IRC client)
14:44 🔗 Yoshimura has joined #archiveteam-bs
14:45 🔗 Start has quit IRC (Quit: Disconnected.)
14:46 🔗 Yoshimura has quit IRC (Client Quit)
14:48 🔗 Yoshimura has joined #archiveteam-bs
14:50 🔗 JesseW has joined #archiveteam-bs
15:15 🔗 JesseW has quit IRC (Ping timeout: 370 seconds)
15:38 🔗 Honno has joined #archiveteam-bs
15:45 🔗 Start has joined #archiveteam-bs
16:06 🔗 Start has quit IRC (Quit: Disconnected.)
16:51 🔗 atrocity has quit IRC ()
17:03 🔗 Start has joined #archiveteam-bs
17:27 🔗 Start has quit IRC (Quit: Disconnected.)
18:20 🔗 BnA-Rob1n has quit IRC (Ping timeout: 244 seconds)
18:21 🔗 metalcamp has quit IRC (Ping timeout: 244 seconds)
18:24 🔗 BnA-Rob1n has joined #archiveteam-bs
18:34 🔗 BnA-Rob1n has quit IRC (Ping timeout: 244 seconds)
18:36 🔗 BnA-Rob1n has joined #archiveteam-bs
18:44 🔗 useretail has quit IRC (Ping timeout: 244 seconds)
18:45 🔗 ring has quit IRC (Read error: Operation timed out)
18:45 🔗 ring has joined #archiveteam-bs
18:52 🔗 useretail has joined #archiveteam-bs
18:56 🔗 bwn_ has quit IRC (Read error: Operation timed out)
19:02 🔗 metalcamp has joined #archiveteam-bs
19:13 🔗 BnA-Rob1n has quit IRC (Ping timeout: 244 seconds)
19:15 🔗 metalcamp has quit IRC (Ping timeout: 244 seconds)
19:17 🔗 bwn_ has joined #archiveteam-bs
19:34 🔗 BnA-Rob1n has joined #archiveteam-bs
20:08 🔗 Medowar has quit IRC (Quit: Connection closed for inactivity)
20:30 🔗 tomwsmf-a has joined #archiveteam-bs
20:31 🔗 Stiletto has quit IRC (Read error: Operation timed out)
20:34 🔗 VADemon has quit IRC (left4dead)
20:36 🔗 chfoo has quit IRC (Read error: Operation timed out)
20:43 🔗 chfoo has joined #archiveteam-bs
20:46 🔗 Medowar has joined #archiveteam-bs
20:53 🔗 tomwsmf-a has quit IRC (Ping timeout: 258 seconds)
20:55 🔗 ErkDog has quit IRC (Quit: ECAN Solutions)
20:58 🔗 ErkDog has joined #archiveteam-bs
21:18 🔗 Stiletto has joined #archiveteam-bs
21:24 🔗 Start has joined #archiveteam-bs
21:24 🔗 schbirid has quit IRC (Quit: Leaving)
21:36 🔗 atrocity has joined #archiveteam-bs
21:51 🔗 tomwsmf-a has joined #archiveteam-bs
22:08 🔗 tomwsmf-a has quit IRC (Read error: Operation timed out)
22:15 🔗 Honno has quit IRC (Read error: Operation timed out)
22:26 🔗 tomwsmf-a has joined #archiveteam-bs
22:53 🔗 RichardG has quit IRC (Ping timeout: 260 seconds)
23:08 🔗 Medowar has quit IRC (Quit: Connection closed for inactivity)
23:11 🔗 tomwsmf-a has quit IRC (Ping timeout: 261 seconds)
23:30 🔗 atrocity wednesday tom...what???
23:30 🔗 atrocity today is wednesday? wtf!
23:30 🔗 atrocity i was working on my weekly wednesday youtube video (lootcrate!) and just realized that's today
23:32 🔗 xmc oooops
23:33 🔗 atrocity yeah, i'm seriously missing a day in my head from the last week. at least i skipped a dya of work, lol
23:43 🔗 SketchCow 17,000 torrents
23:45 🔗 atrocity do trackers even allow you to connect to that many at once? lol
23:45 🔗 Frogging yes
23:47 🔗 atrocity would a consumer router allow you to have that many routes to track? lol
23:49 🔗 Yoshimura atrocity: Tracker connection is not a prob, and number of torrents do not affect that. Unless you run all at the same time, of course. Good consumser grade do 4k since 2005/8, today likely much more with or without custom firmware and with enough power.
23:53 🔗 atrocity i had to kill yuku. it was in an endless loop for the past few hours and hit over 100k files on 2 sessions
23:53 🔗 Yoshimura few?
23:53 🔗 Yoshimura Mine did loop two days.
23:54 🔗 atrocity yeah, i've been waiting to reboot, and just checking in over and over hoping it would get it
23:54 🔗 atrocity didn't, so now i'm waiting on this 1GB video to upload
23:55 🔗 Yoshimura it will take up space, until you remove manually or reinit disk btw.
23:56 🔗 Yoshimura Python syntax is breaking my mind. *puke*
23:57 🔗 atrocity eh, that's not a problem, but i have to reboot, lol!
23:57 🔗 atrocity brb
23:57 🔗 atrocity has quit IRC ()

irclogger-viewer