#archiveteam 2015-05-04,Mon

↑back Search

Time Nickname Message
00:08 🔗 mistym has joined #archiveteam
00:09 🔗 nertzy has quit IRC (This computer has gone to sleep)
00:45 🔗 primus104 has quit IRC (Leaving.)
01:02 🔗 godane has joined #archiveteam
01:43 🔗 Ymgve has quit IRC ()
01:44 🔗 brayden has quit IRC (Read error: Connection reset by peer)
01:45 🔗 mistym has quit IRC (Remote host closed the connection)
01:46 🔗 mistym has joined #archiveteam
01:47 🔗 brayden has joined #archiveteam
02:04 🔗 garyrh has quit IRC (hub.se irc.ac.za)
02:11 🔗 [BNC]gary has joined #archiveteam
02:50 🔗 [BNC]gary is now known as garyrh
03:02 🔗 kyan has quit IRC (Quit: Leaving)
03:34 🔗 mistym has quit IRC (Remote host closed the connection)
04:29 🔗 aaaaaaaaa has quit IRC (Leaving)
04:35 🔗 mistym has joined #archiveteam
04:49 🔗 mistym has quit IRC (Remote host closed the connection)
04:51 🔗 mistym has joined #archiveteam
05:11 🔗 mistym has quit IRC (Remote host closed the connection)
05:11 🔗 mistym has joined #archiveteam
05:13 🔗 mistym has quit IRC (Remote host closed the connection)
05:14 🔗 nertzy has joined #archiveteam
05:42 🔗 mistym has joined #archiveteam
06:47 🔗 SimpBrain has joined #archiveteam
07:00 🔗 mistym has quit IRC (Remote host closed the connection)
07:06 🔗 dashcloud has quit IRC (Read error: Operation timed out)
07:07 🔗 primus104 has joined #archiveteam
07:09 🔗 dashcloud has joined #archiveteam
07:38 🔗 rolfb has joined #archiveteam
07:38 🔗 rolfb has quit IRC (Client Quit)
07:43 🔗 primus104 has quit IRC (Leaving.)
07:44 🔗 nertzy has quit IRC (This computer has gone to sleep)
07:51 🔗 atomotic has joined #archiveteam
08:00 🔗 mistym has joined #archiveteam
08:09 🔗 mistym has quit IRC (Read error: Operation timed out)
08:10 🔗 schbirid has joined #archiveteam
08:15 🔗 BlueMaxim has quit IRC (Ping timeout: 512 seconds)
08:26 🔗 RichardG_ has joined #archiveteam
08:28 🔗 john1 has quit IRC (Read error: Operation timed out)
08:28 🔗 swebb has quit IRC (Ping timeout: 255 seconds)
08:29 🔗 warthurto has quit IRC (Ping timeout: 255 seconds)
08:29 🔗 okeuday has quit IRC (Read error: Operation timed out)
08:32 🔗 RichardG has quit IRC (Ping timeout: 483 seconds)
08:33 🔗 okeuday has joined #archiveteam
08:35 🔗 warthurto has joined #archiveteam
08:37 🔗 john1 has joined #archiveteam
08:40 🔗 primus104 has joined #archiveteam
08:58 🔗 berndj has quit IRC (Excess Flood)
08:58 🔗 berndj has joined #archiveteam
09:31 🔗 berndj has quit IRC (Remote host closed the connection)
10:16 🔗 atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com)
10:31 🔗 insane_al has joined #archiveteam
10:55 🔗 Ymgve has joined #archiveteam
11:28 🔗 Ctrl-S has quit IRC (Read error: Operation timed out)
11:29 🔗 Ctrl-S has joined #archiveteam
11:33 🔗 primus104 has quit IRC (Leaving.)
12:04 🔗 atomotic has joined #archiveteam
12:07 🔗 Control-S has joined #archiveteam
12:12 🔗 atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com)
12:15 🔗 Ctrl-S has quit IRC (Ping timeout: 600 seconds)
12:15 🔗 Control-S is now known as Ctrl-S
12:53 🔗 sankin has joined #archiveteam
13:29 🔗 philpem has joined #archiveteam
13:43 🔗 primus104 has joined #archiveteam
13:52 🔗 Start has quit IRC (Disconnected.)
13:55 🔗 primus104 has quit IRC (Leaving.)
14:04 🔗 lrkj_ has joined #archiveteam
14:06 🔗 mistym has joined #archiveteam
14:06 🔗 lrkj has quit IRC (Ping timeout: 600 seconds)
14:13 🔗 mistym has quit IRC (Read error: Operation timed out)
14:19 🔗 RichardG_ has quit IRC (Remote host closed the connection)
14:22 🔗 scyther has joined #archiveteam
14:24 🔗 RichardG has joined #archiveteam
14:27 🔗 SketchCow Badoom
14:28 🔗 SketchCow Greetings from the Internet Archive. You all have my fullest attention.
15:06 🔗 Start has joined #archiveteam
15:16 🔗 insane_al has quit IRC (Ping timeout: 306 seconds)
15:17 🔗 signius has quit IRC (Ping timeout: 265 seconds)
15:22 🔗 mistym has joined #archiveteam
15:27 🔗 Start has quit IRC (Disconnected.)
15:30 🔗 signius has joined #archiveteam
15:37 🔗 nertzy has joined #archiveteam
15:41 🔗 Start has joined #archiveteam
15:41 🔗 K4k has joined #archiveteam
15:47 🔗 SadDM has quit IRC (Remote host closed the connection)
15:47 🔗 SadDM has joined #archiveteam
15:51 🔗 Start has quit IRC (Disconnected.)
15:56 🔗 aaaaaaaaa has joined #archiveteam
16:03 🔗 helothere has joined #archiveteam
16:03 🔗 helothere hello there
16:04 🔗 helothere just a question, i'm just wondering, if you guys have the bandwidth and space to receive all payloads from volunteer warriors, why not do the scraping yourself?
16:05 🔗 primus104 has joined #archiveteam
16:06 🔗 SimpBrain more warriors = less problems
16:06 🔗 augusztin helothere: banning :)
16:06 🔗 DFJustin scraping takes more than just bandwidth and space - cpu and ram for example
16:06 🔗 augusztin helothere: if 200 IP's scrape, it is less obvious than when 5 do the same
16:07 🔗 SimpBrain easier for a warrior to scrape 1% each person. some sites get ban happy if you browse too many urls
16:08 🔗 augusztin meanwhile Kenshin does all the Baraza LOL
16:08 🔗 Nemo_bis filippo__: I had an amnesia about the name of your employer (missing a letter) and I found it only thanks to your github page. :P
16:08 🔗 augusztin 1372GB for Kenshin, 28GB for the 2nd guy :D
16:08 🔗 SimpBrain he has a poll of ips
16:08 🔗 SimpBrain pool
16:08 🔗 Nemo_bis filippo__: They're lucky that they hire famous people saving their online reputation! :D
16:09 🔗 augusztin SimpBrain: it still looks like a single man project mostly :P
16:09 🔗 SimpBrain true, warrior can run multiple times, you can just tell it that node runs a certain ip
16:10 🔗 SimpBrain well the pipeline script
16:10 🔗 augusztin and actually scraping takes pretty much minimum bandwidth
16:11 🔗 helothere I see. thanks for clarification
16:11 🔗 augusztin even downloading furrafinity like right now only takes around 1-1.5MB/s downstream, which is nothing to me (150MBps internet)
16:11 🔗 augusztin so only ~10% of my downstream bandwidth is used
16:12 🔗 augusztin (and i allowed the warriot to run up to 30Mbps)
16:18 🔗 mistym has quit IRC (Remote host closed the connection)
16:31 🔗 helothere has quit IRC (Quit: Page closed)
16:38 🔗 mistym has joined #archiveteam
16:49 🔗 scyther has quit IRC (Read error: Connection reset by peer)
16:53 🔗 nertzy has quit IRC (This computer has gone to sleep)
17:00 🔗 atomotic has joined #archiveteam
17:01 🔗 augusztin http://www.theverge.com/2015/5/4/8543605/google-plus-collections-announced how long till we are going to try to save this ? :D
17:01 🔗 augusztin i give it 2 years, tops :D
17:03 🔗 yipdw I bet Pinterest will be bought first and then Google will kill both
17:03 🔗 xmc oh dear
17:03 🔗 Infreq fair assessment
17:04 🔗 Infreq lest yahoo get their hands on it first... hue
17:04 🔗 augusztin well, at least google is killing their own projects :)
17:05 🔗 augusztin yahoo is buying others, then killing them :D
17:05 🔗 Infreq tumblr is still strong but all the policy changes has really changed it
17:06 🔗 rejon has quit IRC (Read error: Operation timed out)
17:10 🔗 Nemo_bis augusztin: that's because Google doesn't *allow* competing projects to exist in the first place, nothing to buy
17:10 🔗 yipdw eh they allow it insofar as they can't keep track of their own shit
17:11 🔗 yipdw or some district wins the Google Hunger Games that year
17:12 🔗 yipdw wait a minute that'd be fucking awesome
17:12 🔗 yipdw #-bs
17:13 🔗 augusztin http://www.slate.com/articles/technology/map_of_the_week/2013/03/google_reader_joins_graveyard_of_dead_google_products.html glass graveyard hole was a bit premature it seems
18:18 🔗 atomotic has quit IRC (Quit: My Mac has gone to sleep. ZZZzzz…)
18:18 🔗 Jonimus has quit IRC (Ping timeout: 370 seconds)
18:21 🔗 khaoohs has quit IRC (Read error: Connection reset by peer)
18:22 🔗 khaoohs has joined #archiveteam
18:23 🔗 atomotic has joined #archiveteam
18:30 🔗 Jonimus has joined #archiveteam
18:58 🔗 scyther has joined #archiveteam
18:58 🔗 khaoohs has quit IRC (Read error: Connection reset by peer)
18:59 🔗 khaoohs has joined #archiveteam
19:07 🔗 primus104 has quit IRC (Read error: Operation timed out)
19:08 🔗 primus104 has joined #archiveteam
19:16 🔗 mistym has quit IRC (Remote host closed the connection)
19:20 🔗 primus105 has joined #archiveteam
19:27 🔗 primus104 has quit IRC (Read error: Operation timed out)
19:30 🔗 mistym has joined #archiveteam
19:39 🔗 Start has joined #archiveteam
19:40 🔗 atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com)
19:53 🔗 SN4T14_ has joined #archiveteam
19:58 🔗 mistym has quit IRC (Remote host closed the connection)
19:59 🔗 SN4T14__ has quit IRC (Ping timeout: 369 seconds)
20:13 🔗 mistym has joined #archiveteam
20:22 🔗 Start has quit IRC (Disconnected.)
20:23 🔗 khaoohs has quit IRC (Read error: Connection reset by peer)
20:24 🔗 khaoohs has joined #archiveteam
20:27 🔗 schbirid has quit IRC (Quit: Leaving)
20:29 🔗 Start has joined #archiveteam
20:29 🔗 twrist has quit IRC (Ping timeout: 240 seconds)
20:29 🔗 twrist has joined #archiveteam
20:32 🔗 habi has joined #archiveteam
20:33 🔗 Start has quit IRC (Client Quit)
20:33 🔗 habi has left
20:40 🔗 RedType has quit IRC (Quit: leaving)
20:44 🔗 habi has joined #archiveteam
20:45 🔗 scyther has quit IRC (Leaving)
20:46 🔗 RedType has joined #archiveteam
20:49 🔗 habi has left
20:57 🔗 nertzy has joined #archiveteam
20:59 🔗 K4k has quit IRC (Quit: WeeChat 1.0.1)
21:02 🔗 sankin has quit IRC (Leaving.)
21:03 🔗 BlueMaxim has joined #archiveteam
21:11 🔗 Start has joined #archiveteam
21:11 🔗 Start has quit IRC (Client Quit)
21:18 🔗 SimpBrain has quit IRC (Quit: Leaving)
21:24 🔗 skiy has joined #archiveteam
21:40 🔗 joepie91_ is now known as joepie91c
21:40 🔗 joepie91c is now known as joepie91_
22:25 🔗 skiy_ has joined #archiveteam
22:32 🔗 skiy has quit IRC (Read error: Operation timed out)
22:52 🔗 skiy_ has quit IRC (Read error: Connection reset by peer)
22:55 🔗 za3k has joined #archiveteam
22:56 🔗 za3k hbrowse.com gives the go-ahead to be archived (http://www.hbrowse.com/forum/index.php?topic=3885.0). I turn out not to have space for it, but if someone else wants to archive it, plain wget should work, and I think it's not so huge.
22:58 🔗 za3k I have a copy of ArXiV which appears to be complete but has consistent, mismatching checksums from the official manifest. If anyone wants to help me diagnose why I'd appreciate it; if anyone wants a copy of what I have, contact me.
22:59 🔗 za3k Someone said I should post this here: https://github.com/mispy/twitter_ebooks. It has an 'archive' subcommand which grabs a twitter feed, archiving raw JSON as far back as the API allows. It handles updates in a bandwidth-efficient way.
23:02 🔗 za3k I didn't write twitter_ebooks. I have a short how-to for transforming the JSON to plaintext here: https://blog.za3k.com/archiving-twitter/
23:03 🔗 za3k has quit IRC (Quit: Page closed)
23:16 🔗 philpem has quit IRC (Ping timeout: 252 seconds)
23:23 🔗 nertzy has quit IRC (This computer has gone to sleep)
23:27 🔗 slash` has joined #archiveteam
23:30 🔗 robv has joined #archiveteam
23:35 🔗 robv has quit IRC (Quit: [LINE TERMINATED])
23:56 🔗 Start has joined #archiveteam
23:56 🔗 Start has quit IRC (Remote host closed the connection)
23:57 🔗 Start has joined #archiveteam

irclogger-viewer