[00:08] *** mistym has joined #archiveteam [00:09] *** nertzy has quit IRC (This computer has gone to sleep) [00:45] *** primus104 has quit IRC (Leaving.) [01:02] *** godane has joined #archiveteam [01:43] *** Ymgve has quit IRC () [01:44] *** brayden has quit IRC (Read error: Connection reset by peer) [01:45] *** mistym has quit IRC (Remote host closed the connection) [01:46] *** mistym has joined #archiveteam [01:47] *** brayden has joined #archiveteam [02:04] *** garyrh has quit IRC (hub.se irc.ac.za) [02:11] *** [BNC]gary has joined #archiveteam [02:50] *** [BNC]gary is now known as garyrh [03:02] *** kyan has quit IRC (Quit: Leaving) [03:34] *** mistym has quit IRC (Remote host closed the connection) [04:29] *** aaaaaaaaa has quit IRC (Leaving) [04:35] *** mistym has joined #archiveteam [04:49] *** mistym has quit IRC (Remote host closed the connection) [04:51] *** mistym has joined #archiveteam [05:11] *** mistym has quit IRC (Remote host closed the connection) [05:11] *** mistym has joined #archiveteam [05:13] *** mistym has quit IRC (Remote host closed the connection) [05:14] *** nertzy has joined #archiveteam [05:42] *** mistym has joined #archiveteam [06:47] *** SimpBrain has joined #archiveteam [07:00] *** mistym has quit IRC (Remote host closed the connection) [07:06] *** dashcloud has quit IRC (Read error: Operation timed out) [07:07] *** primus104 has joined #archiveteam [07:09] *** dashcloud has joined #archiveteam [07:38] *** rolfb has joined #archiveteam [07:38] *** rolfb has quit IRC (Client Quit) [07:43] *** primus104 has quit IRC (Leaving.) [07:44] *** nertzy has quit IRC (This computer has gone to sleep) [07:51] *** atomotic has joined #archiveteam [08:00] *** mistym has joined #archiveteam [08:09] *** mistym has quit IRC (Read error: Operation timed out) [08:10] *** schbirid has joined #archiveteam [08:15] *** BlueMaxim has quit IRC (Ping timeout: 512 seconds) [08:26] *** RichardG_ has joined #archiveteam [08:28] *** john1 has quit IRC (Read error: Operation timed out) [08:28] *** swebb has quit IRC (Ping timeout: 255 seconds) [08:29] *** warthurto has quit IRC (Ping timeout: 255 seconds) [08:29] *** okeuday has quit IRC (Read error: Operation timed out) [08:32] *** RichardG has quit IRC (Ping timeout: 483 seconds) [08:33] *** okeuday has joined #archiveteam [08:35] *** warthurto has joined #archiveteam [08:37] *** john1 has joined #archiveteam [08:40] *** primus104 has joined #archiveteam [08:58] *** berndj has quit IRC (Excess Flood) [08:58] *** berndj has joined #archiveteam [09:31] *** berndj has quit IRC (Remote host closed the connection) [10:16] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [10:31] *** insane_al has joined #archiveteam [10:55] *** Ymgve has joined #archiveteam [11:28] *** Ctrl-S has quit IRC (Read error: Operation timed out) [11:29] *** Ctrl-S has joined #archiveteam [11:33] *** primus104 has quit IRC (Leaving.) [12:04] *** atomotic has joined #archiveteam [12:07] *** Control-S has joined #archiveteam [12:12] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [12:15] *** Ctrl-S has quit IRC (Ping timeout: 600 seconds) [12:15] *** Control-S is now known as Ctrl-S [12:53] *** sankin has joined #archiveteam [13:29] *** philpem has joined #archiveteam [13:43] *** primus104 has joined #archiveteam [13:52] *** Start has quit IRC (Disconnected.) [13:55] *** primus104 has quit IRC (Leaving.) [14:04] *** lrkj_ has joined #archiveteam [14:06] *** mistym has joined #archiveteam [14:06] *** lrkj has quit IRC (Ping timeout: 600 seconds) [14:13] *** mistym has quit IRC (Read error: Operation timed out) [14:19] *** RichardG_ has quit IRC (Remote host closed the connection) [14:22] *** scyther has joined #archiveteam [14:24] *** RichardG has joined #archiveteam [14:27] Badoom [14:28] Greetings from the Internet Archive. You all have my fullest attention. [15:06] *** Start has joined #archiveteam [15:16] *** insane_al has quit IRC (Ping timeout: 306 seconds) [15:17] *** signius has quit IRC (Ping timeout: 265 seconds) [15:22] *** mistym has joined #archiveteam [15:27] *** Start has quit IRC (Disconnected.) [15:30] *** signius has joined #archiveteam [15:37] *** nertzy has joined #archiveteam [15:41] *** Start has joined #archiveteam [15:41] *** K4k has joined #archiveteam [15:47] *** SadDM has quit IRC (Remote host closed the connection) [15:47] *** SadDM has joined #archiveteam [15:51] *** Start has quit IRC (Disconnected.) [15:56] *** aaaaaaaaa has joined #archiveteam [16:03] *** helothere has joined #archiveteam [16:03] hello there [16:04] just a question, i'm just wondering, if you guys have the bandwidth and space to receive all payloads from volunteer warriors, why not do the scraping yourself? [16:05] *** primus104 has joined #archiveteam [16:06] more warriors = less problems [16:06] helothere: banning :) [16:06] scraping takes more than just bandwidth and space - cpu and ram for example [16:06] helothere: if 200 IP's scrape, it is less obvious than when 5 do the same [16:07] easier for a warrior to scrape 1% each person. some sites get ban happy if you browse too many urls [16:08] meanwhile Kenshin does all the Baraza LOL [16:08] filippo__: I had an amnesia about the name of your employer (missing a letter) and I found it only thanks to your github page. :P [16:08] 1372GB for Kenshin, 28GB for the 2nd guy :D [16:08] he has a poll of ips [16:08] pool [16:08] filippo__: They're lucky that they hire famous people saving their online reputation! :D [16:09] SimpBrain: it still looks like a single man project mostly :P [16:09] true, warrior can run multiple times, you can just tell it that node runs a certain ip [16:10] well the pipeline script [16:10] and actually scraping takes pretty much minimum bandwidth [16:11] I see. thanks for clarification [16:11] even downloading furrafinity like right now only takes around 1-1.5MB/s downstream, which is nothing to me (150MBps internet) [16:11] so only ~10% of my downstream bandwidth is used [16:12] (and i allowed the warriot to run up to 30Mbps) [16:18] *** mistym has quit IRC (Remote host closed the connection) [16:31] *** helothere has quit IRC (Quit: Page closed) [16:38] *** mistym has joined #archiveteam [16:49] *** scyther has quit IRC (Read error: Connection reset by peer) [16:53] *** nertzy has quit IRC (This computer has gone to sleep) [17:00] *** atomotic has joined #archiveteam [17:01] http://www.theverge.com/2015/5/4/8543605/google-plus-collections-announced how long till we are going to try to save this ? :D [17:01] i give it 2 years, tops :D [17:03] I bet Pinterest will be bought first and then Google will kill both [17:03] oh dear [17:03] fair assessment [17:04] lest yahoo get their hands on it first... hue [17:04] well, at least google is killing their own projects :) [17:05] yahoo is buying others, then killing them :D [17:05] tumblr is still strong but all the policy changes has really changed it [17:06] *** rejon has quit IRC (Read error: Operation timed out) [17:10] augusztin: that's because Google doesn't *allow* competing projects to exist in the first place, nothing to buy [17:10] eh they allow it insofar as they can't keep track of their own shit [17:11] or some district wins the Google Hunger Games that year [17:12] wait a minute that'd be fucking awesome [17:12] #-bs [17:13] http://www.slate.com/articles/technology/map_of_the_week/2013/03/google_reader_joins_graveyard_of_dead_google_products.html glass graveyard hole was a bit premature it seems [18:18] *** atomotic has quit IRC (Quit: My Mac has gone to sleep. ZZZzzz…) [18:18] *** Jonimus has quit IRC (Ping timeout: 370 seconds) [18:21] *** khaoohs has quit IRC (Read error: Connection reset by peer) [18:22] *** khaoohs has joined #archiveteam [18:23] *** atomotic has joined #archiveteam [18:30] *** Jonimus has joined #archiveteam [18:58] *** scyther has joined #archiveteam [18:58] *** khaoohs has quit IRC (Read error: Connection reset by peer) [18:59] *** khaoohs has joined #archiveteam [19:07] *** primus104 has quit IRC (Read error: Operation timed out) [19:08] *** primus104 has joined #archiveteam [19:16] *** mistym has quit IRC (Remote host closed the connection) [19:20] *** primus105 has joined #archiveteam [19:27] *** primus104 has quit IRC (Read error: Operation timed out) [19:30] *** mistym has joined #archiveteam [19:39] *** Start has joined #archiveteam [19:40] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [19:53] *** SN4T14_ has joined #archiveteam [19:58] *** mistym has quit IRC (Remote host closed the connection) [19:59] *** SN4T14__ has quit IRC (Ping timeout: 369 seconds) [20:13] *** mistym has joined #archiveteam [20:22] *** Start has quit IRC (Disconnected.) [20:23] *** khaoohs has quit IRC (Read error: Connection reset by peer) [20:24] *** khaoohs has joined #archiveteam [20:27] *** schbirid has quit IRC (Quit: Leaving) [20:29] *** Start has joined #archiveteam [20:29] *** twrist has quit IRC (Ping timeout: 240 seconds) [20:29] *** twrist has joined #archiveteam [20:32] *** habi has joined #archiveteam [20:33] *** Start has quit IRC (Client Quit) [20:33] *** habi has left [20:40] *** RedType has quit IRC (Quit: leaving) [20:44] *** habi has joined #archiveteam [20:45] *** scyther has quit IRC (Leaving) [20:46] *** RedType has joined #archiveteam [20:49] *** habi has left [20:57] *** nertzy has joined #archiveteam [20:59] *** K4k has quit IRC (Quit: WeeChat 1.0.1) [21:02] *** sankin has quit IRC (Leaving.) [21:03] *** BlueMaxim has joined #archiveteam [21:11] *** Start has joined #archiveteam [21:11] *** Start has quit IRC (Client Quit) [21:18] *** SimpBrain has quit IRC (Quit: Leaving) [21:24] *** skiy has joined #archiveteam [21:40] *** joepie91_ is now known as joepie91c [21:40] *** joepie91c is now known as joepie91_ [22:25] *** skiy_ has joined #archiveteam [22:32] *** skiy has quit IRC (Read error: Operation timed out) [22:52] *** skiy_ has quit IRC (Read error: Connection reset by peer) [22:55] *** za3k has joined #archiveteam [22:56] hbrowse.com gives the go-ahead to be archived (http://www.hbrowse.com/forum/index.php?topic=3885.0). I turn out not to have space for it, but if someone else wants to archive it, plain wget should work, and I think it's not so huge. [22:58] I have a copy of ArXiV which appears to be complete but has consistent, mismatching checksums from the official manifest. If anyone wants to help me diagnose why I'd appreciate it; if anyone wants a copy of what I have, contact me. [22:59] Someone said I should post this here: https://github.com/mispy/twitter_ebooks. It has an 'archive' subcommand which grabs a twitter feed, archiving raw JSON as far back as the API allows. It handles updates in a bandwidth-efficient way. [23:02] I didn't write twitter_ebooks. I have a short how-to for transforming the JSON to plaintext here: https://blog.za3k.com/archiving-twitter/ [23:03] *** za3k has quit IRC (Quit: Page closed) [23:16] *** philpem has quit IRC (Ping timeout: 252 seconds) [23:23] *** nertzy has quit IRC (This computer has gone to sleep) [23:27] *** slash` has joined #archiveteam [23:30] *** robv has joined #archiveteam [23:35] *** robv has quit IRC (Quit: [LINE TERMINATED]) [23:56] *** Start has joined #archiveteam [23:56] *** Start has quit IRC (Remote host closed the connection) [23:57] *** Start has joined #archiveteam