[00:01] *** BlueMaxim has quit IRC (Read error: Connection reset by peer) [00:44] *** SN4T14 has quit IRC (Quit: Leaving) [00:47] ohhdemgir: I missed the news that the Interview had leaked onto the Internet [00:52] *** Lord_Nigh has quit IRC (Read error: Connection reset by peer) [00:53] *** Lord_Nigh has joined #archiveteam-bs [01:17] *** duoi has joined #archiveteam-bs [01:41] *** SN4T14 has joined #archiveteam-bs [02:00] *** mistym has joined #archiveteam-bs [02:10] *** primus104 has quit IRC (Leaving.) [02:12] *** schbirid has quit IRC (Read error: Operation timed out) [02:18] *** schbirid has joined #archiveteam-bs [02:24] *** DopefishJ has joined #archiveteam-bs [02:24] *** swebb sets mode: +o DopefishJ [02:32] *** Smiley has quit IRC (Remote host closed the connection) [02:34] *** snuffy has quit IRC (Excess Flood) [02:35] *** DFJustin has quit IRC (Ping timeout: 740 seconds) [02:38] *** underscor has quit IRC (Read error: Connection reset by peer) [02:38] *** underscor has joined #archiveteam-bs [02:38] *** swebb sets mode: +o underscor [02:44] *** snuffy has joined #archiveteam-bs [02:44] *** snuffy has quit IRC (Excess Flood) [02:44] *** ionpulse has quit IRC (Ping timeout: 512 seconds) [02:45] *** ionpulse has joined #archiveteam-bs [02:45] *** snuffy has joined #archiveteam-bs [02:48] *** Smiley has joined #archiveteam-bs [02:51] *** danneh_ has quit IRC (hub.se efnet.port80.se) [02:51] *** deathy has quit IRC (hub.se efnet.port80.se) [02:51] *** GLaDOS has quit IRC (hub.se efnet.port80.se) [02:51] *** garyrh has quit IRC (Write error: Broken pipe) [02:52] *** useretail has quit IRC (Read error: Operation timed out) [02:54] *** Void_ has quit IRC (Read error: Operation timed out) [02:54] *** Void_ has joined #archiveteam-bs [02:55] *** useretail has joined #archiveteam-bs [02:55] *** garyrh has joined #archiveteam-bs [03:00] *** wm_ has quit IRC (Ping timeout: 265 seconds) [03:00] *** Kirk has quit IRC (Ping timeout: 265 seconds) [03:01] *** schbirid has quit IRC (Read error: Operation timed out) [03:02] *** Zebranky has quit IRC (Ping timeout: 265 seconds) [03:02] *** Zebranky has joined #archiveteam-bs [03:06] *** wm_ has joined #archiveteam-bs [03:11] *** Kirk has joined #archiveteam-bs [03:15] *** Ctrl-S has quit IRC (Read error: Connection reset by peer) [03:15] *** schbirid has joined #archiveteam-bs [03:16] *** Kirk has quit IRC (Ping timeout: 265 seconds) [03:18] *** wm_ has quit IRC (Ping timeout: 265 seconds) [03:25] *** wm_ has joined #archiveteam-bs [03:26] *** Kirk has joined #archiveteam-bs [03:29] *** Ctrl-S has joined #archiveteam-bs [04:21] *** deathy has joined #archiveteam-bs [04:21] *** danneh_ has joined #archiveteam-bs [04:21] *** GLaDOS has joined #archiveteam-bs [04:21] *** swebb sets mode: +o GLaDOS [04:21] *** Kirk has quit IRC (hub.dk irc.underworld.no) [04:21] *** wm_ has quit IRC (hub.dk irc.underworld.no) [04:21] *** duoi has quit IRC (hub.dk irc.underworld.no) [04:21] *** ersi has quit IRC (hub.dk irc.underworld.no) [04:21] *** Atluxity has quit IRC (hub.dk irc.underworld.no) [04:22] *** ersi_ has joined #archiveteam-bs [04:23] *** duoi_ghos has joined #archiveteam-bs [05:31] *** duoi_3 has joined #archiveteam-bs [05:31] *** Sellyme_ has quit IRC (Quit: No Ping reply in 180 seconds.) [05:32] *** Sellyme has joined #archiveteam-bs [05:33] *** duoi_ghos has quit IRC (Ping timeout: 246 seconds) [05:39] *** BlueMaxim has joined #archiveteam-bs [05:57] *** dx has quit IRC (Ping timeout: 265 seconds) [05:58] *** mutoso has quit IRC (Ping timeout: 265 seconds) [05:59] *** mutoso has joined #archiveteam-bs [06:08] *** dx has joined #archiveteam-bs [06:26] *** mutoso has quit IRC (Ping timeout: 272 seconds) [06:26] *** Nertsy has joined #archiveteam-bs [06:27] *** mutoso has joined #archiveteam-bs [06:28] *** mistym has quit IRC (Remote host closed the connection) [06:29] *** Nertsy` has quit IRC (Ping timeout: 480 seconds) [07:26] *** mistym has joined #archiveteam-bs [07:26] *** DopefishJ is now known as DFJustin [07:49] *** primus104 has joined #archiveteam-bs [08:25] *** APerti has quit IRC () [09:13] *** wm_ has joined #archiveteam-bs [09:13] *** Kirk has joined #archiveteam-bs [09:34] *** Atluxity has joined #archiveteam-bs [09:41] *** duoi_3 has quit IRC (Ping timeout: 265 seconds) [10:22] *** BlueMaxim has quit IRC (Read error: Connection reset by peer) [10:59] *** primus104 has quit IRC (Leaving.) [11:02] *** Boppen has quit IRC (Ping timeout: 198 seconds) [11:03] *** Boppen has joined #archiveteam-bs [11:27] *** mistym has quit IRC (Remote host closed the connection) [12:59] *** Ravenloft has quit IRC (Ping timeout: 492 seconds) [13:24] *** primus104 has joined #archiveteam-bs [13:53] *** primus104 has quit IRC (Leaving.) [14:05] *** brayden has quit IRC (Ping timeout: 606 seconds) [14:07] dashcloud, still a bunch of people streaming it right now, guess they wanted to see it on the intended release day [14:19] *** brayden has joined #archiveteam-bs [15:46] *** schbirid has quit IRC (Read error: Operation timed out) [15:53] *** schbirid has joined #archiveteam-bs [16:11] *** primus104 has joined #archiveteam-bs [17:30] *** primus105 has joined #archiveteam-bs [17:33] *** primus104 has quit IRC (Read error: Operation timed out) [17:43] *** Nertsy has quit IRC (Read error: Connection reset by peer) [17:43] *** Nertsy` has joined #archiveteam-bs [17:50] *** primus105 has quit IRC (Read error: Operation timed out) [18:04] *** primus104 has joined #archiveteam-bs [18:11] *** Nertsy` has quit IRC (Quit: Nertsy) [18:14] *** Nertsy has joined #archiveteam-bs [18:36] *** primus104 has quit IRC (Read error: Operation timed out) [18:56] *** primus104 has joined #archiveteam-bs [19:08] *** mistym has joined #archiveteam-bs [19:15] *** wp494 has quit IRC () [19:21] *** wp494 has joined #archiveteam-bs [19:45] *** primus105 has joined #archiveteam-bs [19:48] *** primus104 has quit IRC (Read error: Operation timed out) [19:57] *** primus has quit IRC (Ping timeout: 335 seconds) [20:04] *** primus104 has joined #archiveteam-bs [20:10] *** primus105 has quit IRC (Read error: Operation timed out) [20:11] *** primus105 has joined #archiveteam-bs [20:17] *** primus104 has quit IRC (Read error: Operation timed out) [20:38] *** primus has joined #archiveteam-bs [20:44] *** primus105 has quit IRC (Read error: Operation timed out) [21:10] *** RichardG has joined #archiveteam-bs [21:11] throw your questions, joepie91 [21:12] merry christmas everyone [21:15] merry christmas to you, godane. thanks for all your data gifts over the year [21:15] RichardG: hai :P [21:15] RichardG: what are your experiences with their rate limiting? how it works, when it triggers, etc. [21:16] because I've been scraping from a single box for months, but it eventually got hit by a ban [21:16] well... my rate limiting is based on a script I found on a blog back in 2011 [21:16] I scraped for a few months, but I had the dumb idea of storing pastes in individual files, which NTFS absolutely hates, one day my data drive decided to pull an ACL corruption and I had to nuke the pastes [21:17] but I keep the same formula: check pastebin.com for new pastes every 12 seconds, and wait 1.1 seconds between getting raw pastes [21:17] when I get banned, it's because I restart the script faster than the delays [21:17] RichardG: https://github.com/joepie91/pastebin-scrape/tree/develop [21:17] scrape.py does indexing, retrieve.py does the fetching [21:18] my indexing delay was 60 secs [21:18] retrieval delay 1.3 [21:18] so, not too far off I guess :P [21:18] and I'm also storing in separate files, but it runs on a Linux system so that's okay [21:18] well the 2 servers I use to help get temporary bans every now and then [21:19] RichardG: i always make sure my data dumps need the less amount of duck tape [21:19] the problem here is obviously winblows [21:19] which can't handle 665k files in a single folder [21:19] (I still have an index with the IDs of all the pastes from the 2011 run) [21:21] using mysql for paste storage was sort of a good idea, possibly, but only because there is no better FS [21:26] RichardG: hold, bit overloaded with people messaging me right now, back in a few mins [21:29] RichardG: split into dirs by the first 1-2 characters works well too [21:30] RichardG: any chance you can a) release the scraper code and b) share the list of pastes? so that somebody could at least archive those that are still around [21:31] working on making the 2011 list of pastes [21:31] :) [21:36] https://mega.co.nz/#!Ow4C3ADC!hAyN7Nxh4KrjIAz5Gu_9uHkTbTAB3eUXBOAr0w_TnPM [21:36] inconvenient host choice, I know, but my Dropbox was permanently suspended over autohotkey code in it... [21:40] *** dashcloud has quit IRC (Read error: Operation timed out) [21:41] RichardG: might want to bookmark https://transfer.sh/ [21:41] it's useful for temporary stuff [21:41] (although mega isn't bad) [21:41] and, thanks :P [21:42] that's strange, it barely compressed [21:42] transfer.sh rocks [21:43] *** dashcloud has joined #archiveteam-bs [21:44] *** BlueMaxim has joined #archiveteam-bs [21:45] when scraping pastebin you have to get used to the tools that automatically post to it. according to some stats I made with my current database, the most popular kind of automated paste is a mod tool for Phantasy Star Online 2 [21:46] followed by crash reports of old versions of Minecraft mod tools (they moved to their own pastebins a while ago), then JOdin (a ROM flash tool for some Android devices) [21:49] RichardG: computercraft is also a popular one [21:49] :) [21:53] heh, I used to make an addon for it! [22:03] https://www.facebook.com/ghazayel/posts/10205536170422795?pnref=story :( [22:05] i'd like to see the mtrg of that ia box right now [22:06] also, fuck sony [22:07] so psn/live is down, the joys [22:07] i uploaded this yesterday: https://archive.org/details/www.asiatorrents.me-subtitle-1-to-38406-20141205 [22:07] over 2gb of translate subtitles [22:08] in web archive and in a zip file for people to be able to download it [22:10] https://ia601509.us.archive.org/mrtg/ theoretically [22:11] https://ia601509.us.archive.org/mrtg/nginx_rps.html [22:11] ouch https://ia601509.us.archive.org/mrtg/nginx_con.html [22:11] uh oh [22:11] conn limit? [22:13] direct mp4 link is on reddit frontpage [22:13] but the comments are great, pointing out the license issue and suggesting IA donations [22:14] https://pay.reddit.com/r/videos/comments/2qds9z/the_interview_full_movie_in_hd_free/ [22:14] yeah, was alerted to it by a friend [22:14] heh [22:16] any recommendations for jabber clients on android? it's beena year since i used one, yaxim iirc [22:17] ah, chatsecure of course [22:17] 11 mb, ffff [22:19] xabber and yaxim both seem unmaintained since feb 13 [22:20] *** duoi_3 has joined #archiveteam-bs [22:22] :/ [22:24] trying yaxim, as it is the smallest [22:24] but cs has otr :) [22:32] lol, the IA box hosting the interview is getting hit hard [22:33] yep [22:33] anyway [22:33] RichardG: did you have the scraping code on github or something? [22:34] I don't know if I should, the code is kinda bad, has some hacks, although I'll see if I can do something [22:35] *** mistym has quit IRC (Remote host closed the connection) [22:35] RichardG: bad code is better than no code :) [22:35] everybody's code has some hacks [22:36] hell, probably half of the code behind the software you and I use on a daily basis has horrible hacks that somebody feels really ashamed for [22:36] that's no reason not to publish code! :P [22:39] I'm commenting the thing at least a little bit. [22:41] okay, this is a new one... got an abusemail, responded that I wasn't going to follow up on it because no legal grounds, only to be met with a bounce [22:41] what [22:41] from a gmail address, too [22:58] *** raylee has joined #archiveteam-bs [23:23] *** mistym has joined #archiveteam-bs [23:27] *** aaaaaaaaa has joined #archiveteam-bs [23:40] *** RichardG_ has joined #archiveteam-bs [23:40] joepie91: I kinda rushed this since I have to go mobile... https://github.com/richardg867/pastescraper [23:41] RichardG: will have a look at it soon [23:41] RichardG: as a loosely related aside; http://cryto.net/~joepie91/blog/2013/03/21/licensing-for-beginners/ [23:41] :P [23:43] I just unlicense quick things like this, but I will license this, don't ya worry... I was just in doubt [23:44] :) [23:56] *** Ravenloft has joined #archiveteam-bs