#archiveteam-bs 2014-12-25,Thu

↑back Search

Time Nickname Message
00:01 🔗 BlueMaxim has quit IRC (Read error: Connection reset by peer)
00:44 🔗 SN4T14 has quit IRC (Quit: Leaving)
00:47 🔗 dashcloud ohhdemgir: I missed the news that the Interview had leaked onto the Internet
00:52 🔗 Lord_Nigh has quit IRC (Read error: Connection reset by peer)
00:53 🔗 Lord_Nigh has joined #archiveteam-bs
01:17 🔗 duoi has joined #archiveteam-bs
01:41 🔗 SN4T14 has joined #archiveteam-bs
02:00 🔗 mistym has joined #archiveteam-bs
02:10 🔗 primus104 has quit IRC (Leaving.)
02:12 🔗 schbirid has quit IRC (Read error: Operation timed out)
02:18 🔗 schbirid has joined #archiveteam-bs
02:24 🔗 DopefishJ has joined #archiveteam-bs
02:24 🔗 swebb sets mode: +o DopefishJ
02:32 🔗 Smiley has quit IRC (Remote host closed the connection)
02:34 🔗 snuffy has quit IRC (Excess Flood)
02:35 🔗 DFJustin has quit IRC (Ping timeout: 740 seconds)
02:38 🔗 underscor has quit IRC (Read error: Connection reset by peer)
02:38 🔗 underscor has joined #archiveteam-bs
02:38 🔗 swebb sets mode: +o underscor
02:44 🔗 snuffy has joined #archiveteam-bs
02:44 🔗 snuffy has quit IRC (Excess Flood)
02:44 🔗 ionpulse has quit IRC (Ping timeout: 512 seconds)
02:45 🔗 ionpulse has joined #archiveteam-bs
02:45 🔗 snuffy has joined #archiveteam-bs
02:48 🔗 Smiley has joined #archiveteam-bs
02:51 🔗 danneh_ has quit IRC (hub.se efnet.port80.se)
02:51 🔗 deathy has quit IRC (hub.se efnet.port80.se)
02:51 🔗 GLaDOS has quit IRC (hub.se efnet.port80.se)
02:51 🔗 garyrh has quit IRC (Write error: Broken pipe)
02:52 🔗 useretail has quit IRC (Read error: Operation timed out)
02:54 🔗 Void_ has quit IRC (Read error: Operation timed out)
02:54 🔗 Void_ has joined #archiveteam-bs
02:55 🔗 useretail has joined #archiveteam-bs
02:55 🔗 garyrh has joined #archiveteam-bs
03:00 🔗 wm_ has quit IRC (Ping timeout: 265 seconds)
03:00 🔗 Kirk has quit IRC (Ping timeout: 265 seconds)
03:01 🔗 schbirid has quit IRC (Read error: Operation timed out)
03:02 🔗 Zebranky has quit IRC (Ping timeout: 265 seconds)
03:02 🔗 Zebranky has joined #archiveteam-bs
03:06 🔗 wm_ has joined #archiveteam-bs
03:11 🔗 Kirk has joined #archiveteam-bs
03:15 🔗 Ctrl-S has quit IRC (Read error: Connection reset by peer)
03:15 🔗 schbirid has joined #archiveteam-bs
03:16 🔗 Kirk has quit IRC (Ping timeout: 265 seconds)
03:18 🔗 wm_ has quit IRC (Ping timeout: 265 seconds)
03:25 🔗 wm_ has joined #archiveteam-bs
03:26 🔗 Kirk has joined #archiveteam-bs
03:29 🔗 Ctrl-S has joined #archiveteam-bs
04:21 🔗 deathy has joined #archiveteam-bs
04:21 🔗 danneh_ has joined #archiveteam-bs
04:21 🔗 GLaDOS has joined #archiveteam-bs
04:21 🔗 swebb sets mode: +o GLaDOS
04:21 🔗 Kirk has quit IRC (hub.dk irc.underworld.no)
04:21 🔗 wm_ has quit IRC (hub.dk irc.underworld.no)
04:21 🔗 duoi has quit IRC (hub.dk irc.underworld.no)
04:21 🔗 ersi has quit IRC (hub.dk irc.underworld.no)
04:21 🔗 Atluxity has quit IRC (hub.dk irc.underworld.no)
04:22 🔗 ersi_ has joined #archiveteam-bs
04:23 🔗 duoi_ghos has joined #archiveteam-bs
05:31 🔗 duoi_3 has joined #archiveteam-bs
05:31 🔗 Sellyme_ has quit IRC (Quit: No Ping reply in 180 seconds.)
05:32 🔗 Sellyme has joined #archiveteam-bs
05:33 🔗 duoi_ghos has quit IRC (Ping timeout: 246 seconds)
05:39 🔗 BlueMaxim has joined #archiveteam-bs
05:57 🔗 dx has quit IRC (Ping timeout: 265 seconds)
05:58 🔗 mutoso has quit IRC (Ping timeout: 265 seconds)
05:59 🔗 mutoso has joined #archiveteam-bs
06:08 🔗 dx has joined #archiveteam-bs
06:26 🔗 mutoso has quit IRC (Ping timeout: 272 seconds)
06:26 🔗 Nertsy has joined #archiveteam-bs
06:27 🔗 mutoso has joined #archiveteam-bs
06:28 🔗 mistym has quit IRC (Remote host closed the connection)
06:29 🔗 Nertsy` has quit IRC (Ping timeout: 480 seconds)
07:26 🔗 mistym has joined #archiveteam-bs
07:26 🔗 DopefishJ is now known as DFJustin
07:49 🔗 primus104 has joined #archiveteam-bs
08:25 🔗 APerti has quit IRC ()
09:13 🔗 wm_ has joined #archiveteam-bs
09:13 🔗 Kirk has joined #archiveteam-bs
09:34 🔗 Atluxity has joined #archiveteam-bs
09:41 🔗 duoi_3 has quit IRC (Ping timeout: 265 seconds)
10:22 🔗 BlueMaxim has quit IRC (Read error: Connection reset by peer)
10:59 🔗 primus104 has quit IRC (Leaving.)
11:02 🔗 Boppen has quit IRC (Ping timeout: 198 seconds)
11:03 🔗 Boppen has joined #archiveteam-bs
11:27 🔗 mistym has quit IRC (Remote host closed the connection)
12:59 🔗 Ravenloft has quit IRC (Ping timeout: 492 seconds)
13:24 🔗 primus104 has joined #archiveteam-bs
13:53 🔗 primus104 has quit IRC (Leaving.)
14:05 🔗 brayden has quit IRC (Ping timeout: 606 seconds)
14:07 🔗 ohhdemgir dashcloud, still a bunch of people streaming it right now, guess they wanted to see it on the intended release day
14:19 🔗 brayden has joined #archiveteam-bs
15:46 🔗 schbirid has quit IRC (Read error: Operation timed out)
15:53 🔗 schbirid has joined #archiveteam-bs
16:11 🔗 primus104 has joined #archiveteam-bs
17:30 🔗 primus105 has joined #archiveteam-bs
17:33 🔗 primus104 has quit IRC (Read error: Operation timed out)
17:43 🔗 Nertsy has quit IRC (Read error: Connection reset by peer)
17:43 🔗 Nertsy` has joined #archiveteam-bs
17:50 🔗 primus105 has quit IRC (Read error: Operation timed out)
18:04 🔗 primus104 has joined #archiveteam-bs
18:11 🔗 Nertsy` has quit IRC (Quit: Nertsy)
18:14 🔗 Nertsy has joined #archiveteam-bs
18:36 🔗 primus104 has quit IRC (Read error: Operation timed out)
18:56 🔗 primus104 has joined #archiveteam-bs
19:08 🔗 mistym has joined #archiveteam-bs
19:15 🔗 wp494 has quit IRC ()
19:21 🔗 wp494 has joined #archiveteam-bs
19:45 🔗 primus105 has joined #archiveteam-bs
19:48 🔗 primus104 has quit IRC (Read error: Operation timed out)
19:57 🔗 primus has quit IRC (Ping timeout: 335 seconds)
20:04 🔗 primus104 has joined #archiveteam-bs
20:10 🔗 primus105 has quit IRC (Read error: Operation timed out)
20:11 🔗 primus105 has joined #archiveteam-bs
20:17 🔗 primus104 has quit IRC (Read error: Operation timed out)
20:38 🔗 primus has joined #archiveteam-bs
20:44 🔗 primus105 has quit IRC (Read error: Operation timed out)
21:10 🔗 RichardG has joined #archiveteam-bs
21:11 🔗 RichardG throw your questions, joepie91
21:12 🔗 godane merry christmas everyone
21:15 🔗 schbirid merry christmas to you, godane. thanks for all your data gifts over the year
21:15 🔗 joepie91 RichardG: hai :P
21:15 🔗 joepie91 RichardG: what are your experiences with their rate limiting? how it works, when it triggers, etc.
21:16 🔗 joepie91 because I've been scraping from a single box for months, but it eventually got hit by a ban
21:16 🔗 RichardG well... my rate limiting is based on a script I found on a blog back in 2011
21:16 🔗 RichardG I scraped for a few months, but I had the dumb idea of storing pastes in individual files, which NTFS absolutely hates, one day my data drive decided to pull an ACL corruption and I had to nuke the pastes
21:17 🔗 RichardG but I keep the same formula: check pastebin.com for new pastes every 12 seconds, and wait 1.1 seconds between getting raw pastes
21:17 🔗 RichardG when I get banned, it's because I restart the script faster than the delays
21:17 🔗 joepie91 RichardG: https://github.com/joepie91/pastebin-scrape/tree/develop
21:17 🔗 joepie91 scrape.py does indexing, retrieve.py does the fetching
21:18 🔗 joepie91 my indexing delay was 60 secs
21:18 🔗 joepie91 retrieval delay 1.3
21:18 🔗 joepie91 so, not too far off I guess :P
21:18 🔗 joepie91 and I'm also storing in separate files, but it runs on a Linux system so that's okay
21:18 🔗 RichardG well the 2 servers I use to help get temporary bans every now and then
21:19 🔗 godane RichardG: i always make sure my data dumps need the less amount of duck tape
21:19 🔗 RichardG the problem here is obviously winblows
21:19 🔗 RichardG which can't handle 665k files in a single folder
21:19 🔗 RichardG (I still have an index with the IDs of all the pastes from the 2011 run)
21:21 🔗 RichardG using mysql for paste storage was sort of a good idea, possibly, but only because there is no better FS
21:26 🔗 joepie91 RichardG: hold, bit overloaded with people messaging me right now, back in a few mins
21:29 🔗 schbirid RichardG: split into dirs by the first 1-2 characters works well too
21:30 🔗 joepie91 RichardG: any chance you can a) release the scraper code and b) share the list of pastes? so that somebody could at least archive those that are still around
21:31 🔗 RichardG working on making the 2011 list of pastes
21:31 🔗 joepie91 :)
21:36 🔗 RichardG https://mega.co.nz/#!Ow4C3ADC!hAyN7Nxh4KrjIAz5Gu_9uHkTbTAB3eUXBOAr0w_TnPM
21:36 🔗 RichardG inconvenient host choice, I know, but my Dropbox was permanently suspended over autohotkey code in it...
21:40 🔗 dashcloud has quit IRC (Read error: Operation timed out)
21:41 🔗 joepie91 RichardG: might want to bookmark https://transfer.sh/
21:41 🔗 joepie91 it's useful for temporary stuff
21:41 🔗 joepie91 (although mega isn't bad)
21:41 🔗 joepie91 and, thanks :P
21:42 🔗 joepie91 that's strange, it barely compressed
21:42 🔗 schbirid transfer.sh rocks
21:43 🔗 dashcloud has joined #archiveteam-bs
21:44 🔗 BlueMaxim has joined #archiveteam-bs
21:45 🔗 RichardG when scraping pastebin you have to get used to the tools that automatically post to it. according to some stats I made with my current database, the most popular kind of automated paste is a mod tool for Phantasy Star Online 2
21:46 🔗 RichardG followed by crash reports of old versions of Minecraft mod tools (they moved to their own pastebins a while ago), then JOdin (a ROM flash tool for some Android devices)
21:49 🔗 joepie91 RichardG: computercraft is also a popular one
21:49 🔗 joepie91 :)
21:53 🔗 RichardG heh, I used to make an addon for it!
22:03 🔗 schbirid https://www.facebook.com/ghazayel/posts/10205536170422795?pnref=story :(
22:05 🔗 schbirid i'd like to see the mtrg of that ia box right now
22:06 🔗 schbirid also, fuck sony
22:07 🔗 Smiley so psn/live is down, the joys
22:07 🔗 godane i uploaded this yesterday: https://archive.org/details/www.asiatorrents.me-subtitle-1-to-38406-20141205
22:07 🔗 godane over 2gb of translate subtitles
22:08 🔗 godane in web archive and in a zip file for people to be able to download it
22:10 🔗 schbirid https://ia601509.us.archive.org/mrtg/ theoretically
22:11 🔗 schbirid https://ia601509.us.archive.org/mrtg/nginx_rps.html
22:11 🔗 schbirid ouch https://ia601509.us.archive.org/mrtg/nginx_con.html
22:11 🔗 joepie91 uh oh
22:11 🔗 joepie91 conn limit?
22:13 🔗 schbirid direct mp4 link is on reddit frontpage
22:13 🔗 schbirid but the comments are great, pointing out the license issue and suggesting IA donations
22:14 🔗 schbirid https://pay.reddit.com/r/videos/comments/2qds9z/the_interview_full_movie_in_hd_free/
22:14 🔗 joepie91 yeah, was alerted to it by a friend
22:14 🔗 joepie91 heh
22:16 🔗 schbirid any recommendations for jabber clients on android? it's beena year since i used one, yaxim iirc
22:17 🔗 schbirid ah, chatsecure of course
22:17 🔗 schbirid 11 mb, ffff
22:19 🔗 schbirid xabber and yaxim both seem unmaintained since feb 13
22:20 🔗 duoi_3 has joined #archiveteam-bs
22:22 🔗 joepie91 :/
22:24 🔗 schbirid trying yaxim, as it is the smallest
22:24 🔗 schbirid but cs has otr :)
22:32 🔗 RichardG lol, the IA box hosting the interview is getting hit hard
22:33 🔗 joepie91 yep
22:33 🔗 joepie91 anyway
22:33 🔗 joepie91 RichardG: did you have the scraping code on github or something?
22:34 🔗 RichardG I don't know if I should, the code is kinda bad, has some hacks, although I'll see if I can do something
22:35 🔗 mistym has quit IRC (Remote host closed the connection)
22:35 🔗 joepie91 RichardG: bad code is better than no code :)
22:35 🔗 joepie91 everybody's code has some hacks
22:36 🔗 joepie91 hell, probably half of the code behind the software you and I use on a daily basis has horrible hacks that somebody feels really ashamed for
22:36 🔗 joepie91 that's no reason not to publish code! :P
22:39 🔗 RichardG I'm commenting the thing at least a little bit.
22:41 🔗 joepie91 okay, this is a new one... got an abusemail, responded that I wasn't going to follow up on it because no legal grounds, only to be met with a bounce
22:41 🔗 joepie91 what
22:41 🔗 joepie91 from a gmail address, too
22:58 🔗 raylee has joined #archiveteam-bs
23:23 🔗 mistym has joined #archiveteam-bs
23:27 🔗 aaaaaaaaa has joined #archiveteam-bs
23:40 🔗 RichardG_ has joined #archiveteam-bs
23:40 🔗 RichardG joepie91: I kinda rushed this since I have to go mobile... https://github.com/richardg867/pastescraper
23:41 🔗 joepie91 RichardG: will have a look at it soon
23:41 🔗 joepie91 RichardG: as a loosely related aside; http://cryto.net/~joepie91/blog/2013/03/21/licensing-for-beginners/
23:41 🔗 joepie91 :P
23:43 🔗 RichardG_ I just unlicense quick things like this, but I will license this, don't ya worry... I was just in doubt
23:44 🔗 joepie91 :)
23:56 🔗 Ravenloft has joined #archiveteam-bs

irclogger-viewer