Time |
Nickname |
Message |
00:01
🔗
|
|
BlueMaxim has quit IRC (Read error: Connection reset by peer) |
00:44
🔗
|
|
SN4T14 has quit IRC (Quit: Leaving) |
00:47
🔗
|
dashcloud |
ohhdemgir: I missed the news that the Interview had leaked onto the Internet |
00:52
🔗
|
|
Lord_Nigh has quit IRC (Read error: Connection reset by peer) |
00:53
🔗
|
|
Lord_Nigh has joined #archiveteam-bs |
01:17
🔗
|
|
duoi has joined #archiveteam-bs |
01:41
🔗
|
|
SN4T14 has joined #archiveteam-bs |
02:00
🔗
|
|
mistym has joined #archiveteam-bs |
02:10
🔗
|
|
primus104 has quit IRC (Leaving.) |
02:12
🔗
|
|
schbirid has quit IRC (Read error: Operation timed out) |
02:18
🔗
|
|
schbirid has joined #archiveteam-bs |
02:24
🔗
|
|
DopefishJ has joined #archiveteam-bs |
02:24
🔗
|
|
swebb sets mode: +o DopefishJ |
02:32
🔗
|
|
Smiley has quit IRC (Remote host closed the connection) |
02:34
🔗
|
|
snuffy has quit IRC (Excess Flood) |
02:35
🔗
|
|
DFJustin has quit IRC (Ping timeout: 740 seconds) |
02:38
🔗
|
|
underscor has quit IRC (Read error: Connection reset by peer) |
02:38
🔗
|
|
underscor has joined #archiveteam-bs |
02:38
🔗
|
|
swebb sets mode: +o underscor |
02:44
🔗
|
|
snuffy has joined #archiveteam-bs |
02:44
🔗
|
|
snuffy has quit IRC (Excess Flood) |
02:44
🔗
|
|
ionpulse has quit IRC (Ping timeout: 512 seconds) |
02:45
🔗
|
|
ionpulse has joined #archiveteam-bs |
02:45
🔗
|
|
snuffy has joined #archiveteam-bs |
02:48
🔗
|
|
Smiley has joined #archiveteam-bs |
02:51
🔗
|
|
danneh_ has quit IRC (hub.se efnet.port80.se) |
02:51
🔗
|
|
deathy has quit IRC (hub.se efnet.port80.se) |
02:51
🔗
|
|
GLaDOS has quit IRC (hub.se efnet.port80.se) |
02:51
🔗
|
|
garyrh has quit IRC (Write error: Broken pipe) |
02:52
🔗
|
|
useretail has quit IRC (Read error: Operation timed out) |
02:54
🔗
|
|
Void_ has quit IRC (Read error: Operation timed out) |
02:54
🔗
|
|
Void_ has joined #archiveteam-bs |
02:55
🔗
|
|
useretail has joined #archiveteam-bs |
02:55
🔗
|
|
garyrh has joined #archiveteam-bs |
03:00
🔗
|
|
wm_ has quit IRC (Ping timeout: 265 seconds) |
03:00
🔗
|
|
Kirk has quit IRC (Ping timeout: 265 seconds) |
03:01
🔗
|
|
schbirid has quit IRC (Read error: Operation timed out) |
03:02
🔗
|
|
Zebranky has quit IRC (Ping timeout: 265 seconds) |
03:02
🔗
|
|
Zebranky has joined #archiveteam-bs |
03:06
🔗
|
|
wm_ has joined #archiveteam-bs |
03:11
🔗
|
|
Kirk has joined #archiveteam-bs |
03:15
🔗
|
|
Ctrl-S has quit IRC (Read error: Connection reset by peer) |
03:15
🔗
|
|
schbirid has joined #archiveteam-bs |
03:16
🔗
|
|
Kirk has quit IRC (Ping timeout: 265 seconds) |
03:18
🔗
|
|
wm_ has quit IRC (Ping timeout: 265 seconds) |
03:25
🔗
|
|
wm_ has joined #archiveteam-bs |
03:26
🔗
|
|
Kirk has joined #archiveteam-bs |
03:29
🔗
|
|
Ctrl-S has joined #archiveteam-bs |
04:21
🔗
|
|
deathy has joined #archiveteam-bs |
04:21
🔗
|
|
danneh_ has joined #archiveteam-bs |
04:21
🔗
|
|
GLaDOS has joined #archiveteam-bs |
04:21
🔗
|
|
swebb sets mode: +o GLaDOS |
04:21
🔗
|
|
Kirk has quit IRC (hub.dk irc.underworld.no) |
04:21
🔗
|
|
wm_ has quit IRC (hub.dk irc.underworld.no) |
04:21
🔗
|
|
duoi has quit IRC (hub.dk irc.underworld.no) |
04:21
🔗
|
|
ersi has quit IRC (hub.dk irc.underworld.no) |
04:21
🔗
|
|
Atluxity has quit IRC (hub.dk irc.underworld.no) |
04:22
🔗
|
|
ersi_ has joined #archiveteam-bs |
04:23
🔗
|
|
duoi_ghos has joined #archiveteam-bs |
05:31
🔗
|
|
duoi_3 has joined #archiveteam-bs |
05:31
🔗
|
|
Sellyme_ has quit IRC (Quit: No Ping reply in 180 seconds.) |
05:32
🔗
|
|
Sellyme has joined #archiveteam-bs |
05:33
🔗
|
|
duoi_ghos has quit IRC (Ping timeout: 246 seconds) |
05:39
🔗
|
|
BlueMaxim has joined #archiveteam-bs |
05:57
🔗
|
|
dx has quit IRC (Ping timeout: 265 seconds) |
05:58
🔗
|
|
mutoso has quit IRC (Ping timeout: 265 seconds) |
05:59
🔗
|
|
mutoso has joined #archiveteam-bs |
06:08
🔗
|
|
dx has joined #archiveteam-bs |
06:26
🔗
|
|
mutoso has quit IRC (Ping timeout: 272 seconds) |
06:26
🔗
|
|
Nertsy has joined #archiveteam-bs |
06:27
🔗
|
|
mutoso has joined #archiveteam-bs |
06:28
🔗
|
|
mistym has quit IRC (Remote host closed the connection) |
06:29
🔗
|
|
Nertsy` has quit IRC (Ping timeout: 480 seconds) |
07:26
🔗
|
|
mistym has joined #archiveteam-bs |
07:26
🔗
|
|
DopefishJ is now known as DFJustin |
07:49
🔗
|
|
primus104 has joined #archiveteam-bs |
08:25
🔗
|
|
APerti has quit IRC () |
09:13
🔗
|
|
wm_ has joined #archiveteam-bs |
09:13
🔗
|
|
Kirk has joined #archiveteam-bs |
09:34
🔗
|
|
Atluxity has joined #archiveteam-bs |
09:41
🔗
|
|
duoi_3 has quit IRC (Ping timeout: 265 seconds) |
10:22
🔗
|
|
BlueMaxim has quit IRC (Read error: Connection reset by peer) |
10:59
🔗
|
|
primus104 has quit IRC (Leaving.) |
11:02
🔗
|
|
Boppen has quit IRC (Ping timeout: 198 seconds) |
11:03
🔗
|
|
Boppen has joined #archiveteam-bs |
11:27
🔗
|
|
mistym has quit IRC (Remote host closed the connection) |
12:59
🔗
|
|
Ravenloft has quit IRC (Ping timeout: 492 seconds) |
13:24
🔗
|
|
primus104 has joined #archiveteam-bs |
13:53
🔗
|
|
primus104 has quit IRC (Leaving.) |
14:05
🔗
|
|
brayden has quit IRC (Ping timeout: 606 seconds) |
14:07
🔗
|
ohhdemgir |
dashcloud, still a bunch of people streaming it right now, guess they wanted to see it on the intended release day |
14:19
🔗
|
|
brayden has joined #archiveteam-bs |
15:46
🔗
|
|
schbirid has quit IRC (Read error: Operation timed out) |
15:53
🔗
|
|
schbirid has joined #archiveteam-bs |
16:11
🔗
|
|
primus104 has joined #archiveteam-bs |
17:30
🔗
|
|
primus105 has joined #archiveteam-bs |
17:33
🔗
|
|
primus104 has quit IRC (Read error: Operation timed out) |
17:43
🔗
|
|
Nertsy has quit IRC (Read error: Connection reset by peer) |
17:43
🔗
|
|
Nertsy` has joined #archiveteam-bs |
17:50
🔗
|
|
primus105 has quit IRC (Read error: Operation timed out) |
18:04
🔗
|
|
primus104 has joined #archiveteam-bs |
18:11
🔗
|
|
Nertsy` has quit IRC (Quit: Nertsy) |
18:14
🔗
|
|
Nertsy has joined #archiveteam-bs |
18:36
🔗
|
|
primus104 has quit IRC (Read error: Operation timed out) |
18:56
🔗
|
|
primus104 has joined #archiveteam-bs |
19:08
🔗
|
|
mistym has joined #archiveteam-bs |
19:15
🔗
|
|
wp494 has quit IRC () |
19:21
🔗
|
|
wp494 has joined #archiveteam-bs |
19:45
🔗
|
|
primus105 has joined #archiveteam-bs |
19:48
🔗
|
|
primus104 has quit IRC (Read error: Operation timed out) |
19:57
🔗
|
|
primus has quit IRC (Ping timeout: 335 seconds) |
20:04
🔗
|
|
primus104 has joined #archiveteam-bs |
20:10
🔗
|
|
primus105 has quit IRC (Read error: Operation timed out) |
20:11
🔗
|
|
primus105 has joined #archiveteam-bs |
20:17
🔗
|
|
primus104 has quit IRC (Read error: Operation timed out) |
20:38
🔗
|
|
primus has joined #archiveteam-bs |
20:44
🔗
|
|
primus105 has quit IRC (Read error: Operation timed out) |
21:10
🔗
|
|
RichardG has joined #archiveteam-bs |
21:11
🔗
|
RichardG |
throw your questions, joepie91 |
21:12
🔗
|
godane |
merry christmas everyone |
21:15
🔗
|
schbirid |
merry christmas to you, godane. thanks for all your data gifts over the year |
21:15
🔗
|
joepie91 |
RichardG: hai :P |
21:15
🔗
|
joepie91 |
RichardG: what are your experiences with their rate limiting? how it works, when it triggers, etc. |
21:16
🔗
|
joepie91 |
because I've been scraping from a single box for months, but it eventually got hit by a ban |
21:16
🔗
|
RichardG |
well... my rate limiting is based on a script I found on a blog back in 2011 |
21:16
🔗
|
RichardG |
I scraped for a few months, but I had the dumb idea of storing pastes in individual files, which NTFS absolutely hates, one day my data drive decided to pull an ACL corruption and I had to nuke the pastes |
21:17
🔗
|
RichardG |
but I keep the same formula: check pastebin.com for new pastes every 12 seconds, and wait 1.1 seconds between getting raw pastes |
21:17
🔗
|
RichardG |
when I get banned, it's because I restart the script faster than the delays |
21:17
🔗
|
joepie91 |
RichardG: https://github.com/joepie91/pastebin-scrape/tree/develop |
21:17
🔗
|
joepie91 |
scrape.py does indexing, retrieve.py does the fetching |
21:18
🔗
|
joepie91 |
my indexing delay was 60 secs |
21:18
🔗
|
joepie91 |
retrieval delay 1.3 |
21:18
🔗
|
joepie91 |
so, not too far off I guess :P |
21:18
🔗
|
joepie91 |
and I'm also storing in separate files, but it runs on a Linux system so that's okay |
21:18
🔗
|
RichardG |
well the 2 servers I use to help get temporary bans every now and then |
21:19
🔗
|
godane |
RichardG: i always make sure my data dumps need the less amount of duck tape |
21:19
🔗
|
RichardG |
the problem here is obviously winblows |
21:19
🔗
|
RichardG |
which can't handle 665k files in a single folder |
21:19
🔗
|
RichardG |
(I still have an index with the IDs of all the pastes from the 2011 run) |
21:21
🔗
|
RichardG |
using mysql for paste storage was sort of a good idea, possibly, but only because there is no better FS |
21:26
🔗
|
joepie91 |
RichardG: hold, bit overloaded with people messaging me right now, back in a few mins |
21:29
🔗
|
schbirid |
RichardG: split into dirs by the first 1-2 characters works well too |
21:30
🔗
|
joepie91 |
RichardG: any chance you can a) release the scraper code and b) share the list of pastes? so that somebody could at least archive those that are still around |
21:31
🔗
|
RichardG |
working on making the 2011 list of pastes |
21:31
🔗
|
joepie91 |
:) |
21:36
🔗
|
RichardG |
https://mega.co.nz/#!Ow4C3ADC!hAyN7Nxh4KrjIAz5Gu_9uHkTbTAB3eUXBOAr0w_TnPM |
21:36
🔗
|
RichardG |
inconvenient host choice, I know, but my Dropbox was permanently suspended over autohotkey code in it... |
21:40
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
21:41
🔗
|
joepie91 |
RichardG: might want to bookmark https://transfer.sh/ |
21:41
🔗
|
joepie91 |
it's useful for temporary stuff |
21:41
🔗
|
joepie91 |
(although mega isn't bad) |
21:41
🔗
|
joepie91 |
and, thanks :P |
21:42
🔗
|
joepie91 |
that's strange, it barely compressed |
21:42
🔗
|
schbirid |
transfer.sh rocks |
21:43
🔗
|
|
dashcloud has joined #archiveteam-bs |
21:44
🔗
|
|
BlueMaxim has joined #archiveteam-bs |
21:45
🔗
|
RichardG |
when scraping pastebin you have to get used to the tools that automatically post to it. according to some stats I made with my current database, the most popular kind of automated paste is a mod tool for Phantasy Star Online 2 |
21:46
🔗
|
RichardG |
followed by crash reports of old versions of Minecraft mod tools (they moved to their own pastebins a while ago), then JOdin (a ROM flash tool for some Android devices) |
21:49
🔗
|
joepie91 |
RichardG: computercraft is also a popular one |
21:49
🔗
|
joepie91 |
:) |
21:53
🔗
|
RichardG |
heh, I used to make an addon for it! |
22:03
🔗
|
schbirid |
https://www.facebook.com/ghazayel/posts/10205536170422795?pnref=story :( |
22:05
🔗
|
schbirid |
i'd like to see the mtrg of that ia box right now |
22:06
🔗
|
schbirid |
also, fuck sony |
22:07
🔗
|
Smiley |
so psn/live is down, the joys |
22:07
🔗
|
godane |
i uploaded this yesterday: https://archive.org/details/www.asiatorrents.me-subtitle-1-to-38406-20141205 |
22:07
🔗
|
godane |
over 2gb of translate subtitles |
22:08
🔗
|
godane |
in web archive and in a zip file for people to be able to download it |
22:10
🔗
|
schbirid |
https://ia601509.us.archive.org/mrtg/ theoretically |
22:11
🔗
|
schbirid |
https://ia601509.us.archive.org/mrtg/nginx_rps.html |
22:11
🔗
|
schbirid |
ouch https://ia601509.us.archive.org/mrtg/nginx_con.html |
22:11
🔗
|
joepie91 |
uh oh |
22:11
🔗
|
joepie91 |
conn limit? |
22:13
🔗
|
schbirid |
direct mp4 link is on reddit frontpage |
22:13
🔗
|
schbirid |
but the comments are great, pointing out the license issue and suggesting IA donations |
22:14
🔗
|
schbirid |
https://pay.reddit.com/r/videos/comments/2qds9z/the_interview_full_movie_in_hd_free/ |
22:14
🔗
|
joepie91 |
yeah, was alerted to it by a friend |
22:14
🔗
|
joepie91 |
heh |
22:16
🔗
|
schbirid |
any recommendations for jabber clients on android? it's beena year since i used one, yaxim iirc |
22:17
🔗
|
schbirid |
ah, chatsecure of course |
22:17
🔗
|
schbirid |
11 mb, ffff |
22:19
🔗
|
schbirid |
xabber and yaxim both seem unmaintained since feb 13 |
22:20
🔗
|
|
duoi_3 has joined #archiveteam-bs |
22:22
🔗
|
joepie91 |
:/ |
22:24
🔗
|
schbirid |
trying yaxim, as it is the smallest |
22:24
🔗
|
schbirid |
but cs has otr :) |
22:32
🔗
|
RichardG |
lol, the IA box hosting the interview is getting hit hard |
22:33
🔗
|
joepie91 |
yep |
22:33
🔗
|
joepie91 |
anyway |
22:33
🔗
|
joepie91 |
RichardG: did you have the scraping code on github or something? |
22:34
🔗
|
RichardG |
I don't know if I should, the code is kinda bad, has some hacks, although I'll see if I can do something |
22:35
🔗
|
|
mistym has quit IRC (Remote host closed the connection) |
22:35
🔗
|
joepie91 |
RichardG: bad code is better than no code :) |
22:35
🔗
|
joepie91 |
everybody's code has some hacks |
22:36
🔗
|
joepie91 |
hell, probably half of the code behind the software you and I use on a daily basis has horrible hacks that somebody feels really ashamed for |
22:36
🔗
|
joepie91 |
that's no reason not to publish code! :P |
22:39
🔗
|
RichardG |
I'm commenting the thing at least a little bit. |
22:41
🔗
|
joepie91 |
okay, this is a new one... got an abusemail, responded that I wasn't going to follow up on it because no legal grounds, only to be met with a bounce |
22:41
🔗
|
joepie91 |
what |
22:41
🔗
|
joepie91 |
from a gmail address, too |
22:58
🔗
|
|
raylee has joined #archiveteam-bs |
23:23
🔗
|
|
mistym has joined #archiveteam-bs |
23:27
🔗
|
|
aaaaaaaaa has joined #archiveteam-bs |
23:40
🔗
|
|
RichardG_ has joined #archiveteam-bs |
23:40
🔗
|
RichardG |
joepie91: I kinda rushed this since I have to go mobile... https://github.com/richardg867/pastescraper |
23:41
🔗
|
joepie91 |
RichardG: will have a look at it soon |
23:41
🔗
|
joepie91 |
RichardG: as a loosely related aside; http://cryto.net/~joepie91/blog/2013/03/21/licensing-for-beginners/ |
23:41
🔗
|
joepie91 |
:P |
23:43
🔗
|
RichardG_ |
I just unlicense quick things like this, but I will license this, don't ya worry... I was just in doubt |
23:44
🔗
|
joepie91 |
:) |
23:56
🔗
|
|
Ravenloft has joined #archiveteam-bs |