[00:15] ah ok [00:57] Good day. Can I crawl a warc archive served via warc-proxy with httrack? I have troubles with forming a scan url. [07:53] I'm trying to sign up to the wiki. What's the secret word? [07:56] spiralofh: yahoosucks [07:58] Nemo_bis: Indeed it does. Signup complete, thanks. [07:59] yw [09:49] WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD [09:51] yahoosucks [09:54] hi how i can receive a secter word to complete reg? [09:55] REiN^: you just did, use it [09:59] thx [18:33] still on IRC break [18:33] but briefly popping in here [18:33] Justin.tv is baleeting all archived broadcasts in a week (!) [18:34] http://techcrunch.com/2014/06/01/justin-tv-to-kill-off-its-built-in-video-archiving-system/ [18:34] according to friend, youtube-dl has code for downloading said broadcasts [18:34] but it's quite a lot [18:34] joepie91: yep, we're on it [18:34] :) [18:37] WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD [18:37] oink: yahoosucks [18:37] ;-) [18:37] ty [18:38] exmic: awesome [18:38] :) [18:39] Gents... permission to tantrum? [18:40] Thanks. Let me get right to it.. [18:40] There are so many people turning up wanting to help, and finding that there's nothing to do. [18:40] Even before Justin there was a steady stream of people saying "why does nothing in the warrior work" [18:41] oh, yes, about that, I have some good news [18:41] Even 6 days away from the Justin deletions, all the work is pretty much happening behind closed doors. [18:41] People want to help and we're giving them nothing to do. [18:41] * antomatic listens. [18:41] I've quit my job, so assuming my fundraiser turns out well, I'd have a year long where I can help on archiveteam stuff where necessary [18:41] without stupid time constraints [18:41] dev-wise [18:41] :) [18:41] antomatic: sure, join #justouttv and help write the warrior job [18:42] but you're already there, ok [18:42] I wish I could write warrior code but alas, my kungfu is not quite that good. [18:42] yes, archiveteam doesn't feel quite as gangbusters as it did last year [18:42] I apprecaite it's rich to bitch about something that I can't actually execute myself, [18:42] heh, that's for sure [18:42] but I think the observation is hopefully at least partially valid [18:42] antomatic: I do this on a frequent basis, it still helps a lot [18:43] anyway, I'll probably have the time soon to really learn the specifics of warrior things [18:43] and write stuff for archiveteam [18:43] :D [18:43] * joepie91 skirts the -bs line [18:43] Hail Joepie, our saviour. :) [18:44] relatedly on-topic, my pastebin scraper has magically started working again [18:44] ??? [18:44] it was blocked for like 2 weeks, but apparently unblocked now, or something [18:44] idk the specifics, but paste dumps are showing up again [18:44] weird, but ok [18:44] where do you put them? [18:44] archive.org/details/pastebinpastes [18:45] it's a daily cron [18:45] good stuff [18:45] also relatedly, I've set up a PDF host that auto-mirrors all public docs to archive.org :D [18:45] I saw that, good stuff [18:45] http://pdf.yt/ / https://archive.org/details/pdfymirrors [18:45] it appears to be a pretty efficient sinkhole so far [18:46] any particular reason you're not also pastebin-scraping to warc? [18:46] because I'm running a custom scraping script that doesn't do warc [18:46] far be it from me to fault you for doing it [18:46] aye [18:46] it just grabs the paste contents and metadata [18:47] there's not much point to saving a WARC [18:47] https://github.com/odie5533/WarcProxy [18:47] it literally saves the /raw/ paste [18:47] not the paste page [18:47] sure [18:47] ah [18:47] I don't like relying on local proxies :) [18:47] fair enough [18:47] I still need to look into the Python WARC ecosystem [18:48] as well as the Node.js WARC ecosystem, since I've been dabbling in Node.js lately [18:48] hm, I haven't [18:48] currently busy porting the internetarchive module to Node.js [18:48] holy moly, what a #firehose [18:48] ersi: firehose? [18:48] THINGS & STUFF GOING ON IN #ARCHIVETEAM OMG [18:49] cool that your pastebinscraper started working again [18:49] lol [18:49] You're alive yeeeeeeeeeeee [18:49] yeah, idk what's up with that, either there's just a 1 in a million chance that it gets hit with a temp ban, or the pastebin guy is okay with it existing [18:49] monod: I am, just taking a prolonged break from IRC :) [18:49] I feel much better now :D [18:49] You're doing great! [18:49] Good luck and cya soon! ;) [18:50] monod: PM :P [18:50] you look good, joepie91! [20:12] http://pcasts.in/xT7Z has many great things to say about Archive Team after minute 20.