#archiveteam 2014-06-02,Mon

↑back Search

Time Nickname Message
00:15 🔗 lemonkey ah ok
00:57 🔗 Warchell Good day. Can I crawl a warc archive served via warc-proxy with httrack? I have troubles with forming a scan url.
07:53 🔗 spiralofh I'm trying to sign up to the wiki. What's the secret word?
07:56 🔗 Nemo_bis spiralofh: yahoosucks
07:58 🔗 spiralofh Nemo_bis: Indeed it does. Signup complete, thanks.
07:59 🔗 Nemo_bis yw
09:51 🔗 Nemo_bis yahoosucks
09:54 🔗 REiN^ hi how i can receive a secter word to complete reg?
09:55 🔗 Nemo_bis REiN^: you just did, use it
09:59 🔗 REiN^ thx
18:33 🔗 joepie91 still on IRC break
18:33 🔗 joepie91 but briefly popping in here
18:33 🔗 joepie91 Justin.tv is baleeting all archived broadcasts in a week (!)
18:34 🔗 joepie91 http://techcrunch.com/2014/06/01/justin-tv-to-kill-off-its-built-in-video-archiving-system/
18:34 🔗 joepie91 according to friend, youtube-dl has code for downloading said broadcasts
18:34 🔗 joepie91 but it's quite a lot
18:34 🔗 exmic joepie91: yep, we're on it
18:34 🔗 exmic :)
18:37 🔗 antomatic oink: yahoosucks
18:37 🔗 ersi ;-)
18:37 🔗 oink ty
18:38 🔗 joepie91 exmic: awesome
18:38 🔗 joepie91 :)
18:39 🔗 antomatic Gents... permission to tantrum?
18:40 🔗 antomatic Thanks. Let me get right to it..
18:40 🔗 antomatic There are so many people turning up wanting to help, and finding that there's nothing to do.
18:40 🔗 antomatic Even before Justin there was a steady stream of people saying "why does nothing in the warrior work"
18:41 🔗 joepie91 oh, yes, about that, I have some good news
18:41 🔗 antomatic Even 6 days away from the Justin deletions, all the work is pretty much happening behind closed doors.
18:41 🔗 antomatic People want to help and we're giving them nothing to do.
18:41 🔗 * antomatic listens.
18:41 🔗 joepie91 I've quit my job, so assuming my fundraiser turns out well, I'd have a year long where I can help on archiveteam stuff where necessary
18:41 🔗 joepie91 without stupid time constraints
18:41 🔗 joepie91 dev-wise
18:41 🔗 joepie91 :)
18:41 🔗 exmic antomatic: sure, join #justouttv and help write the warrior job
18:42 🔗 exmic but you're already there, ok
18:42 🔗 antomatic I wish I could write warrior code but alas, my kungfu is not quite that good.
18:42 🔗 exmic yes, archiveteam doesn't feel quite as gangbusters as it did last year
18:42 🔗 antomatic I apprecaite it's rich to bitch about something that I can't actually execute myself,
18:42 🔗 exmic heh, that's for sure
18:42 🔗 antomatic but I think the observation is hopefully at least partially valid
18:42 🔗 joepie91 antomatic: I do this on a frequent basis, it still helps a lot
18:43 🔗 joepie91 anyway, I'll probably have the time soon to really learn the specifics of warrior things
18:43 🔗 joepie91 and write stuff for archiveteam
18:43 🔗 joepie91 :D
18:43 🔗 * joepie91 skirts the -bs line
18:43 🔗 antomatic Hail Joepie, our saviour. :)
18:44 🔗 joepie91 relatedly on-topic, my pastebin scraper has magically started working again
18:44 🔗 joepie91 ???
18:44 🔗 joepie91 it was blocked for like 2 weeks, but apparently unblocked now, or something
18:44 🔗 joepie91 idk the specifics, but paste dumps are showing up again
18:44 🔗 exmic weird, but ok
18:44 🔗 exmic where do you put them?
18:44 🔗 joepie91 archive.org/details/pastebinpastes
18:45 🔗 joepie91 it's a daily cron
18:45 🔗 exmic good stuff
18:45 🔗 joepie91 also relatedly, I've set up a PDF host that auto-mirrors all public docs to archive.org :D
18:45 🔗 exmic I saw that, good stuff
18:45 🔗 joepie91 http://pdf.yt/ / https://archive.org/details/pdfymirrors
18:45 🔗 joepie91 it appears to be a pretty efficient sinkhole so far
18:46 🔗 exmic any particular reason you're not also pastebin-scraping to warc?
18:46 🔗 joepie91 because I'm running a custom scraping script that doesn't do warc
18:46 🔗 exmic far be it from me to fault you for doing it
18:46 🔗 exmic aye
18:46 🔗 joepie91 it just grabs the paste contents and metadata
18:47 🔗 joepie91 there's not much point to saving a WARC
18:47 🔗 exmic https://github.com/odie5533/WarcProxy
18:47 🔗 joepie91 it literally saves the /raw/ paste
18:47 🔗 joepie91 not the paste page
18:47 🔗 exmic sure
18:47 🔗 exmic ah
18:47 🔗 joepie91 I don't like relying on local proxies :)
18:47 🔗 exmic fair enough
18:47 🔗 joepie91 I still need to look into the Python WARC ecosystem
18:48 🔗 joepie91 as well as the Node.js WARC ecosystem, since I've been dabbling in Node.js lately
18:48 🔗 exmic hm, I haven't
18:48 🔗 joepie91 currently busy porting the internetarchive module to Node.js
18:48 🔗 ersi holy moly, what a #firehose
18:48 🔗 joepie91 ersi: firehose?
18:49 🔗 ersi cool that your pastebinscraper started working again
18:49 🔗 joepie91 lol
18:49 🔗 monod You're alive yeeeeeeeeeeee
18:49 🔗 joepie91 yeah, idk what's up with that, either there's just a 1 in a million chance that it gets hit with a temp ban, or the pastebin guy is okay with it existing
18:49 🔗 joepie91 monod: I am, just taking a prolonged break from IRC :)
18:49 🔗 monod I feel much better now :D
18:49 🔗 monod You're doing great!
18:49 🔗 monod Good luck and cya soon! ;)
18:50 🔗 joepie91 monod: PM :P
18:50 🔗 exmic you look good, joepie91!
20:12 🔗 SketchCow http://pcasts.in/xT7Z has many great things to say about Archive Team after minute 20.
