[07:40] irc://irc.fallen-irc.net/MOOXDCC [10:30] can you download data from wikia in some good way? [14:50] I suppose [14:50] although it will be much slower [14:50] because here you're getting lan speed [14:50] so like 97Mbps [14:51] Er, oops [14:51] That was in the wrong box [14:51] My bad [16:00] right, my robots downloader seems to work well. unattented for 5 days and my server is still alive [16:00] do i have any remotely exploitable issues in https://github.com/ArchiveTeam/robots-relapse ? eg someone being able to forge a malicious robots.txt file [17:30] http://www.dotnetdotcom.org/ [17:30] I wonder what's in that 14 gigabyte torrent [17:31] ah, there's a sample index [18:03] Spirit_: it looks safe, unless there is a way to make aria create arbitrarily named files.. you don't quote any filenames to guard against malicious ones [18:04] but, I have to wonder why you're storing robots.txt files undiffed in sql. I would just check them into git.. that's the kind of thing git excells at [18:05] that is one sexy idea [18:06] then you can publish it to github, and people who use it can just pull whenever they want an update [18:06] and you can git log y/yahoo.com [18:07] it's gonna be a while until i have time to dive into this [18:07] would it be easy to automate? [18:07] i rarely use git myself [18:08] Warning: Git does not store files diffs! And it will be very ineffective for your use case! [18:08] the main problem that you will run into is that git will be a little bit slow committing a tree of a million files [18:09] it delta compresses files efficiently in packs, you might need to turn up the auto.gc interval [18:09] yeah, the gazillion of files are why i use sqlite [18:10] i searched for a compressed deduplicating growing filesystem-in-a-single-file for a while [18:11] ZFS in a loopback-device? [18:26] soultcer: any hint how i could do that? i would need hours to read up and learn, maybe you know right away [18:27] No idea, it was only half serious when I suggested it. [18:29] :) [18:46] Here's a working script that converts Youtube annotations to SRT: https://github.com/ndurner/AT-tools/tree/master/ann2srt [19:02] Hi, gang. [19:02] I am finally among the living again. [19:02] Welcome from NY [20:10] SketchCow: Hi, I hope you had a good trip. [20:11] There are two or three questions I'd like to ask about the WARC format. Who can I mail them to? [20:22] Kenji@archive.org [20:22] He's The Man when it comes to ingesting through WARC at archive.org. [20:22] Tell him I sent you, of course. [20:24] Okay, thanks! [20:29] You're both geniuses, it'll work out. [20:29] It's a singularly important project. [20:29] It's also forced a bunch of issues for them. [20:29] Previously, they could sort of assume all items in the wayback were from them [20:29] Now they can't. [20:29] But they get more stuff [23:14] SketchCow: thanks for linking to this from your twitter account: http://bob-way.com [23:15] Yeah, great guy [23:17] so is it just the nostalgia or was it actually more exciting in that time frame? [23:17] Every time frame is exciting.