#archiveteam 2011-07-10,Sun

↑back Search

Time Nickname Message
07:40 🔗 jd1001 irc://irc.fallen-irc.net/MOOXDCC
10:30 🔗 Spirit_ can you download data from wikia in some good way?
14:50 🔗 underscor I suppose
14:50 🔗 underscor although it will be much slower
14:50 🔗 underscor because here you're getting lan speed
14:50 🔗 underscor so like 97Mbps
14:51 🔗 underscor Er, oops
14:51 🔗 underscor That was in the wrong box
14:51 🔗 underscor My bad
16:00 🔗 Spirit_ right, my robots downloader seems to work well. unattented for 5 days and my server is still alive
16:00 🔗 Spirit_ do i have any remotely exploitable issues in https://github.com/ArchiveTeam/robots-relapse ? eg someone being able to forge a malicious robots.txt file
17:30 🔗 bbot_ http://www.dotnetdotcom.org/
17:30 🔗 bbot_ I wonder what's in that 14 gigabyte torrent
17:31 🔗 bbot_ ah, there's a sample index
18:03 🔗 closure Spirit_: it looks safe, unless there is a way to make aria create arbitrarily named files.. you don't quote any filenames to guard against malicious ones
18:04 🔗 closure but, I have to wonder why you're storing robots.txt files undiffed in sql. I would just check them into git.. that's the kind of thing git excells at
18:05 🔗 Spirit_ that is one sexy idea
18:06 🔗 closure then you can publish it to github, and people who use it can just pull whenever they want an update
18:06 🔗 closure and you can git log y/yahoo.com
18:07 🔗 Spirit_ it's gonna be a while until i have time to dive into this
18:07 🔗 Spirit_ would it be easy to automate?
18:07 🔗 Spirit_ i rarely use git myself
18:08 🔗 soultcer Warning: Git does not store files diffs! And it will be very ineffective for your use case!
18:08 🔗 closure the main problem that you will run into is that git will be a little bit slow committing a tree of a million files
18:09 🔗 closure it delta compresses files efficiently in packs, you might need to turn up the auto.gc interval
18:09 🔗 Spirit_ yeah, the gazillion of files are why i use sqlite
18:10 🔗 Spirit_ i searched for a compressed deduplicating growing filesystem-in-a-single-file for a while
18:11 🔗 soultcer ZFS in a loopback-device?
18:26 🔗 Spirit_ soultcer: any hint how i could do that? i would need hours to read up and learn, maybe you know right away
18:27 🔗 soultcer No idea, it was only half serious when I suggested it.
18:29 🔗 Spirit_ :)
18:46 🔗 ndurner Here's a working script that converts Youtube annotations to SRT: https://github.com/ndurner/AT-tools/tree/master/ann2srt
19:02 🔗 SketchCow Hi, gang.
19:02 🔗 SketchCow I am finally among the living again.
19:02 🔗 SketchCow Welcome from NY
20:10 🔗 alard SketchCow: Hi, I hope you had a good trip.
20:11 🔗 alard There are two or three questions I'd like to ask about the WARC format. Who can I mail them to?
20:22 🔗 SketchCow Kenji@archive.org
20:22 🔗 SketchCow He's The Man when it comes to ingesting through WARC at archive.org.
20:22 🔗 SketchCow Tell him I sent you, of course.
20:24 🔗 alard Okay, thanks!
20:29 🔗 SketchCow You're both geniuses, it'll work out.
20:29 🔗 SketchCow It's a singularly important project.
20:29 🔗 SketchCow It's also forced a bunch of issues for them.
20:29 🔗 SketchCow Previously, they could sort of assume all items in the wayback were from them
20:29 🔗 SketchCow Now they can't.
20:29 🔗 SketchCow But they get more stuff
23:14 🔗 dashcloud SketchCow: thanks for linking to this from your twitter account: http://bob-way.com
23:15 🔗 SketchCow Yeah, great guy
23:17 🔗 dashcloud so is it just the nostalgia or was it actually more exciting in that time frame?
23:17 🔗 SketchCow Every time frame is exciting.

irclogger-viewer