Time |
Nickname |
Message |
07:40
🔗
|
jd1001 |
irc://irc.fallen-irc.net/MOOXDCC |
10:30
🔗
|
Spirit_ |
can you download data from wikia in some good way? |
14:50
🔗
|
underscor |
I suppose |
14:50
🔗
|
underscor |
although it will be much slower |
14:50
🔗
|
underscor |
because here you're getting lan speed |
14:50
🔗
|
underscor |
so like 97Mbps |
14:51
🔗
|
underscor |
Er, oops |
14:51
🔗
|
underscor |
That was in the wrong box |
14:51
🔗
|
underscor |
My bad |
16:00
🔗
|
Spirit_ |
right, my robots downloader seems to work well. unattented for 5 days and my server is still alive |
16:00
🔗
|
Spirit_ |
do i have any remotely exploitable issues in https://github.com/ArchiveTeam/robots-relapse ? eg someone being able to forge a malicious robots.txt file |
17:30
🔗
|
bbot_ |
http://www.dotnetdotcom.org/ |
17:30
🔗
|
bbot_ |
I wonder what's in that 14 gigabyte torrent |
17:31
🔗
|
bbot_ |
ah, there's a sample index |
18:03
🔗
|
closure |
Spirit_: it looks safe, unless there is a way to make aria create arbitrarily named files.. you don't quote any filenames to guard against malicious ones |
18:04
🔗
|
closure |
but, I have to wonder why you're storing robots.txt files undiffed in sql. I would just check them into git.. that's the kind of thing git excells at |
18:05
🔗
|
Spirit_ |
that is one sexy idea |
18:06
🔗
|
closure |
then you can publish it to github, and people who use it can just pull whenever they want an update |
18:06
🔗
|
closure |
and you can git log y/yahoo.com |
18:07
🔗
|
Spirit_ |
it's gonna be a while until i have time to dive into this |
18:07
🔗
|
Spirit_ |
would it be easy to automate? |
18:07
🔗
|
Spirit_ |
i rarely use git myself |
18:08
🔗
|
soultcer |
Warning: Git does not store files diffs! And it will be very ineffective for your use case! |
18:08
🔗
|
closure |
the main problem that you will run into is that git will be a little bit slow committing a tree of a million files |
18:09
🔗
|
closure |
it delta compresses files efficiently in packs, you might need to turn up the auto.gc interval |
18:09
🔗
|
Spirit_ |
yeah, the gazillion of files are why i use sqlite |
18:10
🔗
|
Spirit_ |
i searched for a compressed deduplicating growing filesystem-in-a-single-file for a while |
18:11
🔗
|
soultcer |
ZFS in a loopback-device? |
18:26
🔗
|
Spirit_ |
soultcer: any hint how i could do that? i would need hours to read up and learn, maybe you know right away |
18:27
🔗
|
soultcer |
No idea, it was only half serious when I suggested it. |
18:29
🔗
|
Spirit_ |
:) |
18:46
🔗
|
ndurner |
Here's a working script that converts Youtube annotations to SRT: https://github.com/ndurner/AT-tools/tree/master/ann2srt |
19:02
🔗
|
SketchCow |
Hi, gang. |
19:02
🔗
|
SketchCow |
I am finally among the living again. |
19:02
🔗
|
SketchCow |
Welcome from NY |
20:10
🔗
|
alard |
SketchCow: Hi, I hope you had a good trip. |
20:11
🔗
|
alard |
There are two or three questions I'd like to ask about the WARC format. Who can I mail them to? |
20:22
🔗
|
SketchCow |
Kenji@archive.org |
20:22
🔗
|
SketchCow |
He's The Man when it comes to ingesting through WARC at archive.org. |
20:22
🔗
|
SketchCow |
Tell him I sent you, of course. |
20:24
🔗
|
alard |
Okay, thanks! |
20:29
🔗
|
SketchCow |
You're both geniuses, it'll work out. |
20:29
🔗
|
SketchCow |
It's a singularly important project. |
20:29
🔗
|
SketchCow |
It's also forced a bunch of issues for them. |
20:29
🔗
|
SketchCow |
Previously, they could sort of assume all items in the wayback were from them |
20:29
🔗
|
SketchCow |
Now they can't. |
20:29
🔗
|
SketchCow |
But they get more stuff |
23:14
🔗
|
dashcloud |
SketchCow: thanks for linking to this from your twitter account: http://bob-way.com |
23:15
🔗
|
SketchCow |
Yeah, great guy |
23:17
🔗
|
dashcloud |
so is it just the nostalgia or was it actually more exciting in that time frame? |
23:17
🔗
|
SketchCow |
Every time frame is exciting. |