| Time |
Nickname |
Message |
|
07:40
🔗
|
jd1001 |
irc://irc.fallen-irc.net/MOOXDCC |
|
10:30
🔗
|
Spirit_ |
can you download data from wikia in some good way? |
|
14:50
🔗
|
underscor |
I suppose |
|
14:50
🔗
|
underscor |
although it will be much slower |
|
14:50
🔗
|
underscor |
because here you're getting lan speed |
|
14:50
🔗
|
underscor |
so like 97Mbps |
|
14:51
🔗
|
underscor |
Er, oops |
|
14:51
🔗
|
underscor |
That was in the wrong box |
|
14:51
🔗
|
underscor |
My bad |
|
16:00
🔗
|
Spirit_ |
right, my robots downloader seems to work well. unattented for 5 days and my server is still alive |
|
16:00
🔗
|
Spirit_ |
do i have any remotely exploitable issues in https://github.com/ArchiveTeam/robots-relapse ? eg someone being able to forge a malicious robots.txt file |
|
17:30
🔗
|
bbot_ |
http://www.dotnetdotcom.org/ |
|
17:30
🔗
|
bbot_ |
I wonder what's in that 14 gigabyte torrent |
|
17:31
🔗
|
bbot_ |
ah, there's a sample index |
|
18:03
🔗
|
closure |
Spirit_: it looks safe, unless there is a way to make aria create arbitrarily named files.. you don't quote any filenames to guard against malicious ones |
|
18:04
🔗
|
closure |
but, I have to wonder why you're storing robots.txt files undiffed in sql. I would just check them into git.. that's the kind of thing git excells at |
|
18:05
🔗
|
Spirit_ |
that is one sexy idea |
|
18:06
🔗
|
closure |
then you can publish it to github, and people who use it can just pull whenever they want an update |
|
18:06
🔗
|
closure |
and you can git log y/yahoo.com |
|
18:07
🔗
|
Spirit_ |
it's gonna be a while until i have time to dive into this |
|
18:07
🔗
|
Spirit_ |
would it be easy to automate? |
|
18:07
🔗
|
Spirit_ |
i rarely use git myself |
|
18:08
🔗
|
soultcer |
Warning: Git does not store files diffs! And it will be very ineffective for your use case! |
|
18:08
🔗
|
closure |
the main problem that you will run into is that git will be a little bit slow committing a tree of a million files |
|
18:09
🔗
|
closure |
it delta compresses files efficiently in packs, you might need to turn up the auto.gc interval |
|
18:09
🔗
|
Spirit_ |
yeah, the gazillion of files are why i use sqlite |
|
18:10
🔗
|
Spirit_ |
i searched for a compressed deduplicating growing filesystem-in-a-single-file for a while |
|
18:11
🔗
|
soultcer |
ZFS in a loopback-device? |
|
18:26
🔗
|
Spirit_ |
soultcer: any hint how i could do that? i would need hours to read up and learn, maybe you know right away |
|
18:27
🔗
|
soultcer |
No idea, it was only half serious when I suggested it. |
|
18:29
🔗
|
Spirit_ |
:) |
|
18:46
🔗
|
ndurner |
Here's a working script that converts Youtube annotations to SRT: https://github.com/ndurner/AT-tools/tree/master/ann2srt |
|
19:02
🔗
|
SketchCow |
Hi, gang. |
|
19:02
🔗
|
SketchCow |
I am finally among the living again. |
|
19:02
🔗
|
SketchCow |
Welcome from NY |
|
20:10
🔗
|
alard |
SketchCow: Hi, I hope you had a good trip. |
|
20:11
🔗
|
alard |
There are two or three questions I'd like to ask about the WARC format. Who can I mail them to? |
|
20:22
🔗
|
SketchCow |
Kenji@archive.org |
|
20:22
🔗
|
SketchCow |
He's The Man when it comes to ingesting through WARC at archive.org. |
|
20:22
🔗
|
SketchCow |
Tell him I sent you, of course. |
|
20:24
🔗
|
alard |
Okay, thanks! |
|
20:29
🔗
|
SketchCow |
You're both geniuses, it'll work out. |
|
20:29
🔗
|
SketchCow |
It's a singularly important project. |
|
20:29
🔗
|
SketchCow |
It's also forced a bunch of issues for them. |
|
20:29
🔗
|
SketchCow |
Previously, they could sort of assume all items in the wayback were from them |
|
20:29
🔗
|
SketchCow |
Now they can't. |
|
20:29
🔗
|
SketchCow |
But they get more stuff |
|
23:14
🔗
|
dashcloud |
SketchCow: thanks for linking to this from your twitter account: http://bob-way.com |
|
23:15
🔗
|
SketchCow |
Yeah, great guy |
|
23:17
🔗
|
dashcloud |
so is it just the nostalgia or was it actually more exciting in that time frame? |
|
23:17
🔗
|
SketchCow |
Every time frame is exciting. |