Time |
Nickname |
Message |
07:21
🔗
|
yipdw |
so I've put up a first cut of code to grab *.patch.com sites at https://github.com/ArchiveTeam/patch-grab (warning: doesn't upload yet) |
07:22
🔗
|
yipdw |
I'm not convinced that one-site-per-work-item is a sane granularity, but on the other hand, I can't think of anything better |
07:22
🔗
|
yipdw |
feel free to hack on it, etc. |
07:23
🔗
|
yipdw |
http://quilt.at.ninjawedding.org/patchy has all patch.com sites identified by antomatic in http://archiveteam.org/index.php?title=List_of_Patch.com_sites |
07:25
🔗
|
yipdw |
I'm testing the downloader on a few work items right now -- it's a very basic wget --mirror, more or less -- so we'll see how complete that is |
09:50
🔗
|
antomatic |
yipdw: possibly each site could be further divided up into the similar subcategories - e.g. /directory, /news, /jobs, /blogs, /boards, etc - but going at it from the top level first of all seems like a good start to ensure nothing gets missed. |
10:11
🔗
|
antomatic |
good luck! :) |
23:16
🔗
|
yipdw |
antomatic: that sounds like a good idea; we can combine the individual sections later using megawarc etc |
23:27
🔗
|
ATZ0 |
a highfive to you two. awesome that i can come in here and scream fire and you folks are running with it |
23:32
🔗
|
yipdw |
ATZ0: btw, I did get HTTP 420'd |
23:32
🔗
|
yipdw |
with --random-wait --wait 1 |
23:32
🔗
|
yipdw |
evidently more waiting is required |
23:33
🔗
|
yipdw |
I'm not sure *when*, though, because I went to bed while the grab was running and I forgot to turn retry off on the pipeline |