#archiveteam 2013-08-10,Sat

↑back Search

Time Nickname Message
07:21 🔗 yipdw so I've put up a first cut of code to grab *.patch.com sites at https://github.com/ArchiveTeam/patch-grab (warning: doesn't upload yet)
07:22 🔗 yipdw I'm not convinced that one-site-per-work-item is a sane granularity, but on the other hand, I can't think of anything better
07:22 🔗 yipdw feel free to hack on it, etc.
07:23 🔗 yipdw http://quilt.at.ninjawedding.org/patchy has all patch.com sites identified by antomatic in http://archiveteam.org/index.php?title=List_of_Patch.com_sites
07:25 🔗 yipdw I'm testing the downloader on a few work items right now -- it's a very basic wget --mirror, more or less -- so we'll see how complete that is
09:50 🔗 antomatic yipdw: possibly each site could be further divided up into the similar subcategories - e.g. /directory, /news, /jobs, /blogs, /boards, etc - but going at it from the top level first of all seems like a good start to ensure nothing gets missed.
10:11 🔗 antomatic good luck! :)
23:16 🔗 yipdw antomatic: that sounds like a good idea; we can combine the individual sections later using megawarc etc
23:27 🔗 ATZ0 a highfive to you two. awesome that i can come in here and scream fire and you folks are running with it
23:32 🔗 yipdw ATZ0: btw, I did get HTTP 420'd
23:32 🔗 yipdw with --random-wait --wait 1
23:32 🔗 yipdw evidently more waiting is required
23:33 🔗 yipdw I'm not sure *when*, though, because I went to bed while the grab was running and I forgot to turn retry off on the pipeline

irclogger-viewer