[07:21] so I've put up a first cut of code to grab *.patch.com sites at https://github.com/ArchiveTeam/patch-grab (warning: doesn't upload yet) [07:22] I'm not convinced that one-site-per-work-item is a sane granularity, but on the other hand, I can't think of anything better [07:22] feel free to hack on it, etc. [07:23] http://quilt.at.ninjawedding.org/patchy has all patch.com sites identified by antomatic in http://archiveteam.org/index.php?title=List_of_Patch.com_sites [07:25] I'm testing the downloader on a few work items right now -- it's a very basic wget --mirror, more or less -- so we'll see how complete that is [09:50] yipdw: possibly each site could be further divided up into the similar subcategories - e.g. /directory, /news, /jobs, /blogs, /boards, etc - but going at it from the top level first of all seems like a good start to ensure nothing gets missed. [10:11] good luck! :) [23:16] antomatic: that sounds like a good idea; we can combine the individual sections later using megawarc etc [23:27] a highfive to you two. awesome that i can come in here and scream fire and you folks are running with it [23:32] ATZ0: btw, I did get HTTP 420'd [23:32] with --random-wait --wait 1 [23:32] evidently more waiting is required [23:33] I'm not sure *when*, though, because I went to bed while the grab was running and I forgot to turn retry off on the pipeline