#archiveteam 2013-08-10,Sat

Time	Nickname	Message
07:21 ^🔗	yipdw	so I've put up a first cut of code to grab *.patch.com sites at https://github.com/ArchiveTeam/patch-grab (warning: doesn't upload yet)
07:22 ^🔗	yipdw	I'm not convinced that one-site-per-work-item is a sane granularity, but on the other hand, I can't think of anything better
07:22 ^🔗	yipdw	feel free to hack on it, etc.
07:23 ^🔗	yipdw	http://quilt.at.ninjawedding.org/patchy has all patch.com sites identified by antomatic in http://archiveteam.org/index.php?title=List_of_Patch.com_sites
07:25 ^🔗	yipdw	I'm testing the downloader on a few work items right now -- it's a very basic wget --mirror, more or less -- so we'll see how complete that is
09:50 ^🔗	antomatic	yipdw: possibly each site could be further divided up into the similar subcategories - e.g. /directory, /news, /jobs, /blogs, /boards, etc - but going at it from the top level first of all seems like a good start to ensure nothing gets missed.
10:11 ^🔗	antomatic	good luck! :)
23:16 ^🔗	yipdw	antomatic: that sounds like a good idea; we can combine the individual sections later using megawarc etc
23:27 ^🔗	ATZ0	a highfive to you two. awesome that i can come in here and scream fire and you folks are running with it
23:32 ^🔗	yipdw	ATZ0: btw, I did get HTTP 420'd
23:32 ^🔗	yipdw	with --random-wait --wait 1
23:32 ^🔗	yipdw	evidently more waiting is required
23:33 ^🔗	yipdw	I'm not sure when, though, because I went to bed while the grab was running and I forgot to turn retry off on the pipeline