[03:25] How can I browse a warc file, or unmodified wget output to make sure I get everything? [10:56] tsp_: 1. gunzip -c your.warc.gz | less [10:56] tsp_: 2. gunzip -c your.warc.gz | grep Target-URI [10:57] tsp_: 3. Or try this: https://github.com/alard/warc-proxy (although there seems to be a problem on Mac OS X) [11:45] i'm starting to upload my ce lifestyles magazines [21:11] http://archive.org/details/ybbot.com-20121106-mirror now in fire [22:35] * 405 #catgirls-in-cellophane You have joined too many channels [22:36] dammit [22:36] wat [22:36] woop woop woop off-topic siren [23:08] hmm. Someone should update past project pages on the wiki to include information on where to find the data. for example, where is the javascript MobileMe search thing that was hosted somewhere in an IA item? [23:08] oh [23:09] I found it. the only link was a small one in the infobox [23:24] I need to re-open the wiki. [23:25] Even closed, those spam accounts were amazing. [23:25] They were doing 4-5 month sleepers! [23:25] 4-5 months! [23:25] crazy [23:28] Something I've found works well for stopping bots is a simple ad hoc Javascript challenge [23:28] Give the client an expression to calculate [23:28] Could be trivially broken by someone who wanted to target your site, but stops most of the "passer-by" bots [23:29] until the bots start executing js [23:29] That day has yet to come in my experience [23:32] adafruit has an interesting captcha http://www.adafruit.com/blog/2012/11/06/the-worlds-smallest-nes-controller-2/ [23:34] wow [23:35] lots of room for innovation [23:35] ultra legit