#archiveteam 2012-01-18,Wed

โ†‘back Search

Time Nickname Message
00:02 ๐Ÿ”— yipdw so
00:02 ๐Ÿ”— yipdw I'm gonna start archiving all news stories re: Wikipedia blackout
00:06 ๐Ÿ”— NovaKing http://www.cbsnews.com/8301-205_162-57360291/google-plans-to-use-home-page-to-protest-sopa/
01:16 ๐Ÿ”— godane some guy from luxembourg is uploading old glenn beck episodes at full speed
01:16 ๐Ÿ”— godane :-D
01:16 ๐Ÿ”— godane i want to hug the guy
01:17 ๐Ÿ”— balrog LOL
01:17 ๐Ÿ”— balrog why, you like glenn beck, or you like the archival value of them?
01:17 ๐Ÿ”— balrog xD
01:19 ๐Ÿ”— godane just what to archive thme
01:19 ๐Ÿ”— godane *them
01:19 ๐Ÿ”— nitro2k01 Glenn Beck is my favorite comedian
01:19 ๐Ÿ”— godane there is like 6 months of it
01:20 ๐Ÿ”— dashcloud if you're going to allow spam videos to get uploaded, it's kind of hard to discriminate against any particular variety
01:21 ๐Ÿ”— godane this luxembourg guy is now seeding 4 episodes
01:22 ๐Ÿ”— godane this is just crasy
01:22 ๐Ÿ”— godane it use to be like dialup speeds before
01:23 ๐Ÿ”— godane also look up george soros
02:41 ๐Ÿ”— tef yipdw: fwiw, warc2warc doesn't truncate the records now at least. going to put the wget workaround in now as an option
03:05 ๐Ÿ”— tef yipdw: python warc2warc.py -D -Z --wget-chunk-fix foo.warc.gz > bar.warc.gz
03:05 ๐Ÿ”— tef should decompress (if needed), decode each http message (removing content-encoding/transfer-encoding), deal with the corrupt wget output, and recompress it record by record
03:13 ๐Ÿ”— tef hoooray!
03:14 ๐Ÿ”— Ymgve tef: are you the tef that's on somethingawful?
03:14 ๐Ÿ”— tef yes
03:14 ๐Ÿ”— Ymgve oh!
03:14 ๐Ÿ”— Ymgve *wave*
03:14 ๐Ÿ”— tef I was going to ask if you were the Ymgve on sa
03:14 ๐Ÿ”— tef I like your av :3 if it's the cos(...) one iirc
03:15 ๐Ÿ”— Ymgve I'm basically the only Ymgve on the internet, actually picked the name mainly for uniqueness
03:15 ๐Ÿ”— Ymgve what about my new mandelbrot avatar?
03:16 ๐Ÿ”— Ymgve surprised it didn't exceed the size limits
03:16 ๐Ÿ”— tef ah yes that owns
03:16 ๐Ÿ”— tef I have a thing for fractals
03:17 ๐Ÿ”— tef I wrote a js1k which uses a hilbert curve to draw a mandelbrot
03:17 ๐Ÿ”— tef http://secretvolcanobase.org/~tef/js1k.html
03:17 ๐Ÿ”— tef this makes me so happy
03:18 ๐Ÿ”— Ymgve hah, didn't know you could do that, cool!
03:28 ๐Ÿ”— tef well I use it for enqueing progressive rendering of the fractal at higher resolutions
03:29 ๐Ÿ”— tef so I draw larger and larger hilbert curves and render the points as blocks until it's ~ 1px small
04:30 ๐Ÿ”— yipdw tef: awesome, thanks
04:30 ๐Ÿ”— underscor ^รƒย‚รƒยŠรƒยŽรƒย”รƒย›รƒยขรƒยชรƒยฎรƒยดรƒยปร„ยˆร„ย‰ร„ยœร„ยร„ยคร„ยฅร„ยดร„ยตร…ยœร…ยร…ยดร…ยตร…ยถร…ยทร‹ย†รŒยญรŒย‚รกยทยรกยธย’รกยธย“รกยธย˜รกยธย™รกยธยผรกยธยฝรกยนยŠรกยนย‹รกยนยฐรกยนยฑรกยนยถรกยนยทรกยบยรกยบย‘รกยบยครกยบยฅรกยบยฆรกยบยงรกยบยจรกยบยฉรกยบยชรกยบยซรกยบยฌรกยบยญรกยบยพรกยบยฟรกยปย€รกยปยรกยปย‚รกยปยƒรกยปย„รกยปย…รกยปย†รกยปย‡รกยปยรกยปย‘รกยปย’รกยปย“รกยปย”รกยปย•รกยปย–รกยปย—รกยปย˜รกยปย™รขยจยฃรขยจยถรขยฉยฏรชยžยˆรฏยผยพรณย ยยž
04:32 ๐Ÿ”— yipdw hmm
04:32 ๐Ÿ”— yipdw about 1200 SOPA/Wikipedia stories coming in
04:42 ๐Ÿ”— closure yipdw: are you going to be warcing any sites like wikipedia? Already has a pretty historic banner up
04:43 ๐Ÿ”— closure I read some google person encouraging sites to make their blackout be a 403, or some such code, so it's *not* archived.
04:43 ๐Ÿ”— yipdw closure: already did
04:43 ๐Ÿ”— yipdw well
04:43 ๐Ÿ”— yipdw I got their announcement at least
04:44 ๐Ÿ”— yipdw I guess I could grab en.wikipedia.org/wiki/Main_Page or something
04:44 ๐Ÿ”— closure they've depolyed their face of jimbo banner technology for good, at last :P
04:44 ๐Ÿ”— yipdw heh
04:45 ๐Ÿ”— yipdw ok, I got Main Page
04:45 ๐Ÿ”— closure https://plus.google.com/115984868678744352358/posts/Gas8vjZ5fmB
04:45 ๐Ÿ”— yipdw I guess I can grab the blackout too
04:46 ๐Ÿ”— closure so he suggested a 503 code
04:49 ๐Ÿ”— yipdw heh
04:49 ๐Ÿ”— yipdw seems reasonable
04:49 ๐Ÿ”— yipdw I just went looking for other status codes...on Wikipedia
04:49 ๐Ÿ”— yipdw CAN'T DO THAT IN TEN MINUTES
04:49 ๐Ÿ”— closure aside from not being archived
04:49 ๐Ÿ”— closure http://en.wikipedia.org/wiki/List_of_HTTP_status_codes
04:50 ๐Ÿ”— closure remember, in 10 minutes, es.wikipedia.org will still be there :)
04:50 ๐Ÿ”— yipdw http://es.wikipedia.org/wiki/Anexo:C%C3%B3digos_de_estado_HTTP
04:50 ๐Ÿ”— yipdw sweet
04:55 ๐Ÿ”— tef hee
04:55 ๐Ÿ”— tef just memorize the http status cats
04:57 ๐Ÿ”— tef http://www.flickr.com/photos/girliemac/6509400997/
04:59 ๐Ÿ”— closure http://tricities.craigslist.org/ is already black
04:59 ๐Ÿ”— closure hmm, they leave a link to the real site
05:00 ๐Ÿ”— closure woot, wikipedia is black
05:00 ๐Ÿ”— closure really nice design too
05:01 ๐Ÿ”— underscor Oh, it's just an overlay trick
05:01 ๐Ÿ”— closure heh, their Learn More link is buggy.. it links to a wiki page that's blacked out :)
05:01 ๐Ÿ”— bsmith093 wikipedia just went dark check it out
05:02 ๐Ÿ”— Coderjoe closure: hahaha
05:02 ๐Ÿ”— closure yeah, overlay is a good way to do it, although you do see the real page flash
05:03 ๐Ÿ”— Coderjoe I think they need to tweak their overlay JS (or whatever) to not show on that page
05:03 ๐Ÿ”— closure yep
05:03 ๐Ÿ”— closure or they could link to the eff's page
05:03 ๐Ÿ”— tef lol the learn more link
05:03 ๐Ÿ”— tef classy
05:03 ๐Ÿ”— closure http://www.fsf.org/ black
05:04 ๐Ÿ”— tef hmmm
05:04 ๐Ÿ”— tef I can make warcs with the crawler from work.
05:04 ๐Ÿ”— Coderjoe fixed
05:04 ๐Ÿ”— closure and has a title that's binary for some reason, heh
05:04 ๐Ÿ”— closure 2012-01-18 01:04:52 ERROR 503: Service Temporarily Unavailable.
05:04 ๐Ÿ”— closure that's wget..
05:05 ๐Ÿ”— closure wonder what's the switch to archive despite failing status code
05:05 ๐Ÿ”— closure heh, wikipedia fixed their link :)
05:08 ๐Ÿ”— Coderjoe woot
05:08 ๐Ÿ”— Coderjoe I blocked the blackout JS. I can now use the english wikipedia just fine
05:09 ๐Ÿ”— Coderjoe (since I already know about sopa and have written my critters about it, I don't think I need to see the blackout overlay)
05:09 ๐Ÿ”— closure already lead story on CNN
05:10 ๐Ÿ”— tef well I have a warc of wikipedia.org's front page now
05:10 ๐Ÿ”— tef should have all the css/js too
05:10 ๐Ÿ”— Coderjoe as a plus, I probably can modify this to block the jimboface banners, too
05:11 ๐Ÿ”— yipdw tef: hey me too! :)
05:11 ๐Ÿ”— Coderjoe (I added this to my adblock filter list, with "at start" anchoring: http://meta.wikimedia.org/w/index.php*title=Special:BannerLoader*banner=blackout )
05:11 ๐Ÿ”— yipdw I do wish that they really just turned off the wiki software, or set up a redirect
05:12 ๐Ÿ”— yipdw but maybe they just couldn't do that across all their app servers
05:12 ๐Ÿ”— closure you can also just browse it in w3m :P
05:13 ๐Ÿ”— yipdw I am hardcore and browse it via gunzip -c *.warc.gz
05:13 ๐Ÿ”— closure oooh, nice on google
05:13 ๐Ÿ”— yipdw haha
05:13 ๐Ÿ”— closure enormous censorship bar of dooom
05:13 ๐Ÿ”— closure best google logo evar
05:13 ๐Ÿ”— yipdw time to grab that too
05:14 ๐Ÿ”— tef yipdw: I caught http://pastebin.com/NWiV4yEV
05:15 ๐Ÿ”— Coderjoe I wonder if they are blocking the english site on the secure site
05:16 ๐Ÿ”— yipdw tef: https://gist.github.com/7d787f4487a80b354d8f
05:16 ๐Ÿ”— yipdw come to think of it
05:16 ๐Ÿ”— yipdw I'm not sure I got what I actually went for
05:16 ๐Ÿ”— yipdw one moment
05:17 ๐Ÿ”— yipdw oh
05:17 ๐Ÿ”— yipdw I forgot to turn off robots.txt obey stuff
05:18 ๐Ÿ”— tef weird
05:18 ๐Ÿ”— tef you didn't get http://upload.wikimedia.org/wikipedia/commons/9/98/WP_SOPA_Splash_Full.jpg
05:18 ๐Ÿ”— yipdw yeah
05:18 ๐Ÿ”— tef hooray javascript?
05:18 ๐Ÿ”— closure http://thepiratebay.org/ heh, I didn't see that blackout coming
05:18 ๐Ÿ”— yipdw I don't think wget's seeing it
05:19 ๐Ÿ”— yipdw but whatever, you got a copy
05:19 ๐Ÿ”— tef grabbed
05:19 ๐Ÿ”— yipdw guess I'll grab www.google.com and the landing page
05:19 ๐Ÿ”— yipdw might have better luck there
05:21 ๐Ÿ”— tef i'm sorta glad the thing I made for work is capturing more :3 </smug>
05:22 ๐Ÿ”— PatC_ yay archiving
05:23 ๐Ÿ”— yipdw tef: haha
05:24 ๐Ÿ”— closure http://thedailywtf.com/Articles/Support-The-Daily-WTF-in-Supporting-the-Support-SOPA-Movement.aspx
05:24 ๐Ÿ”— closure wins IMHO by dragging BBSes into this
05:26 ๐Ÿ”— yipdw tef: can your software include embedded videos?
05:26 ๐Ÿ”— tef sorta
05:26 ๐Ÿ”— yipdw there's a few on https://www.google.com/landing/takeaction/sopa-pipa/ that I'm trying to get wget to somehow get
05:26 ๐Ÿ”— yipdw but without knowledge of YouTube's system I don't think it can really do it
05:27 ๐Ÿ”— closure http://www.metafilter.com/ black
05:27 ๐Ÿ”— yipdw maybe we should make a list
05:28 ๐Ÿ”— tef pls!
05:28 ๐Ÿ”— arrith yipdw: piratepad or the wiki?
05:28 ๐Ÿ”— closure http://boingboing.net/ black
05:29 ๐Ÿ”— tef recommend a pad over wiki while it's still being updated
05:29 ๐Ÿ”— yipdw http://archiveteam.org/index.php?title=SOPA_blackout_pages
05:29 ๐Ÿ”— yipdw works here
05:30 ๐Ÿ”— closure if you like edit conflicts and captchas
05:30 ๐Ÿ”— yipdw FINE
05:31 ๐Ÿ”— closure wow, check view source on boing boing
05:31 ๐Ÿ”— yipdw http://piratepad.net/I42hnyc0zk
05:33 ๐Ÿ”— yipdw blah, I just got disconnected
05:33 ๐Ÿ”— closure constantly
05:33 ๐Ÿ”— arrith i think there are other sites that run the etherpad sw
05:33 ๐Ÿ”— arrith just i don't know of any of them off the top of my head
05:34 ๐Ÿ”— yipdw http://beta.etherpad.org/p/KDVzcBCKTj
05:35 ๐Ÿ”— PatC_ 4chan is sensored too
05:35 ๐Ÿ”— closure go congress
05:36 ๐Ÿ”— closure no it's not
05:36 ๐Ÿ”— PatC_ go to any section
05:36 ๐Ÿ”— PatC_ ./g/ is
05:36 ๐Ÿ”— PatC_ all the text is black until you mouse over it
05:36 ๐Ÿ”— closure oh, i see.. I actually have to eeek
05:36 ๐Ÿ”— closure censoring the text on 4chan is like.. like.. words fail me
05:39 ๐Ÿ”— NovaKing http://wordpress.com/
05:40 ๐Ÿ”— tef can we drop the * thing
05:40 ๐Ÿ”— tef it is hard to copy paste the urls out
05:40 ๐Ÿ”— tef and dump them into a script :v
05:40 ๐Ÿ”— closure lol
05:40 ๐Ÿ”— yipdw we can
05:41 ๐Ÿ”— tef awesome
05:42 ๐Ÿ”— yipdw oh sadface
05:42 ๐Ÿ”— yipdw wget doesn't retrieve page requisites if retrieval of a URL returns 503
05:42 ๐Ÿ”— yipdw I wonder if I can force it to follow
05:42 ๐Ÿ”— NovaKing http://sopastrike.com/on-strike/
05:42 ๐Ÿ”— yipdw don't really want to write custom tools to do this if tef already has them :P
05:44 ๐Ÿ”— closure the sopastrike page is crashing chromium
05:44 ๐Ÿ”— closure ok.. that's a list
05:44 ๐Ÿ”— NovaKing it's not all though
05:44 ๐Ÿ”— tef yeah but my tools are well, flakey
05:44 ๐Ÿ”— closure indeed not
05:45 ๐Ÿ”— yipdw haha
05:45 ๐Ÿ”— yipdw "
05:45 ๐Ÿ”— yipdw "Chromium's connection attempt to www.twitter.com was rejected. The website may be down, or your network may not be properly configured.
05:45 ๐Ÿ”— yipdw FUCK BLACKOUT, WE'RE GOING OFFLINE
05:46 ๐Ÿ”— yipdw (probably just my link though)
05:47 ๐Ÿ”— yipdw oh wait there it goes
05:47 ๐Ÿ”— arrith yeah twitter's ceo guy was being a dbag about the sopa stuff
05:47 ๐Ÿ”— arrith calling it idiotic or something
05:48 ๐Ÿ”— yipdw nah, he was just referring to Twitter specifically
05:48 ๐Ÿ”— arrith twitter isn't going down for anything but mismanagement
05:48 ๐Ÿ”— arrith well, twitter not going down for a day? wikipedia is effectively
05:48 ๐Ÿ”— yipdw I meant his tweet referred to Twitter going offline as foolish, not Wikipedia or anyone else
05:49 ๐Ÿ”— arrith ah yeah
05:50 ๐Ÿ”— yipdw <meta name="googlebot" content="noarchive"/>
05:50 ๐Ÿ”— yipdw <title>Why Wikipedia went down at midnight - CNN.com</title>
05:51 ๐Ÿ”— arrith yipdw: Imgur Tor Project Miro iSchool at Syracuse University Oreilly.com Wikipedia Reddit Mozilla WordPress.org icanhazCheezburger Network MoveOn.org Good Old Games TwitPic Minecraft Free Press Mojang XDA Developers Destructoid Good.is
05:52 ๐Ÿ”— closure lol
05:52 ๐Ÿ”— closure we'll report on it.. but it didn't happen
05:52 ๐Ÿ”— arrith at least twitpic is i gues
05:56 ๐Ÿ”— yipdw arrith: ?
05:56 ๐Ÿ”— arrith yipdw: those sites were said to be doing a blackout thing
05:56 ๐Ÿ”— yipdw oh
05:56 ๐Ÿ”— yipdw add them to the list
06:00 ๐Ÿ”— tef ahahah I had a stupid bug
06:02 ๐Ÿ”— NovaKing twitpic only doing logo by looks of it
06:02 ๐Ÿ”— yipdw aww sonuvabitch
06:02 ๐Ÿ”— yipdw http://savannah.gnu.org/bugs/index.php?20417
06:02 ๐Ÿ”— NovaKing i'm only doing logo on my site
06:02 ๐Ÿ”— yipdw that bug deals *directly* with wget's 503 behavior
06:02 ๐Ÿ”— yipdw and I can't access it
06:02 ๐Ÿ”— yipdw ...
06:02 ๐Ÿ”— yipdw fuck.
06:04 ๐Ÿ”— dan_ omg omg i hope you guys archived wikipedia because they took it down *ducks*
06:04 ๐Ÿ”— closure nice
06:04 ๐Ÿ”— NovaKing ps: ?banner=0
06:04 ๐Ÿ”— NovaKing will take off the banner
06:07 ๐Ÿ”— SDr__ hey guys!
06:07 ๐Ÿ”— SDr__ you... are... awesome!
06:09 ๐Ÿ”— Coderjoe NovaKing: please provide a full url that works for that bug page?
06:10 ๐Ÿ”— yipdw yeah, I can't get to it
06:10 ๐Ÿ”— NovaKing eh? sorry, that was in relation to wikipedia
06:11 ๐Ÿ”— NovaKing as for 503
06:11 ๐Ÿ”— yipdw oh
06:11 ๐Ÿ”— NovaKing that is due to google crawlers
06:11 ๐Ÿ”— NovaKing to not index the blackout page
06:11 ๐Ÿ”— NovaKing you might have to get wget source, and remove the 503 section
06:12 ๐Ÿ”— yipdw there isn't one
06:12 ๐Ÿ”— closure it's probably the generic failure code handling code
06:13 ๐Ÿ”— NovaKing there isn't one?
06:13 ๐Ÿ”— NovaKing ...?
06:13 ๐Ÿ”— yipdw I cannot find a failure handler that deals with HTTP 503, no
06:14 ๐Ÿ”— yipdw there may be a generic one
06:14 ๐Ÿ”— yipdw I'm trying to find it
06:15 ๐Ÿ”— yipdw in fact, here's something weird in wget trunk:
06:15 ๐Ÿ”— yipdw $ grep -nR HTTP_STATUS_UNAVAILABLE *
06:15 ๐Ÿ”— yipdw src/http.c:131:#define HTTP_STATUS_UNAVAILABLE 503
06:15 ๐Ÿ”— yipdw that's it
06:15 ๐Ÿ”— yipdw it's not used anywhere else
06:15 ๐Ÿ”— yipdw well, at least not straight off
06:16 ๐Ÿ”— yipdw could be some preprocessor tricks
06:17 ๐Ÿ”— closure I think it's only there for completeness
06:17 ๐Ÿ”— closure I was in the same place, didn't get any further
06:21 ๐Ÿ”— underscor anyone having trouble connecting to twitter?
06:21 ๐Ÿ”— underscor I thought they weren't participating
06:21 ๐Ÿ”— closure they participate at random thruought the year
06:21 ๐Ÿ”— underscor hahah
06:21 ๐Ÿ”— closure in total it comes to a day anyway
06:22 ๐Ÿ”— closure https://twitter.com/herpderpedia lolz
06:24 ๐Ÿ”— Coderjoe worksforme
06:24 ๐Ÿ”— Coderjoe but they might be buckling under the "OMG SHIT IS DOWN" posts
06:25 ๐Ÿ”— yipdw twitter can't go offline
06:25 ๐Ÿ”— yipdw my mom told me they were a cloud and clouds never go offline
06:25 ๐Ÿ”— underscor hahahah
06:26 ๐Ÿ”— Coderjoe yeah
06:26 ๐Ÿ”— Coderjoe it's the load of #wikipedia
06:26 ๐Ÿ”— yipdw damn
06:27 ๐Ÿ”— yipdw well
06:27 ๐Ÿ”— underscor Those comments give me no hope for humanity
06:27 ๐Ÿ”— yipdw it's probably a good thing that Twitter dumps its tweets in the LoC
06:27 ๐Ÿ”— yipdw because we have no hope of actually archiving any of that, short of a direct feed to Twitter's message brokers
06:27 ๐Ÿ”— underscor haha
06:28 ๐Ÿ”— yipdw mostly because the Javascript on New Twitter makes it such a fucking bear to wrestle
06:28 ๐Ÿ”— Coderjoe 19 new tweets in the last minute
06:28 ๐Ÿ”— Coderjoe 36 in 2m
06:29 ๐Ÿ”— Coderjoe kinda surprised by all the spanish tweets
06:29 ๐Ÿ”— SDr__ google://time in spain => 7:29am; consider the impact of waking up with your coffee in your hand, and seeing half the internetz blacked out
06:31 ๐Ÿ”— yipdw ok
06:31 ๐Ÿ”— yipdw so
06:31 ๐Ÿ”— Coderjoe they're mentioning the #wikipedia hash tag, though
06:31 ๐Ÿ”— yipdw how to actually archive these
06:31 ๐Ÿ”— yipdw short of "let tef do it"
06:31 ๐Ÿ”— Coderjoe and only en was supposed to go black
06:35 ๐Ÿ”— yipdw https://twitter.com/#!/_ItsONLY1_KEN/status/159510766695878656
06:39 ๐Ÿ”— tef hmm now wondering how to fix the 503 nonsense
06:39 ๐Ÿ”— tef well it gets captured sort of
06:42 ๐Ÿ”— yipdw I don't think http://sopastrike.com/on-strike/ is accurate at all
06:42 ๐Ÿ”— closure I think it's full of crap and few sites on strike, and a few that will be later
06:43 ๐Ÿ”— Coderjoe 4chan's got an odd blackout
06:43 ๐Ÿ”— Coderjoe the site still works, just the text is black on black
06:44 ๐Ÿ”— yipdw oh fuck
06:44 ๐Ÿ”— yipdw etherpad.org died
06:44 ๐Ÿ”— yipdw did someone archive it :P
06:46 ๐Ÿ”— yipdw ah ok it's back
06:48 ๐Ÿ”— MCV Sopastrike seems to have a lot of cybersquatting websites
06:48 ๐Ÿ”— MCV and facebook profilez
06:48 ๐Ÿ”— MCV basically they let anyone put their link up there I guess
06:50 ๐Ÿ”— MCV added http://www.qwantz.com/index.php
06:53 ๐Ÿ”— yipdw oh, cool
06:53 ๐Ÿ”— yipdw http://blog.nearlyfreespeech.net/2012/01/18/sopa-blackout-option/
07:01 ๐Ÿ”— tef uuughghu
07:05 ๐Ÿ”— tef right and I have patched the 503 errors
07:05 ๐Ÿ”— yipdw in wget or another tool?
07:05 ๐Ÿ”— tef in another tools
07:05 ๐Ÿ”— yipdw I think I found where it's falling through
07:05 ๐Ÿ”— yipdw oh ok
07:05 ๐Ÿ”— tef work stuffs :/
07:06 ๐Ÿ”— yipdw that's fine
07:06 ๐Ÿ”— yipdw so long as SOMEONE has a tool to grab this
07:06 ๐Ÿ”— tef well i'm running it now
07:06 ๐Ÿ”— tef seems to be surviving
07:12 ๐Ÿ”— yipdw ah ha
07:12 ๐Ÿ”— yipdw patched wget
07:14 ๐Ÿ”— yipdw https://gist.github.com/1631756
07:14 ๐Ÿ”— yipdw that can be applied against wget bzr 2574
07:15 ๐Ÿ”— yipdw I think it works
07:16 ๐Ÿ”— tef hurrah
07:16 ๐Ÿ”— tef well this crawl should end soonish I think
07:22 ๐Ÿ”— tef heh
07:22 ๐Ÿ”— tef it's picking up links under the blackout js and the blackout js
07:24 ๐Ÿ”— yipdw yeah
07:25 ๐Ÿ”— yipdw I'm not sure if the patched wget is handling wikipedia right
07:25 ๐Ÿ”— yipdw but as you've got a grab of that
07:25 ๐Ÿ”— yipdw its' fine
07:28 ๐Ÿ”— Coderjoe what the
07:28 ๐Ÿ”— Coderjoe http://twitter.com/JOIN__US/status/159537070338093056
07:29 ๐Ÿ”— tef got about 200 resources now
07:31 ๐Ÿ”— Coderjoe http://twitter.com/Eastern_Star_/status/159537964865699840
07:31 ๐Ÿ”— Coderjoe stupid
07:31 ๐Ÿ”— tef 2000 even
07:32 ๐Ÿ”— yipdw gonna try archiving the wtf wikipedia tweets
07:32 ๐Ÿ”— tef good luck with that
07:33 ๐Ÿ”— tef twitter is awful
07:33 ๐Ÿ”— yipdw I'm probably just going to have to bang their REST API
07:33 ๐Ÿ”— Coderjoe have fun
07:33 ๐Ÿ”— Coderjoe 20-40 tweets per minute on the two searches I have open
07:36 ๐Ÿ”— Coderjoe about the same on #stopsopa #sopa and #wikipedia
07:38 ๐Ÿ”— Coderjoe about the same on #wikipediablackout
07:38 ๐Ÿ”— Coderjoe #wikistrike hasn't moved in awhile
07:38 ๐Ÿ”— yipdw ugh
07:38 ๐Ÿ”— yipdw actually fuck that
07:39 ๐Ÿ”— yipdw I'll just continue running my news crawlers
07:40 ๐Ÿ”— Coderjoe people are discovering that the mobile site still works
07:52 ๐Ÿ”— Coderjoe http://twitter.com/HWGVictor/status/159543226984955905
07:54 ๐Ÿ”— Coderjoe another blackout: http://cinematictitanic.com/sopa.html
07:55 ๐Ÿ”— Coderjoe well, the front page redirects to the sopa page
07:55 ๐Ÿ”— ersi http://flowingdata.com/2012/01/17/watching-wtf-wikipedia-as-sopapipa-blackout-begins/
07:55 ๐Ÿ”— ersi heh
07:55 ๐Ÿ”— tef google isn't giving me the blackout
07:55 ๐Ÿ”— tef cos i'm on aws
07:57 ๐Ÿ”— yipdw https://github.com/zachstronaut/stop-sopa
07:57 ๐Ÿ”— yipdw oh, huh
08:05 ๐Ÿ”— yipdw heh
08:05 ๐Ÿ”— yipdw http://online.wsj.com/article/SB10001424052970203471004577142893718069820.html
08:06 ๐Ÿ”— Coderjoe "Rather, ..."
08:06 ๐Ÿ”— Coderjoe darn preview
08:10 ๐Ÿ”— Coderjoe lots of non-retweet repetition on #wikipediablackout: I support #wikipediablackout! Show your support here (tinyurl link)
08:11 ๐Ÿ”— Coderjoe new tending: #NoALaPincheSOPA
08:11 ๐Ÿ”— closure http://reedmorse.com/tmp/sopa-adwords.png can anyone who doesn't block ads confirm google is making anti-sopa adwords?
08:12 ๐Ÿ”— Coderjoe #NOALAPINCHESOPA hey hey hey you never say no to soup okaay?
08:12 ๐Ÿ”— yipdw closure: yes, there are ads for www.google.com/takeaction
08:12 ๐Ÿ”— yipdw if you search for "sopa" anyway
08:12 ๐Ÿ”— yipdw er, on Google
08:13 ๐Ÿ”— Coderjoe I tuned my adblock off and went to techcrunch and got the same google sopa banner
08:13 ๐Ÿ”— closure only there or other sites tho
08:13 ๐Ÿ”— yipdw they show up on Google search results too
08:13 ๐Ÿ”— closure techcrunch could have changed it.. if it's everywhere, that'd be huge
08:14 ๐Ÿ”— Coderjoe well, when I turn adblock back on, it goes away
08:14 ๐Ÿ”— Coderjoe give me another site with adwords
08:15 ๐Ÿ”— closure no idea, that's why I asked :)
08:16 ๐Ÿ”— Coderjoe it's also on ytmnd
08:16 ๐Ÿ”— Coderjoe (in an ad slot, not a site)
08:17 ๐Ÿ”— Coderjoe both ad slots, actually
08:18 ๐Ÿ”— Coderjoe geh
08:18 ๐Ÿ”— Coderjoe #stopsopa has 339 new tweets in the last 27 minutes or so
08:18 ๐Ÿ”— Coderjoe 335 new #sopa in 25 minutes
08:19 ๐Ÿ”— Coderjoe 176 new #wikipedia in 14 minutes
08:19 ๐Ÿ”— Coderjoe 159 new #wikipediablackout in 10 minutes
08:22 ๐Ÿ”— yipdw must be lots of school papers
08:28 ๐Ÿ”— Coderjoe well, I am really pissing off firefox
08:29 ๐Ÿ”— Coderjoe in addition to all those search pages I had open, I just opened those two trackers. told revisit to grab 1000 tweets
08:30 ๐Ÿ”— yipdw oops
08:30 ๐Ÿ”— yipdw closure: what was the etherpad link?
08:30 ๐Ÿ”— yipdw I just closed it and can't find it in Chromium's history
08:31 ๐Ÿ”— yipdw oh wait, I can undo
08:31 ๐Ÿ”— Coderjoe spot says 373 "wtf wikipedia" tweets per hour
08:31 ๐Ÿ”— arrith yipdw: http://beta.etherpad.org/p/KDVzcBCKTj
08:31 ๐Ÿ”— yipdw thanks
08:32 ๐Ÿ”— Coderjoe a number of the ones spot is highlighting are "do this twitter search and laugh at dumb people freaking out"
08:32 ๐Ÿ”— closure http://theoatmeal.com lol
08:34 ๐Ÿ”— arrith yipdw: where does a site go if its just changed its banner? but isn't really down
08:34 ๐Ÿ”— Coderjoe "raging vagina tractors"
08:34 ๐Ÿ”— closure woot to whoever noticed xmonad.org
08:35 ๐Ÿ”— * closure strokes his 300 line .xmonadrc
08:35 ๐Ÿ”— Coderjoe that is one long gif
08:36 ๐Ÿ”— Coderjoe courtesy of spot: http://twitter.com/WEIQINGZ/status/159551680587898880
08:37 ๐Ÿ”— yipdw who keeps claiming I have WARCs of all of those :P
08:39 ๐Ÿ”— yipdw ok now I kinda do
09:09 ๐Ÿ”— Coderjoe new stats at about 50 minutes: #wikipedia 650, #wikipediablackout 655, #sopa 673, #stopsopa 676
09:10 ๐Ÿ”— Coderjoe oh man
09:10 ๐Ÿ”— Coderjoe this looks to be silly
09:11 ๐Ÿ”— Coderjoe #FactsWithoutWikipedia
09:11 ๐Ÿ”— Coderjoe Weaves are made from abandoned foetuses. And you wondered why they rubbed you the wrong way, huh? #FactswithoutWikipedia
09:12 ๐Ÿ”— Coderjoe During the selection process for a new pope in the event of a tie, it is settled with a game of conkers #factswithoutwikipedia
09:15 ๐Ÿ”— Coderjoe Tiger woods owns 4 brothels #FactsWithoutWikipedia
09:15 ๐Ÿ”— Coderjoe The Earth is not spherical, it is actually a rectangular prism. #FactsWithoutWikipedia
09:24 ๐Ÿ”— Coderjoe i can imagine the mess when the people in the states wake up in a few hours
09:28 ๐Ÿ”— NovaKing http://theoatmeal.com/sopa
09:28 ๐Ÿ”— perfinion that is excellent
09:31 ๐Ÿ”— ersi "P.S. Please pirate the shit out of this animated GIF. "
09:34 ๐Ÿ”— Coderjoe yeah... #FactsWithoutWikipedia is going fast... 20 tweets in the last minute
09:36 ๐Ÿ”— Coderjoe Dubstep is the cure to diabetes #factswithoutwikipedia
09:36 ๐Ÿ”— Coderjoe well shit
09:38 ๐Ÿ”— ersi 1 of them is from a collueage
09:39 ๐Ÿ”— yipdw oh
09:39 ๐Ÿ”— yipdw heh
09:39 ๐Ÿ”— yipdw my buds followed through
09:39 ๐Ÿ”— yipdw http://chicagoparkour.com/
09:39 ๐Ÿ”— yipdw how weird
09:39 ๐Ÿ”— Coderjoe 78% of pregnancies occur due to the high incidence of couples sharing toothbrushes and bath towels. #FactsWithoutWikipedia
09:41 ๐Ÿ”— yipdw I've also been watching my news scraper and there is a suspicious dearth of pro-SOPA/pro-PIPA articles
09:41 ๐Ÿ”— yipdw but I'm just using Google News feeds
10:14 ๐Ÿ”— NovaKing http://xkcd.com/
10:54 ๐Ÿ”— Coderjoe hahah
10:56 ๐Ÿ”— Coderjoe bahahaha
10:58 ๐Ÿ”— Coderjoe did you catch the hidden message?
10:58 ๐Ÿ”— Coderjoe or hidden comic
10:58 ๐Ÿ”— NovaKing ya
15:23 ๐Ÿ”— closure I've added a few dozen more sopa blackout pages to http://beta.etherpad.org/p/KDVzcBCKTj
15:26 ๐Ÿ”— closure oddly, archive.org is still up
16:32 ๐Ÿ”— don closure: not for me
16:36 ๐Ÿ”— tef mornin
18:08 ๐Ÿ”— tef right, restarting the crawl with the current list in the pirate pad
18:09 ๐Ÿ”— tef i've been capturing pages all day from about 5am, then again at 8,9 am and then at 1pm, and again this afternoon. (gmt)
18:09 ๐Ÿ”— tef i'll clean up the warcs tomorrow and find out where to shove them
18:12 ๐Ÿ”— yipdw heheheh
18:12 ๐Ÿ”— yipdw http://support.godaddy.com/godaddy/go-daddy-many-other-internet-leaders-oppose-sopa-pipa/?ci=56582
18:13 ๐Ÿ”— tef heh
18:13 ๐Ÿ”— yipdw they are so full of shit
18:13 ๐Ÿ”— tef yipdw: did the warc options play nicely with the wget warc? and has the bug been filed upstream?
18:13 ๐Ÿ”— yipdw tef: I haven't tried cleaning up my existing WARCs, I'll get to that sometime tonight
18:13 ๐Ÿ”— tef cool
18:14 ๐Ÿ”— tef honest I am archiving xvideos.com for work
18:14 ๐Ÿ”— yipdw I haven't yet been able to file bugs because GNU took down savannah.gnu.org
18:14 ๐Ÿ”— tef heee
18:14 ๐Ÿ”— tef worst day to file a bug
18:15 ๐Ÿ”— yipdw no kidding
18:16 ๐Ÿ”— tef my boss is happy i'm running sopa crawls cos I keep finding bugs
18:16 ๐Ÿ”— tef hooray running stuff for archive team now counts as testing. heh heh heh
18:16 ๐Ÿ”— yipdw hah
18:16 ๐Ÿ”— yipdw these are some really weird edge cases
18:17 ๐Ÿ”— tef well not the bugs I found in my stuff (unrelated to warcs)
18:17 ๐Ÿ”— tef found a page hang i nthe crawler on <link> tags
18:18 ๐Ÿ”— yipdw ahh
18:18 ๐Ÿ”— tef because I am a terrible coder
18:18 ๐Ÿ”— tef and i've moved from cpickle to pickle for some ipc because of weird newline issues
18:18 ๐Ÿ”— tef thanks python
18:27 ๐Ÿ”— tef and flash is crashing my crawler again. i hate flash
18:33 ๐Ÿ”— tef might do a +1 crawl on news.ycombinator
18:56 ๐Ÿ”— yipdw so, evidently, if you get this level of ruckus going on, politicians back off
18:56 ๐Ÿ”— yipdw I just saw six news articles scroll by that said something to the effect of "cosponsors back off"
18:56 ๐Ÿ”— ersi for a little while, yes
18:56 ๐Ÿ”— yipdw yeah
18:56 ๐Ÿ”— ersi I bet they'll fuck it all up in a matter of time
18:56 ๐Ÿ”— yipdw I'm sure of it
18:57 ๐Ÿ”— tef hmm newzbin is also blacked out
19:01 ๐Ÿ”— yipdw what the hell
19:01 ๐Ÿ”— yipdw http://www.theweedblog.com/what-would-the-marijuana-movement-be-without-the-internet/
19:01 ๐Ÿ”— yipdw that showed up in the news feeds for "SOPA"
19:01 ๐Ÿ”— yipdw oh, I see why
19:33 ๐Ÿ”— Nemo_bis sigh, uploading to IA at 50 kB/s
19:33 ๐Ÿ”— Nemo_bis and a 30.5 GiB is reported as 28.3 GiB bu curl
19:39 ๐Ÿ”— nitro2k01 Sure both units are meant to be GiB?
19:43 ๐Ÿ”— Nemo_bis nitro2k01, yes, why not?
19:44 ๐Ÿ”— nitro2k01 30.5/28.3 is close enough to 1024^3
19:44 ๐Ÿ”— nitro2k01 Or 1024^3/1000^3 rather but you get my point
19:45 ๐Ÿ”— Nemo_bis yes, but curl uses GiB
19:45 ๐Ÿ”— Nemo_bis and nautilus too I hope
19:45 ๐Ÿ”— nitro2k01 It's the bigger one that would be using GB
19:45 ๐Ÿ”— Nemo_bis although it seems you're right, because the file is 30486180948 B
19:46 ๐Ÿ”— Nemo_bis bah, stupid nautilus
19:47 ๐Ÿ”— tef zombo.com is down too
19:47 ๐Ÿ”— yipdw SHIT
19:47 ๐Ÿ”— yipdw WHAT
19:47 ๐Ÿ”— tef Nemo_bis: btw I pushed my crap to SketchCow
19:47 ๐Ÿ”— tef thanks for reminding me :#
19:47 ๐Ÿ”— tef yipdw: I KNOW? NOTHING IS POSSIBLE!
19:48 ๐Ÿ”— yipdw YOU ARE NOT WELCOME
19:48 ๐Ÿ”— yipdw TO ZOMBOCOM
19:48 ๐Ÿ”— Nemo_bis tef, good :)
19:50 ๐Ÿ”— Nemo_bis but now who's Angra
19:50 ๐Ÿ”— tef i've got news.yc to crawl so I am capturing the front page and all the articles
19:50 ๐Ÿ”— tef hopefully
19:50 ๐Ÿ”— Nemo_bis or crawl333-9
19:51 ๐Ÿ”— tef got 141 seeds in the sopa crawl i've just restarted
19:51 ๐Ÿ”— tef I probably have about 7-8 copies of some pages by now but eh i'll dedupe when i'm dead
20:20 ๐Ÿ”— db48x just had some teeth pulled
20:20 ๐Ÿ”— db48x what is everyone else up to?
20:29 ๐Ÿ”— chronomex it's snowing out.
20:30 ๐Ÿ”— chronomex paaaanic
20:30 ๐Ÿ”— db48x :)
20:36 ๐Ÿ”— DFJustin yeah got a pretty good dump here on vancouver island
20:36 ๐Ÿ”— chronomex didnt know you were closeby
20:41 ๐Ÿ”— closure http://blumenauer.house.gov/ lol
20:42 ๐Ÿ”— db48x who is archiving blackout messages?
20:42 ๐Ÿ”— nitro2k01 https://twitter.com/#!/herpderpedia
20:43 ๐Ÿ”— db48x nitro2k01: heh, awesome
20:45 ๐Ÿ”— yipdw db48x: me
20:45 ๐Ÿ”— yipdw db48x: got a few thousand SOPA-related news stories, some blackout pages
20:46 ๐Ÿ”— yipdw I also just got a Zeiss CP.2 85mm T/2.1 for a rental
20:46 ๐Ÿ”— yipdw and am kind of freaking out because the lens is fucking awesome
20:46 ๐Ÿ”— db48x heh
20:47 ๐Ÿ”— db48x hrm
20:47 ๐Ÿ”— db48x anesthesia is wearing off already
20:49 ๐Ÿ”— nitro2k01 How many teeth?
20:50 ๐Ÿ”— db48x 4
20:50 ๐Ÿ”— nitro2k01 Wisdom?
20:50 ๐Ÿ”— db48x yea
20:50 ๐Ÿ”— nitro2k01 Ah. Have fun!
20:50 ๐Ÿ”— db48x three impacted, one that is useless without the one underneath it
20:51 ๐Ÿ”— tef db48x: me too.
20:52 ๐Ÿ”— db48x good
20:52 ๐Ÿ”— tef i've got ~200 M or so but mostly dupes :/ I'm doing a fresh crawl now and a +1 from hacker news
20:53 ๐Ÿ”— tef from the ones in progress I have 78M and 87M
20:53 ๐Ÿ”— tef of stuff
20:53 ๐Ÿ”— tef approx
20:54 ๐Ÿ”— db48x not bad
20:54 ๐Ÿ”— tef hmm that's including the db uncompressed
20:54 ๐Ÿ”— yipdw so far, 4.1 gigs of SOPANews
20:54 ๐Ÿ”— yipdw I'm going to guess that like tef there's a shitload of dupes in there
20:55 ๐Ÿ”— tef I have dupes across crawls but not within them I think
20:55 ๐Ÿ”— tef db48x: yipdw is using wget and i'm using work resources. I'm capturing some of the js wget is missing, etc.
20:57 ๐Ÿ”— tef I could probably fire up more machines next time but that's a sort of a work in progress
20:57 ๐Ÿ”— yipdw heh, next time
20:57 ๐Ÿ”— yipdw we need a catchy acronym for the next bil
20:57 ๐Ÿ”— yipdw l
20:57 ๐Ÿ”— tef I think my boss is looking to expose api access to make this sort of thing easier
20:57 ๐Ÿ”— yipdw INNOVATE
20:57 ๐Ÿ”— yipdw International, uh
20:58 ๐Ÿ”— tef if I had s3 write support for archive.org it would make things a lot easier
20:58 ๐Ÿ”— tef I could just make my crawler upload to them instead of amazon
20:59 ๐Ÿ”— yipdw heh
20:59 ๐Ÿ”— yipdw https://twitter.com/#!/jimmy_wales/status/159737306419433472
21:00 ๐Ÿ”— yipdw oh, and http://uncyclopedia.wikia.com/wiki/Main_Page
21:02 ๐Ÿ”— closure s3 write for archive.org is not too hard
21:03 ๐Ÿ”— tef yeah I just signed up for an api key
21:04 ๐Ÿ”— tef hmmm
21:05 ๐Ÿ”— db48x this anesthesia didn't last as long as I was told
21:05 ๐Ÿ”— tef I could make an irc bot that does a page capture and uploads the warc to archive.org :V
21:05 ๐Ÿ”— yipdw that'd actually be pretty useful
21:05 ๐Ÿ”— yipdw since that's the main way these are being found
21:05 ๐Ÿ”— db48x I like the one at thedailywtf.com
21:06 ๐Ÿ”— closure huh.. that would be a neat service for here
21:06 ๐Ÿ”— tef well irc does make a great command and control structure
21:07 ๐Ÿ”— tef can always make an archiveteam-bots
21:07 ๐Ÿ”— tef channel with more than one bot to service requests
21:12 ๐Ÿ”— tef heh
21:12 ๐Ÿ”— tef I did seriously consider using irc as a messagebus at one point
21:16 ๐Ÿ”— tef I think i'll need the right credentials first / bucket details but I could likely get an irc bot up later this weekend
21:21 ๐Ÿ”— db48x odd
21:21 ๐Ÿ”— db48x my private xmpp chat room is bouncing up and down
21:29 ๐Ÿ”— yipdw http://www.flickr.com/photos/jcn/6721179703/sizes/l/in/photostream/
21:30 ๐Ÿ”— db48x :)
21:45 ๐Ÿ”— db48x does anyone here know how the openlibrary.org source code is organized?
22:01 ๐Ÿ”— yipdw http://hq.deviantart.com/journal/Join-a-SOPA-and-PIPA-debate-280023798
22:01 ๐Ÿ”— yipdw er
22:01 ๐Ÿ”— yipdw http://sakimichan.deviantart.com/art/STOP-SOPA-Bill-276510440
22:38 ๐Ÿ”— db48x hrm
22:38 ๐Ÿ”— db48x I'm having trouble concentrating
22:40 ๐Ÿ”— SketchCow Been there
22:41 ๐Ÿ”— SketchCow yipdw: I'd love those pages when you're done.
22:41 ๐Ÿ”— SketchCow How are you doing it?
22:41 ๐Ÿ”— yipdw I'm periodically scraping Google News for links
22:41 ๐Ÿ”— SketchCow Are you using WGET or heretrix?
22:41 ๐Ÿ”— yipdw and then spawning an assload of wget-warc processes
22:42 ๐Ÿ”— SketchCow Oh, excellent.
22:42 ๐Ÿ”— SketchCow Everyone here is happy.
22:42 ๐Ÿ”— yipdw I can substitute Heritrix
22:42 ๐Ÿ”— SketchCow No no! Use wget-warc
22:42 ๐Ÿ”— yipdw heh, ok
22:42 ๐Ÿ”— yipdw I haven't yet verified the integrity of all the WARCs; it's entirely possible that wget-warc is tripping up on some of them
22:42 ๐Ÿ”— yipdw I have spot-checked a few and those do look okay though
22:43 ๐Ÿ”— underscor 2.2656 TiB RAM
22:43 ๐Ÿ”— underscor 54.0039 TiB Disk
22:43 ๐Ÿ”— underscor 826 Virtual CPUs
22:43 ๐Ÿ”— underscor [5:42:22 PM EDT] Andy Bezella: fyi worker farm is:
22:43 ๐Ÿ”— underscor Interesting statistics on the IA workers
22:44 ๐Ÿ”— yipdw actually, one problem I have periodically hit is wget-warc just...freezing up
22:44 ๐Ÿ”— yipdw there's no indication in the log that anything's wrong
22:45 ๐Ÿ”— yipdw oh wait, I just noticed that I have a shitload of TCP connections open, I bet that's it
22:45 ๐Ÿ”— yipdw weird
23:07 ๐Ÿ”— SketchCow Regardless, you're a hero, yipdw
23:15 ๐Ÿ”— NovaKing https://static.thepiratebay.org/legal/sopa.txt
23:58 ๐Ÿ”— Coderjoe http://www.wired.com/threatlevel/2012/01/scotus-re-copyright-decision/

irclogger-viewer