Why Wikipedia went down at midnight

[00:02] so [00:02] I'm gonna start archiving all news stories re: Wikipedia blackout [00:06] http://www.cbsnews.com/8301-205_162-57360291/google-plans-to-use-home-page-to-protest-sopa/ [01:16] some guy from luxembourg is uploading old glenn beck episodes at full speed [01:16] :-D [01:16] i want to hug the guy [01:17] LOL [01:17] why, you like glenn beck, or you like the archival value of them? [01:17] xD [01:19] just what to archive thme [01:19] *them [01:19] Glenn Beck is my favorite comedian [01:19] there is like 6 months of it [01:20] if you're going to allow spam videos to get uploaded, it's kind of hard to discriminate against any particular variety [01:21] this luxembourg guy is now seeding 4 episodes [01:22] this is just crasy [01:22] it use to be like dialup speeds before [01:23] also look up george soros [02:41] yipdw: fwiw, warc2warc doesn't truncate the records now at least. going to put the wget workaround in now as an option [03:05] yipdw: python warc2warc.py -D -Z --wget-chunk-fix foo.warc.gz > bar.warc.gz [03:05] should decompress (if needed), decode each http message (removing content-encoding/transfer-encoding), deal with the corrupt wget output, and recompress it record by record [03:13] hoooray! [03:14] tef: are you the tef that's on somethingawful? [03:14] yes [03:14] oh! [03:14] *wave* [03:14] I was going to ask if you were the Ymgve on sa [03:14] I like your av :3 if it's the cos(...) one iirc [03:15] I'm basically the only Ymgve on the internet, actually picked the name mainly for uniqueness [03:15] what about my new mandelbrot avatar? [03:16] surprised it didn't exceed the size limits [03:16] ah yes that owns [03:16] I have a thing for fractals [03:17] I wrote a js1k which uses a hilbert curve to draw a mandelbrot [03:17] http://secretvolcanobase.org/~tef/js1k.html [03:17] this makes me so happy [03:18] hah, didn't know you could do that, cool! [03:28] well I use it for enqueing progressive rendering of the fractal at higher resolutions [03:29] so I draw larger and larger hilbert curves and render the points as blocks until it's ~ 1px small [04:30] tef: awesome, thanks [04:30] ^ÃÃÃÃÃÃ¢ÃªÃ®Ã´Ã»ÄÄÄÄÄ¤Ä¥Ä´ÄµÅÅÅ´ÅµÅ¶Å·ËÌÌá·á¸á¸á¸á¸á¸¼á¸½á¹á¹á¹°á¹±á¹¶á¹·áºáºáº¤áº¥áº¦áº§áº¨áº©áºªáº«áº¬áºáº¾áº¿á»á»á»á»á»á»á»á»á»á»á»á»á»á»á»á»á»á»â¨£â¨¶â©¯êï¼¾ó [04:32] hmm [04:32] about 1200 SOPA/Wikipedia stories coming in [04:42] yipdw: are you going to be warcing any sites like wikipedia? Already has a pretty historic banner up [04:43] I read some google person encouraging sites to make their blackout be a 403, or some such code, so it's *not* archived. [04:43] closure: already did [04:43] well [04:43] I got their announcement at least [04:44] I guess I could grab en.wikipedia.org/wiki/Main_Page or something [04:44] they've depolyed their face of jimbo banner technology for good, at last :P [04:44] heh [04:45] ok, I got Main Page [04:45] https://plus.google.com/115984868678744352358/posts/Gas8vjZ5fmB [04:45] I guess I can grab the blackout too [04:46] so he suggested a 503 code [04:49] heh [04:49] seems reasonable [04:49] I just went looking for other status codes...on Wikipedia [04:49] CAN'T DO THAT IN TEN MINUTES [04:49] aside from not being archived [04:49] http://en.wikipedia.org/wiki/List_of_HTTP_status_codes [04:50] remember, in 10 minutes, es.wikipedia.org will still be there :) [04:50] http://es.wikipedia.org/wiki/Anexo:C%C3%B3digos_de_estado_HTTP [04:50] sweet [04:55] hee [04:55] just memorize the http status cats [04:57] http://www.flickr.com/photos/girliemac/6509400997/ [04:59] http://tricities.craigslist.org/ is already black [04:59] hmm, they leave a link to the real site [05:00] woot, wikipedia is black [05:00] really nice design too [05:01] Oh, it's just an overlay trick [05:01] heh, their Learn More link is buggy.. it links to a wiki page that's blacked out :) [05:01] wikipedia just went dark check it out [05:02] closure: hahaha [05:02] yeah, overlay is a good way to do it, although you do see the real page flash [05:03] I think they need to tweak their overlay JS (or whatever) to not show on that page [05:03] yep [05:03] or they could link to the eff's page [05:03] lol the learn more link [05:03] classy [05:03] http://www.fsf.org/ black [05:04] hmmm [05:04] I can make warcs with the crawler from work. [05:04] fixed [05:04] and has a title that's binary for some reason, heh [05:04] 2012-01-18 01:04:52 ERROR 503: Service Temporarily Unavailable. [05:04] that's wget.. [05:05] wonder what's the switch to archive despite failing status code [05:05] heh, wikipedia fixed their link :) [05:08] woot [05:08] I blocked the blackout JS. I can now use the english wikipedia just fine [05:09] (since I already know about sopa and have written my critters about it, I don't think I need to see the blackout overlay) [05:09] already lead story on CNN [05:10] well I have a warc of wikipedia.org's front page now [05:10] should have all the css/js too [05:10] as a plus, I probably can modify this to block the jimboface banners, too [05:11] tef: hey me too! :) [05:11] (I added this to my adblock filter list, with "at start" anchoring: http://meta.wikimedia.org/w/index.php*title=Special:BannerLoader*banner=blackout ) [05:11] I do wish that they really just turned off the wiki software, or set up a redirect [05:12] but maybe they just couldn't do that across all their app servers [05:12] you can also just browse it in w3m :P [05:13] I am hardcore and browse it via gunzip -c *.warc.gz [05:13] oooh, nice on google [05:13] haha [05:13] enormous censorship bar of dooom [05:13] best google logo evar [05:13] time to grab that too [05:14] yipdw: I caught http://pastebin.com/NWiV4yEV [05:15] I wonder if they are blocking the english site on the secure site [05:16] tef: https://gist.github.com/7d787f4487a80b354d8f [05:16] come to think of it [05:16] I'm not sure I got what I actually went for [05:16] one moment [05:17] oh [05:17] I forgot to turn off robots.txt obey stuff [05:18] weird [05:18] you didn't get http://upload.wikimedia.org/wikipedia/commons/9/98/WP_SOPA_Splash_Full.jpg [05:18] yeah [05:18] hooray javascript? [05:18] http://thepiratebay.org/ heh, I didn't see that blackout coming [05:18] I don't think wget's seeing it [05:19] but whatever, you got a copy [05:19] grabbed [05:19] guess I'll grab www.google.com and the landing page [05:19] might have better luck there [05:21] i'm sorta glad the thing I made for work is capturing more :3 [05:22] yay archiving [05:23] tef: haha [05:24] http://thedailywtf.com/Articles/Support-The-Daily-WTF-in-Supporting-the-Support-SOPA-Movement.aspx [05:24] wins IMHO by dragging BBSes into this [05:26] tef: can your software include embedded videos? [05:26] sorta [05:26] there's a few on https://www.google.com/landing/takeaction/sopa-pipa/ that I'm trying to get wget to somehow get [05:26] but without knowledge of YouTube's system I don't think it can really do it [05:27] http://www.metafilter.com/ black [05:27] maybe we should make a list [05:28] pls! [05:28] yipdw: piratepad or the wiki? [05:28] http://boingboing.net/ black [05:29] recommend a pad over wiki while it's still being updated [05:29] http://archiveteam.org/index.php?title=SOPA_blackout_pages [05:29] works here [05:30] if you like edit conflicts and captchas [05:30] FINE [05:31] wow, check view source on boing boing [05:31] http://piratepad.net/I42hnyc0zk [05:33] blah, I just got disconnected [05:33] constantly [05:33] i think there are other sites that run the etherpad sw [05:33] just i don't know of any of them off the top of my head [05:34] http://beta.etherpad.org/p/KDVzcBCKTj [05:35] 4chan is sensored too [05:35] go congress [05:36] no it's not [05:36] go to any section [05:36] ./g/ is [05:36] all the text is black until you mouse over it [05:36] oh, i see.. I actually have to eeek [05:36] censoring the text on 4chan is like.. like.. words fail me [05:39] http://wordpress.com/ [05:40] can we drop the * thing [05:40] it is hard to copy paste the urls out [05:40] and dump them into a script :v [05:40] lol [05:40] we can [05:41] awesome [05:42] oh sadface [05:42] wget doesn't retrieve page requisites if retrieval of a URL returns 503 [05:42] I wonder if I can force it to follow [05:42] http://sopastrike.com/on-strike/ [05:42] don't really want to write custom tools to do this if tef already has them :P [05:44] the sopastrike page is crashing chromium [05:44] ok.. that's a list [05:44] it's not all though [05:44] yeah but my tools are well, flakey [05:44] indeed not [05:45] haha [05:45] " [05:45] "Chromium's connection attempt to www.twitter.com was rejected. The website may be down, or your network may not be properly configured. [05:45] FUCK BLACKOUT, WE'RE GOING OFFLINE [05:46] (probably just my link though) [05:47] oh wait there it goes [05:47] yeah twitter's ceo guy was being a dbag about the sopa stuff [05:47] calling it idiotic or something [05:48] nah, he was just referring to Twitter specifically [05:48] twitter isn't going down for anything but mismanagement [05:48] well, twitter not going down for a day? wikipedia is effectively [05:48] I meant his tweet referred to Twitter going offline as foolish, not Wikipedia or anyone else [05:49] ah yeah [05:50] [05:50] Why Wikipedia went down at midnight - CNN.com [05:51] yipdw: Imgur Tor Project Miro iSchool at Syracuse University Oreilly.com Wikipedia Reddit Mozilla WordPress.org icanhazCheezburger Network MoveOn.org Good Old Games TwitPic Minecraft Free Press Mojang XDA Developers Destructoid Good.is [05:52] lol [05:52] we'll report on it.. but it didn't happen [05:52] at least twitpic is i gues [05:56] arrith: ? [05:56] yipdw: those sites were said to be doing a blackout thing [05:56] oh [05:56] add them to the list [06:00] ahahah I had a stupid bug [06:02] twitpic only doing logo by looks of it [06:02] aww sonuvabitch [06:02] http://savannah.gnu.org/bugs/index.php?20417 [06:02] i'm only doing logo on my site [06:02] that bug deals *directly* with wget's 503 behavior [06:02] and I can't access it [06:02] ... [06:02] fuck. [06:04] omg omg i hope you guys archived wikipedia because they took it down *ducks* [06:04] nice [06:04] ps: ?banner=0 [06:04] will take off the banner [06:07] hey guys! [06:07] you... are... awesome! [06:09] NovaKing: please provide a full url that works for that bug page? [06:10] yeah, I can't get to it [06:10] eh? sorry, that was in relation to wikipedia [06:11] as for 503 [06:11] oh [06:11] that is due to google crawlers [06:11] to not index the blackout page [06:11] you might have to get wget source, and remove the 503 section [06:12] there isn't one [06:12] it's probably the generic failure code handling code [06:13] there isn't one? [06:13] ...? [06:13] I cannot find a failure handler that deals with HTTP 503, no [06:14] there may be a generic one [06:14] I'm trying to find it [06:15] in fact, here's something weird in wget trunk: [06:15] $ grep -nR HTTP_STATUS_UNAVAILABLE * [06:15] src/http.c:131:#define HTTP_STATUS_UNAVAILABLE 503 [06:15] that's it [06:15] it's not used anywhere else [06:15] well, at least not straight off [06:16] could be some preprocessor tricks [06:17] I think it's only there for completeness [06:17] I was in the same place, didn't get any further [06:21] anyone having trouble connecting to twitter? [06:21] I thought they weren't participating [06:21] they participate at random thruought the year [06:21] hahah [06:21] in total it comes to a day anyway [06:22] https://twitter.com/herpderpedia lolz [06:24] worksforme [06:24] but they might be buckling under the "OMG SHIT IS DOWN" posts [06:25] twitter can't go offline [06:25] my mom told me they were a cloud and clouds never go offline [06:25] hahahah [06:26] yeah [06:26] it's the load of #wikipedia [06:26] damn [06:27] well [06:27] Those comments give me no hope for humanity [06:27] it's probably a good thing that Twitter dumps its tweets in the LoC [06:27] because we have no hope of actually archiving any of that, short of a direct feed to Twitter's message brokers [06:27] haha [06:28] mostly because the Javascript on New Twitter makes it such a fucking bear to wrestle [06:28] 19 new tweets in the last minute [06:28] 36 in 2m [06:29] kinda surprised by all the spanish tweets [06:29] google://time in spain => 7:29am; consider the impact of waking up with your coffee in your hand, and seeing half the internetz blacked out [06:31] ok [06:31] so [06:31] they're mentioning the #wikipedia hash tag, though [06:31] how to actually archive these [06:31] short of "let tef do it" [06:31] and only en was supposed to go black [06:35] https://twitter.com/#!/_ItsONLY1_KEN/status/159510766695878656 [06:39] hmm now wondering how to fix the 503 nonsense [06:39] well it gets captured sort of [06:42] I don't think http://sopastrike.com/on-strike/ is accurate at all [06:42] I think it's full of crap and few sites on strike, and a few that will be later [06:43] 4chan's got an odd blackout [06:43] the site still works, just the text is black on black [06:44] oh fuck [06:44] etherpad.org died [06:44] did someone archive it :P [06:46] ah ok it's back [06:48] Sopastrike seems to have a lot of cybersquatting websites [06:48] and facebook profilez [06:48] basically they let anyone put their link up there I guess [06:50] added http://www.qwantz.com/index.php [06:53] oh, cool [06:53] http://blog.nearlyfreespeech.net/2012/01/18/sopa-blackout-option/ [07:01] uuughghu [07:05] right and I have patched the 503 errors [07:05] in wget or another tool? [07:05] in another tools [07:05] I think I found where it's falling through [07:05] oh ok [07:05] work stuffs :/ [07:06] that's fine [07:06] so long as SOMEONE has a tool to grab this [07:06] well i'm running it now [07:06] seems to be surviving [07:12] ah ha [07:12] patched wget [07:14] https://gist.github.com/1631756 [07:14] that can be applied against wget bzr 2574 [07:15] I think it works [07:16] hurrah [07:16] well this crawl should end soonish I think [07:22] heh [07:22] it's picking up links under the blackout js and the blackout js [07:24] yeah [07:25] I'm not sure if the patched wget is handling wikipedia right [07:25] but as you've got a grab of that [07:25] its' fine [07:28] what the [07:28] http://twitter.com/JOIN__US/status/159537070338093056 [07:29] got about 200 resources now [07:31] http://twitter.com/Eastern_Star_/status/159537964865699840 [07:31] stupid [07:31] 2000 even [07:32] gonna try archiving the wtf wikipedia tweets [07:32] good luck with that [07:33] twitter is awful [07:33] I'm probably just going to have to bang their REST API [07:33] have fun [07:33] 20-40 tweets per minute on the two searches I have open [07:36] about the same on #stopsopa #sopa and #wikipedia [07:38] about the same on #wikipediablackout [07:38] #wikistrike hasn't moved in awhile [07:38] ugh [07:38] actually fuck that [07:39] I'll just continue running my news crawlers [07:40] people are discovering that the mobile site still works [07:52] http://twitter.com/HWGVictor/status/159543226984955905 [07:54] another blackout: http://cinematictitanic.com/sopa.html [07:55] well, the front page redirects to the sopa page [07:55] http://flowingdata.com/2012/01/17/watching-wtf-wikipedia-as-sopapipa-blackout-begins/ [07:55] heh [07:55] google isn't giving me the blackout [07:55] cos i'm on aws [07:57] https://github.com/zachstronaut/stop-sopa [07:57] oh, huh [08:05] heh [08:05] http://online.wsj.com/article/SB10001424052970203471004577142893718069820.html [08:06] "Rather, ..." [08:06] darn preview [08:10] lots of non-retweet repetition on #wikipediablackout: I support #wikipediablackout! Show your support here (tinyurl link) [08:11] new tending: #NoALaPincheSOPA [08:11] http://reedmorse.com/tmp/sopa-adwords.png can anyone who doesn't block ads confirm google is making anti-sopa adwords? [08:12] #NOALAPINCHESOPA hey hey hey you never say no to soup okaay? [08:12] closure: yes, there are ads for www.google.com/takeaction [08:12] if you search for "sopa" anyway [08:12] er, on Google [08:13] I tuned my adblock off and went to techcrunch and got the same google sopa banner [08:13] only there or other sites tho [08:13] they show up on Google search results too [08:13] techcrunch could have changed it.. if it's everywhere, that'd be huge [08:14] well, when I turn adblock back on, it goes away [08:14] give me another site with adwords [08:15] no idea, that's why I asked :) [08:16] it's also on ytmnd [08:16] (in an ad slot, not a site) [08:17] both ad slots, actually [08:18] geh [08:18] #stopsopa has 339 new tweets in the last 27 minutes or so [08:18] 335 new #sopa in 25 minutes [08:19] 176 new #wikipedia in 14 minutes [08:19] 159 new #wikipediablackout in 10 minutes [08:22] must be lots of school papers [08:28] well, I am really pissing off firefox [08:29] in addition to all those search pages I had open, I just opened those two trackers. told revisit to grab 1000 tweets [08:30] oops [08:30] closure: what was the etherpad link? [08:30] I just closed it and can't find it in Chromium's history [08:31] oh wait, I can undo [08:31] spot says 373 "wtf wikipedia" tweets per hour [08:31] yipdw: http://beta.etherpad.org/p/KDVzcBCKTj [08:31] thanks [08:32] a number of the ones spot is highlighting are "do this twitter search and laugh at dumb people freaking out" [08:32] http://theoatmeal.com lol [08:34] yipdw: where does a site go if its just changed its banner? but isn't really down [08:34] "raging vagina tractors" [08:34] woot to whoever noticed xmonad.org [08:35] * closure strokes his 300 line .xmonadrc [08:35] that is one long gif [08:36] courtesy of spot: http://twitter.com/WEIQINGZ/status/159551680587898880 [08:37] who keeps claiming I have WARCs of all of those :P [08:39] ok now I kinda do [09:09] new stats at about 50 minutes: #wikipedia 650, #wikipediablackout 655, #sopa 673, #stopsopa 676 [09:10] oh man [09:10] this looks to be silly [09:11] #FactsWithoutWikipedia [09:11] Weaves are made from abandoned foetuses. And you wondered why they rubbed you the wrong way, huh? #FactswithoutWikipedia [09:12] During the selection process for a new pope in the event of a tie, it is settled with a game of conkers #factswithoutwikipedia [09:15] Tiger woods owns 4 brothels #FactsWithoutWikipedia [09:15] The Earth is not spherical, it is actually a rectangular prism. #FactsWithoutWikipedia [09:24] i can imagine the mess when the people in the states wake up in a few hours [09:28] http://theoatmeal.com/sopa [09:28] that is excellent [09:31] "P.S. Please pirate the shit out of this animated GIF. " [09:34] yeah... #FactsWithoutWikipedia is going fast... 20 tweets in the last minute [09:36] Dubstep is the cure to diabetes #factswithoutwikipedia [09:36] well shit [09:38] 1 of them is from a collueage [09:39] oh [09:39]