Time |
Nickname |
Message |
00:02
๐
|
yipdw |
so |
00:02
๐
|
yipdw |
I'm gonna start archiving all news stories re: Wikipedia blackout |
00:06
๐
|
NovaKing |
http://www.cbsnews.com/8301-205_162-57360291/google-plans-to-use-home-page-to-protest-sopa/ |
01:16
๐
|
godane |
some guy from luxembourg is uploading old glenn beck episodes at full speed |
01:16
๐
|
godane |
:-D |
01:16
๐
|
godane |
i want to hug the guy |
01:17
๐
|
balrog |
LOL |
01:17
๐
|
balrog |
why, you like glenn beck, or you like the archival value of them? |
01:17
๐
|
balrog |
xD |
01:19
๐
|
godane |
just what to archive thme |
01:19
๐
|
godane |
*them |
01:19
๐
|
nitro2k01 |
Glenn Beck is my favorite comedian |
01:19
๐
|
godane |
there is like 6 months of it |
01:20
๐
|
dashcloud |
if you're going to allow spam videos to get uploaded, it's kind of hard to discriminate against any particular variety |
01:21
๐
|
godane |
this luxembourg guy is now seeding 4 episodes |
01:22
๐
|
godane |
this is just crasy |
01:22
๐
|
godane |
it use to be like dialup speeds before |
01:23
๐
|
godane |
also look up george soros |
02:41
๐
|
tef |
yipdw: fwiw, warc2warc doesn't truncate the records now at least. going to put the wget workaround in now as an option |
03:05
๐
|
tef |
yipdw: python warc2warc.py -D -Z --wget-chunk-fix foo.warc.gz > bar.warc.gz |
03:05
๐
|
tef |
should decompress (if needed), decode each http message (removing content-encoding/transfer-encoding), deal with the corrupt wget output, and recompress it record by record |
03:13
๐
|
tef |
hoooray! |
03:14
๐
|
Ymgve |
tef: are you the tef that's on somethingawful? |
03:14
๐
|
tef |
yes |
03:14
๐
|
Ymgve |
oh! |
03:14
๐
|
Ymgve |
*wave* |
03:14
๐
|
tef |
I was going to ask if you were the Ymgve on sa |
03:14
๐
|
tef |
I like your av :3 if it's the cos(...) one iirc |
03:15
๐
|
Ymgve |
I'm basically the only Ymgve on the internet, actually picked the name mainly for uniqueness |
03:15
๐
|
Ymgve |
what about my new mandelbrot avatar? |
03:16
๐
|
Ymgve |
surprised it didn't exceed the size limits |
03:16
๐
|
tef |
ah yes that owns |
03:16
๐
|
tef |
I have a thing for fractals |
03:17
๐
|
tef |
I wrote a js1k which uses a hilbert curve to draw a mandelbrot |
03:17
๐
|
tef |
http://secretvolcanobase.org/~tef/js1k.html |
03:17
๐
|
tef |
this makes me so happy |
03:18
๐
|
Ymgve |
hah, didn't know you could do that, cool! |
03:28
๐
|
tef |
well I use it for enqueing progressive rendering of the fractal at higher resolutions |
03:29
๐
|
tef |
so I draw larger and larger hilbert curves and render the points as blocks until it's ~ 1px small |
04:30
๐
|
yipdw |
tef: awesome, thanks |
04:30
๐
|
underscor |
^รยรยรยรยรยรยขรยชรยฎรยดรยปรยรยรยรยรยครยฅรยดรยตร
ยร
ยร
ยดร
ยตร
ยถร
ยทรยรยญรยรกยทยรกยธยรกยธยรกยธยรกยธยรกยธยผรกยธยฝรกยนยรกยนยรกยนยฐรกยนยฑรกยนยถรกยนยทรกยบยรกยบยรกยบยครกยบยฅรกยบยฆรกยบยงรกยบยจรกยบยฉรกยบยชรกยบยซรกยบยฌรกยบยญรกยบยพรกยบยฟรกยปยรกยปยรกยปยรกยปยรกยปยรกยปย
รกยปยรกยปยรกยปยรกยปยรกยปยรกยปยรกยปยรกยปยรกยปยรกยปยรกยปยรกยปยรขยจยฃรขยจยถรขยฉยฏรชยยรฏยผยพรณย ยย |
04:32
๐
|
yipdw |
hmm |
04:32
๐
|
yipdw |
about 1200 SOPA/Wikipedia stories coming in |
04:42
๐
|
closure |
yipdw: are you going to be warcing any sites like wikipedia? Already has a pretty historic banner up |
04:43
๐
|
closure |
I read some google person encouraging sites to make their blackout be a 403, or some such code, so it's *not* archived. |
04:43
๐
|
yipdw |
closure: already did |
04:43
๐
|
yipdw |
well |
04:43
๐
|
yipdw |
I got their announcement at least |
04:44
๐
|
yipdw |
I guess I could grab en.wikipedia.org/wiki/Main_Page or something |
04:44
๐
|
closure |
they've depolyed their face of jimbo banner technology for good, at last :P |
04:44
๐
|
yipdw |
heh |
04:45
๐
|
yipdw |
ok, I got Main Page |
04:45
๐
|
closure |
https://plus.google.com/115984868678744352358/posts/Gas8vjZ5fmB |
04:45
๐
|
yipdw |
I guess I can grab the blackout too |
04:46
๐
|
closure |
so he suggested a 503 code |
04:49
๐
|
yipdw |
heh |
04:49
๐
|
yipdw |
seems reasonable |
04:49
๐
|
yipdw |
I just went looking for other status codes...on Wikipedia |
04:49
๐
|
yipdw |
CAN'T DO THAT IN TEN MINUTES |
04:49
๐
|
closure |
aside from not being archived |
04:49
๐
|
closure |
http://en.wikipedia.org/wiki/List_of_HTTP_status_codes |
04:50
๐
|
closure |
remember, in 10 minutes, es.wikipedia.org will still be there :) |
04:50
๐
|
yipdw |
http://es.wikipedia.org/wiki/Anexo:C%C3%B3digos_de_estado_HTTP |
04:50
๐
|
yipdw |
sweet |
04:55
๐
|
tef |
hee |
04:55
๐
|
tef |
just memorize the http status cats |
04:57
๐
|
tef |
http://www.flickr.com/photos/girliemac/6509400997/ |
04:59
๐
|
closure |
http://tricities.craigslist.org/ is already black |
04:59
๐
|
closure |
hmm, they leave a link to the real site |
05:00
๐
|
closure |
woot, wikipedia is black |
05:00
๐
|
closure |
really nice design too |
05:01
๐
|
underscor |
Oh, it's just an overlay trick |
05:01
๐
|
closure |
heh, their Learn More link is buggy.. it links to a wiki page that's blacked out :) |
05:01
๐
|
bsmith093 |
wikipedia just went dark check it out |
05:02
๐
|
Coderjoe |
closure: hahaha |
05:02
๐
|
closure |
yeah, overlay is a good way to do it, although you do see the real page flash |
05:03
๐
|
Coderjoe |
I think they need to tweak their overlay JS (or whatever) to not show on that page |
05:03
๐
|
closure |
yep |
05:03
๐
|
closure |
or they could link to the eff's page |
05:03
๐
|
tef |
lol the learn more link |
05:03
๐
|
tef |
classy |
05:03
๐
|
closure |
http://www.fsf.org/ black |
05:04
๐
|
tef |
hmmm |
05:04
๐
|
tef |
I can make warcs with the crawler from work. |
05:04
๐
|
Coderjoe |
fixed |
05:04
๐
|
closure |
and has a title that's binary for some reason, heh |
05:04
๐
|
closure |
2012-01-18 01:04:52 ERROR 503: Service Temporarily Unavailable. |
05:04
๐
|
closure |
that's wget.. |
05:05
๐
|
closure |
wonder what's the switch to archive despite failing status code |
05:05
๐
|
closure |
heh, wikipedia fixed their link :) |
05:08
๐
|
Coderjoe |
woot |
05:08
๐
|
Coderjoe |
I blocked the blackout JS. I can now use the english wikipedia just fine |
05:09
๐
|
Coderjoe |
(since I already know about sopa and have written my critters about it, I don't think I need to see the blackout overlay) |
05:09
๐
|
closure |
already lead story on CNN |
05:10
๐
|
tef |
well I have a warc of wikipedia.org's front page now |
05:10
๐
|
tef |
should have all the css/js too |
05:10
๐
|
Coderjoe |
as a plus, I probably can modify this to block the jimboface banners, too |
05:11
๐
|
yipdw |
tef: hey me too! :) |
05:11
๐
|
Coderjoe |
(I added this to my adblock filter list, with "at start" anchoring: http://meta.wikimedia.org/w/index.php*title=Special:BannerLoader*banner=blackout ) |
05:11
๐
|
yipdw |
I do wish that they really just turned off the wiki software, or set up a redirect |
05:12
๐
|
yipdw |
but maybe they just couldn't do that across all their app servers |
05:12
๐
|
closure |
you can also just browse it in w3m :P |
05:13
๐
|
yipdw |
I am hardcore and browse it via gunzip -c *.warc.gz |
05:13
๐
|
closure |
oooh, nice on google |
05:13
๐
|
yipdw |
haha |
05:13
๐
|
closure |
enormous censorship bar of dooom |
05:13
๐
|
closure |
best google logo evar |
05:13
๐
|
yipdw |
time to grab that too |
05:14
๐
|
tef |
yipdw: I caught http://pastebin.com/NWiV4yEV |
05:15
๐
|
Coderjoe |
I wonder if they are blocking the english site on the secure site |
05:16
๐
|
yipdw |
tef: https://gist.github.com/7d787f4487a80b354d8f |
05:16
๐
|
yipdw |
come to think of it |
05:16
๐
|
yipdw |
I'm not sure I got what I actually went for |
05:16
๐
|
yipdw |
one moment |
05:17
๐
|
yipdw |
oh |
05:17
๐
|
yipdw |
I forgot to turn off robots.txt obey stuff |
05:18
๐
|
tef |
weird |
05:18
๐
|
tef |
you didn't get http://upload.wikimedia.org/wikipedia/commons/9/98/WP_SOPA_Splash_Full.jpg |
05:18
๐
|
yipdw |
yeah |
05:18
๐
|
tef |
hooray javascript? |
05:18
๐
|
closure |
http://thepiratebay.org/ heh, I didn't see that blackout coming |
05:18
๐
|
yipdw |
I don't think wget's seeing it |
05:19
๐
|
yipdw |
but whatever, you got a copy |
05:19
๐
|
tef |
grabbed |
05:19
๐
|
yipdw |
guess I'll grab www.google.com and the landing page |
05:19
๐
|
yipdw |
might have better luck there |
05:21
๐
|
tef |
i'm sorta glad the thing I made for work is capturing more :3 </smug> |
05:22
๐
|
PatC_ |
yay archiving |
05:23
๐
|
yipdw |
tef: haha |
05:24
๐
|
closure |
http://thedailywtf.com/Articles/Support-The-Daily-WTF-in-Supporting-the-Support-SOPA-Movement.aspx |
05:24
๐
|
closure |
wins IMHO by dragging BBSes into this |
05:26
๐
|
yipdw |
tef: can your software include embedded videos? |
05:26
๐
|
tef |
sorta |
05:26
๐
|
yipdw |
there's a few on https://www.google.com/landing/takeaction/sopa-pipa/ that I'm trying to get wget to somehow get |
05:26
๐
|
yipdw |
but without knowledge of YouTube's system I don't think it can really do it |
05:27
๐
|
closure |
http://www.metafilter.com/ black |
05:27
๐
|
yipdw |
maybe we should make a list |
05:28
๐
|
tef |
pls! |
05:28
๐
|
arrith |
yipdw: piratepad or the wiki? |
05:28
๐
|
closure |
http://boingboing.net/ black |
05:29
๐
|
tef |
recommend a pad over wiki while it's still being updated |
05:29
๐
|
yipdw |
http://archiveteam.org/index.php?title=SOPA_blackout_pages |
05:29
๐
|
yipdw |
works here |
05:30
๐
|
closure |
if you like edit conflicts and captchas |
05:30
๐
|
yipdw |
FINE |
05:31
๐
|
closure |
wow, check view source on boing boing |
05:31
๐
|
yipdw |
http://piratepad.net/I42hnyc0zk |
05:33
๐
|
yipdw |
blah, I just got disconnected |
05:33
๐
|
closure |
constantly |
05:33
๐
|
arrith |
i think there are other sites that run the etherpad sw |
05:33
๐
|
arrith |
just i don't know of any of them off the top of my head |
05:34
๐
|
yipdw |
http://beta.etherpad.org/p/KDVzcBCKTj |
05:35
๐
|
PatC_ |
4chan is sensored too |
05:35
๐
|
closure |
go congress |
05:36
๐
|
closure |
no it's not |
05:36
๐
|
PatC_ |
go to any section |
05:36
๐
|
PatC_ |
./g/ is |
05:36
๐
|
PatC_ |
all the text is black until you mouse over it |
05:36
๐
|
closure |
oh, i see.. I actually have to eeek |
05:36
๐
|
closure |
censoring the text on 4chan is like.. like.. words fail me |
05:39
๐
|
NovaKing |
http://wordpress.com/ |
05:40
๐
|
tef |
can we drop the * thing |
05:40
๐
|
tef |
it is hard to copy paste the urls out |
05:40
๐
|
tef |
and dump them into a script :v |
05:40
๐
|
closure |
lol |
05:40
๐
|
yipdw |
we can |
05:41
๐
|
tef |
awesome |
05:42
๐
|
yipdw |
oh sadface |
05:42
๐
|
yipdw |
wget doesn't retrieve page requisites if retrieval of a URL returns 503 |
05:42
๐
|
yipdw |
I wonder if I can force it to follow |
05:42
๐
|
NovaKing |
http://sopastrike.com/on-strike/ |
05:42
๐
|
yipdw |
don't really want to write custom tools to do this if tef already has them :P |
05:44
๐
|
closure |
the sopastrike page is crashing chromium |
05:44
๐
|
closure |
ok.. that's a list |
05:44
๐
|
NovaKing |
it's not all though |
05:44
๐
|
tef |
yeah but my tools are well, flakey |
05:44
๐
|
closure |
indeed not |
05:45
๐
|
yipdw |
haha |
05:45
๐
|
yipdw |
" |
05:45
๐
|
yipdw |
"Chromium's connection attempt to www.twitter.com was rejected. The website may be down, or your network may not be properly configured. |
05:45
๐
|
yipdw |
FUCK BLACKOUT, WE'RE GOING OFFLINE |
05:46
๐
|
yipdw |
(probably just my link though) |
05:47
๐
|
yipdw |
oh wait there it goes |
05:47
๐
|
arrith |
yeah twitter's ceo guy was being a dbag about the sopa stuff |
05:47
๐
|
arrith |
calling it idiotic or something |
05:48
๐
|
yipdw |
nah, he was just referring to Twitter specifically |
05:48
๐
|
arrith |
twitter isn't going down for anything but mismanagement |
05:48
๐
|
arrith |
well, twitter not going down for a day? wikipedia is effectively |
05:48
๐
|
yipdw |
I meant his tweet referred to Twitter going offline as foolish, not Wikipedia or anyone else |
05:49
๐
|
arrith |
ah yeah |
05:50
๐
|
yipdw |
<meta name="googlebot" content="noarchive"/> |
05:50
๐
|
yipdw |
<title>Why Wikipedia went down at midnight - CNN.com</title> |
05:51
๐
|
arrith |
yipdw: Imgur Tor Project Miro iSchool at Syracuse University Oreilly.com Wikipedia Reddit Mozilla WordPress.org icanhazCheezburger Network MoveOn.org Good Old Games TwitPic Minecraft Free Press Mojang XDA Developers Destructoid Good.is |
05:52
๐
|
closure |
lol |
05:52
๐
|
closure |
we'll report on it.. but it didn't happen |
05:52
๐
|
arrith |
at least twitpic is i gues |
05:56
๐
|
yipdw |
arrith: ? |
05:56
๐
|
arrith |
yipdw: those sites were said to be doing a blackout thing |
05:56
๐
|
yipdw |
oh |
05:56
๐
|
yipdw |
add them to the list |
06:00
๐
|
tef |
ahahah I had a stupid bug |
06:02
๐
|
NovaKing |
twitpic only doing logo by looks of it |
06:02
๐
|
yipdw |
aww sonuvabitch |
06:02
๐
|
yipdw |
http://savannah.gnu.org/bugs/index.php?20417 |
06:02
๐
|
NovaKing |
i'm only doing logo on my site |
06:02
๐
|
yipdw |
that bug deals *directly* with wget's 503 behavior |
06:02
๐
|
yipdw |
and I can't access it |
06:02
๐
|
yipdw |
... |
06:02
๐
|
yipdw |
fuck. |
06:04
๐
|
dan_ |
omg omg i hope you guys archived wikipedia because they took it down *ducks* |
06:04
๐
|
closure |
nice |
06:04
๐
|
NovaKing |
ps: ?banner=0 |
06:04
๐
|
NovaKing |
will take off the banner |
06:07
๐
|
SDr__ |
hey guys! |
06:07
๐
|
SDr__ |
you... are... awesome! |
06:09
๐
|
Coderjoe |
NovaKing: please provide a full url that works for that bug page? |
06:10
๐
|
yipdw |
yeah, I can't get to it |
06:10
๐
|
NovaKing |
eh? sorry, that was in relation to wikipedia |
06:11
๐
|
NovaKing |
as for 503 |
06:11
๐
|
yipdw |
oh |
06:11
๐
|
NovaKing |
that is due to google crawlers |
06:11
๐
|
NovaKing |
to not index the blackout page |
06:11
๐
|
NovaKing |
you might have to get wget source, and remove the 503 section |
06:12
๐
|
yipdw |
there isn't one |
06:12
๐
|
closure |
it's probably the generic failure code handling code |
06:13
๐
|
NovaKing |
there isn't one? |
06:13
๐
|
NovaKing |
...? |
06:13
๐
|
yipdw |
I cannot find a failure handler that deals with HTTP 503, no |
06:14
๐
|
yipdw |
there may be a generic one |
06:14
๐
|
yipdw |
I'm trying to find it |
06:15
๐
|
yipdw |
in fact, here's something weird in wget trunk: |
06:15
๐
|
yipdw |
$ grep -nR HTTP_STATUS_UNAVAILABLE * |
06:15
๐
|
yipdw |
src/http.c:131:#define HTTP_STATUS_UNAVAILABLE 503 |
06:15
๐
|
yipdw |
that's it |
06:15
๐
|
yipdw |
it's not used anywhere else |
06:15
๐
|
yipdw |
well, at least not straight off |
06:16
๐
|
yipdw |
could be some preprocessor tricks |
06:17
๐
|
closure |
I think it's only there for completeness |
06:17
๐
|
closure |
I was in the same place, didn't get any further |
06:21
๐
|
underscor |
anyone having trouble connecting to twitter? |
06:21
๐
|
underscor |
I thought they weren't participating |
06:21
๐
|
closure |
they participate at random thruought the year |
06:21
๐
|
underscor |
hahah |
06:21
๐
|
closure |
in total it comes to a day anyway |
06:22
๐
|
closure |
https://twitter.com/herpderpedia lolz |
06:24
๐
|
Coderjoe |
worksforme |
06:24
๐
|
Coderjoe |
but they might be buckling under the "OMG SHIT IS DOWN" posts |
06:25
๐
|
yipdw |
twitter can't go offline |
06:25
๐
|
yipdw |
my mom told me they were a cloud and clouds never go offline |
06:25
๐
|
underscor |
hahahah |
06:26
๐
|
Coderjoe |
yeah |
06:26
๐
|
Coderjoe |
it's the load of #wikipedia |
06:26
๐
|
yipdw |
damn |
06:27
๐
|
yipdw |
well |
06:27
๐
|
underscor |
Those comments give me no hope for humanity |
06:27
๐
|
yipdw |
it's probably a good thing that Twitter dumps its tweets in the LoC |
06:27
๐
|
yipdw |
because we have no hope of actually archiving any of that, short of a direct feed to Twitter's message brokers |
06:27
๐
|
underscor |
haha |
06:28
๐
|
yipdw |
mostly because the Javascript on New Twitter makes it such a fucking bear to wrestle |
06:28
๐
|
Coderjoe |
19 new tweets in the last minute |
06:28
๐
|
Coderjoe |
36 in 2m |
06:29
๐
|
Coderjoe |
kinda surprised by all the spanish tweets |
06:29
๐
|
SDr__ |
google://time in spain => 7:29am; consider the impact of waking up with your coffee in your hand, and seeing half the internetz blacked out |
06:31
๐
|
yipdw |
ok |
06:31
๐
|
yipdw |
so |
06:31
๐
|
Coderjoe |
they're mentioning the #wikipedia hash tag, though |
06:31
๐
|
yipdw |
how to actually archive these |
06:31
๐
|
yipdw |
short of "let tef do it" |
06:31
๐
|
Coderjoe |
and only en was supposed to go black |
06:35
๐
|
yipdw |
https://twitter.com/#!/_ItsONLY1_KEN/status/159510766695878656 |
06:39
๐
|
tef |
hmm now wondering how to fix the 503 nonsense |
06:39
๐
|
tef |
well it gets captured sort of |
06:42
๐
|
yipdw |
I don't think http://sopastrike.com/on-strike/ is accurate at all |
06:42
๐
|
closure |
I think it's full of crap and few sites on strike, and a few that will be later |
06:43
๐
|
Coderjoe |
4chan's got an odd blackout |
06:43
๐
|
Coderjoe |
the site still works, just the text is black on black |
06:44
๐
|
yipdw |
oh fuck |
06:44
๐
|
yipdw |
etherpad.org died |
06:44
๐
|
yipdw |
did someone archive it :P |
06:46
๐
|
yipdw |
ah ok it's back |
06:48
๐
|
MCV |
Sopastrike seems to have a lot of cybersquatting websites |
06:48
๐
|
MCV |
and facebook profilez |
06:48
๐
|
MCV |
basically they let anyone put their link up there I guess |
06:50
๐
|
MCV |
added http://www.qwantz.com/index.php |
06:53
๐
|
yipdw |
oh, cool |
06:53
๐
|
yipdw |
http://blog.nearlyfreespeech.net/2012/01/18/sopa-blackout-option/ |
07:01
๐
|
tef |
uuughghu |
07:05
๐
|
tef |
right and I have patched the 503 errors |
07:05
๐
|
yipdw |
in wget or another tool? |
07:05
๐
|
tef |
in another tools |
07:05
๐
|
yipdw |
I think I found where it's falling through |
07:05
๐
|
yipdw |
oh ok |
07:05
๐
|
tef |
work stuffs :/ |
07:06
๐
|
yipdw |
that's fine |
07:06
๐
|
yipdw |
so long as SOMEONE has a tool to grab this |
07:06
๐
|
tef |
well i'm running it now |
07:06
๐
|
tef |
seems to be surviving |
07:12
๐
|
yipdw |
ah ha |
07:12
๐
|
yipdw |
patched wget |
07:14
๐
|
yipdw |
https://gist.github.com/1631756 |
07:14
๐
|
yipdw |
that can be applied against wget bzr 2574 |
07:15
๐
|
yipdw |
I think it works |
07:16
๐
|
tef |
hurrah |
07:16
๐
|
tef |
well this crawl should end soonish I think |
07:22
๐
|
tef |
heh |
07:22
๐
|
tef |
it's picking up links under the blackout js and the blackout js |
07:24
๐
|
yipdw |
yeah |
07:25
๐
|
yipdw |
I'm not sure if the patched wget is handling wikipedia right |
07:25
๐
|
yipdw |
but as you've got a grab of that |
07:25
๐
|
yipdw |
its' fine |
07:28
๐
|
Coderjoe |
what the |
07:28
๐
|
Coderjoe |
http://twitter.com/JOIN__US/status/159537070338093056 |
07:29
๐
|
tef |
got about 200 resources now |
07:31
๐
|
Coderjoe |
http://twitter.com/Eastern_Star_/status/159537964865699840 |
07:31
๐
|
Coderjoe |
stupid |
07:31
๐
|
tef |
2000 even |
07:32
๐
|
yipdw |
gonna try archiving the wtf wikipedia tweets |
07:32
๐
|
tef |
good luck with that |
07:33
๐
|
tef |
twitter is awful |
07:33
๐
|
yipdw |
I'm probably just going to have to bang their REST API |
07:33
๐
|
Coderjoe |
have fun |
07:33
๐
|
Coderjoe |
20-40 tweets per minute on the two searches I have open |
07:36
๐
|
Coderjoe |
about the same on #stopsopa #sopa and #wikipedia |
07:38
๐
|
Coderjoe |
about the same on #wikipediablackout |
07:38
๐
|
Coderjoe |
#wikistrike hasn't moved in awhile |
07:38
๐
|
yipdw |
ugh |
07:38
๐
|
yipdw |
actually fuck that |
07:39
๐
|
yipdw |
I'll just continue running my news crawlers |
07:40
๐
|
Coderjoe |
people are discovering that the mobile site still works |
07:52
๐
|
Coderjoe |
http://twitter.com/HWGVictor/status/159543226984955905 |
07:54
๐
|
Coderjoe |
another blackout: http://cinematictitanic.com/sopa.html |
07:55
๐
|
Coderjoe |
well, the front page redirects to the sopa page |
07:55
๐
|
ersi |
http://flowingdata.com/2012/01/17/watching-wtf-wikipedia-as-sopapipa-blackout-begins/ |
07:55
๐
|
ersi |
heh |
07:55
๐
|
tef |
google isn't giving me the blackout |
07:55
๐
|
tef |
cos i'm on aws |
07:57
๐
|
yipdw |
https://github.com/zachstronaut/stop-sopa |
07:57
๐
|
yipdw |
oh, huh |
08:05
๐
|
yipdw |
heh |
08:05
๐
|
yipdw |
http://online.wsj.com/article/SB10001424052970203471004577142893718069820.html |
08:06
๐
|
Coderjoe |
"Rather, ..." |
08:06
๐
|
Coderjoe |
darn preview |
08:10
๐
|
Coderjoe |
lots of non-retweet repetition on #wikipediablackout: I support #wikipediablackout! Show your support here (tinyurl link) |
08:11
๐
|
Coderjoe |
new tending: #NoALaPincheSOPA |
08:11
๐
|
closure |
http://reedmorse.com/tmp/sopa-adwords.png can anyone who doesn't block ads confirm google is making anti-sopa adwords? |
08:12
๐
|
Coderjoe |
#NOALAPINCHESOPA hey hey hey you never say no to soup okaay? |
08:12
๐
|
yipdw |
closure: yes, there are ads for www.google.com/takeaction |
08:12
๐
|
yipdw |
if you search for "sopa" anyway |
08:12
๐
|
yipdw |
er, on Google |
08:13
๐
|
Coderjoe |
I tuned my adblock off and went to techcrunch and got the same google sopa banner |
08:13
๐
|
closure |
only there or other sites tho |
08:13
๐
|
yipdw |
they show up on Google search results too |
08:13
๐
|
closure |
techcrunch could have changed it.. if it's everywhere, that'd be huge |
08:14
๐
|
Coderjoe |
well, when I turn adblock back on, it goes away |
08:14
๐
|
Coderjoe |
give me another site with adwords |
08:15
๐
|
closure |
no idea, that's why I asked :) |
08:16
๐
|
Coderjoe |
it's also on ytmnd |
08:16
๐
|
Coderjoe |
(in an ad slot, not a site) |
08:17
๐
|
Coderjoe |
both ad slots, actually |
08:18
๐
|
Coderjoe |
geh |
08:18
๐
|
Coderjoe |
#stopsopa has 339 new tweets in the last 27 minutes or so |
08:18
๐
|
Coderjoe |
335 new #sopa in 25 minutes |
08:19
๐
|
Coderjoe |
176 new #wikipedia in 14 minutes |
08:19
๐
|
Coderjoe |
159 new #wikipediablackout in 10 minutes |
08:22
๐
|
yipdw |
must be lots of school papers |
08:28
๐
|
Coderjoe |
well, I am really pissing off firefox |
08:29
๐
|
Coderjoe |
in addition to all those search pages I had open, I just opened those two trackers. told revisit to grab 1000 tweets |
08:30
๐
|
yipdw |
oops |
08:30
๐
|
yipdw |
closure: what was the etherpad link? |
08:30
๐
|
yipdw |
I just closed it and can't find it in Chromium's history |
08:31
๐
|
yipdw |
oh wait, I can undo |
08:31
๐
|
Coderjoe |
spot says 373 "wtf wikipedia" tweets per hour |
08:31
๐
|
arrith |
yipdw: http://beta.etherpad.org/p/KDVzcBCKTj |
08:31
๐
|
yipdw |
thanks |
08:32
๐
|
Coderjoe |
a number of the ones spot is highlighting are "do this twitter search and laugh at dumb people freaking out" |
08:32
๐
|
closure |
http://theoatmeal.com lol |
08:34
๐
|
arrith |
yipdw: where does a site go if its just changed its banner? but isn't really down |
08:34
๐
|
Coderjoe |
"raging vagina tractors" |
08:34
๐
|
closure |
woot to whoever noticed xmonad.org |
08:35
๐
|
* |
closure strokes his 300 line .xmonadrc |
08:35
๐
|
Coderjoe |
that is one long gif |
08:36
๐
|
Coderjoe |
courtesy of spot: http://twitter.com/WEIQINGZ/status/159551680587898880 |
08:37
๐
|
yipdw |
who keeps claiming I have WARCs of all of those :P |
08:39
๐
|
yipdw |
ok now I kinda do |
09:09
๐
|
Coderjoe |
new stats at about 50 minutes: #wikipedia 650, #wikipediablackout 655, #sopa 673, #stopsopa 676 |
09:10
๐
|
Coderjoe |
oh man |
09:10
๐
|
Coderjoe |
this looks to be silly |
09:11
๐
|
Coderjoe |
#FactsWithoutWikipedia |
09:11
๐
|
Coderjoe |
Weaves are made from abandoned foetuses. And you wondered why they rubbed you the wrong way, huh? #FactswithoutWikipedia |
09:12
๐
|
Coderjoe |
During the selection process for a new pope in the event of a tie, it is settled with a game of conkers #factswithoutwikipedia |
09:15
๐
|
Coderjoe |
Tiger woods owns 4 brothels #FactsWithoutWikipedia |
09:15
๐
|
Coderjoe |
The Earth is not spherical, it is actually a rectangular prism. #FactsWithoutWikipedia |
09:24
๐
|
Coderjoe |
i can imagine the mess when the people in the states wake up in a few hours |
09:28
๐
|
NovaKing |
http://theoatmeal.com/sopa |
09:28
๐
|
perfinion |
that is excellent |
09:31
๐
|
ersi |
"P.S. Please pirate the shit out of this animated GIF. " |
09:34
๐
|
Coderjoe |
yeah... #FactsWithoutWikipedia is going fast... 20 tweets in the last minute |
09:36
๐
|
Coderjoe |
Dubstep is the cure to diabetes #factswithoutwikipedia |
09:36
๐
|
Coderjoe |
well shit |
09:38
๐
|
ersi |
1 of them is from a collueage |
09:39
๐
|
yipdw |
oh |
09:39
๐
|
yipdw |
heh |
09:39
๐
|
yipdw |
my buds followed through |
09:39
๐
|
yipdw |
http://chicagoparkour.com/ |
09:39
๐
|
yipdw |
how weird |
09:39
๐
|
Coderjoe |
78% of pregnancies occur due to the high incidence of couples sharing toothbrushes and bath towels. #FactsWithoutWikipedia |
09:41
๐
|
yipdw |
I've also been watching my news scraper and there is a suspicious dearth of pro-SOPA/pro-PIPA articles |
09:41
๐
|
yipdw |
but I'm just using Google News feeds |
10:14
๐
|
NovaKing |
http://xkcd.com/ |
10:54
๐
|
Coderjoe |
hahah |
10:56
๐
|
Coderjoe |
bahahaha |
10:58
๐
|
Coderjoe |
did you catch the hidden message? |
10:58
๐
|
Coderjoe |
or hidden comic |
10:58
๐
|
NovaKing |
ya |
15:23
๐
|
closure |
I've added a few dozen more sopa blackout pages to http://beta.etherpad.org/p/KDVzcBCKTj |
15:26
๐
|
closure |
oddly, archive.org is still up |
16:32
๐
|
don |
closure: not for me |
16:36
๐
|
tef |
mornin |
18:08
๐
|
tef |
right, restarting the crawl with the current list in the pirate pad |
18:09
๐
|
tef |
i've been capturing pages all day from about 5am, then again at 8,9 am and then at 1pm, and again this afternoon. (gmt) |
18:09
๐
|
tef |
i'll clean up the warcs tomorrow and find out where to shove them |
18:12
๐
|
yipdw |
heheheh |
18:12
๐
|
yipdw |
http://support.godaddy.com/godaddy/go-daddy-many-other-internet-leaders-oppose-sopa-pipa/?ci=56582 |
18:13
๐
|
tef |
heh |
18:13
๐
|
yipdw |
they are so full of shit |
18:13
๐
|
tef |
yipdw: did the warc options play nicely with the wget warc? and has the bug been filed upstream? |
18:13
๐
|
yipdw |
tef: I haven't tried cleaning up my existing WARCs, I'll get to that sometime tonight |
18:13
๐
|
tef |
cool |
18:14
๐
|
tef |
honest I am archiving xvideos.com for work |
18:14
๐
|
yipdw |
I haven't yet been able to file bugs because GNU took down savannah.gnu.org |
18:14
๐
|
tef |
heee |
18:14
๐
|
tef |
worst day to file a bug |
18:15
๐
|
yipdw |
no kidding |
18:16
๐
|
tef |
my boss is happy i'm running sopa crawls cos I keep finding bugs |
18:16
๐
|
tef |
hooray running stuff for archive team now counts as testing. heh heh heh |
18:16
๐
|
yipdw |
hah |
18:16
๐
|
yipdw |
these are some really weird edge cases |
18:17
๐
|
tef |
well not the bugs I found in my stuff (unrelated to warcs) |
18:17
๐
|
tef |
found a page hang i nthe crawler on <link> tags |
18:18
๐
|
yipdw |
ahh |
18:18
๐
|
tef |
because I am a terrible coder |
18:18
๐
|
tef |
and i've moved from cpickle to pickle for some ipc because of weird newline issues |
18:18
๐
|
tef |
thanks python |
18:27
๐
|
tef |
and flash is crashing my crawler again. i hate flash |
18:33
๐
|
tef |
might do a +1 crawl on news.ycombinator |
18:56
๐
|
yipdw |
so, evidently, if you get this level of ruckus going on, politicians back off |
18:56
๐
|
yipdw |
I just saw six news articles scroll by that said something to the effect of "cosponsors back off" |
18:56
๐
|
ersi |
for a little while, yes |
18:56
๐
|
yipdw |
yeah |
18:56
๐
|
ersi |
I bet they'll fuck it all up in a matter of time |
18:56
๐
|
yipdw |
I'm sure of it |
18:57
๐
|
tef |
hmm newzbin is also blacked out |
19:01
๐
|
yipdw |
what the hell |
19:01
๐
|
yipdw |
http://www.theweedblog.com/what-would-the-marijuana-movement-be-without-the-internet/ |
19:01
๐
|
yipdw |
that showed up in the news feeds for "SOPA" |
19:01
๐
|
yipdw |
oh, I see why |
19:33
๐
|
Nemo_bis |
sigh, uploading to IA at 50 kB/s |
19:33
๐
|
Nemo_bis |
and a 30.5 GiB is reported as 28.3 GiB bu curl |
19:39
๐
|
nitro2k01 |
Sure both units are meant to be GiB? |
19:43
๐
|
Nemo_bis |
nitro2k01, yes, why not? |
19:44
๐
|
nitro2k01 |
30.5/28.3 is close enough to 1024^3 |
19:44
๐
|
nitro2k01 |
Or 1024^3/1000^3 rather but you get my point |
19:45
๐
|
Nemo_bis |
yes, but curl uses GiB |
19:45
๐
|
Nemo_bis |
and nautilus too I hope |
19:45
๐
|
nitro2k01 |
It's the bigger one that would be using GB |
19:45
๐
|
Nemo_bis |
although it seems you're right, because the file is 30486180948 B |
19:46
๐
|
Nemo_bis |
bah, stupid nautilus |
19:47
๐
|
tef |
zombo.com is down too |
19:47
๐
|
yipdw |
SHIT |
19:47
๐
|
yipdw |
WHAT |
19:47
๐
|
tef |
Nemo_bis: btw I pushed my crap to SketchCow |
19:47
๐
|
tef |
thanks for reminding me :# |
19:47
๐
|
tef |
yipdw: I KNOW? NOTHING IS POSSIBLE! |
19:48
๐
|
yipdw |
YOU ARE NOT WELCOME |
19:48
๐
|
yipdw |
TO ZOMBOCOM |
19:48
๐
|
Nemo_bis |
tef, good :) |
19:50
๐
|
Nemo_bis |
but now who's Angra |
19:50
๐
|
tef |
i've got news.yc to crawl so I am capturing the front page and all the articles |
19:50
๐
|
tef |
hopefully |
19:50
๐
|
Nemo_bis |
or crawl333-9 |
19:51
๐
|
tef |
got 141 seeds in the sopa crawl i've just restarted |
19:51
๐
|
tef |
I probably have about 7-8 copies of some pages by now but eh i'll dedupe when i'm dead |
20:20
๐
|
db48x |
just had some teeth pulled |
20:20
๐
|
db48x |
what is everyone else up to? |
20:29
๐
|
chronomex |
it's snowing out. |
20:30
๐
|
chronomex |
paaaanic |
20:30
๐
|
db48x |
:) |
20:36
๐
|
DFJustin |
yeah got a pretty good dump here on vancouver island |
20:36
๐
|
chronomex |
didnt know you were closeby |
20:41
๐
|
closure |
http://blumenauer.house.gov/ lol |
20:42
๐
|
db48x |
who is archiving blackout messages? |
20:42
๐
|
nitro2k01 |
https://twitter.com/#!/herpderpedia |
20:43
๐
|
db48x |
nitro2k01: heh, awesome |
20:45
๐
|
yipdw |
db48x: me |
20:45
๐
|
yipdw |
db48x: got a few thousand SOPA-related news stories, some blackout pages |
20:46
๐
|
yipdw |
I also just got a Zeiss CP.2 85mm T/2.1 for a rental |
20:46
๐
|
yipdw |
and am kind of freaking out because the lens is fucking awesome |
20:46
๐
|
db48x |
heh |
20:47
๐
|
db48x |
hrm |
20:47
๐
|
db48x |
anesthesia is wearing off already |
20:49
๐
|
nitro2k01 |
How many teeth? |
20:50
๐
|
db48x |
4 |
20:50
๐
|
nitro2k01 |
Wisdom? |
20:50
๐
|
db48x |
yea |
20:50
๐
|
nitro2k01 |
Ah. Have fun! |
20:50
๐
|
db48x |
three impacted, one that is useless without the one underneath it |
20:51
๐
|
tef |
db48x: me too. |
20:52
๐
|
db48x |
good |
20:52
๐
|
tef |
i've got ~200 M or so but mostly dupes :/ I'm doing a fresh crawl now and a +1 from hacker news |
20:53
๐
|
tef |
from the ones in progress I have 78M and 87M |
20:53
๐
|
tef |
of stuff |
20:53
๐
|
tef |
approx |
20:54
๐
|
db48x |
not bad |
20:54
๐
|
tef |
hmm that's including the db uncompressed |
20:54
๐
|
yipdw |
so far, 4.1 gigs of SOPANews |
20:54
๐
|
yipdw |
I'm going to guess that like tef there's a shitload of dupes in there |
20:55
๐
|
tef |
I have dupes across crawls but not within them I think |
20:55
๐
|
tef |
db48x: yipdw is using wget and i'm using work resources. I'm capturing some of the js wget is missing, etc. |
20:57
๐
|
tef |
I could probably fire up more machines next time but that's a sort of a work in progress |
20:57
๐
|
yipdw |
heh, next time |
20:57
๐
|
yipdw |
we need a catchy acronym for the next bil |
20:57
๐
|
yipdw |
l |
20:57
๐
|
tef |
I think my boss is looking to expose api access to make this sort of thing easier |
20:57
๐
|
yipdw |
INNOVATE |
20:57
๐
|
yipdw |
International, uh |
20:58
๐
|
tef |
if I had s3 write support for archive.org it would make things a lot easier |
20:58
๐
|
tef |
I could just make my crawler upload to them instead of amazon |
20:59
๐
|
yipdw |
heh |
20:59
๐
|
yipdw |
https://twitter.com/#!/jimmy_wales/status/159737306419433472 |
21:00
๐
|
yipdw |
oh, and http://uncyclopedia.wikia.com/wiki/Main_Page |
21:02
๐
|
closure |
s3 write for archive.org is not too hard |
21:03
๐
|
tef |
yeah I just signed up for an api key |
21:04
๐
|
tef |
hmmm |
21:05
๐
|
db48x |
this anesthesia didn't last as long as I was told |
21:05
๐
|
tef |
I could make an irc bot that does a page capture and uploads the warc to archive.org :V |
21:05
๐
|
yipdw |
that'd actually be pretty useful |
21:05
๐
|
yipdw |
since that's the main way these are being found |
21:05
๐
|
db48x |
I like the one at thedailywtf.com |
21:06
๐
|
closure |
huh.. that would be a neat service for here |
21:06
๐
|
tef |
well irc does make a great command and control structure |
21:07
๐
|
tef |
can always make an archiveteam-bots |
21:07
๐
|
tef |
channel with more than one bot to service requests |
21:12
๐
|
tef |
heh |
21:12
๐
|
tef |
I did seriously consider using irc as a messagebus at one point |
21:16
๐
|
tef |
I think i'll need the right credentials first / bucket details but I could likely get an irc bot up later this weekend |
21:21
๐
|
db48x |
odd |
21:21
๐
|
db48x |
my private xmpp chat room is bouncing up and down |
21:29
๐
|
yipdw |
http://www.flickr.com/photos/jcn/6721179703/sizes/l/in/photostream/ |
21:30
๐
|
db48x |
:) |
21:45
๐
|
db48x |
does anyone here know how the openlibrary.org source code is organized? |
22:01
๐
|
yipdw |
http://hq.deviantart.com/journal/Join-a-SOPA-and-PIPA-debate-280023798 |
22:01
๐
|
yipdw |
er |
22:01
๐
|
yipdw |
http://sakimichan.deviantart.com/art/STOP-SOPA-Bill-276510440 |
22:38
๐
|
db48x |
hrm |
22:38
๐
|
db48x |
I'm having trouble concentrating |
22:40
๐
|
SketchCow |
Been there |
22:41
๐
|
SketchCow |
yipdw: I'd love those pages when you're done. |
22:41
๐
|
SketchCow |
How are you doing it? |
22:41
๐
|
yipdw |
I'm periodically scraping Google News for links |
22:41
๐
|
SketchCow |
Are you using WGET or heretrix? |
22:41
๐
|
yipdw |
and then spawning an assload of wget-warc processes |
22:42
๐
|
SketchCow |
Oh, excellent. |
22:42
๐
|
SketchCow |
Everyone here is happy. |
22:42
๐
|
yipdw |
I can substitute Heritrix |
22:42
๐
|
SketchCow |
No no! Use wget-warc |
22:42
๐
|
yipdw |
heh, ok |
22:42
๐
|
yipdw |
I haven't yet verified the integrity of all the WARCs; it's entirely possible that wget-warc is tripping up on some of them |
22:42
๐
|
yipdw |
I have spot-checked a few and those do look okay though |
22:43
๐
|
underscor |
2.2656 TiB RAM |
22:43
๐
|
underscor |
54.0039 TiB Disk |
22:43
๐
|
underscor |
826 Virtual CPUs |
22:43
๐
|
underscor |
[5:42:22 PM EDT] Andy Bezella: fyi worker farm is: |
22:43
๐
|
underscor |
Interesting statistics on the IA workers |
22:44
๐
|
yipdw |
actually, one problem I have periodically hit is wget-warc just...freezing up |
22:44
๐
|
yipdw |
there's no indication in the log that anything's wrong |
22:45
๐
|
yipdw |
oh wait, I just noticed that I have a shitload of TCP connections open, I bet that's it |
22:45
๐
|
yipdw |
weird |
23:07
๐
|
SketchCow |
Regardless, you're a hero, yipdw |
23:15
๐
|
NovaKing |
https://static.thepiratebay.org/legal/sopa.txt |
23:58
๐
|
Coderjoe |
http://www.wired.com/threatlevel/2012/01/scotus-re-copyright-decision/ |