[00:00] winr4r: awesome [00:00] give me a couple of minutes [00:00] deal [00:04] the Miserere Allegro has colonized my brain stem [00:04] even after listening to other music all day, it's running through my mind continuously [00:05] Remember, we got shut down constantly, consistently, and threatened, db48x [00:05] yea [00:05] So all things considered, it was pretty aggressive suicide on their part. [00:05] also that only includes what you uploaded the other day [00:05] But add the archiveteam.org stuff, that will hopefully increase it. [00:06] lots more in the, yea [00:06] Yes. I suspect that's the majority of what we got. [00:06] This was just stuff I found in uploads people dumped to batcave. [00:11] http://pastebin.com/NsbryDvm [00:11] that should work [00:12] oops, that goes the wrong way [00:12] I want to convert the entity reference into the character it references [00:12] oh. [00:13] dude i just misread you [00:13] * winr4r headdesk. [00:13] heh, it happens [00:13] unhtml is the right name for the function though :) [00:13] in my defense, i haven't been awake long. :/ [00:16] one sec [00:17] Setting up this 2 tb of tarring [00:17] Then will go the transfers. But first... the tarring! [00:22] that'll take a while [00:23] winr4r: take your time [00:23] it's a tricky problem [00:23] i meant what cow's doing [00:23] oh [00:23] as for me, i need caffeine, so brb [00:23] :) [00:29] k all better [00:29] does it need to do ones like Ӓ ? [00:30] unknown [00:30] I guess it should [00:30] let's go with utf-8 output [00:40] not even the Moonlight sonatta can displace the Miserere [00:41] sonata [00:50] http://pastebin.com/EeUpf4SM [00:50] that should work [00:51] newfile.txt [00:51] shiny [00:51] doesn't account for some joker forgetting the trailing semicolon but fuck 'em [00:51] indeed [00:53] replace the last line with sys.stdout.write(d) if you don't want a forced line break at the end [00:53] nope, it's perfect [00:53] Oh thank goodness, another script to help me. [00:53] Now it takes a directory of yahoo videos and names it YV-FIRST-LAST as needed. [00:54] winr4r: superb [00:54] SketchCow: neat [00:55] winr4r: no, you're right. I already had a newline, so printing an extra one is extra [00:57] There it goes! 1.6tb of yahoo video being turned into tars. [00:57] That'll be a nice add tonight. [00:58] db48x: different version if you want to force a newline only if it doesn't have one at the end: http://pastebin.com/HcxJDr6a [00:58] i don't know *why*, but seeing "warning: no newline at end of file" enough times makes you that way [00:59] SketchCow: sweet [00:59] winr4r: :) [00:59] yea, I got tired of that warning a long time ago [01:07] doh [01:07] it's outputting 0xa9 for © [01:08] but wait, that's right [01:08] so why doesn't it survive when pasted? [01:10] I get a replacement character when I paste :( [01:10] oh well, the output is perfect [01:11] interesting [01:11] as in the literal string "0xa9"? [01:12] no, the byte 0xa9 [01:12] I was expecting a multi-byte sequence [01:12] ah [01:12] dunno [01:14] oh, that is wrong [01:14] it should be a multibyte sequence [01:15] ahh [01:15] http://docs.python.org/release/2.3/lib/module-htmlentitydefs.html [01:15] it maps from entities to ISO-8859-1 [01:15] oh [01:15] silly me [01:15] no, silly python [01:16] no, that was me using the wrong thing, entitydefs rather than name2codepoint [01:16] entitydefs is such a braindead thing to include [01:17] it shouldn't even be possible to make that error [01:17] heh [01:17] File "/home/db48x/archives/lulupoetry/unified/unhtml.py", line 10, in unhtml [01:18] TypeError: expected a character buffer object [01:18] s = s.replace("&%s;" % x, name2codepoint[x]) [01:18] that should be unichr(name2codepoint[x]) [01:18] one second [01:19] ah, heh [01:19] (you'll get an error if you do that, you need to set the default encoding first) [01:19] indeed I do [01:20] http://pastebin.com/yr2rnzZV [01:20] try that, sorry [01:21] perfect [01:21] reload(sys) because site.py actually deletes the "setdefaultencoding" function from sys, for reasons that probably make sense to python developers [01:21] http://pastebin.com/nihmctj4 [01:22] heh [01:22] are those backslashes meant to be there? [01:22] they're in the html [01:22] k fine, that's not something i screwed up then ;D [01:22] yea :) [01:23] so you're making a text dump of lulu poetry? [01:23] yep [01:23] nice :) [01:23] the html has so much gorp [01:25] mhm [02:11] [02:13] took the words out of my mouth [02:23] Yeah, suck those words [02:40] OK, plane is landing. [02:40] Or crashing [02:40] They never really tell you. [02:40] Either way, on the ground in 20. [02:49] i didn't even know they had internet on planes, so i figured you had already landed [02:49] welcome to the future [02:52] SketchCow: dude, welcome to SF [02:52] * closure is hanging at noisebridge [02:52] sf, home of internet [02:53] sf, home of cisco [02:53] * BlueMax is the state the obvious guy [02:53] sf, home of many hobos [02:56] and a temporary home to SketchCow [02:56] (hobo?) [02:56] :P [03:34] Hi does any one have any suggestion for backing up the google buzz / google reader data people have shared with me? [03:50] sketchcow is sleeping at noisebridge? [10:37] I own this network, #hackers 4 skillz & #trustnetwork 4 shellz [10:38] I own this network, #hackers 4 skillz & #trustnetwork 4 shellz [10:39] uhhuh. [10:41] I own this network, #hackers 4 skillz & #trustnetwork 4 shellz [10:42] I own this network, #hackers 4 skillz & #trustnetwork 4 shellz [10:43] let's see if that stops it [16:23] SketchCow: are you still thinking of doing a hangout later?