[00:03] looks like we are no longer on the way to destruction [00:06] SketchCow: \o/ [00:06] root@teamarchive-0:/3/MOBILEME-SETS# rm -rf full-1328865740 [00:06] always a good sign [00:06] DFJustin: ohhh? [00:07] where's this info from? [00:22] http://i.imgur.com/YvqhK.jpg [00:36] You won a 1KB hard drive. [00:37] :D [01:25] balrog: http://tracker.archive.org/df.html [01:27] DFJustin: for me, it never changes from "loading" now [01:27] Oh [01:27] Sorry [01:27] My bad [01:27] Try now [01:29] http://www.metafilter.com/112641/Listening-to-the-past-recorded-on-tin-foil-and-glass-for-the-first-time-in-over-a-century [01:29] i'm kinda curious about the internals of pubnub [01:29] Coderjoe: y u no pusher? [01:29] eh? [01:30] pusherapp [01:30] http://pusher.com/ [01:30] I have a private web app I'm working on that could use message push capabilities [01:30] much better than pubnub :) [01:30] but pubnub has that fantastic song [01:30] kennethre: i mentioned pubnub because that is what the df tracker was using [01:31] I love pubnub's song [01:31] Coderjoe: I assumed you wrote it [01:31] the song is truly the best thing about the product [01:31] ^ [01:31] http://www.youtube.com/watch?v=jZgcEj_qKLU [01:31] It works well though [01:31] but my problem is that I don't want to use someone elses servers for pushing the messages [01:31] check out pusher though [01:31] http://rdd-glimpse.heroku.com [01:31] damnit they turned it off [01:31] nvm [01:32] and no, I have no special access to batcave or whereever tracker is [01:32] I met the evangelist from pusher this week [01:32] Coderjoe: why not? [01:32] What is rdd glimpse? [01:32] it does seem like a hosted irc server somewhat, but that's nice anyway [01:32] I did call him on the lack of rest in his rest api and he was like 'fair cop' [01:32] underscor: it was a realtime feed of every article going through readability âÂ I worked there up until a few months ago [01:32] underscor: looks like they turned it off though [01:33] awww [01:33] kennethre: why not use someone else's hosted service? lack of trust? [01:33] That's a bummer :( [01:34] Coderjoe: the internet is a hosted service [01:34] Coderjoe: paranoia ;) [01:34] granted, I'm not doing anything that absolutely requires confidentiality [01:34] but that's a bit different [01:34] and it isn't paranoia if they really are out to get you [01:35] (silly ISP logging legislative attempts, SOPA/PIPA/ACTA/etc, yaddayadda) [01:36] (google's tracking, etc) [01:40] meh [01:41] lol [01:41] my time is more valuble to me [01:41] particularly with realtime stuff [01:41] it's a queue [01:41] if it's data storage, that's a bit different [01:41] but that's just me [01:42] we use aws at work because of the storage guarantees from s3 [01:44] http://fortressofsolitude.textfiles.com/ [01:44] * SketchCow bows [01:45] tef: we use s3 for EVERYTHING at heroku. You can't beat nine 9s of retention. [01:45] yup [01:45] tef: when you put data into a heroku postgres server, its also instantly streamed to s3 [01:45] <3 s3 [01:45] hmmmm [01:45] the WAL logs [01:45] we should look at heroku more seriously [01:46] but not this release cycle :3 [01:46] SketchCow: hahah [01:46] so superman is a hoarder? [01:46] tef: for what? [01:47] I work on web archiving and web archiving accessories [01:47] i knew that [01:47] :) [01:48] well we do access servers from postgres for replaying [01:48] SketchCow: hahahahaha [01:48] I love it [01:48] we've been looking for hosted postgres because we're lazy people [01:48] tef: we're by far the best one http://postgres.heroku.com [01:48] We all know which way it's going for Superman [01:49] SketchCow: Now I just need a user account >:) [01:49] tef: all the plans get 2TB of storage [01:50] You get your own machine [01:50] You are not going on this one [01:50] Just ask for a machine from AB [01:50] :( [01:50] I have one, they just won't give me a lot of space to play with [01:50] See? [01:50] Hoarding machines [01:50] ha [01:50] I knew it [01:51] I'm just addicted to big storage [01:51] I want to get a real disk array [01:51] I have a measly 10TB at home :( [01:52] pre-redundancy [01:52] SketchCow: But how am I going to write cool things like my utility to watch disk storage without a user account! [01:52] :D [01:53] Oh, you mean that script that's constantly slamming the Disk I/O so a webpage can update? [01:53] ouch :) [01:53] It's not constantly slamming the disk IO [01:53] df -m is not resource intensive [01:54] Also, it only runs every 2 second [01:54] s [01:57] --no-sync do not invoke sync before getting usage info (default) [01:58] the process fork is prob more expensive than the call itself [02:02] > Content-Length: 204820131840 [02:02] > Expect: 100-continue [02:02] nice [02:02] SketchCow: What's the identifier? [02:02] Let it go for a while, it's STILL uploading. [02:03] oh joy 100 continue [02:03] I am worried about this - I think it might be possible kennethre can upload faster than I can output to archive.org. [02:03] I know, I just want to watch it in the s3 console [02:03] SketchCow: can't kennethre upload directly to archive.org with a s3 token ? [02:03] (now that sam explained it to me) [02:03] Yes and no. [02:03] tef: Not with the way the scripts currently work [02:03] if he finds the right magic incantations to put in the headers [02:03] ah [02:04] He COULD, but a script needs to be run a certain way to track how the sets are generated. [02:04] It would have to be rather kludgy [02:04] I can do kludgy [02:04] Also, yeah [02:04] the tracker requires things are done a certain way [02:04] Already, we're doing 200gb sets into archive.org, which is kind of hates. [02:04] We're going to produce many sets at that rate. [02:04] http://i.imgur.com/Lus4Y.png [02:08] SketchCow: oh i can scale much much more if you'd like me to prove :) [02:08] SketchCow: I'm completely off though. I have been all day [02:09] If strait-to-s3 woudl be easy, that'd be *much* preferred, since i'm already on ec2 [02:09] archive run a s3 like api [02:09] Yeah, it's not actually to s3 [02:10] It's to IA's s3-compatible API [02:10] kennethre: But you'd just clog up the API [02:10] It's slower than even batcave is [02:10] Yeah, I might experiment with FTP. [02:10] SketchCow: I know you personally like using the regular public things, but you will get a LOT faster performance if you send your own contrib_submit [02:10] Just to see if that machine is faster. [02:10] It will do direct rsync from batcave to the destination petabox [02:11] Not interested. [02:11] Okay [02:11] Improve S3, don't support internal-only hacks [02:11] Because then they own you [02:11] underscor: i thought you meant actual s3 [02:11] I use it for the 400GB daily stuff I've been stuffing in [02:11] Well, use a client that supports chunked transfers then [02:11] We added that to the API [02:12] Then you'll get amazing performance [02:12] What clients. [02:12] Basically axel in reverse [02:12] Curl? [02:12] underscor: why would chunked transfers make a difference though ? [02:12] Sorry [02:13] Not chunked, multipart [02:13] SketchCow: No, curl doesn't [02:13] I'm looking to see if s3cmd does [02:13] hmm [02:13] one moment [02:13] I mean if you can do appending (unlike amazon s3 iirc) that would be nice I guess [02:13] fuck me this computer is slow [02:13] Yeah, you can do resuming [02:13] But the biggest boon is that it's parallel [02:14] So you get wonderful performance [02:14] aaah I see [02:14] what's wrong with the rsync situtation now? [02:15] The main problem right now is that I have a LOT of things goin on batcae right now. [02:15] SketchCow: https://gist.github.com/977597 [02:15] so we need more batcaves :) [02:15] Needs slight modification to point to ias3 though [02:15] huh [02:15] our automated migrations off of batcave [02:15] contrib_submit doesn't look like internal-only O_O [02:15] *or [02:16] Coderjoe: If you submit a job it will say "nope" [02:17] external caller [02:17] http://www.archive.org/contrib_submit.php [02:18] it's happily showing me the help at least [02:18] underscor: iirc don't you need additional headers for ia s3 ? [02:18] Yeah, help is public [02:18] tef: Yes [02:19] That's what I was saying [02:19] Needs slight modification [02:19] "Content-Type" : content_type, [02:19] # Metadata that we need to pass in before attempting an upload. [02:19] basic_headers = { [02:19] content_type = guess_type(local_file, False)[0] or "application/octet-stream" [02:19] } [02:19] aaah [02:19] oh btw I made an example crawler that makes warcs and uses requests https://github.com/tef/crawler. the pipelining is noice. [02:20] tef: damn you're faster than me :) [02:20] https://github.com/tef/crawler [02:20] But it supports chunking natively, so you'll likely get >100mbps [02:20] tef: quite excellent âÂ notice a big speedup w/ the auto keep-alive? [02:20] yeah I kinda stayed up till 9am that night [02:20] kennethre: I had a code sample of a crawler and a library for warcs, oh and requests helped :-) [02:21] hehe [02:21] <3 [02:21] most of the changes were going 'fuck it it doesnt need to be in a bunch of different files' [02:21] that said [02:21] I think I pasted you where I recreate the http messages from accessors [02:22] yeah [02:22] makes me feel wrong and dirty inside [02:22] i need to add an iteritems() method on CaseInsensitveDict [02:22] oh those [02:22] gotcha [02:22] different person :) [02:23] https://github.com/tef/crawler/blob/master/crawler.py#L107 [02:23] literally I want to make that bit pretty i.e a sort of raw-ish http output [02:24] one day *dreams* [02:24] one thing at a time :) [02:25] one day there will be a http library with parsers that aren't intertwined with sockets [02:25] hah, that's hilarious [02:25] (and there is one, I wrote it for warc processing) [02:25] hooray! code reuse! [02:25] yeah, the standard lib can be a bit cancerous at times [02:25] urilib3000 [02:26] which is the main thing that's keeping me from putting requests into the standard lib [02:26] oh god don't do that [02:26] i know, right? [02:27] found a great quote today [02:27] "Bundling into core Python requires a package to be essentially stable, i.e., dead." [02:27] might as well put your code on sourceforge [02:27] nah, it'd get tons of use [02:27] and it would do a huge service to the community [02:28] is it finished though ? [02:28] nah, i have a grad student working on oauth [02:28] I mean, can you build a s3 library on top of requests ? [02:28] that's the major thing i want to get before 1.0 [02:28] of course [02:28] boto is working actively to move to requests :) [02:28] so you do 100-expect on post messages with a specific size ? [02:29] oh so that's the other huge thing [02:29] when i dropped urllib2, i lost streaming uploads [02:29] so that needs to happen. [02:29] those are the two major peices [02:29] won't be too hard [02:30] then beautiful soup needs to be included :v [02:30] NO [02:30] * kennethre stabs tef [02:30] haha [02:30] so many people want me to add content-type-decoding [02:31] "why do i have to deserialize my json?" [02:31] I threatened to burn down a coworkers house for pushing something into a sprint midway [02:31] stfu [02:31] haha [02:31] kennethre: because actually a generalized mechanize like thing would be an obvious thing to build atop [02:31] yeah i'd love to replace mechanize [02:31] though multi-mechanize is pretty nice now [02:31] my boss said to him 'this is reality. not fantasy. things won't get done. ' [02:31] lol [02:32] in a different convo [02:32] the other boss also said 'we need to have consensus and not fuck with the sprint' essentially [02:32] I like my job \o/ [02:32] sounds like you like scrum :) [02:33] although it's been a year long argument to get this sort of fortnight driven releses [02:33] I dunno if it is scrumm [02:33] it's more like ' version numbers are now a measure of time, not features' [02:33] we work out the priorities every two weeks and work how much time we're willing to *spend* on features, not estimates [02:33] thing is as a result we actually know what is getting fixed and done [02:34] i hate anything that's not autonomous :) [02:34] haha, well we are somewhat autonomous [02:34] this is where we get together with business and ensure we know what t he fuck is happening [02:34] with clients, contracts, sales, etc [02:35] for-profit archive company? [02:36] yeah [02:36] interesting [02:36] compliance archiving mostly [02:36] how does that work? [02:36] but we do research grants too for our academic bit on the side [02:36] it's either brand heritage [02:36] or things like sarbanes oxley or ftc rules about archiving the fuck out of everything [02:37] heh [02:37] interesting [02:38] my boss offically supports me trying to do archive team things [02:38] I was planning to get a snapshot bot that takes page warcs from things pasted in here [02:38] in my dubious free time [02:39] haha [02:39] do it [02:39] thats what i was going to build for requests [02:39] warcify([response, response, response]) [02:39] i might be able to ship it in requests itself :) [02:40] you had to name it something that allowed for confusion over subject [02:40] well first I have to finish making this release deployable [02:40] Coderjoe: it's a great name :) [02:49] kennethre: next up replace mime handling please [02:49] tef: elaborate.. [02:49] ah my co-worker complains about the mime library in python [02:51] for headers? [02:51] i think [02:51] oh well it's not even there in python3 anymore [02:51] pretty sure they killed it [02:51] \o/ [02:54] no it's bad [02:54] they removed critical functionality [02:54] like a way to detect upload boundries [02:54] no big deal [02:54] heh [02:54] (i fucking hate python 3) [02:54] well breaking things in py3 is ok [02:54] why do you hate it ? [02:55] porting requests to it was one of the most difficult things i've ever had to do [02:55] as a python dev [02:55] ah [02:55] was it bytes/strings? [02:55] the force bytes/str seperation [02:55] yes [02:55] it's not so bad on its own [02:55] I mean, aren't you ust doing stuff with bytestrings all the time [02:55] but supporting 2.x and 3.x at the same time makes it more difficult [02:56] no [02:56] in 2, you can use bytes or unicode everywhere [02:56] in 3.x i just made it so all the non-binary entry points expect unicode [02:56] once i did that and seperated resopnse text from content (which was auto unicode-decoded before) [02:56] it wasn't *so* bad [02:56] yeah [02:57] still, i don't think the way it was was so bad [02:57] i never had any problems. [02:57] it sounds more like it shok out a design bug [02:57] the real problem [02:57] is half the standard lib is broken now [02:57] the real problem is unicode/bytes is awful [02:57] the bytes/unicode thing is all over the place [02:57] lol yes [02:57] http://lucumr.pocoo.org/2011/12/7/thoughts-on-python3/ [02:58] ^^ I agree with every word of this. [02:58] ah yes [02:59] thing is to me python 3 is a fork of python [02:59] supported by the core devs [02:59] but unlike within 2x I can't from __future__ import the bits I need [02:59] I can pretty much agree with that [02:59] I have to rewrite for python 3 [03:00] we all do [03:00] it's a new language [03:00] and at that point, why not ruby or other things ? [03:00] and it wasn't needed imo [03:00] thing is, they also broke the abi at the same time [03:00] I mean, backwards incompatible changes are good [03:00] but you need to be able to opt-in early [03:01] at this rate someone could fork 2.8 and get away with it [03:01] fork /a/ 2.8 [03:02] yes [03:02] and that's exactly what I, as a developer, want. [03:02] a 2.8 [03:02] if I bring this up online I am pretty sure I will get told off for trolling [03:02] because core devs seem quite sensitive about 3 [03:04] i'll just get the PSF to fund 2.8 [03:04] lol [03:04] no big deal [03:04] haha [03:04] pypy can do it [03:04] well if there was a 2.8 which came with python 3 tacked on [03:04] I mean, heh i'll just write a rpc layer with ctypes :v [03:05] "If 2to3 is our upgrade path to Python 3, then py2js is the upgrade path to JavaScript." [03:05] lawl [03:05] nice [03:05] tef: you can't, because they use the same namespace identifier [03:07] kennethre: run it in a seperate process [03:07] tef: or just don't run it at all ;) [03:08] heheh [03:11] yeah I am sure talk of 2.8 will get you lynched at pycon [03:14] nah, everyone loves me :) [03:14] Going to pycon should get you lynched at pycon [03:14] SketchCow should definitely go to pycon [03:14] Ha ha, next century [03:14] * SketchCow packs [03:15] not a fan of the pythons? [03:15] Not a fan of the guido [03:15] And guide is a big influence on the pythons [03:15] interesting [03:15] guide/guido [03:15] any particular reason? [03:15] He's a jerk? [03:15] lol, what'd he do? [03:16] What, I need to point to my stolen lolly and my popped balloon? [03:16] most programmers are jerks though [03:16] Most are, yeah. [03:16] might be observational bias on my account [03:16] i've met him, seemed pretty awkward but not confrontational or anything [03:16] he's no linus torvalds :) [03:16] *that* is a jerk [03:17] Shhh, I'm in his home country [03:17] agents will come [03:17] lmao [03:17] perkele [03:17] the gits [03:17] adding: 1994/Pinball_Arcade_CD-MSJ/msj_pin6.zip (deflated 0%) [03:17] adding: 1994/Pinball_Arcade_CD-MSJ/msj_pin9.zip (deflated 0%) [03:17] adding: 1994/Pinball_Arcade_CD-MSJ/MSJ!PIN.NFO (deflated 74%) [03:17] adding: 1994/Pinball_Arcade_CD-MSJ/msj_pin1.zip (deflated 0%) [03:17] adding: 1994/Pinball_Arcade_CD-MSJ/msj_pin4.zip (deflated 0%) [03:17] adding: 1994/Pinball_Arcade_CD-MSJ/msj_pin2.zip (deflated 0%) [03:17] adding: 1994/Pinball_Arcade_CD-MSJ/msj_pin7.zip (deflated 0%) [03:17] adding: 1994/Arena_All_Continential_Maps-POLICE/ (stored 0%) [03:17] adding: 1994/Arena_All_Continential_Maps-POLICE/FILE_ID.DIZ (deflated 59%) [03:17] adding: 1994/Arena_All_Continential_Maps-POLICE/POLICE.NFO (deflated 77%) [03:17] adding: 1994/Arena_All_Continential_Maps-POLICE/plc_amap.zip (deflated 0%) [03:17] but yeah if I didn't like software because I didn't like the authors, I wouldn't be left with a lot of software to use [03:17] adding: 1994/Beneath_A_Steel_Sky_GERMAN-UA/ (stored 0%) [03:18] adding: 1994/Beneath_A_Steel_Sky_GERMAN-UA/ua-sky5.zip (deflated 0%) [03:18]