[00:03] looks like we are no longer on the way to destruction [00:06] SketchCow: \o/ [00:06] root@teamarchive-0:/3/MOBILEME-SETS# rm -rf full-1328865740 [00:06] always a good sign [00:06] DFJustin: ohhh? [00:07] where's this info from? [00:22] http://i.imgur.com/YvqhK.jpg [00:36] You won a 1KB hard drive. [00:37] :D [01:25] balrog: http://tracker.archive.org/df.html [01:27] DFJustin: for me, it never changes from "loading" now [01:27] Oh [01:27] Sorry [01:27] My bad [01:27] Try now [01:29] http://www.metafilter.com/112641/Listening-to-the-past-recorded-on-tin-foil-and-glass-for-the-first-time-in-over-a-century [01:29] i'm kinda curious about the internals of pubnub [01:29] Coderjoe: y u no pusher? [01:29] eh? [01:30] pusherapp [01:30] http://pusher.com/ [01:30] I have a private web app I'm working on that could use message push capabilities [01:30] much better than pubnub :) [01:30] but pubnub has that fantastic song [01:30] kennethre: i mentioned pubnub because that is what the df tracker was using [01:31] I love pubnub's song [01:31] Coderjoe: I assumed you wrote it [01:31] the song is truly the best thing about the product [01:31] ^ [01:31] http://www.youtube.com/watch?v=jZgcEj_qKLU [01:31] It works well though [01:31] but my problem is that I don't want to use someone elses servers for pushing the messages [01:31] check out pusher though [01:31] http://rdd-glimpse.heroku.com [01:31] damnit they turned it off [01:31] nvm [01:32] and no, I have no special access to batcave or whereever tracker is [01:32] I met the evangelist from pusher this week [01:32] Coderjoe: why not? [01:32] What is rdd glimpse? [01:32] it does seem like a hosted irc server somewhat, but that's nice anyway [01:32] I did call him on the lack of rest in his rest api and he was like 'fair cop' [01:32] underscor: it was a realtime feed of every article going through readability — I worked there up until a few months ago [01:32] underscor: looks like they turned it off though [01:33] awww [01:33] kennethre: why not use someone else's hosted service? lack of trust? [01:33] That's a bummer :( [01:34] Coderjoe: the internet is a hosted service [01:34] Coderjoe: paranoia ;) [01:34] granted, I'm not doing anything that absolutely requires confidentiality [01:34] but that's a bit different [01:34] and it isn't paranoia if they really are out to get you [01:35] (silly ISP logging legislative attempts, SOPA/PIPA/ACTA/etc, yaddayadda) [01:36] (google's tracking, etc) [01:40] meh [01:41] lol [01:41] my time is more valuble to me [01:41] particularly with realtime stuff [01:41] it's a queue [01:41] if it's data storage, that's a bit different [01:41] but that's just me [01:42] we use aws at work because of the storage guarantees from s3 [01:44] http://fortressofsolitude.textfiles.com/ [01:44] * SketchCow bows [01:45] tef: we use s3 for EVERYTHING at heroku. You can't beat nine 9s of retention. [01:45] yup [01:45] tef: when you put data into a heroku postgres server, its also instantly streamed to s3 [01:45] <3 s3 [01:45] hmmmm [01:45] the WAL logs [01:45] we should look at heroku more seriously [01:46] but not this release cycle :3 [01:46] SketchCow: hahah [01:46] so superman is a hoarder? [01:46] tef: for what? [01:47] I work on web archiving and web archiving accessories [01:47] i knew that [01:47] :) [01:48] well we do access servers from postgres for replaying [01:48] SketchCow: hahahahaha [01:48] I love it [01:48] we've been looking for hosted postgres because we're lazy people [01:48] tef: we're by far the best one http://postgres.heroku.com [01:48] We all know which way it's going for Superman [01:49] SketchCow: Now I just need a user account >:) [01:49] tef: all the plans get 2TB of storage [01:50] You get your own machine [01:50] You are not going on this one [01:50] Just ask for a machine from AB [01:50] :( [01:50] I have one, they just won't give me a lot of space to play with [01:50] See? [01:50] Hoarding machines [01:50] ha [01:50] I knew it [01:51] I'm just addicted to big storage [01:51] I want to get a real disk array [01:51] I have a measly 10TB at home :( [01:52] pre-redundancy [01:52] SketchCow: But how am I going to write cool things like my utility to watch disk storage without a user account! [01:52] :D [01:53] Oh, you mean that script that's constantly slamming the Disk I/O so a webpage can update? [01:53] ouch :) [01:53] It's not constantly slamming the disk IO [01:53] df -m is not resource intensive [01:54] Also, it only runs every 2 second [01:54] s [01:57] --no-sync do not invoke sync before getting usage info (default) [01:58] the process fork is prob more expensive than the call itself [02:02] > Content-Length: 204820131840 [02:02] > Expect: 100-continue [02:02] nice [02:02] SketchCow: What's the identifier? [02:02] Let it go for a while, it's STILL uploading. [02:03] oh joy 100 continue [02:03] I am worried about this - I think it might be possible kennethre can upload faster than I can output to archive.org. [02:03] I know, I just want to watch it in the s3 console [02:03] SketchCow: can't kennethre upload directly to archive.org with a s3 token ? [02:03] (now that sam explained it to me) [02:03] Yes and no. [02:03] tef: Not with the way the scripts currently work [02:03] if he finds the right magic incantations to put in the headers [02:03] ah [02:04] He COULD, but a script needs to be run a certain way to track how the sets are generated. [02:04] It would have to be rather kludgy [02:04] I can do kludgy [02:04] Also, yeah [02:04] the tracker requires things are done a certain way [02:04] Already, we're doing 200gb sets into archive.org, which is kind of hates. [02:04] We're going to produce many sets at that rate. [02:04] http://i.imgur.com/Lus4Y.png [02:08] SketchCow: oh i can scale much much more if you'd like me to prove :) [02:08] SketchCow: I'm completely off though. I have been all day [02:09] If strait-to-s3 woudl be easy, that'd be *much* preferred, since i'm already on ec2 [02:09] archive run a s3 like api [02:09] Yeah, it's not actually to s3 [02:10] It's to IA's s3-compatible API [02:10] kennethre: But you'd just clog up the API [02:10] It's slower than even batcave is [02:10] Yeah, I might experiment with FTP. [02:10] SketchCow: I know you personally like using the regular public things, but you will get a LOT faster performance if you send your own contrib_submit [02:10] Just to see if that machine is faster. [02:10] It will do direct rsync from batcave to the destination petabox [02:11] Not interested. [02:11] Okay [02:11] Improve S3, don't support internal-only hacks [02:11] Because then they own you [02:11] underscor: i thought you meant actual s3 [02:11] I use it for the 400GB daily stuff I've been stuffing in [02:11] Well, use a client that supports chunked transfers then [02:11] We added that to the API [02:12] Then you'll get amazing performance [02:12] What clients. [02:12] Basically axel in reverse [02:12] Curl? [02:12] underscor: why would chunked transfers make a difference though ? [02:12] Sorry [02:13] Not chunked, multipart [02:13] SketchCow: No, curl doesn't [02:13] I'm looking to see if s3cmd does [02:13] hmm [02:13] one moment [02:13] I mean if you can do appending (unlike amazon s3 iirc) that would be nice I guess [02:13] fuck me this computer is slow [02:13] Yeah, you can do resuming [02:13] But the biggest boon is that it's parallel [02:14] So you get wonderful performance [02:14] aaah I see [02:14] what's wrong with the rsync situtation now? [02:15] The main problem right now is that I have a LOT of things goin on batcae right now. [02:15] SketchCow: https://gist.github.com/977597 [02:15] so we need more batcaves :) [02:15] Needs slight modification to point to ias3 though [02:15] huh [02:15] our automated migrations off of batcave [02:15] contrib_submit doesn't look like internal-only O_O [02:15] *or [02:16] Coderjoe: If you submit a job it will say "nope" [02:17] external caller [02:17] http://www.archive.org/contrib_submit.php [02:18] it's happily showing me the help at least [02:18] underscor: iirc don't you need additional headers for ia s3 ? [02:18] Yeah, help is public [02:18] tef: Yes [02:19] That's what I was saying [02:19] Needs slight modification [02:19] "Content-Type" : content_type, [02:19] # Metadata that we need to pass in before attempting an upload. [02:19] basic_headers = { [02:19] content_type = guess_type(local_file, False)[0] or "application/octet-stream" [02:19] } [02:19] aaah [02:19] oh btw I made an example crawler that makes warcs and uses requests https://github.com/tef/crawler. the pipelining is noice. [02:20] tef: damn you're faster than me :) [02:20] https://github.com/tef/crawler [02:20] But it supports chunking natively, so you'll likely get >100mbps [02:20] tef: quite excellent — notice a big speedup w/ the auto keep-alive? [02:20] yeah I kinda stayed up till 9am that night [02:20] kennethre: I had a code sample of a crawler and a library for warcs, oh and requests helped :-) [02:21] hehe [02:21] <3 [02:21] most of the changes were going 'fuck it it doesnt need to be in a bunch of different files' [02:21] that said [02:21] I think I pasted you where I recreate the http messages from accessors [02:22] yeah [02:22] makes me feel wrong and dirty inside [02:22] i need to add an iteritems() method on CaseInsensitveDict [02:22] oh those [02:22] gotcha [02:22] different person :) [02:23] https://github.com/tef/crawler/blob/master/crawler.py#L107 [02:23] literally I want to make that bit pretty i.e a sort of raw-ish http output [02:24] one day *dreams* [02:24] one thing at a time :) [02:25] one day there will be a http library with parsers that aren't intertwined with sockets [02:25] hah, that's hilarious [02:25] (and there is one, I wrote it for warc processing) [02:25] hooray! code reuse! [02:25] yeah, the standard lib can be a bit cancerous at times [02:25] urilib3000 [02:26] which is the main thing that's keeping me from putting requests into the standard lib [02:26] oh god don't do that [02:26] i know, right? [02:27] found a great quote today [02:27] "Bundling into core Python requires a package to be essentially stable, i.e., dead." [02:27] might as well put your code on sourceforge [02:27] nah, it'd get tons of use [02:27] and it would do a huge service to the community [02:28] is it finished though ? [02:28] nah, i have a grad student working on oauth [02:28] I mean, can you build a s3 library on top of requests ? [02:28] that's the major thing i want to get before 1.0 [02:28] of course [02:28] boto is working actively to move to requests :) [02:28] so you do 100-expect on post messages with a specific size ? [02:29] oh so that's the other huge thing [02:29] when i dropped urllib2, i lost streaming uploads [02:29] so that needs to happen. [02:29] those are the two major peices [02:29] won't be too hard [02:30] then beautiful soup needs to be included :v [02:30] NO [02:30] * kennethre stabs tef [02:30] haha [02:30] so many people want me to add content-type-decoding [02:31] "why do i have to deserialize my json?" [02:31] I threatened to burn down a coworkers house for pushing something into a sprint midway [02:31] stfu [02:31] haha [02:31] kennethre: because actually a generalized mechanize like thing would be an obvious thing to build atop [02:31] yeah i'd love to replace mechanize [02:31] though multi-mechanize is pretty nice now [02:31] my boss said to him 'this is reality. not fantasy. things won't get done. ' [02:31] lol [02:32] in a different convo [02:32] the other boss also said 'we need to have consensus and not fuck with the sprint' essentially [02:32] I like my job \o/ [02:32] sounds like you like scrum :) [02:33] although it's been a year long argument to get this sort of fortnight driven releses [02:33] I dunno if it is scrumm [02:33] it's more like ' version numbers are now a measure of time, not features' [02:33] we work out the priorities every two weeks and work how much time we're willing to *spend* on features, not estimates [02:33] thing is as a result we actually know what is getting fixed and done [02:34] i hate anything that's not autonomous :) [02:34] haha, well we are somewhat autonomous [02:34] this is where we get together with business and ensure we know what t he fuck is happening [02:34] with clients, contracts, sales, etc [02:35] for-profit archive company? [02:36] yeah [02:36] interesting [02:36] compliance archiving mostly [02:36] how does that work? [02:36] but we do research grants too for our academic bit on the side [02:36] it's either brand heritage [02:36] or things like sarbanes oxley or ftc rules about archiving the fuck out of everything [02:37] heh [02:37] interesting [02:38] my boss offically supports me trying to do archive team things [02:38] I was planning to get a snapshot bot that takes page warcs from things pasted in here [02:38] in my dubious free time [02:39] haha [02:39] do it [02:39] thats what i was going to build for requests [02:39] warcify([response, response, response]) [02:39] i might be able to ship it in requests itself :) [02:40] you had to name it something that allowed for confusion over subject [02:40] well first I have to finish making this release deployable [02:40] Coderjoe: it's a great name :) [02:49] kennethre: next up replace mime handling please [02:49] tef: elaborate.. [02:49] ah my co-worker complains about the mime library in python [02:51] for headers? [02:51] i think [02:51] oh well it's not even there in python3 anymore [02:51] pretty sure they killed it [02:51] \o/ [02:54] no it's bad [02:54] they removed critical functionality [02:54] like a way to detect upload boundries [02:54] no big deal [02:54] heh [02:54] (i fucking hate python 3) [02:54] well breaking things in py3 is ok [02:54] why do you hate it ? [02:55] porting requests to it was one of the most difficult things i've ever had to do [02:55] as a python dev [02:55] ah [02:55] was it bytes/strings? [02:55] the force bytes/str seperation [02:55] yes [02:55] it's not so bad on its own [02:55] I mean, aren't you ust doing stuff with bytestrings all the time [02:55] but supporting 2.x and 3.x at the same time makes it more difficult [02:56] no [02:56] in 2, you can use bytes or unicode everywhere [02:56] in 3.x i just made it so all the non-binary entry points expect unicode [02:56] once i did that and seperated resopnse text from content (which was auto unicode-decoded before) [02:56] it wasn't *so* bad [02:56] yeah [02:57] still, i don't think the way it was was so bad [02:57] i never had any problems. [02:57] it sounds more like it shok out a design bug [02:57] the real problem [02:57] is half the standard lib is broken now [02:57] the real problem is unicode/bytes is awful [02:57] the bytes/unicode thing is all over the place [02:57] lol yes [02:57] http://lucumr.pocoo.org/2011/12/7/thoughts-on-python3/ [02:58] ^^ I agree with every word of this. [02:58] ah yes [02:59] thing is to me python 3 is a fork of python [02:59] supported by the core devs [02:59] but unlike within 2x I can't from __future__ import the bits I need [02:59] I can pretty much agree with that [02:59] I have to rewrite for python 3 [03:00] we all do [03:00] it's a new language [03:00] and at that point, why not ruby or other things ? [03:00] and it wasn't needed imo [03:00] thing is, they also broke the abi at the same time [03:00] I mean, backwards incompatible changes are good [03:00] but you need to be able to opt-in early [03:01] at this rate someone could fork 2.8 and get away with it [03:01] fork /a/ 2.8 [03:02] yes [03:02] and that's exactly what I, as a developer, want. [03:02] a 2.8 [03:02] if I bring this up online I am pretty sure I will get told off for trolling [03:02] because core devs seem quite sensitive about 3 [03:04] i'll just get the PSF to fund 2.8 [03:04] lol [03:04] no big deal [03:04] haha [03:04] pypy can do it [03:04] well if there was a 2.8 which came with python 3 tacked on [03:04] I mean, heh i'll just write a rpc layer with ctypes :v [03:05] "If 2to3 is our upgrade path to Python 3, then py2js is the upgrade path to JavaScript." [03:05] lawl [03:05] nice [03:05] tef: you can't, because they use the same namespace identifier [03:07] kennethre: run it in a seperate process [03:07] tef: or just don't run it at all ;) [03:08] heheh [03:11] yeah I am sure talk of 2.8 will get you lynched at pycon [03:14] nah, everyone loves me :) [03:14] Going to pycon should get you lynched at pycon [03:14] SketchCow should definitely go to pycon [03:14] Ha ha, next century [03:14] * SketchCow packs [03:15] not a fan of the pythons? [03:15] Not a fan of the guido [03:15] And guide is a big influence on the pythons [03:15] interesting [03:15] guide/guido [03:15] any particular reason? [03:15] He's a jerk? [03:15] lol, what'd he do? [03:16] What, I need to point to my stolen lolly and my popped balloon? [03:16] most programmers are jerks though [03:16] Most are, yeah. [03:16] might be observational bias on my account [03:16] i've met him, seemed pretty awkward but not confrontational or anything [03:16] he's no linus torvalds :) [03:16] *that* is a jerk [03:17] Shhh, I'm in his home country [03:17] agents will come [03:17] lmao [03:17] perkele [03:17] the gits [03:17] adding: 1994/Pinball_Arcade_CD-MSJ/msj_pin6.zip (deflated 0%) [03:17] adding: 1994/Pinball_Arcade_CD-MSJ/msj_pin9.zip (deflated 0%) [03:17] adding: 1994/Pinball_Arcade_CD-MSJ/MSJ!PIN.NFO (deflated 74%) [03:17] adding: 1994/Pinball_Arcade_CD-MSJ/msj_pin1.zip (deflated 0%) [03:17] adding: 1994/Pinball_Arcade_CD-MSJ/msj_pin4.zip (deflated 0%) [03:17] adding: 1994/Pinball_Arcade_CD-MSJ/msj_pin2.zip (deflated 0%) [03:17] adding: 1994/Pinball_Arcade_CD-MSJ/msj_pin7.zip (deflated 0%) [03:17] adding: 1994/Arena_All_Continential_Maps-POLICE/ (stored 0%) [03:17] adding: 1994/Arena_All_Continential_Maps-POLICE/FILE_ID.DIZ (deflated 59%) [03:17] adding: 1994/Arena_All_Continential_Maps-POLICE/POLICE.NFO (deflated 77%) [03:17] adding: 1994/Arena_All_Continential_Maps-POLICE/plc_amap.zip (deflated 0%) [03:17] but yeah if I didn't like software because I didn't like the authors, I wouldn't be left with a lot of software to use [03:17] adding: 1994/Beneath_A_Steel_Sky_GERMAN-UA/ (stored 0%) [03:18] adding: 1994/Beneath_A_Steel_Sky_GERMAN-UA/ua-sky5.zip (deflated 0%) [03:18] adding: 1994/Beneath_A_Steel_Sky_GERMAN-UA/FILE_ID.DIZ (deflated 47%) [03:18] I'd say today is a good day [03:18] Not true [03:18] You'd be left with archiveteam tools [03:18] That's all you need [03:18] hehehe [03:18] but the linuxes it runs on! [03:18] steve jobs was a jerk too [03:18] made damn good systems though [03:19] * Aranje doesn't like them [03:19] No, he pounded human beings into the ground to make them make good enough systems [03:19] Also, Archive Team is the Yelling Bird of Programming Guilds [03:19] haha [03:19] http://questionablecontent.wikia.com/wiki/Yelling_Bird [03:20] SketchCow: I can agree with that :) [03:20] I made this t-shirt about 7 years ago http://printf.net/~tef/photos/tshirt/tef_tourette.jpg [03:21] but in scotland, cunt is a more affectionate term than offensive [03:21] http://www.indietits.com/comics/blogs.png [03:21] sometimes :v [03:22] tef: hahaha [03:22] oh noes, down to 800GB [03:23] SketchCow: the reason I asked about guido is because I've never heard anyone say that about him before :) [03:23] http://www.questionablecontent.net/indietits/comics/wowagain.png [03:24] We're kind of fucked, I'm trying to get files off of batcave and I am up at 6am and the bus leaves at 9:30am and then I'm on a plane for 10 hours. [03:24] No real solution. [03:24] hopefully a wifi plane then :) [03:25] I use too many emoticons [03:25] o(O.o)o [03:25] Nah, we're fucked, we're bringing it all in too fast. [03:26] Solution: Give me sudo, and I can help load into IA [03:26] :D [03:26] Yeah, wait, let me check my... NO [03:26] Especially since it's friday night so I have no bedtime! [03:27] SketchCow: Aww :( [03:27] Or just chmod them then :P [03:27] well I think trying to work out a way just to push it to ia directly will be faster than batcave to ia [03:27] No such solutio. [03:27] ^ [03:27] I'm now seeing if I can get google groups loaded over. [03:28] lol underscor you authwhore [03:28] Authwhore? [03:28] :P [03:28] authwhore. [03:28] You heard me :P [03:28] no sane person asks for root [03:28] hahaha [03:28] * underscor is not sane [03:28] root is a burden [03:28] haha [03:28] true [03:28] I share root on all my servers [03:28] tef: very true [03:28] it's going 'yes blame me for everything and I have to fix shit' [03:29] he didn't ask for root, only sudo :P [03:29] Also, yes [03:29] You guys are confusing "Is it progromatically possible to transfer bits into the internet archive servers" and "what are the procedures to politically fit into archive.org's storage paradigm" [03:29] I know all of you can do the first. [03:29] kennethre: crack instead of meth [03:29] Right now I am working on the second, only I can bridge that. [03:29] ah [03:29] oic [03:29] root@teamarchive-0:/3# du -sh MOBILEME [03:29] 3.5T MOBILEME-SETS [03:29] 398G MOBILEME [03:29] du -sh *SETS [03:29] root@teamarchive-0:/3# du -sh *SETS [03:29] well I figure the technical barriers are easy to solve and I can help with that [03:30] Anyway, as we see there, it's not necessarily mobileme filling that drive (11tb) [03:30] Let me see if I can get something going. [03:30] SketchCow: Not me anymore either [03:30] but yeah social problems are always the annoying ones in programming, I have no idea :v [03:30] I have ~2GB on that drive now [03:30] I'm clearing out everything else I can too [03:31] I am sure it's googlegroups. [03:31] btw, mailing drives next monday or tuesday [03:31] We should just start a large storage-as-a-service company and abuse it [03:31] so they should be there in time for you [03:32] Already am, dude [03:32] it's called archive.org [03:32] heheheh [03:32] hahaha [03:32] without the social problem :) [03:32] kennethre: no if it has people it has social problems [03:32] if it has any of you assholes it has social problems [03:32] ^ [03:32] haha [03:32] It's like firing a frathouse into a nunnery [03:33] :D [03:33] :D [03:33] that's an awesome analogy [03:33] there are some pisshead archivists i've met [03:33] there was this awesome chap who does the digital stuff for nz library [03:33] but most of them were quite straight laced [03:37] although it seems how you start a fight with an archivist is by arguing over naming things [03:37] haha [03:38] I can't put any standard bits into warctools because every archive is *special* and *unique* [03:38] every single institution. file names, id numbersm serial numbers. compression settings [03:38] oh: and I want to find who made the ARC format and kill them [03:39] you have to parse the body of the first record to parse the headers of the first and subsequent records *head explodes* [03:40] http itself is already a pretty good format :) [03:40] almost [03:40] just need some metadata around it [03:41] iso-8859-1 [03:41] there were fiths about using mime [03:41] mime can die in a fire [03:41] warc needs a transfer-chunked style thing [03:41] parsing http headers properly is something that never happens [03:41] so I don't need to keep them in memory [03:42] (header values) [03:42] kennethre: sort of [03:43] omst people don't produce them correctly [03:44] gah, typing with lag ruins my ability to type [03:45] irssi + slow ssh? [03:45] yeah [03:45] joy [03:45] hmm [03:45] you can do local buffering [03:45] well, not with curses nvm [03:46] now i'm webscale [03:46] nepotism in action [03:47] but yeah warc is 'not bad' not great by any means [03:50] underscor: What's that countdown page again? [03:50] http://tracker.archive.org/df.html [03:50] Except it's a count-up [03:50] :D [03:50] haha [03:50] it's a countup :) [03:51] Yeah, take that. [03:52] * kennethre spins back up [03:52] I still don't want the kennethre horn turned back on, but I found what I was doing wrong somewhere. [03:52] I can get a better grip on things shortly. [03:52] awesome [03:52] awesome :) [03:53] The buffering issue is still there, but at least the space for the slowpokes can be there while I get the uploading more automated. [03:56] This machine is being so hammered. It's doing a straight rm and it's STILL going slow. [03:56] Also, there's a tonof files in that directory, I guess. [03:56] Granted, getting a gigabyte back every second isn't that bad. [03:56] Especially with all the assholes uploading. [03:59] haha [03:59] * We are completely uploaded and fine [03:59] < HTTP/1.1 200 Ok [03:59] iotop is cool [03:59] That's the first of the full sets to upload. [03:59] Let's see if s3 shits the bed and then shits a biiger, more intense bed full of shitted beds. [03:59] What identifier? [04:00] shitted-bedception? [04:00] [item_size] => 200019661 [04:00] hahahahaha [04:00] We have to go DEEPER [04:01] * underscor waits for disk io to shoot up [04:01] http://ia600802.us.archive.org/mrtg/diskv3.html [04:01] aww, how cute. it's only 3 weeks old [04:02] w00t 1TiB [04:06] haha [04:09] hahahaha [04:11] OK, so I'm still in deletion city with those files and I am currently concentrating on writing the proposal/information for a documentary I'm being hired to film this summer. [04:11] (A commercial job, straight-fee which I can use to pay off some debts/taxes) [04:12] http://www.us.archive.org/log_show.php?task_id=96126299 is just loving copying over a 198gb file, let me tell you [04:19] * Aranje wonders aloud where urlte.am went [04:20] A hot item on my list to fix. [04:20] After I get batcave undr control, it's the next thing up. [04:20] Keep on me about it. [04:21] * Aranje nods [04:21] It's my fault, for not raping dot.fm in the eye [04:25] SketchCow: hahahahahahahaha [04:25] That [04:25] is [04:25] the [04:25] best [04:58] http://ia600807.us.archive.org/zipview.php?zip=/29/items/archiveteam-googlegroups-yw/googlegroups-yw.zip&file= [05:33] http://www.archive.org/details/archiveteam-googlegroups-zb&reCache=1 [05:36] 100% done by script. [05:39] very neat [05:40] automation is the way to go. congrats on more automation [05:50] Gets better, give me a moment. [06:40] so, apparently using edit.php to add files to your item nukes files that came from s3 because it doesn't know about them [06:50] did you wait until after derive had run on your item? [06:51] ah that could be it [06:58] OK, so. [06:58] I now have a script running those scripts. [06:58] It's in a screen session. [06:58] hah, classy [06:58] So while I'm in the air, it'll upload 909gb of google groups. [06:58] scriptception [06:59] http://www.archive.org/details/archiveteam-googlegroups [06:59] You will see entries with more than just googlegroups-XX.zip [06:59] That was the old paradigm [07:02] 4.9gb uploaded so far. [07:03] 1305 [07:03] root@teamarchive-0:/3/googlegroups# ls | wc -l [07:33] Off it goes. [12:04] Hm, http://memac.heroku.com isn't showing downloaded users anymore. Even though I'm fetching and reporting users downloaded to the tracker [14:40] ersi: WORKSFORME WONTFIX [14:53] Yeah, it does. I commented in #memac ;) [15:57] damn, ran into a 20 pages per day (per paid account) limit on a site [15:57] i want to get ~8000 pages [15:57] gonna need to script that :) [22:47] kennethre, is this you? O_o http://ia700000.us.archive.org:8088/mrtg/networkv2.html [22:48] space enough for ~4 h of that [23:29] ndurner: i'm not running [23:29] er, Nemo_bis ^ [23:30] ok [23:30] :) [23:31] i wish it was me :) [23:31] i want to be at the top of that dashboard :)