[00:07] o_o [00:07] indeed [00:11] Wow. Yeah, awesome. Mind you, so is the script that originally made things possible, but we expect that sort of awesome out of alard already. [00:21] http://www.archive.org/details/conquer-videogame-craze-album [00:45] alard: shiny [01:00] anything esle by this guy? [01:00] seriously though, IA guys, good job on the metadata [01:02] No [01:02] I did that [01:02] (Nothing else by the guys, I did the metadata just now) [01:18] alard: that page is awesome indeed [01:21] SketchCow: sorry for the misplaced credit, you deserve all sue to you. [01:21] due to you [01:21] alard: is there a way to enqueue usernames? [01:22] No misplaced [01:22] I was just indicating where this came from in this case. [01:23] k then. anyway good job where'd you find this? i grabbed without even looking [01:24] It's been floating around forever and ever. [01:24] I just happened to do a big disk transfer this time - I've put a couple hundred gigs of data I had at home and am putting them on archive.org. [01:24] Like 110 FreeBSD CD-ROMs. [01:25] chronomex: Put them up somewhere, then I'll add them. [01:25] ive been meaning to ask about that, what isp do u have and are they also in rochester ny? how do u transfer multi hundred gig collections in less than a month [01:25] (Though not today, it's way past bedtime.) Latest tracker page addition: a colored graph. [01:26] alard: okay. I ran across a few the other day but didn't want to bug you. it'd be great to have a form field online to submit with. [01:26] Who knows. :) [01:34] uhoh [01:34] underscor had a huge bmp there in the last 12 hours [01:35] bump [01:48] oh no. he is about to pass me [01:48] curse you, stupidly small instance store [01:50] argh christ why does the arc format put the names of the headers inside the record body [01:57] seriously? a 12GB user? [02:59] Coderjoe: Beating you! [02:59] :D [03:01] yeah... damn my pause to upload 340GB to the batcave [03:02] I also seem to be maxing out the cpu on my high memory xlarge instance [03:04] hmm [03:04] ok... now not so much [03:05] and the loadavg dropped considerably [03:05] How expensive is that?? [03:05] less than a buck an hour, but not much less [03:05] iirc [03:05] i'm using a spot instance, so it is lower than the normal one most of the time [03:06] but it seems to be about $6/day :-\ [03:06] that's not very cheap, have you researched alternatives much? [03:06] last month was only $40. this month will kinda suck, since I'll also have a lot of data out and more s3 usage [03:06] score! I was wondering how I was going to demodulate some data, and then I remembered I have a textbook on (de)modulation! [03:10] i originally spun the instance up when I was trying to rescue a wget-warc that was consuming more and more memory and heavilly swapping my free micro instance [03:10] and then i re-tried the command on the m2.xlarge when it failed on the micro. [03:10] and I keep throwing shit at it :-\ [03:12] (and that wget-warc that was swapping the micro to shreds wound up getting everything on the m2.xlarge oom_killed) [03:13] hmmm. [03:14] guys, why don't we set our files to readonly after we're done downloading them? it's a decent safeguard against accidentally fucking things up. [03:16] hmm [03:17] I really should be shutting this down [03:18] Coderjoe: That's gonna be pricey :/ [03:18] so far between the instance and the data (pushing stuff to the batcave) I'm already looking at $86.14 estimated for Nov (just for the 8 days so far this month) [03:18] Ouch [03:18] last month was only $31.16 [03:18] That's... quite pricey [03:19] the S3 so far is only $.99 [03:19] the m2.xlarge is 41.04 [03:19] and the data transfer out of ec2 is 44.11 [03:20] That's not bad for s3 [03:20] But you pay to get it back out, don't you? [03:20] http://www.youtube.com/watch?v=tnCk0uGwFZI [03:21] yeah. I stashed the stuff I already downloaded on my ec2 instance into s3 in addition to rsyncing to the batcave [03:21] I'll likely end up deleting the stuff I stashed in s3 later [03:21] ah [03:21] putting it in essentially cost nothing on the s3 side. [03:28] underscor: you and your infinite bandwidth ... [03:28] :D [03:28] yeah... you're not even paying for that... [03:29] ha [03:29] Well, I mean, my parents are [03:29] not for the IA traffic [03:34] True [03:34] Most of the mobileme is from home though [03:35] anyone here have access to ansi standards? [03:36] depends on which one, i think. and what state. one of my co-workers dug up a copy of ansi sql92 somewhere. it might be a draft, but I don't know for certain [03:37] I need X3.55 or X3.56, the 1600-bpi phase encoding for magtape [03:53] IA might, try poking SketchCow [03:54] hm, it was cross-published as ISO 4057 [03:54] iso standards are much easier to steal than ansi, I've found [04:05] also ecma 46 [04:08] ah, really? sweet :) [04:08] yeah. you can usually find copies of the drafts, since those were free [04:09] http://www.ecma-international.org/publications/files/ECMA-ST-WITHDRAWN/ECMA-46,%201st%20Edition,%20March%201976.pdf [04:09] it's kind of from the 1970s, I doubt the drafts are online [04:09] I stand corrected [04:09] fuckyeah, you are my hero [04:09] I have been looking around but uh it is hard to tell if they are the same [04:10] 100% the same isn't necessary, I just need to know details of the 1600bpi phase encoding [04:10] am recovering tapes for a 1973-era computer [04:11] do you know the manufacturer? [04:11] there seemed to be a bunch of pdfs on qic.org [04:11] tapes were made by 3M under contract to Western Electric / Bell Labs / AT&T [04:12] see also http://www.classiccmp.org/pipermail/cctalk/2003-August/248957.html [04:13] hm. [04:13] the documents I have say ... [04:13] I like the temp file rsync created [04:13] .friendster.004200001-004300000.tar.SExybF [04:13] chronomex: as in one of these ?http://en.wikipedia.org/wiki/Quarter-inch_cartridge#3M_Data_Cartridge_.28DC.29 [04:14] " ... in the ANSI-proposed 1600 bit-per-inch (BPI) phase-encoded format on a ... 1/4 inch magnetic tape cartridge." [04:14] http://www.google.co.uk/search?q=site:qic.org+filetype:pdf [04:14] it's a DC-300, physically similar to the DC-600A shown there [04:15] heh [04:15] http://printf.net/~tef/p679-kerpelman.pdf [04:15] does this help? [04:16] hold on, my internet connection is only so fast :O [04:16] but yes, that helps [04:17] http://downloadmoreram.com/index.html [04:17] underscor: https://plus.google.com/u/1/118060174030033503719/posts/GUgZYgBAqrY [04:18] Damn [04:19] yuuup, I think it's pretty hilarious actually [04:20] more details turned up since that post, someone looked up the original part numbers -- they're most likely rejects from Micron, spec'd for no faster than pc2100 [04:20] hahaha [04:20] probably overvolting too, because they run really hot [04:21] I really want to get some now [04:22] Just for shits and giggles [04:22] if you want to throw away $40, be my guest [04:22] Yeah [04:22] If it was cheaper, I would [04:46] wow... this one user has been going for 3 hours now [04:54] such is life on the internet [04:56] ok tmux, just stop updating the screen then, that's fine whatever [04:57] amerrykan: Do you have a dead client somewhere? [04:57] not anymore [04:57] oic [05:25] closure: not sure if you've seen this -> http://developer.berlios.de/devlog/blog/2011/10/31/berlios-continues-%e2%80%93-non-profit-association-is-founded/ [05:27] closure: never mind, I think you did -- Darkstar beat me to the punch re: updating the wiki [05:28] yeah. who knows what will happen [05:34] is there a channel just for the mobileme project? [05:36] hmm [05:37] no updates from underscor in a bit... which means either clients errored out or he's about to come out swinging with some huge uers [05:37] users [05:37] or he fell asleep on the Ctrl+C key [05:37] s [05:38] I hate that dashboard, btw, because I am incapable of closing t [05:38] it [05:38] Wed Nov 9 01:49:48 UTC 2011 [05:38] something about live updates that are addictive [05:38] that's what, 3h50m ago? [05:40] hrmph. I had a download just sort of stop. [05:40] * chronomex kicks it [06:10] yipdw: :) [06:32] Time elapsed: 268m 49s [06:32] 8.7GB [06:53] hrm [06:53] this version of wget can't create a warc file [06:53] [06:53] Error opening GZIP stream to WARC file. [06:53] Opening WARC file `test.warc.gz'. [06:53] [db48x@celebdil mobileme-grab]$ ./wget-warc --warc-file test http://db48x.net/ [06:54] Error writing warcinfo record to WARC file. [06:54] Could not open WARC file. [07:05] re: BerliOS: I'm wondering if anyone has a full list of BerliOS developer users that I can use to derive a list of weblogs [07:05] I'm running a script to build a user list now [07:05] but I guess it'd be nicer to BerliOS to not run thousands of queries against them :P [07:07] especially since my script is just taking all trigrams from [a-z0-9] and hitting the BerliOS people search with it [07:07] yeah that's kind of rude [07:07] not sure how else to get a list of weblogs, though -- BerliOS developer weblogs site doesn't have an index [08:33] bleh [08:34] going through my collection of music videos, i can't help but get more and more annoyed by the watermarks and bumpers and shit added by the people who captured and encoded them [08:35] hey, that's history too! :) [08:41] kinda ... [10:33] chronomex: http://memac.heroku.com/rescue-me [10:33] <3 [10:34] You can't post really long lists, since that would block the tracker. [10:34] sure, I just run into things from time to time :) [10:58] alard: did you see my error message above? [10:59] db48x22: I did now. Do you have any more specific info? It's a bit general. :) [11:01] well [11:01] [11:01] XXXXXXXXXXXX [11:01] [db48x@celebdil mobileme-grab]$ cat test.warc.gz [11:02] it compiled wget with no errors [11:03] Can you apply a patch to get a more informative error message? [11:03] sure [11:03] (one moment) [11:07] https://raw.github.com/gist/67ee9c20475cc8251ffa/fcb70d40cd963d88f5ab5858215e3305cd3e1aa3/gistfile1.diff [11:07] db48x22: Hopefully that should tell why gzdopen fails. [11:07] now, if my hard drive will catch up and let me open the file [12:34] err [12:34] alard: errno is undefined [12:37] undeclared, I mean [12:39] * db48x22 throws an errno.h in there [12:40] Error opening GZIP stream to WARC file (zlib error 2). [12:43] ENOENT [12:44] which is odd [12:44] since the reserved space got written [12:47] Hmm. [12:49] Could it have anything to do with the filesystem? Are you using something non-normal? [12:50] There is quite a bit of flushing, moving around and reopening going on. [12:52] hmm [12:53] nope [12:53] same thing happens whether the warc file is on ZFS or ext4 [12:55] That is strange. [12:57] Hmm: http://www.zlib.net/manual.html#Gzip [12:57] gzdopen returns NULL if there was insufficient memory to allocate the gzFile state, if an invalid mode was specified (an 'r', 'w', or 'a' was not provided, or '+' was provided) [12:58] It's wb+9 [12:58] warc_current_gzfile = gzdopen (dup (fileno (warc_current_file)), "wb+9"); [13:00] that's what I was just reading [13:00] why do you have a plus in there if it will make it fail? [13:00] and why does it work for anyone else if that is the case? [13:01] 1. Ignorance. 2. It works for me. 3. I probably thought I needed it. :) [13:02] I'm now compiling with wb9, see how that works. [13:03] now it works [13:03] http://sourceforge.net/tracker/?func=detail&aid=2976146&group_id=86976&atid=581579 [13:03] Yes, the previous versions simply ignored the "+", while version 1.2.4 [13:03] checks for it (and fails). [13:03] ah [13:03] You're too modern. [13:05] I've got zlib version 1.2.3.4 [13:08] Time for a new patch. [13:11] yay [13:14] Sent. Thanks. [13:15] yw [14:21] Coderjoe: the poems in your grab from poetry.com are funky [14:24] ./grabs/abq/www.poetry.com/poems/4476949/5042058/index.html [14:25] the second number is the poem id [14:25] [14:25] but what is the first number? [14:26] it's not the user number [16:49] Putting so much stuff into archive.org today [16:50] First, a massive pile of FreeBSD CD-ROMs [17:19] db48x22: i don't know. that's from their urls. [18:03] off-topic query: has anyone here written SELinux policies before? [18:04] er, never mind -- found out my problem [18:04] for those who are curious, you can't have the "portcon" statement in a policy module [18:23] Nobody is better for this information. [18:57] ah, damnit [18:57] - Running wget --mirror (at least 25189 files)... ERROR (3). [18:57] Error downloading from web.me.com. [18:57] Error downloading 'ofellestat'. [19:01] wait a sec, the wget.log for web.me.com/ofellestat says it completed [19:02] what the hell? [20:24] yipdw: That means there is at least one error in the wget.log (wget exited with 3, so it's a File I/O error). [20:30] alard: there's a bunch of 402s and 404s in the wget.log, but nothing that looks like a file I/O error [20:30] Have you searched for 'Cannot' ? [20:30] ahh [20:30] grep 'Cannot' wget.log [20:30] Cannot write to `data/o/of/ofe/ofellestat/web.me.com/files/web.me.com/ofellestat/christmas/bzAnimation.swf?swfId=BZFC4B873AE6254149BA85&xmlPath=http:%2F%2Fweb.me.com%2Fofellestat%2Fchristmas%2Fbz.xml&imgPath=http:%2F%2Fweb.me.com%2Fofellestat%2Fchristmas%2Fimg&soundPath=http:%2F%2Fweb.me.com%2Fofellestat%2Fchristmas%2Faudio.mp3&urlType=_blank&showInfo=0&themeMode=2' (File name too long). [20:31] Hmm, yes, I've seen those before. The best option is probably to rm -rf data/o/of/ofe/ofellestat/web.me.com/.incomplete data/o/of/ofe/ofellestat/web.me.com/files and run dld-single.sh again. [20:32] that looks like something that's not transient, though [20:34] in this case, it looks like someone embedded a Flash movie in a webpage with a ton of parameters in the URL [20:35] Yes, that's why you should only remove the .incomplete file and the files/ directory. Then when you run dld-single.sh on that user, it will see that web.me.com is already done. [20:38] oh, that SWF is already stored in the WARC, then [20:38] I'm not sure. [20:38] Probably not, but there's no good way to get it. [20:38] oh, it is [20:39] I'm not sure how, since there don't appear to be any references to it that don't involve that huge URL [20:39] but, whatever [20:39] Ah, yes, of course. If it's listed in the webdav-feed.xml it's also included. [20:40] that is, of course, assuming that a GET on bzAnimation.swf always returns the same thing independent of query string [20:40] Yes, but you have to stop somewhere? [20:40] indeed [20:41] ok, download for ofellestat resumde [20:41] resumed, too [20:41] thanks [20:41] In general, I'm not sure what is the best way for the script to handle these errors. [20:41] Just quitting may be a bit drastic, since it keeps you from downloading other useful stuff. [20:42] if there's a fast way to find and report those errors, I think that's the best that can be done [20:42] unless there's some way of truncating URLs in downloaded content, and remembering the full -> truncated mapping [20:42] which sounds like a lot more work than it's worth [20:43] Or maybe check if wget completed successfully (with a 'completed' at the end of the wget.log). Then mark the user in some way, but still continue the download. [20:43] So that at least when you start a client it will keep running for a while, unless there is a really really urgent problem. [20:44] sure -- perhaps log the problematic user? one file per user where the file contains all such errors [20:44] ? [20:45] actually, nix the errors inclusion [20:45] unless wget's fatal error reporting is standard in some way, e.g. all such error messages start with 'Cannot' [20:56] http://orgasms.xxx/ [20:56] Neat, xxx is live [21:45] alard: Is that graph new? [21:46] Reasonably, yes. It was there yesterday (not when I first sent you the link). [21:56] Ah [21:56] It's beautiful! [21:56] Coderjoe: Stop taking all the easy users! [21:56] ;) [22:15] underscor: You're still way ahead. (Highcharts is really nice, by the way. You can even zoom in by dragging.) [22:15] I've added a new batch of usernames today, so it could be that those are a bit different (smaller, or larger). [22:16] Cool [22:16] Is it possible to get a "x remaining/x saved"? [22:17] Yeah, well, the number remaining is far from complete, so it's just a number that doesn't really mean anything. A bit of googling and you find a few more usernames. [22:25] underscor: Almost got 1%, see http://memac.heroku.com/ [22:26] Awesome [22:26] :D [22:27] so the full thing is 200 TB? O_O [22:29] At the current rate, looks like it [22:29] 193 Tebibytes [22:30] yow [22:30] That's awesome! [22:30] SketchCow's gonna need some more space [22:30] I've been downloading this stupid user for days now [22:31] he's not very big, just slow [22:31] well, 8g so far. [22:31] big -and- slow. [22:31] full of mp3 files. [22:31] So, that's 187 days at full 100mbps [22:32] A month, if 5 people are downloading at 100mbps [22:32] hey alard, how about a rolling mbps display :P [22:32] haha [22:33] yow [22:33] lots of users left [22:33] Downloading public.me.com/caislas [22:33] - Discovering urls (XML)... ERROR (23). [22:34] gmm [22:34] hmm [22:34] out of disk space again [22:34] underscor: shit, you're eating 500g/day at this rate. [22:34] you too, Coderjoe [22:34] judging from the graph [22:34] yeah, but I'm stupidly bleeding money out the ass while doing it [22:34] That good or bad? [22:35] (chronomex's thing, not your assbleed ;)) [22:35] it's impressive. [22:36] AWW YEAH!! [22:37] - Running wget --mirror (at least 2519 files)...*** glibc detected *** ./wget-warc: corrupted double-linked list: 0x0000000001ead9d0 *** [22:37] lol [22:37] I love it [22:37] Coderjoe: uh oh [22:37] O_o [22:37] and a second one segfaulted [22:37] oops [22:38] What's wget error 4? [22:39] Oh, network failure [22:39] lovely [22:42] http://tracker.archive.org/tracker.png [22:42] SAVIN' THEM DATAS [23:23] That dashboard looks a bit skewed towards me [23:23] it underscores your importance [23:23] badum-tish [23:43] oh dear. [23:45] :D