#archiveteam 2011-11-09,Wed

↑back Search

Time Nickname Message
00:07 🔗 chronomex o_o
00:07 🔗 chronomex indeed
00:11 🔗 Paradoks Wow. Yeah, awesome. Mind you, so is the script that originally made things possible, but we expect that sort of awesome out of alard already.
00:21 🔗 SketchCow http://www.archive.org/details/conquer-videogame-craze-album
00:45 🔗 db48x22 alard: shiny
01:00 🔗 bsmith093 anything esle by this guy?
01:00 🔗 bsmith093 seriously though, IA guys, good job on the metadata
01:02 🔗 SketchCow No
01:02 🔗 SketchCow I did that
01:02 🔗 SketchCow (Nothing else by the guys, I did the metadata just now)
01:18 🔗 dashcloud alard: that page is awesome indeed
01:21 🔗 bsmith093 SketchCow: sorry for the misplaced credit, you deserve all sue to you.
01:21 🔗 bsmith093 due to you
01:21 🔗 chronomex alard: is there a way to enqueue usernames?
01:22 🔗 SketchCow No misplaced
01:22 🔗 SketchCow I was just indicating where this came from in this case.
01:23 🔗 bsmith093 k then. anyway good job where'd you find this? i grabbed without even looking
01:24 🔗 SketchCow It's been floating around forever and ever.
01:24 🔗 SketchCow I just happened to do a big disk transfer this time - I've put a couple hundred gigs of data I had at home and am putting them on archive.org.
01:24 🔗 SketchCow Like 110 FreeBSD CD-ROMs.
01:25 🔗 alard chronomex: Put them up somewhere, then I'll add them.
01:25 🔗 bsmith093 ive been meaning to ask about that, what isp do u have and are they also in rochester ny? how do u transfer multi hundred gig collections in less than a month
01:25 🔗 alard (Though not today, it's way past bedtime.) Latest tracker page addition: a colored graph.
01:26 🔗 chronomex alard: okay. I ran across a few the other day but didn't want to bug you. it'd be great to have a form field online to submit with.
01:26 🔗 alard Who knows. :)
01:34 🔗 Coderjoe uhoh
01:34 🔗 Coderjoe underscor had a huge bmp there in the last 12 hours
01:35 🔗 Coderjoe bump
01:48 🔗 Coderjoe oh no. he is about to pass me
01:48 🔗 Coderjoe curse you, stupidly small instance store
01:50 🔗 tef argh christ why does the arc format put the names of the headers inside the record body
01:57 🔗 Coderjoe seriously? a 12GB user?
02:59 🔗 underscor Coderjoe: Beating you!
02:59 🔗 underscor :D
03:01 🔗 Coderjoe yeah... damn my pause to upload 340GB to the batcave
03:02 🔗 Coderjoe I also seem to be maxing out the cpu on my high memory xlarge instance
03:04 🔗 Coderjoe hmm
03:04 🔗 Coderjoe ok... now not so much
03:05 🔗 Coderjoe and the loadavg dropped considerably
03:05 🔗 underscor How expensive is that??
03:05 🔗 chronomex less than a buck an hour, but not much less
03:05 🔗 chronomex iirc
03:05 🔗 Coderjoe i'm using a spot instance, so it is lower than the normal one most of the time
03:06 🔗 Coderjoe but it seems to be about $6/day :-\
03:06 🔗 chronomex that's not very cheap, have you researched alternatives much?
03:06 🔗 Coderjoe last month was only $40. this month will kinda suck, since I'll also have a lot of data out and more s3 usage
03:06 🔗 chronomex score! I was wondering how I was going to demodulate some data, and then I remembered I have a textbook on (de)modulation!
03:10 🔗 Coderjoe i originally spun the instance up when I was trying to rescue a wget-warc that was consuming more and more memory and heavilly swapping my free micro instance
03:10 🔗 Coderjoe and then i re-tried the command on the m2.xlarge when it failed on the micro.
03:10 🔗 Coderjoe and I keep throwing shit at it :-\
03:12 🔗 Coderjoe (and that wget-warc that was swapping the micro to shreds wound up getting everything on the m2.xlarge oom_killed)
03:13 🔗 chronomex hmmm.
03:14 🔗 chronomex guys, why don't we set our files to readonly after we're done downloading them? it's a decent safeguard against accidentally fucking things up.
03:16 🔗 Coderjoe hmm
03:17 🔗 Coderjoe I really should be shutting this down
03:18 🔗 underscor Coderjoe: That's gonna be pricey :/
03:18 🔗 Coderjoe so far between the instance and the data (pushing stuff to the batcave) I'm already looking at $86.14 estimated for Nov (just for the 8 days so far this month)
03:18 🔗 underscor Ouch
03:18 🔗 Coderjoe last month was only $31.16
03:18 🔗 underscor That's... quite pricey
03:19 🔗 Coderjoe the S3 so far is only $.99
03:19 🔗 Coderjoe the m2.xlarge is 41.04
03:19 🔗 Coderjoe and the data transfer out of ec2 is 44.11
03:20 🔗 underscor That's not bad for s3
03:20 🔗 underscor But you pay to get it back out, don't you?
03:20 🔗 underscor http://www.youtube.com/watch?v=tnCk0uGwFZI
03:21 🔗 Coderjoe yeah. I stashed the stuff I already downloaded on my ec2 instance into s3 in addition to rsyncing to the batcave
03:21 🔗 Coderjoe I'll likely end up deleting the stuff I stashed in s3 later
03:21 🔗 underscor ah
03:21 🔗 Coderjoe putting it in essentially cost nothing on the s3 side.
03:28 🔗 chronomex underscor: you and your infinite bandwidth ...
03:28 🔗 underscor :D
03:28 🔗 Coderjoe yeah... you're not even paying for that...
03:29 🔗 underscor ha
03:29 🔗 underscor Well, I mean, my parents are
03:29 🔗 Coderjoe not for the IA traffic
03:34 🔗 underscor True
03:34 🔗 underscor Most of the mobileme is from home though
03:35 🔗 chronomex anyone here have access to ansi standards?
03:36 🔗 Coderjoe depends on which one, i think. and what state. one of my co-workers dug up a copy of ansi sql92 somewhere. it might be a draft, but I don't know for certain
03:37 🔗 chronomex I need X3.55 or X3.56, the 1600-bpi phase encoding for magtape
03:53 🔗 underscor IA might, try poking SketchCow
03:54 🔗 chronomex hm, it was cross-published as ISO 4057
03:54 🔗 chronomex iso standards are much easier to steal than ansi, I've found
04:05 🔗 tef also ecma 46
04:08 🔗 chronomex ah, really? sweet :)
04:08 🔗 Coderjoe yeah. you can usually find copies of the drafts, since those were free
04:09 🔗 tef http://www.ecma-international.org/publications/files/ECMA-ST-WITHDRAWN/ECMA-46,%201st%20Edition,%20March%201976.pdf
04:09 🔗 chronomex it's kind of from the 1970s, I doubt the drafts are online
04:09 🔗 chronomex I stand corrected
04:09 🔗 chronomex fuckyeah, you are my hero
04:09 🔗 tef I have been looking around but uh it is hard to tell if they are the same
04:10 🔗 chronomex 100% the same isn't necessary, I just need to know details of the 1600bpi phase encoding
04:10 🔗 chronomex am recovering tapes for a 1973-era computer
04:11 🔗 tef do you know the manufacturer?
04:11 🔗 tef there seemed to be a bunch of pdfs on qic.org
04:11 🔗 chronomex tapes were made by 3M under contract to Western Electric / Bell Labs / AT&T
04:12 🔗 tef see also http://www.classiccmp.org/pipermail/cctalk/2003-August/248957.html
04:13 🔗 chronomex hm.
04:13 🔗 chronomex the documents I have say ...
04:13 🔗 underscor I like the temp file rsync created
04:13 🔗 underscor .friendster.004200001-004300000.tar.SExybF
04:13 🔗 tef chronomex: as in one of these ?http://en.wikipedia.org/wiki/Quarter-inch_cartridge#3M_Data_Cartridge_.28DC.29
04:14 🔗 chronomex " ... in the ANSI-proposed 1600 bit-per-inch (BPI) phase-encoded format on a ... 1/4 inch magnetic tape cartridge."
04:14 🔗 tef http://www.google.co.uk/search?q=site:qic.org+filetype:pdf
04:14 🔗 chronomex it's a DC-300, physically similar to the DC-600A shown there
04:15 🔗 tef heh
04:15 🔗 tef http://printf.net/~tef/p679-kerpelman.pdf
04:15 🔗 tef does this help?
04:16 🔗 chronomex hold on, my internet connection is only so fast :O
04:16 🔗 chronomex but yes, that helps
04:17 🔗 underscor http://downloadmoreram.com/index.html
04:17 🔗 chronomex underscor: https://plus.google.com/u/1/118060174030033503719/posts/GUgZYgBAqrY
04:18 🔗 underscor Damn
04:19 🔗 chronomex yuuup, I think it's pretty hilarious actually
04:20 🔗 chronomex more details turned up since that post, someone looked up the original part numbers -- they're most likely rejects from Micron, spec'd for no faster than pc2100
04:20 🔗 underscor hahaha
04:20 🔗 chronomex probably overvolting too, because they run really hot
04:21 🔗 underscor I really want to get some now
04:22 🔗 underscor Just for shits and giggles
04:22 🔗 chronomex if you want to throw away $40, be my guest
04:22 🔗 underscor Yeah
04:22 🔗 underscor If it was cheaper, I would
04:46 🔗 Coderjoe wow... this one user has been going for 3 hours now
04:54 🔗 chronomex such is life on the internet
04:56 🔗 amerrykan ok tmux, just stop updating the screen then, that's fine whatever
04:57 🔗 underscor amerrykan: Do you have a dead client somewhere?
04:57 🔗 amerrykan not anymore
04:57 🔗 underscor oic
05:25 🔗 yipdw closure: not sure if you've seen this -> http://developer.berlios.de/devlog/blog/2011/10/31/berlios-continues-%e2%80%93-non-profit-association-is-founded/
05:27 🔗 yipdw closure: never mind, I think you did -- Darkstar beat me to the punch re: updating the wiki
05:28 🔗 closure yeah. who knows what will happen
05:34 🔗 Coderjoe is there a channel just for the mobileme project?
05:36 🔗 Coderjoe hmm
05:37 🔗 Coderjoe no updates from underscor in a bit... which means either clients errored out or he's about to come out swinging with some huge uers
05:37 🔗 Coderjoe users
05:37 🔗 yipdw or he fell asleep on the Ctrl+C key
05:37 🔗 yipdw s
05:38 🔗 yipdw I hate that dashboard, btw, because I am incapable of closing t
05:38 🔗 yipdw it
05:38 🔗 Coderjoe Wed Nov 9 01:49:48 UTC 2011
05:38 🔗 yipdw something about live updates that are addictive
05:38 🔗 Coderjoe that's what, 3h50m ago?
05:40 🔗 chronomex hrmph. I had a download just sort of stop.
05:40 🔗 * chronomex kicks it
06:10 🔗 db48x22 yipdw: :)
06:32 🔗 Coderjoe Time elapsed: 268m 49s
06:32 🔗 Coderjoe 8.7GB
06:53 🔗 db48x22 hrm
06:53 🔗 db48x22 this version of wget can't create a warc file
06:53 🔗 db48x22
06:53 🔗 db48x22 Error opening GZIP stream to WARC file.
06:53 🔗 db48x22 Opening WARC file `test.warc.gz'.
06:53 🔗 db48x22 [db48x@celebdil mobileme-grab]$ ./wget-warc --warc-file test http://db48x.net/
06:54 🔗 db48x22 Error writing warcinfo record to WARC file.
06:54 🔗 db48x22 Could not open WARC file.
07:05 🔗 yipdw re: BerliOS: I'm wondering if anyone has a full list of BerliOS developer users that I can use to derive a list of weblogs
07:05 🔗 yipdw I'm running a script to build a user list now
07:05 🔗 yipdw but I guess it'd be nicer to BerliOS to not run thousands of queries against them :P
07:07 🔗 yipdw especially since my script is just taking all trigrams from [a-z0-9] and hitting the BerliOS people search with it
07:07 🔗 chronomex yeah that's kind of rude
07:07 🔗 yipdw not sure how else to get a list of weblogs, though -- BerliOS developer weblogs site doesn't have an index
08:33 🔗 Coderjoe bleh
08:34 🔗 Coderjoe going through my collection of music videos, i can't help but get more and more annoyed by the watermarks and bumpers and shit added by the people who captured and encoded them
08:35 🔗 Ymgve hey, that's history too! :)
08:41 🔗 chronomex kinda ...
10:33 🔗 alard chronomex: http://memac.heroku.com/rescue-me
10:33 🔗 chronomex <3
10:34 🔗 alard You can't post really long lists, since that would block the tracker.
10:34 🔗 chronomex sure, I just run into things from time to time :)
10:58 🔗 db48x22 alard: did you see my error message above?
10:59 🔗 alard db48x22: I did now. Do you have any more specific info? It's a bit general. :)
11:01 🔗 db48x22 well
11:01 🔗 db48x22
11:01 🔗 db48x22 XXXXXXXXXXXX
11:01 🔗 db48x22 [db48x@celebdil mobileme-grab]$ cat test.warc.gz
11:02 🔗 db48x22 it compiled wget with no errors
11:03 🔗 alard Can you apply a patch to get a more informative error message?
11:03 🔗 db48x22 sure
11:03 🔗 alard (one moment)
11:07 🔗 alard https://raw.github.com/gist/67ee9c20475cc8251ffa/fcb70d40cd963d88f5ab5858215e3305cd3e1aa3/gistfile1.diff
11:07 🔗 alard db48x22: Hopefully that should tell why gzdopen fails.
11:07 🔗 db48x22 now, if my hard drive will catch up and let me open the file
12:34 🔗 db48x22 err
12:34 🔗 db48x22 alard: errno is undefined
12:37 🔗 db48x22 undeclared, I mean
12:39 🔗 * db48x22 throws an errno.h in there
12:40 🔗 db48x22 Error opening GZIP stream to WARC file (zlib error 2).
12:43 🔗 db48x22 ENOENT
12:44 🔗 db48x22 which is odd
12:44 🔗 db48x22 since the reserved space got written
12:47 🔗 alard Hmm.
12:49 🔗 alard Could it have anything to do with the filesystem? Are you using something non-normal?
12:50 🔗 alard There is quite a bit of flushing, moving around and reopening going on.
12:52 🔗 db48x22 hmm
12:53 🔗 db48x22 nope
12:53 🔗 db48x22 same thing happens whether the warc file is on ZFS or ext4
12:55 🔗 alard That is strange.
12:57 🔗 alard Hmm: http://www.zlib.net/manual.html#Gzip
12:57 🔗 alard gzdopen returns NULL if there was insufficient memory to allocate the gzFile state, if an invalid mode was specified (an 'r', 'w', or 'a' was not provided, or '+' was provided)
12:58 🔗 alard It's wb+9
12:58 🔗 alard warc_current_gzfile = gzdopen (dup (fileno (warc_current_file)), "wb+9");
13:00 🔗 db48x22 that's what I was just reading
13:00 🔗 db48x22 why do you have a plus in there if it will make it fail?
13:00 🔗 db48x22 and why does it work for anyone else if that is the case?
13:01 🔗 alard 1. Ignorance. 2. It works for me. 3. I probably thought I needed it. :)
13:02 🔗 alard I'm now compiling with wb9, see how that works.
13:03 🔗 db48x22 now it works
13:03 🔗 alard http://sourceforge.net/tracker/?func=detail&aid=2976146&group_id=86976&atid=581579
13:03 🔗 alard Yes, the previous versions simply ignored the "+", while version 1.2.4
13:03 🔗 alard checks for it (and fails).
13:03 🔗 db48x22 ah
13:03 🔗 alard You're too modern.
13:05 🔗 alard I've got zlib version 1.2.3.4
13:08 🔗 alard Time for a new patch.
13:11 🔗 db48x22 yay
13:14 🔗 alard Sent. Thanks.
13:15 🔗 db48x22 yw
14:21 🔗 db48x22 Coderjoe: the poems in your grab from poetry.com are funky
14:24 🔗 db48x22 ./grabs/abq/www.poetry.com/poems/4476949/5042058/index.html
14:25 🔗 db48x22 the second number is the poem id
14:25 🔗 db48x22 <input type="text" name="poem_id" value="5042058" style="display:none;"/>
14:25 🔗 db48x22 but what is the first number?
14:26 🔗 db48x22 it's not the user number
16:49 🔗 SketchCow Putting so much stuff into archive.org today
16:50 🔗 SketchCow First, a massive pile of FreeBSD CD-ROMs
17:19 🔗 Coderjoe db48x22: i don't know. that's from their urls.
18:03 🔗 yipdw off-topic query: has anyone here written SELinux policies before?
18:04 🔗 yipdw er, never mind -- found out my problem
18:04 🔗 yipdw for those who are curious, you can't have the "portcon" statement in a policy module
18:23 🔗 SketchCow Nobody is better for this information.
18:57 🔗 yipdw ah, damnit
18:57 🔗 yipdw - Running wget --mirror (at least 25189 files)... ERROR (3).
18:57 🔗 yipdw Error downloading from web.me.com.
18:57 🔗 yipdw Error downloading 'ofellestat'.
19:01 🔗 yipdw wait a sec, the wget.log for web.me.com/ofellestat says it completed
19:02 🔗 yipdw what the hell?
20:24 🔗 alard yipdw: That means there is at least one error in the wget.log (wget exited with 3, so it's a File I/O error).
20:30 🔗 yipdw alard: there's a bunch of 402s and 404s in the wget.log, but nothing that looks like a file I/O error
20:30 🔗 alard Have you searched for 'Cannot' ?
20:30 🔗 yipdw ahh
20:30 🔗 yipdw grep 'Cannot' wget.log
20:30 🔗 yipdw Cannot write to `data/o/of/ofe/ofellestat/web.me.com/files/web.me.com/ofellestat/christmas/bzAnimation.swf?swfId=BZFC4B873AE6254149BA85&xmlPath=http:%2F%2Fweb.me.com%2Fofellestat%2Fchristmas%2Fbz.xml&imgPath=http:%2F%2Fweb.me.com%2Fofellestat%2Fchristmas%2Fimg&soundPath=http:%2F%2Fweb.me.com%2Fofellestat%2Fchristmas%2Faudio.mp3&urlType=_blank&showInfo=0&themeMode=2' (File name too long).
20:31 🔗 alard Hmm, yes, I've seen those before. The best option is probably to rm -rf data/o/of/ofe/ofellestat/web.me.com/.incomplete data/o/of/ofe/ofellestat/web.me.com/files and run dld-single.sh again.
20:32 🔗 yipdw that looks like something that's not transient, though
20:34 🔗 yipdw in this case, it looks like someone embedded a Flash movie in a webpage with a ton of parameters in the URL
20:35 🔗 alard Yes, that's why you should only remove the .incomplete file and the files/ directory. Then when you run dld-single.sh on that user, it will see that web.me.com is already done.
20:38 🔗 yipdw oh, that SWF is already stored in the WARC, then
20:38 🔗 alard I'm not sure.
20:38 🔗 alard Probably not, but there's no good way to get it.
20:38 🔗 yipdw oh, it is
20:39 🔗 yipdw I'm not sure how, since there don't appear to be any references to it that don't involve that huge URL
20:39 🔗 yipdw but, whatever
20:39 🔗 alard Ah, yes, of course. If it's listed in the webdav-feed.xml it's also included.
20:40 🔗 yipdw that is, of course, assuming that a GET on bzAnimation.swf always returns the same thing independent of query string
20:40 🔗 alard Yes, but you have to stop somewhere?
20:40 🔗 yipdw indeed
20:41 🔗 yipdw ok, download for ofellestat resumde
20:41 🔗 yipdw resumed, too
20:41 🔗 yipdw thanks
20:41 🔗 alard In general, I'm not sure what is the best way for the script to handle these errors.
20:41 🔗 alard Just quitting may be a bit drastic, since it keeps you from downloading other useful stuff.
20:42 🔗 yipdw if there's a fast way to find and report those errors, I think that's the best that can be done
20:42 🔗 yipdw unless there's some way of truncating URLs in downloaded content, and remembering the full -> truncated mapping
20:42 🔗 yipdw which sounds like a lot more work than it's worth
20:43 🔗 alard Or maybe check if wget completed successfully (with a 'completed' at the end of the wget.log). Then mark the user in some way, but still continue the download.
20:43 🔗 alard So that at least when you start a client it will keep running for a while, unless there is a really really urgent problem.
20:44 🔗 yipdw sure -- perhaps log the problematic user? one file per user where the file contains all such errors
20:44 🔗 yipdw ?
20:45 🔗 yipdw actually, nix the errors inclusion
20:45 🔗 yipdw unless wget's fatal error reporting is standard in some way, e.g. all such error messages start with 'Cannot'
20:56 🔗 underscor http://orgasms.xxx/
20:56 🔗 underscor Neat, xxx is live
21:45 🔗 underscor alard: Is that graph new?
21:46 🔗 alard Reasonably, yes. It was there yesterday (not when I first sent you the link).
21:56 🔗 underscor Ah
21:56 🔗 underscor It's beautiful!
21:56 🔗 underscor Coderjoe: Stop taking all the easy users!
21:56 🔗 underscor ;)
22:15 🔗 alard underscor: You're still way ahead. (Highcharts is really nice, by the way. You can even zoom in by dragging.)
22:15 🔗 alard I've added a new batch of usernames today, so it could be that those are a bit different (smaller, or larger).
22:16 🔗 underscor Cool
22:16 🔗 underscor Is it possible to get a "x remaining/x saved"?
22:17 🔗 alard Yeah, well, the number remaining is far from complete, so it's just a number that doesn't really mean anything. A bit of googling and you find a few more usernames.
22:25 🔗 alard underscor: Almost got 1%, see http://memac.heroku.com/
22:26 🔗 underscor Awesome
22:26 🔗 underscor :D
22:27 🔗 DFJustin so the full thing is 200 TB? O_O
22:29 🔗 underscor At the current rate, looks like it
22:29 🔗 underscor 193 Tebibytes
22:30 🔗 chronomex yow
22:30 🔗 underscor That's awesome!
22:30 🔗 underscor SketchCow's gonna need some more space
22:30 🔗 chronomex I've been downloading this stupid user for days now
22:31 🔗 chronomex he's not very big, just slow
22:31 🔗 chronomex well, 8g so far.
22:31 🔗 chronomex big -and- slow.
22:31 🔗 chronomex full of mp3 files.
22:31 🔗 underscor So, that's 187 days at full 100mbps
22:32 🔗 underscor A month, if 5 people are downloading at 100mbps
22:32 🔗 chronomex hey alard, how about a rolling mbps display :P
22:32 🔗 underscor haha
22:33 🔗 Coderjoe yow
22:33 🔗 Coderjoe lots of users left
22:33 🔗 Coderjoe Downloading public.me.com/caislas
22:33 🔗 Coderjoe - Discovering urls (XML)... ERROR (23).
22:34 🔗 Coderjoe gmm
22:34 🔗 Coderjoe hmm
22:34 🔗 Coderjoe out of disk space again
22:34 🔗 chronomex underscor: shit, you're eating 500g/day at this rate.
22:34 🔗 chronomex you too, Coderjoe
22:34 🔗 chronomex judging from the graph
22:34 🔗 Coderjoe yeah, but I'm stupidly bleeding money out the ass while doing it
22:34 🔗 underscor That good or bad?
22:35 🔗 underscor (chronomex's thing, not your assbleed ;))
22:35 🔗 chronomex it's impressive.
22:36 🔗 Coderjoe AWW YEAH!!
22:37 🔗 Coderjoe - Running wget --mirror (at least 2519 files)...*** glibc detected *** ./wget-warc: corrupted double-linked list: 0x0000000001ead9d0 ***
22:37 🔗 underscor lol
22:37 🔗 underscor I love it
22:37 🔗 db48x22 Coderjoe: uh oh
22:37 🔗 Coderjoe O_o
22:37 🔗 Coderjoe and a second one segfaulted
22:37 🔗 underscor oops
22:38 🔗 underscor What's wget error 4?
22:39 🔗 underscor Oh, network failure
22:39 🔗 underscor lovely
22:42 🔗 underscor http://tracker.archive.org/tracker.png
22:42 🔗 underscor SAVIN' THEM DATAS
23:23 🔗 underscor That dashboard looks a bit skewed towards me
23:23 🔗 yipdw it underscores your importance
23:23 🔗 underscor badum-tish
23:43 🔗 chronomex oh dear.
23:45 🔗 underscor :D

irclogger-viewer