#archiveteam 2011-08-17,Wed

↑back Search

Time Nickname Message
01:02 🔗 SketchCow http://blog.archive.org/2011/08/17/scanning-a-braille-playboy/
01:08 🔗 chronomex heh :|
01:15 🔗 Jofo I am sad that that doesn't include photos like the "braille" playboy from robin hood: men in tights
01:16 🔗 Coderjoe Jofo: here you go: http://i.imgur.com/TkwQ9.gif
01:16 🔗 SketchCow This shit sounds like the anthem the fairies in Ferngully would use to go to war against evil humans to or some shit b. This shit is like Shia LeBeouf in song form yo. Lissenin to this shit is like havin ya ears penetrated by a million microscopic dicks namsayin. Shit sounds like niggas doin aerobics on a magical cloud of daisies. How many meadows did Kanye cartwheel across before he decided to make this beat? Seriously yo....
01:17 🔗 Jofo Coderjoe: I tried to google for a picture of that and only found images from either playboy or RH:MIT
01:18 🔗 Jofo Most excellent work. Also bonus: animated!
01:18 🔗 Coderjoe jofo: I googled for it yesterday and managed to track down this gif, which I then uploaded to imgur so I could find it again
01:18 🔗 Coderjoe (since I didn't have my DVD handy at the time)
01:30 🔗 SketchCow Thing that downloads Jamendo is downloading Jamendo
01:43 🔗 dashcloud hi SketchCow, any luck with getting something for uploads ready?
01:50 🔗 SketchCow Nope.
01:50 🔗 SketchCow But I did finish a weblog entry
01:50 🔗 SketchCow A pretty important one, I need archive.org to get attention.
01:50 🔗 SketchCow I want lots and lots of attention because they're not great at getting it
01:50 🔗 SketchCow Allow me a moment.
01:56 🔗 SketchCow Ahhh, fuck, security in place.
01:58 🔗 SketchCow Ok, well, I have barely the energy to do a massive chroot at the moment.
01:58 🔗 SketchCow Give me some time, then I'll do it.
01:58 🔗 SketchCow How could they block ftp!???
01:58 🔗 SketchCow It's only the single most insecure fucking protocol since walking down to a schoolyard and shouting your password at teenagers
01:59 🔗 Coderjoe telnet? rsh?
01:59 🔗 SketchCow No, I'll probably set up rsync
01:59 🔗 Coderjoe and man ftp servers have encryption capability now
02:00 🔗 SketchCow a hack on a hack on a hack on a hack.
02:00 🔗 Coderjoe i was more referring to your "most insecure protocol" assertion
02:01 🔗 SketchCow ftp is insecure
02:01 🔗 SketchCow If they're adding some sidecar crazyshit, that's a different protocol as far as I would think of it.
02:01 🔗 SketchCow It's like saying that sure, my car is unlocked, but inside I welded a safe to the frame
02:02 🔗 SketchCow Car's still unlocked.
02:02 🔗 SketchCow FTP is like a hippie
02:02 🔗 SketchCow P.S. Jesus Friendster
02:02 🔗 SketchCow Going to start making items.
02:02 🔗 SketchCow I have to split stuff up into millions of tiny things
02:03 🔗 Coderjoe i still have a TB to get to you
02:03 🔗 SketchCow balls
02:04 🔗 SketchCow When we do myspace, we're going to have to keep this in mind.
02:05 🔗 db48xOthe what do you want to change?
02:06 🔗 SketchCow Better finality to saved items. More consistency for archives.
02:07 🔗 Smackt I have a great deal of free time for the foreseeable future. What's been goin' on?
02:07 🔗 SketchCow divorce? fired?
02:07 🔗 Smackt New job.
02:07 🔗 Smackt Full time for a small company, lots of free time.
02:08 🔗 SketchCow damn
02:08 🔗 SketchCow Hoped it was a fire divorce
02:08 🔗 SketchCow fire divorces are quick
02:08 🔗 SketchCow I DIVORCE THEE FWOOOOOOOOOOOOOOOOOOOOOOOOOOSH
02:09 🔗 Smackt Unfortunately, I wouldn't be much of a help to ArchiveTeam if all I had was the 386 and an hour of dialup in the clink.
02:10 🔗 Smackt But I guess if it was for 20 years, it would add up. Start scraping G+ now, by the time I was out, it would be on it's way to being taken down
02:10 🔗 db48xOthe heh, that's optimistic
02:13 🔗 SketchCow Still working on adding rsync to new machine
02:14 🔗 Smackt Anyone write a scraping version of LOIC yet?
02:15 🔗 SketchCow No
02:15 🔗 dashcloud take your time SketchCow
02:24 🔗 db48xOthe we will have to remember about the punier file systems next time
02:25 🔗 db48xOthe and it would save a lot of time and effort if we could save directly to an archive
02:25 🔗 dashcloud this kind of massive file archiving seems like it would be good research/journal paper material
02:27 🔗 SketchCow Well, a distributed thingy would be interesting, yes.
02:29 🔗 db48xOthe would you prefer tarballs that are a consistant size, or have a consistant number of items in them?
02:29 🔗 db48xOthe and what about places that use names instead of numbers?
02:31 🔗 db48xOthe I think consistant number of entries is better
02:31 🔗 db48xOthe easier to keep track of, while the size is basically irrelevant
02:32 🔗 dashcloud what kinds of filesystems do studios use to handle the huge number of resources for any given film? if any of those are publically available, it might be worth looking into
02:32 🔗 db48xOthe it's probably depressingly ad-hoc
02:33 🔗 db48xOthe they probably just have some off-the-shelf SAN system that they dump files on to
02:35 🔗 db48xOthe I think it's time for me to start building the 100TB array I've been thinking of
02:38 🔗 dashcloud just look up the backblaze blog post
02:39 🔗 db48xOthe yes, that'll likely be the basis for it
02:40 🔗 dashcloud if you look at their previous version, they link to a guy who did actually build the first version as a media center, and he has a nice blog series on the build
02:42 🔗 db48xOthe grr
02:42 🔗 db48xOthe wolfram alpha isn't working for me any more
02:43 🔗 Smackt http://archiveteam.org/index.php?title=Cheap_storage
02:44 🔗 db48xOthe they used to return the set of exact integer solutions, but now they're not
02:47 🔗 Smackt Damn. I made that page in Sept. 09. It's been a long time
02:47 🔗 db48xOthe that style pod isn't _really_ what I want, though
02:48 🔗 db48xOthe they increase capacity past the size of the pod by adding more pods, and distributing the load at the application layer above them
02:48 🔗 db48xOthe but I'll be using ZFS, which can do that in the filesystem layer
02:48 🔗 db48xOthe I want something where I can extend the number of attached disks essentially forever
02:49 🔗 db48xOthe I _think_ fiberchannel would let me do what I want
02:49 🔗 Coderjoe add more nodes. you will eventually hit power and performance problems otherwise
02:49 🔗 Coderjoe fiberchannel... $$$$$$$$$
02:49 🔗 db48xOthe I've never used it so I don't really know, but with much longer allowed interconnects, switches that you can plug more drives in to later, etc
02:50 🔗 db48xOthe Coderjoe: the cpu cost of ZFS is proprotional to the amount of data that you read or write
02:50 🔗 Coderjoe i wasn't referring to cpu cost
02:51 🔗 db48xOthe not to the number of drives that you have or the amount of data stored on those drives
02:51 🔗 Coderjoe I was referring to bus throughput
02:51 🔗 db48xOthe same thing
02:51 🔗 Coderjoe and bus limitations on interrupts and the like
02:51 🔗 db48xOthe proportional to what you read and write, not to how many drives there are
02:52 🔗 db48xOthe I don't need more cpus to just store stuff. I would need them to get gazillions of IOPS for a database system using all of those drives
02:53 🔗 db48xOthe what's wrong here? http://www.wolframalpha.com/input/?_=1313549493044&i=+xn%2bs+%3d+45%2c+p%3e0%2c+p%3c%3d3%2c+y%3d3*x%28n-p%29%2c+n%3e0%2c+x%3e0%2c+y%3e0%2c+s%3cn%2c+n-p+mod+2%3d0&fp=1&incTime=true
02:53 🔗 db48xOthe x vdevs of n drives with s spares
02:54 🔗 db48xOthe and p parity drives per vdev
02:54 🔗 undersco2 http://articles.latimes.com/2011/jul/04/news/la-heb-finger-ratio-penis-length-20110704
02:55 🔗 Coderjoe and every time you write, to one vdev (ignoring anything to keep z copies), you wind up having to touch x+p drives
02:56 🔗 Coderjoe er, n+p
02:56 🔗 undersco2 https://twitter.com/#!/NegroTastic/status/103575885457797120
02:56 🔗 db48xOthe just n drives
02:56 🔗 Coderjoe (unless n includes s and p)
02:56 🔗 db48xOthe n includes the parity
02:56 🔗 db48xOthe spares don't have any data on them, they're just there for hot-swap when a drive dies
02:56 🔗 Coderjoe you still have to touch a large number of drives. your bus performance comes into play
02:56 🔗 db48xOthe n is probably going to be 5
02:57 🔗 db48xOthe I can fit 8 vdevs of 5 drives each and 5 spares into 45 bays
04:37 🔗 db48xOthe http://www.wolframalpha.com/input/?_=1313555672752&i=xn%2bz%3d45%2cy%3d3*x%28n-p%29%2cp%3e%3d1%2cp%3c%3d3%2cn%3e0%2cx%3e0%2cy%3e0%2cz%3e%3d0%2cz%3c%3d10%2cy%3e80%2c%28n-p%29%2fn%3e.5%2c%28n+mod+2%29%3d%28p+mod+2%29&fp=1&incTime=true
04:41 🔗 db48xOthe actually
04:41 🔗 db48xOthe http://www.wolframalpha.com/input/?i=xn%2Bz%3D45%2Cy%3D3*x%28n-p%29%2Cp%3E%3D1%2Cp%3C%3D3%2Cn%3E0%2Cx%3E0%2Cy%3E0%2Cz%3E%3D0%2Cz%3C%3D10%2Cy%3E80%2C%28n-p%29%2Fn%3E.5%2Cw%3D128%2F%28n-p%29
04:41 🔗 db48xOthe is better
05:58 🔗 SketchCow P.S. If anyone wants to see a small show on personal style and dress for men. http://putthison.com/
05:58 🔗 SketchCow There's 7 episodes now, all pretty good.
05:58 🔗 SketchCow Shoe one is my favorite
05:58 🔗 SketchCow (Episode 2)
07:43 🔗 undersco2 Hey, can someone try downloading this for me?
07:43 🔗 undersco2 http://ia700400.us.archive.org/5/items/AE2011-01-27.MATRIX/AE2011-01-27.MATRIX-ogg.torrent
07:43 🔗 ersi Just the torrent? Or the content?
07:43 🔗 undersco2 The content
07:44 🔗 undersco2 It's ~60MB
07:44 🔗 ersi ah, it's just 59MB~
07:44 🔗 ersi sure, I'll give it a try
07:44 🔗 undersco2 It's just a test file; the archive is planning on offering torrents of all the content
07:44 🔗 undersco2 With a webseed
07:44 🔗 ersi :o webseeds
07:44 🔗 ersi yeah, saw that just now
07:44 🔗 ersi It's going at "100KB/s - 560KB/s"
07:45 🔗 undersco2 Sweet!
07:45 🔗 undersco2 Excellent <3
07:45 🔗 ersi and I'm using "Transmission 2.33".. but most torrent clients should have webseed support
07:45 🔗 undersco2 Thanks, that means the webseed is working as intended
07:45 🔗 undersco2 Yeah, everything except uTorrent Mac
07:45 🔗 undersco2 hahaha
07:45 🔗 chronomex grand
07:46 🔗 SketchCow http://www.youtube.com/watch?v=_wrvKRNn0rU (Javascript)
07:46 🔗 * undersco2 excited
07:47 🔗 undersco2 Now I just need to show it off to the job requester, and see about getting it incorporated as a deriver
07:53 🔗 Coderjoe interesting that only one of the nodes is listed as a webseed
07:54 🔗 Coderjoe and that batcave.textfiles.com is the tracker
07:54 🔗 undersco2 They only want files webseeded from the 6x servers right now
07:55 🔗 undersco2 and batcave.textfiles.com==teamarchive-0.us.archive.org
07:55 🔗 Coderjoe ah
07:55 🔗 undersco2 It just sounds cooler ;)
07:55 🔗 undersco2 hha
07:55 🔗 undersco2 haha*
07:56 🔗 Coderjoe i'd still think they would want the tracker under the archive.org domain, though
07:56 🔗 undersco2 Probably
07:56 🔗 undersco2 This is just a POC
07:56 🔗 Coderjoe what tracking software?
07:57 🔗 undersco2 opentracker
07:57 🔗 Coderjoe but webseed should work as long as the httpd supports range requests
07:58 🔗 undersco2 Same as denis.stalker.h3q.org
07:58 🔗 undersco2 Yeah
07:58 🔗 Coderjoe and lighttpd does
07:58 🔗 undersco2 Well, it's plain nginx, so
07:58 🔗 Coderjoe when did they switch from lighty to nginx?
07:58 🔗 undersco2 No idea
07:59 🔗 undersco2 It's been nginx since I started using the archive like a year or so ago
07:59 🔗 undersco2 nginx > lighttpd, imo
07:59 🔗 undersco2 but that's just my opinion ;)
08:00 🔗 Coderjoe probably around the time they started using the massive hotswap-capable petabox v4 boxes
08:01 🔗 Coderjoe i suspect older nodes, like on the red boxes, use lighttpd still (unless someone actually upgraded them)
08:01 🔗 undersco2 possibly
08:01 🔗 Coderjoe i remember when anonymous FTP was still available for downloading on the nodes
08:01 🔗 undersco2 I think they try and keep everything consistent
08:02 🔗 undersco2 (p-i and cfengine are used for EVERYTHING)
08:09 🔗 Coderjoe bleh
08:09 🔗 Coderjoe stupid 10^3 vs 2^10
08:10 🔗 undersco2 huh?
08:10 🔗 undersco2 oh
08:11 🔗 Coderjoe that 24 bytes multiplies up to some significant differences
08:12 🔗 Coderjoe when you get out to 1TB vs 1TiB, that is a 99e9 difference
08:12 🔗 Coderjoe 99 billion bytes
08:13 🔗 Coderjoe 99.5 billion
08:14 🔗 Coderjoe "oh, we'll just right-shift by 10. that's close enough."
08:15 🔗 Coderjoe right up there with that "640K" line
08:17 🔗 undersco2 haha
08:18 🔗 Coderjoe i'm cuious (as I no longer really remember) how much of a speed difference there is between right-shift by 10, integer divide by 1000, and convert to BCD and omit three values
08:19 🔗 Coderjoe the 8088 had BCD instructions
08:19 🔗 Coderjoe (granted, other architectures would have been involved)
08:21 🔗 undersco2 Coderjoe: Can you try this one too? (It's 10MB) http://ia600401.us.archive.org/14/items/devisesetembleme00lafeu/devisesetembleme00lafeu-pdf.torrent
08:24 🔗 Coderjoe the previous one (the AE one) seems to have stopped at 96.6% with a hole in two oggs, and a bunch of metadata files not downloaded
08:26 🔗 undersco2 Which files?
08:26 🔗 undersco2 Well, which oggs
08:27 🔗 Coderjoe d1t1 is missing the first piece (which overlaps with the xml/md5 files) and d1t6 is missing the second peice
08:29 🔗 Coderjoe i wonder if this version of utorrent has problems with webseeds when it gets to endgame
08:30 🔗 Coderjoe devise stopped really downloading anything with 4 pieces missing, 3 of which are in the PDF files, and the 4th being all xml files
08:30 🔗 undersco2 That's interesting that that keeps occuring
08:30 🔗 Coderjoe it is uploading to peers still, though
08:30 🔗 undersco2 Are you seeing any hashfails?
08:30 🔗 Coderjoe not that I am aware of
08:31 🔗 Coderjoe ok. one on AE
08:31 🔗 Coderjoe and one on devise
08:31 🔗 Coderjoe are all the metadata files listed in the torrent actually available via the http?
08:32 🔗 undersco2 Should be
08:32 🔗 undersco2 That's what the torrent is created from
08:32 🔗 Coderjoe well that is interesting
08:32 🔗 undersco2 I'm at 97.3 on AE
08:32 🔗 undersco2 and climbing
08:32 🔗 Coderjoe you have a bunch of torrent files along with torrent_meta.txt files
08:33 🔗 undersco2 You mean in the item?
08:33 🔗 undersco2 Yeah
08:33 🔗 undersco2 One torrent file per mediatype
08:35 🔗 Coderjoe i wonder... I notice that the djvu.xml file was sent in chunked transfer-encoding and gzip content encoding
08:35 🔗 Coderjoe I wonder if utorrent is puking on one of those (probably the gzip, if either of them)
08:36 🔗 undersco2 Hmm, interesting
08:36 🔗 undersco2 Most web servers send gzip responses nowadays though, don't they?
08:36 🔗 Coderjoe well, it was sent that way when I grabbed it with ff. I haven't started wireshark yet
08:36 🔗 undersco2 You'd think they'd account for it
08:36 🔗 undersco2 (they being uTorrent devs)
08:37 🔗 Ymgve they probably just use a library
08:37 🔗 Coderjoe usually only certain types would be gzipped... text/html, text/xml, javascript, css
08:38 🔗 Coderjoe (kinda pointless to gzip an image, zipfile, or tarball)
08:38 🔗 undersco2 Good point
08:39 🔗 Coderjoe though on devise, the first piece into the djvu.xml file is marked as completed
08:41 🔗 undersco2 Yeah, I have the same
08:42 🔗 Coderjoe bleh. I haven't reinstalled wireshark on this machine yet
08:42 🔗 undersco2 that's fine, I'm on my way to bed anyway
08:42 🔗 undersco2 I'll look more at it tomorrow
08:43 🔗 undersco2 I wouldn't be surprised if it had something to do with the gzip'd reposnse
08:45 🔗 Coderjoe not sure what it uses for webseed http connections, but for rss http connections it uses this UA: "BTWebClient/2210(25130)"
08:45 🔗 undersco2 I see
08:45 🔗 Coderjoe unsure about library. it might just be using the windows web API calls.
08:47 🔗 Coderjoe oh wow
08:47 🔗 undersco2 Hm?
08:47 🔗 Coderjoe devise now has only 3 pieces missing
08:47 🔗 undersco2 All text?
08:47 🔗 Coderjoe lafeu.pdf finished. bw.pdf has 2 missing, and the XML files are marked as queued
08:48 🔗 Coderjoe and AE had d1t6 finish
08:49 🔗 Coderjoe still missing piece 1 on that, though (all the metadata files and the first bit of d1t01)
08:50 🔗 Coderjoe and on further examiniation, it looks like utorrent is not using the webseed urls because it sees >1.0 availability
08:51 🔗 undersco2 Aha
08:51 🔗 undersco2 Is there?
08:51 🔗 undersco2 Er, well, obviously not
08:51 🔗 undersco2 But why does it think there is?
08:52 🔗 Coderjoe because between all the peers it is connected to, it sees each peice at least once?
08:53 🔗 Coderjoe removing the webseed had no effect, so it is not counting those towards availability numbers
08:53 🔗 Nemo_bis transmission not using the webseed either (but I can't remember whether it's its fault or what)
08:53 🔗 undersco2 Hmm, interesting
08:53 🔗 undersco2 [2011-08-17 01:53:31] B0rked reason: invalid http response code (416): Requested Range Not Satisfiable (http://ia600401.us.archive.org/14/items/)
08:54 🔗 Coderjoe weird... by deleting the webseed and re-adding it, I briefly use the webseed (and show 1 seed on the torrent). it then tries to download the last piece and promptly hashfails
08:54 🔗 undersco2 So it must be something weird gzip-related or something
08:55 🔗 Coderjoe AHA
08:55 🔗 undersco2 You got it?
08:55 🔗 Coderjoe the hash check failed, and utorrent promptly banned that webseed
08:56 🔗 Coderjoe which is why it stopped using the webseed when there was more than one piece
08:56 🔗 undersco2 aha
08:56 🔗 undersco2 [2011-08-17 01:56:15] B0rked reason: content-length (28805) does not correspond to the requested length (33841)
08:56 🔗 undersco2 [2011-08-17 01:56:15] Banned 207.241.227.210:80 until Wed Aug 17 01:56:25 2011
08:57 🔗 undersco2 I wonder...
08:57 🔗 undersco2 if the content-length is shorter because it's gzip encoded
08:58 🔗 Coderjoe pretty sure it would be
08:58 🔗 Coderjoe if it even does gzip encoding on range requests
08:58 🔗 Coderjoe (and provided the client is even saying it allows gzip)
09:04 🔗 Coderjoe oh, there is a txt file in the AE directory that might be a good idea to have included in the torrent
09:06 🔗 undersco2 okay
09:06 🔗 Coderjoe hmm
09:07 🔗 Coderjoe I understand this is currently POC, but... once it becomes part of the derive process, I hope the torrent files are listed in the files.xml as derivative
09:07 🔗 Coderjoe :)
09:08 🔗 Coderjoe and I am going to remove the torrents now
09:10 🔗 Coderjoe oh man
09:11 🔗 Coderjoe what a shame. that devices and emblems book got wet at some point, and some of the ink started washing away
09:12 🔗 ersi Nemo_bis: Transmission 2.33 used the webseed when I tested an hour ago
09:12 🔗 Nemo_bis hm, 2.13 here for some reason
09:12 🔗 ersi that's oold
09:13 🔗 ersi doesn't even support UDP trackers
09:13 🔗 Nemo_bis bah, I don't like torrent much
09:13 🔗 Nemo_bis but let's see
09:16 🔗 Nemo_bis hm, perhaps it came from their ppa which was then disabled during upgrade and then didn't update from the standard repository?
09:16 🔗 ersi maybe
12:04 🔗 ersi 14:00 <@Ziphoid> heh, LastFM has funny developers... https://www.last.fm/robots.txt
12:05 🔗 ersi not nice that they disallow spidering of /music though
12:05 🔗 ersi but following the last disallows... I bet the spider should.. ignore human orders? ;D
12:10 🔗 Cagada Tired of niggers? Sick of their monkeyshines? We are too! Join Chimpout Forum! http://www.chimpout.com/forum At Chimpout WE ARE NOT WHITE SUPREMACISTS! I myself am Mexican! If you are not a NIGGER and you HATE NIGGERS, we welcome you with open arms! http://www.chimpout.com/forum
12:13 🔗 ersi </spam>
12:31 🔗 db48xOthe ersi: that's an awesome robots.txt
12:45 🔗 Nemo_bis ah, yes, webseed works now with trabsmission 2.33 from their ppa
12:47 🔗 Nemo_bis but yes, stopped at 96 %
15:22 🔗 * SketchCow wonders if Chimpout Forum is a 501-3(c) or if it's a politcal action committee.
15:22 🔗 SketchCow P.S. It would have PROBABLY been good to make another channel for discussing torrents for archive.org.
15:22 🔗 SketchCow I should have said something.
15:26 🔗 jch Swedish politicians just passed a law making hard drives ~twice as expensive, added price goes to copyright mafia. Effective as of September 1st.
15:26 🔗 jch It might only be external hard drives.
15:27 🔗 db48x2 ouch
15:27 🔗 jch yup. Only external disks
15:27 🔗 jch ersi: Kindda silly we didn't get around to meeting up at CCC
15:28 🔗 db48x2 I guess people can just buy enclosures and put the cheaper internal disks in them
15:29 🔗 jch Yeah
15:29 🔗 jch That's what the news site says too
15:31 🔗 ersi jch: yeah, but i was in tox i cated
15:31 🔗 ersi aka fuxxord :D
15:32 🔗 jch :P
15:32 🔗 jch some of the gbg people did crystal meth.
15:32 🔗 jch fucking crystal meth.
15:32 🔗 jch at a hacker con
15:32 🔗 jch unbelievable
15:33 🔗 jch and some idiot from #hack.se thought it was a brilliant idea to start tagging in our bus
15:47 🔗 ersi jch: Ahaha, I know who tagged the bus
15:47 🔗 ersi totally comical
15:47 🔗 ersi he asked almost everyone if it was OK, everyone said 'yes', then when he did it, everyone got sadface'd
15:47 🔗 ersi <- lol'd
16:02 🔗 SketchCow Crystal Meth destroys things.
16:03 🔗 SketchCow As a side note.
16:03 🔗 SketchCow Which is all it is right now.
16:03 🔗 SketchCow Man, we need projects!
16:11 🔗 ersi SketchCow: Don't jinx it, heh
16:11 🔗 SketchCow Downloaded: 17 files, 168G in 5h 29m 41s (8.71 MB/s)
16:11 🔗 SketchCow FINISHED --2011-08-17 04:13:43--
16:11 🔗 SketchCow Total wall clock time: 5h 29m 48s
16:11 🔗 db48x2 what should we do about astronautix.com?
16:12 🔗 db48x2 I grabbed a copy of it in April (straight wget)
16:13 🔗 db48x2 oh, it looks like it's back up, with extra ads
16:13 🔗 db48x2 hmm
16:17 🔗 SketchCow Good.
16:17 🔗 SketchCow Is the data different?
16:17 🔗 SketchCow Was anything removed?
16:17 🔗 db48x2 looking
16:17 🔗 SketchCow Might be good if someone could look down our fire drill page to see if there's been changes.
16:21 🔗 underscor SketchCow: It seemed too quiet in here, wanted to spice things up a bit
16:21 🔗 underscor Figured torrent talk was a good way to
16:23 🔗 db48x2 it looks like the only substantial change is to the index page, which features a selection of images linked to insteresting articles
16:23 🔗 db48x2 rather than a block of text links to articles
16:23 🔗 db48x2 and the ads
16:25 🔗 underscor jch: Link?
16:25 🔗 underscor (to the HD article)
16:41 🔗 SketchCow Torrent talk does nothing but nerd out.
16:41 🔗 SketchCow 4 people care.
16:41 🔗 SketchCow You're adding functionality to archive.org, not doing archiveteam stuff
16:41 🔗 SketchCow People come in, go "oh, nerds" and leave
16:54 🔗 Smackt__ Oooh, Nerd!
16:54 🔗 Smackt__ Nerds*
16:55 🔗 underscor SketchCow: :D
16:55 🔗 underscor Well, we *are* nerds
16:55 🔗 underscor Just saying
17:13 🔗 perfinion a nerd?!
17:13 🔗 perfinion where?!
17:21 🔗 SketchCow OK, who wants to go for Reddit with the Braille Playboy post.
18:18 🔗 SketchCow 9.7T .
18:18 🔗 SketchCow root@teamarchive-0:/3/FRIENDSTER# du -sh .
18:52 🔗 yipdw is that compressed or uncompressed?
19:20 🔗 SketchCow Compressed.
19:28 🔗 Nemo_bis hm, if this is not the right channel, should we really claim that #archive channel on FreeNode (or here)?
19:29 🔗 chronomex sounds like a good plan
19:33 🔗 SketchCow ?
19:33 🔗 chronomex for archive.org stuff
19:51 🔗 closure SketchCow: still time for me to upload my 200000 friendsters?
20:00 🔗 SketchCow #archivecommandos for archive.org stuff
20:00 🔗 SketchCow Be right back.
20:00 🔗 SketchCow closure: Certainly.
20:00 🔗 SketchCow Even after archive.org entries show up this week, I can add more material.
20:06 🔗 closure ok, will pull those out of my offline drives and get them somewhere with upstream bw
21:20 🔗 SketchCow -rwxrwxrwx 1 jscott jscott 3244910185 2011-07-31 23:57 friendster.500000-600000.part1.tar.gz
21:21 🔗 SketchCow -rwxrwxrwx 1 jscott jscott 3274828475 2011-07-31 23:58 friendster.500000-600000.part2.tar.gz
21:21 🔗 SketchCow -rwxrwxrwx 1 jscott jscott 500405108 2011-07-31 23:59 friendster.500000-600000.part3.tar.gz
21:21 🔗 SketchCow -rwxrwxrwx 1 jscott jscott 596482199 2011-07-31 23:59 friendster.500000-600000.part4.tar.gz
21:21 🔗 SketchCow THANKS FOR THAT
21:21 🔗 chronomex YOU'RE WELCOME
21:24 🔗 SketchCow root@teamarchive-0:/3/FRIENDSTER/5# gunzip *
21:25 🔗 db48x2 heh
21:25 🔗 Cowering she canna take it captain
21:27 🔗 SketchCow root@teamarchive-0:/3/FRIENDSTER# bzip2 --best *.tar
21:27 🔗 SketchCow Something's doing that down the line
21:27 🔗 SketchCow Friendster 0-1,000,000 is probably going to be 200gb
21:27 🔗 SketchCow "A Million Friends" will get attention
21:28 🔗 db48x2 that is catchy
21:30 🔗 db48x2 is anything missing from the first million?
21:30 🔗 SketchCow Well, at the moment? A lot. I'm still sorting stuff out.
21:30 🔗 db48x2 ah
21:30 🔗 SketchCow Hence that big fat gunzip
21:30 🔗 SketchCow and then a zip
21:31 🔗 dashcloud SketchCow: you got both of my friendster archives successfully?
21:31 🔗 SketchCow I believe so
21:32 🔗 db48x2 I have a copy of 1-99999 if you need it
21:32 🔗 dashcloud I believe I mentioned it before, but one of them contains slightly more than what's listed, because I had briefly grabbed 1 million ids instead of 100k ids
21:32 🔗 dashcloud I just saw this: http://opendedup.org/whatisdedup and it sounds pretty interesting, especially for something like friendster
21:33 🔗 dashcloud the code for it is here: http://code.google.com/p/opendedup/
21:34 🔗 yipdw i don't think deduplication would buy you much in this case -- compression is going to do some deduplication anyway, and you can optimize it by massaging the order of files in the archive
21:35 🔗 yipdw the big parts -- the images -- aren't going to benefit much from deduplication
21:35 🔗 SketchCow I am fine with too much.
21:37 🔗 db48x2 makes for a better story
21:56 🔗 SketchCow http://www.archive.org/details/twaudio-collection-2011&reCache=1
21:56 🔗 SketchCow WHAT COULD POSSIBLY GO WRONG
21:56 🔗 SketchCow 192gb .tar.gz
21:56 🔗 SketchCow Yeah baby
21:56 🔗 db48x2 good practice for Friendster
21:57 🔗 db48x2 hmm
21:59 🔗 db48x2 119G /backups/bff/friendster.009400000-009499999.tar.bz2
21:59 🔗 db48x2 24G /backups/bff/friendster.013800000-013899999.tar.bz2
21:59 🔗 db48x2 70G /backups/bff/friendster.007250000-007299999.tar.bz2
21:59 🔗 db48x2 70G /backups/bff/friendster.009036000-009099999.tar.bz2
22:00 🔗 db48x2 13G /home/db48x/archives/Friendster/friendster.001300000-001399999.tar.bz2
22:00 🔗 db48x2 15G /home/db48x/archives/Friendster/friendster.001200001-001300001.tar.bz2
22:00 🔗 db48x2 5.9G /home/db48x/archives/Friendster/friendster.000000001-000099999.tar.bz2
22:00 🔗 db48x2 8.9G /home/db48x/archives/Friendster/friendster.000100000-000199999.tar.bz2
22:00 🔗 db48x2
22:00 🔗 db48x2 349G /home/db48x/archives/Friendster/friendster.004300001-004399999.tar.bz2
22:00 🔗 db48x2 120G /home/db48x/archives/Friendster/friendster.009300000-009399999.tar.bz2
22:00 🔗 db48x2 107G /home/db48x/archives/Friendster/friendster.040000000-040099999.tar.bz2
22:00 🔗 db48x2 18K /home/db48x/archives/Friendster/friendster.124138261.tar.bz2
22:00 🔗 db48x2 897G total
22:00 🔗 db48x2 with two more 100k blocks left to compress
22:02 🔗 SketchCow Yeah, this is a lot of fuckin' data.
22:02 🔗 SketchCow It's just a lot.
22:08 🔗 chronomex om nom nom
22:09 🔗 SketchCow jscott@ia700601:/14/incoming/gv/twaudio$ cat * | gunzip - | tar vxf - >twaudio.2011.06.txt
22:09 🔗 SketchCow See that? That's been going for 30 minutes.
22:10 🔗 chronomex you might want to use pv instead of cat, it draws progress bars and makes time estimates.
22:10 🔗 chronomex quite handy
22:11 🔗 SketchCow Will investigate.
22:11 🔗 SketchCow Got a lot it's all "doing"
22:12 🔗 db48x2 pv is fun
22:23 🔗 db48x2 alas, there's no way to sort archive.org items by size
22:39 🔗 Coderjoe oh god dammit
22:40 🔗 * Coderjoe pounds his head on his desk a few times
22:41 🔗 Coderjoe underscor: did you also reclaim and download 3,985,001-4,000,000 ?
22:50 🔗 Coderjoe I think tar will stop at the first set of 2 or more null header blocks, which normally signal end-of-archive.
22:51 🔗 Coderjoe there is a command line option to have it ignore null headers
22:52 🔗 Coderjoe -i, --ignore-zeros ignore zeroed blocks in archive (means EOF)
22:57 🔗 SketchCow http://www.youtube.com/watch?v=BwJVGBueYDE
22:57 🔗 SketchCow Worth
22:59 🔗 Coderjoe any relation to Rick?
23:09 🔗 underscor Coderjoe: I don't *think* so
23:09 🔗 underscor I might have
23:09 🔗 underscor All my data is at home
23:14 🔗 SketchCow This is Rick's wife
23:20 🔗 Coderjoe I done fucked up. I lost three range tarballs. (the errant command I should have done a dry run on wiped out 5, but two are still available on the system they were originally downloaded on)
23:21 🔗 chronomex D:
23:21 🔗 Coderjoe 0 friendster.000117000-000119999.tar.xz
23:21 🔗 Coderjoe 0 friendster.000310001-000320000.tar.xz
23:21 🔗 Coderjoe 0 friendster.003985001-004000000.tar.xz
23:21 🔗 chronomex that's okay, they're only bits.
23:22 🔗 Coderjoe the last one was the largest at 32GB (uncompressed size)
23:23 🔗 Coderjoe the 117k was 815MB uncompressed, and the 310k was 5.1GB uncompressed
23:24 🔗 SketchCow I'll avoid telling you how much history you just destroyed
23:24 🔗 Coderjoe this is what the "god dammit" and headdesk earlier was about
23:25 🔗 yipdw how'd that happen
23:27 🔗 Coderjoe forgetting "ls ../blah*" would have the ../ on it when running this:
23:27 🔗 Coderjoe for a in `ls -1S ../friendster.003* ../friendster.000*`; do [ -f $a ] || echo copying $a;echo; pv ../$a > $a; done
23:27 🔗 Coderjoe pv said ../../friend(blah) didn't exist, but ../friend(blah) had already been zeroed
23:28 🔗 Coderjoe trying to fit as much of the data on the portable drive as possible
23:30 🔗 Coderjoe doing a dry run with echo "pv ../$a > $a" would have saved me.
23:30 🔗 yipdw not to be an ass but it's times like that when I'd really just do cp
23:31 🔗 chronomex 15:11:45 db48x2 pv is fun
23:31 🔗 yipdw but uh anyway
23:31 🔗 yipdw bbl
23:31 🔗 chronomex chronomex | cp is not fun
23:34 🔗 yipdw why would you pipe yourself to cp
23:46 🔗 dashcloud is the archive.org shareware page extending to include floppies?
23:51 🔗 SketchCow Not yet.
23:51 🔗 SketchCow Eventually.
23:51 🔗 SketchCow Let me deal with the 1,000 CD-ROMs first
23:52 🔗 dashcloud thanks for the heads up about cryptozookeeper being available as a package- looked really cool, so I got it
23:58 🔗 SketchCow No problem, I like making my friends' stuff known.
23:58 🔗 SketchCow Hence the pushing of the archive's book scanning
23:58 🔗 SketchCow I am going to set up my scanner and CD-ROM now
23:58 🔗 SketchCow Because it's time for that, too.

irclogger-viewer