[00:01] umm this seems to be getting low again http://tracker.archive.org/df.html [00:04] we're fucked [00:05] well, sketchow lands in an hour [00:06] because all our favorite things to do after transatlantic shitflights is sysadmin [00:09] hahaha [02:08] mmm [02:08] router gods are angry again [02:53] haha [02:53] "Try doing a Google search on "umop dooms" for a book by Noam Chomsky scanned upside down. " [02:53] (on the subject of "accidental art" from google book scanning screwups) [03:04] it was probably a misbound book [03:04] where one leaflet was upside down [03:04] i've seen it before [03:05] google's scanner just turns the pages, it doesn't care [03:07] leaflet probably fell on the floor in book printing place and someone stuffed it back on the rack but upside down [03:07] before it was glued to spine and cut [03:33] page before it is a rightside-up 350. first upside down page is 343. then 344... [03:33] (344 is also upside down) [03:34] dcmorton and underscor need to slow down a bit, I think [04:13] hmm [04:13] any way to see if a specific memac user has been archived yet? [04:13] (or check if it is in the list, even) [05:31] holy shit i hate taxes [05:31] taxes are taxing [05:31] eh? see what i did there? [05:31] don't overtax yourself [05:32] haha [05:32] I get to pay state income tax twice! [05:32] how lovely [05:55] You think thats bad? Us prostitutes have to pay about four! [08:33] Yo. [08:38] eyy. safely landed? [08:38] Yeah, back home. [08:38] Wow, we're in trouble with disk space. [08:40] However, the bright spot is this got Google Groups shoved in. http://www.archive.org/details/archiveteam-googlegroups is growing very quickly now, and will get back a terabyte of disk space. [08:43] that'll be good [08:47] Meet the first one: http://www.archive.org/details/archiveteam-mobileme-1328865740 [08:47] The first of 500. [08:50] wow [08:51] we need some 50TB hdds already [08:56] Coderjoe: Use http://memac.heroku.com/rescue-me to add usernames to the list. There's no way to check if a user has been archived, but the upload form will at least tell you if the tracker knows about the user. [09:03] http://www.archive.org/details/thepiratebay.org [09:05] SketchCow, can you explain me if the LOW in header authorization: LOW etc. has some meaning and which? [09:05] or, are there priorities for s3 which allow you to get more bandwidth or whatever? [09:07] 19:17:53 <@tef> but yeah if I didn't like software because I didn't like the authors, I wouldn't be left with a lot of software to use [09:07] tef: you can't possibly hate Kernighan, Aho, Ritchie, and Thompson. [09:30] Up against it, I'm going to upload the Star Wars Forums collection, even though I think it's essentially fucked up as is. [09:31] Nemo_bis: I think that's from the S3 api, I suggest you read Amazon's documentation on it [09:31] LOW is just a designation of the security quality. [09:31] It's a straight-sent password set. [09:34] > x-archive-meta-date:2011-09 [09:34] > x-archive-meta-language:eng [09:34] > x-archive-meta-mediatype:web [09:34] > x-archive-meta-title:Archive Team Panic Download: Star Wars Forums [09:34] > x-archive-meta01-collection:archiveteam [09:34] There it goes. [09:35] I happen to think it will not be a very good set. [09:35] I.e. someone may want to take it down and play with it in the future. [09:35] Not our best work. Too scattered, too weird, came in from too many sources. [09:35] There's a lot of data IN there but it's not good. [09:37] But we're against the wall, so here we go... Terabytes of data draining into archive.org. [09:37] Then we need to clean the archiveteam collection's item settings. [09:37] Or, I do. [09:37] oh crap, it's a race? [09:38] time for me to cram shit into batcave [09:38] http://www.youtube.com/watch?v=GaQxn1Ke8AY [09:38] Just blast that and take me on. [09:38] no I think I'll watch star trek and eat cookies instead [09:41] 122G friendster.002600001-002700000.tar [09:41] 137G friendster.003100001-003189201.tar [09:41] 152G friendster.002700001-002800000.tar [09:41] 243G friendster.004200001-004300000.tar [09:41] 32G friendster.003900001-003921700.tar [09:41] 39G friendster.005000000-005014303.tar [09:41] Oh, look what I found. [09:41] ls -l [09:42] niiice [09:43] I'll get us back in shape, but to be honest, we should really be moving to fortress of solitude [09:44] This would require a full stop [10:03] I've got 5 separate streams dumping googlegroups into archive.org [10:35] SketchCow: Would you like me to update the mobileme upload scripts to upload to the new fortress? We could make it a gradual switch: people who haven't updated the script still upload to batcave and are handled there, updated users upload to the new location. [10:36] I think so. [10:36] Let's do it tomorrow or monday when I haven't been up for 28 hours and counting [10:40] That sounds like a good idea. :) [11:01] Move things in your rsync directory so I know I can put them into archive.org, if possible. [11:07] Goin' well? [11:14] One disk is down to 19gb free [11:15] SketchCow, which disk? Should I stop my rsync to my slot? [11:28] SketchCow: You can't move anything with rsync. [11:28] greeeeat [11:28] And can't rename anything. But I think that is because of your rsyncd setup. [11:29] Last time I checked you're only allowed to upload things, not rename, move or delete. [11:29] Yeah, /2/ is going to fill. [11:29] They're all going to fill. [11:30] I'd love to get enough space back on /2/ to last the night. [11:30] Then I can work harder tomorrow. [11:30] In my rsync directory, everything is done except umich.edu. [11:30] Anyhub is quite large (hundreds of GB), as is mobileme. [11:33] Yeah, I think tomorrow is Concoct upload strategies day. [11:34] I am SURE we're going to fill it. [11:36] All I can do is wait for complaints. [11:36] If people are rsyncing, then they just need to restart the rsyncing. [11:36] So there's that. [11:36] OK, bed. [11:37] Dream of ideas for us to start moving in splinder accounts. [11:38] Remember that I have a 40 GB directory with broken splinder accounts. [11:38] Should be nemo_bis/splinder-broken/ [11:39] They were put back into queue and supposedly redownloaded, but worth checking [14:18] can one specify the network interface to use in wget? [14:18] i have multiple 3g modems and i need to distribute requests across them [14:20] sundown: --bind-address=ADDRESS bind to ADDRESS (hostname or IP) on local host. [14:23] how did i miss? :) [14:23] thx [16:26] 6630047744 56% 255.89kB/s 5:30:12 [16:26] rsync: write failed on "/mobileme/e/ei/eig/eightball3/public.me.com/public.me.com-eightball3.warc.gz" (in nemo_bis): No space left on device (28) [16:26] rsync: writefd_unbuffered failed to write 4 bytes to socket [sender]: Connection reset by peer (104) [16:30] Yeah, no suprise really. [16:30] indeed [16:30] yeah, sucks when you've been uploading for 1 week and the file doesn't resume tho :/ [16:37] I think it does resume [16:38] depends on your rsync parameters [16:41] Mine or batcave's? [16:41] I was able to resume an upload on batcave, to be clear. [16:42] I was talking about the sender/client [16:42] hmm, I wasn't [16:42] but I guess you can restrict that in rsyncd [16:42] perhaps I should not be using --partial to batcave [16:43] might be that [16:43] I mean a single file upload resuming in the middle..normally I use --partial to allow that [16:43] but I think it needs to rename the file which is prohibited on batcave [16:45] Unless the progress status was true and I uploaded at 50 MiB/s over the ocean, with the standard script resume works [16:46] what's the standard script in this case? [16:47] I mean upload-finished [16:50] oh, for mobileme then [16:51] uses --partial :) [16:51] mobileme is not multi-gb large files is it? [17:06] Yes, batcave is write only = true for closure's module, at least [17:06] So partial doesn't work [17:09] There's often multi-gb large Mobileme warcs [17:10] but the connection to the dump has been stable really [17:12] Hmm [17:12] None of the drives are full [17:13] Full-full, or Linux-full? [17:14] 'Cause I've been bitten by Linux wanting to have a wad of free space for no good reason before. [17:18] Linux-full [17:18] Hmm, let me check inodes [17:18] Although that really shouldn't be an issue [17:32] I think SketchCow already dealt with the drive being full [17:34] oh kay [18:24] I've never had problems resuming uploads to batcave, so I think --partial is not affected by write only = true. [19:30] Today, I am doing one thing. I am basically cleaning out the hard drives. [19:31] screwing the top off and blowing the dust bits away? :o) [19:36] It's not that hard to see what's going on. [19:36] We weren't done with uploading splinder, we have no approach to knocking off sets and uploading splinder sets, and then we started packing in the mobileme. [19:36] Case closed on why. Now I need to figure out an approach that will work. [19:52] SketchCow: There is a ton of friendster stuff I found in the underscor ftp directory [19:52] Just so you know [19:52] That would make two of us, with you as #2 [19:52] something like 600GB worth [19:52] archiveteam: stuffing it everywhere it'll fit [19:52] 1.5tb worth [19:52] Oh [19:52] wow, that's a lot [19:53] I couldn't du all of it, but jeez [19:54] There is another like 2TB in /1/UNDERTHESTAIRS/send_to_jason [19:56] ... [20:00] Always good to know....eventually. [20:01] What I want is a lunch, I want to get a nice big lunch, come back, and push friendster into archive.org all day, getting back terabytes. [20:01] Then work with alard to move the mobileme transferring to new box [20:01] Then track down whoever thought cars 2 was a good idea [20:01] hahah [20:02] chronomex: Just a bit of data, eh? [20:02] several bits, it sounds [20:02] Speaking of, ugh, I have like 12TB of data to push into the archive [20:02] EVERY DAY IM ARCHIVING [20:04] I would say we're singing "hi ho hi ho" but it's more like http://www.youtube.com/watch?v=CpEOErRV-u4&t=1m32s [20:10] lol [20:11] also, as an aside [20:11] itap us a fantastic android rdp client [20:12] is* [20:18] I have a friend that calls dubstep like that `delicious robot sex` [20:57] > PUT /archiveteam-googlegroups-um/googlegroups-um.zip HTTP/1.1 [20:57] Googlegroups....um [20:59] Are you using the http put method for anything? [21:00] Or was this just something random from the log? [21:00] I am indeed. [21:01] > PUT /archiveteam-googlegroups-um/googlegroups-um.zip HTTP/1.1 [21:01] http://www.archive.org/details/archiveteam-starwarsforums needs some love and analysis. [21:01] At the least, I or someone else needs to write some docs as to what the hell this is. [21:01] Meanwhile, though, it's 200gb off the drives. [21:02] :o STARWARSFORUMS.2011.Dataset.tar ??B [21:03] When items are first added, they go in and a later process comes along, analyzes them, and adds the filesize. [21:03] Nothing to be afraid of. This is the cutting edge side of uploading data. [21:03] 152,775,772,160 bytes [21:03] By the time today is over I suspect I'll have shoved 3-5 terabytes off the machine [21:04] Ah, so with such items derive.php makes a hours long rsync just to calculate a filesize and add it to metadata? :) [21:05] Yes. [21:05] The system was never really designed or intended for 10gb+ items. [21:05] My doing so is the new guy basically refusing to snap into line and force the system to improve. [21:05] I've made items as large as 400gb [21:05] :D [21:06] And if you don't think that's caused friction, you're kidding yourself. [21:06] What's the point of not splitting a two-three digit GB archive [21:06] Some of my uploads broke some servers too [21:06] (so to say) [21:06] Like group files by the million or so [21:06] You say "what's the point" like this is an intended action. [21:07] I could explain how it got to this situation, but I would instead point out that as we go along, a certain family of items are essentially dataset space nightmares. [21:07] And we improve those as we go, making them work better for paradigm and tracking, but the fact is that one decision at one point expands into a large pain at the end. [21:07] some of these space nightmares also require fractally complex curation [21:07] Right. [21:07] My hope, and I think I'm right, is that we're basically guys in 1998 throwing ISOs around [21:08] it's huge and not really useful now, but soon it will be small and utility will not have decreased? [21:08] Yes. [21:08] This could be probably be solved at the script level [21:08] ok [21:08] At some point the archive is generated [21:08] YOU DON'T SAY [21:08] REALLY. [21:08] :O [21:09] Cpt. Obvious signing in [21:09] nitro2k01: starwars is a few years old, we were younger then and less deliberate [21:09] also it was a panic grab [21:09] I was discussing starwars being a mess. [21:10] I had it on this drive, going "Man, I really should look further into this thing." [21:10] And that time isn't going to come. [21:10] So instead, tar it up into a big set of .tar files, the easiest to handle, the openest format, and pop them into archive.org for people to download or to sit there. [21:11] Wow! A download from archiove.org that doesn't go at 30 KB/s! [21:12] More bandwidth is being added. [21:12] People sure do have lots of QA feedback on their free servies. [21:13] haters gotta hate, else they wouldn'te be haters [21:13] Today you are going to get cheery jason and I hope you have lots of extra heads because I'm going to keep biting them off. [21:14] "Wow! What are all those heads?" [21:14] "These are the faces of evil!" [21:15] http://www.youtube.com/watch?v=FSkd3V-dRY0&feature=related [21:17] I'm about to shove up all the Scientific American issues from 1800s to 1909. Thanks, Coderjoe [21:17] We'll see how they handle it. [21:18] bzip2 --best friendster.002700001-002800000.tar & [21:18] Lot of that, too. [21:18] I need to bzip2 the friendster sets, to save space. [21:18] You can typically pass the j parameter directly to tar to compress with bzip2 already at that step [21:19] REALLY. [21:19] IF [21:19] YOU DON'T SAY [21:19] WHERE'S THE --TIMEMACHINE OPTION TO GO BACK TO JUNE 2011 [21:19] --timemachine 2011-06-23 [21:19] tar: error: unknown date format [21:20] Get the beta, dude [21:20] "I've only done this once before" [21:20] tar: error: --timemachine works forwards only [21:20] Hmm, that shouldn't happen, chronomex. It's ISO8601 after all... [21:21] http://dailyderp.com/wp-content/uploads/2011/01/back-in-time.jpg [21:22] "*after* we get back" [21:22] Or before [21:22] or [21:22] hmm [21:29] I'll think about. Thanx for the inquiry. Without your input on a price, then I guess the way I feel is it takes two to tango (horse trade if you will) and it's you turn to answer my question. If I were to sell the disks for $10 bucks each, then multiply that by 75 and hope for the best. Not sure I would ask that much for each one and I'm not sure I wouldn't. [21:29] Tnx's again, [21:29] Barb [21:29] This is me talking with someone who put up some old CD-ROMs from her BBS days. [21:29] She seems to think of them as an investment. [21:29] She's kind of wrong. [21:29] I am not going to spend $750 for 75 CD-ROMs from the 1990s. [21:30] I can just get people to contribute to me and spend the $750 on additional CD-ROM drives to ingest at a faster rate. [21:30] $750 will get you a shitpile of cd drives [21:31] I've just ordered 150 CD-ROMs for 75 € shipping included [21:33] Yeah, that's typical. [21:34] Part of my job is dealing with people who reveal, at this late stage, that it wasn't the uniqueness of their hobby or the special relation of what they were dealing with to the world that led to their isolation and rejection. It was them. [21:34] History thinks you're an asshole [22:10] scientific-american-v01-n01-1845-08-28.pdf scientific-american-v01-n14-1859-10-01.pdf scientific-american-v01-n27-1846-03-19.pdf scientific-american-v01-n40-1846-06-25.pdf [22:10] scientific-american-v01-n02-1859-07-09.pdf scientific-american-v01-n15-1859-10-08.pdf scientific-american-v01-n28-1846-03-26.pdf scientific-american-v01-n41-1846-07-02.pdf [22:10] scientific-american-v01-n03-1859-07-16.pdf scientific-american-v01-n16-1859-10-15.pdf scientific-american-v01-n29-1846-04-02.pdf scientific-american-v01-n42-1846-07-09.pdf [22:10] scientific-american-v01-n04-1859-07-23.pdf scientific-american-v01-n17-1859-10-22.pdf scientific-american-v01-n30-1846-04-09.pdf scientific-american-v01-n43-1846-07-16.pdf [22:10] scientific-american-v01-n05-1859-07-30.pdf [22:10] Will Scientific American flip out when I upload issues from 1845? Let's find out! [22:10] :D [22:10] project gutenberg has been doing those for a while iirc [22:36] http://www.archive.org/details/scientific-american-1845-08-28 [22:38] Looks good - here goes the rest. [22:38] harrrrrrrrrp [22:51] I am demolishing stuff and the disk space is barely going down. [22:52] But the work continues. [23:01] I adjusted the destruction page [23:01] http://tracker.archive.org/df.html [23:04] http://www.archive.org/search.php?query=collection%3Ascientific-american-1845-1909&sort=-publicdate looking good [23:11] Damn that took a while [23:18] SketchCow: Awesome! [23:45] underscor: If you're seriously going to have that thing running, then at least add a (df -h) output after each one. [23:47] BlueMax: Don't even BEGIN to complain about timetables [23:47] why doesn't mrtg report on disk space? [23:47] SketchCow: it was mainly me trying to find a working EFnet server :P [23:47] OK, carry on whining then [23:47] Do we have any archive team members in spain? [23:48] I'll be in Spain in May [23:48] emijrp I guess [23:48] London/Brighton in September [23:48] Emijrp's one of my favorites [23:48] My special favorite [23:48] :O [23:48] Even though he's a Wikimedian! [23:48] * SketchCow keeps you all separated thinking the others are his favorite and you have to work harder [23:48] lol [23:48] Shouldn't you tell him in private then [23:49] ah, no, ok, it's the reverse [23:50] Blasting up volume 2 of scientific american. [23:51] Not sure how many volumes there are, after the new blast is finished, I'll say. [23:51] They don't help space, just peace of mind [23:51] and OCR graphs! [23:52] It helps that I am BLASTING dubstep, which always cheers me up [23:52] That and the 5 redbulls [23:53] http://www.flickr.com/photos/mirka23/6849414941/in/set-72157629221506551 [23:55] http://www.flickr.com/photos/mirka23/6849418907/in/set-72157629221506551/ <-- Internet Archiving - serious business