[00:25] ha. [00:25] You know what that says? [00:25] "Turn Yahoo! into Facebook's culture." [00:25] Period. [01:07] s w [01:07] oops [01:26] I know what I'll be doing in vegas on my free day [01:26] http://www.youtube.com/watch?NR=1&feature=fvwrel&v=kS6B1jwLvjc [01:29] yeah, right [04:41] chronomex: no, really [04:41] I'm planning on it [04:41] k [05:36] maybe I'll get the video package [05:36] so you guys can laugh at me freefalling [05:37] :D [08:46] underscor: fuck yea! [13:45] underscor: you have time for a quick boto question? [13:46] underscor: i'm able to get a bucket, and get a key file ok [13:46] underscor: but when i call k.size on the key file it always seems to say 1 [13:47] underscor: when I can see from the HTML presentation of the bucket that there is more data than that [13:47] underscor: is that not implemented in the ia s3 api? [14:37] quick psa on fileplanet: we have (slowish) read ftp access now. mirroring is in progress. life is good. [15:02] which bit of fileplanet Schbirid ? [15:02] all of them [15:13] When's fileplanet dying again? [15:14] not announced, they will just make it more static first [15:32] :D nice. [19:11] edsu: yeah, we don't provide correct size info that way [19:11] unfortunately [19:13] underscor: ok, thnx [19:13] i guess i can do a head request and get it that way [19:14] nope, that's what k.getsize does [19:15] I don't know how you get it from our system, actually [19:15] hmm [19:22] edsu: ok, just talked to the s3 guy [19:23] edsu: you can either getkeys and manually sum the sizes [19:24] i was calling k.size [19:24] underscor: where k was a key [19:24] underscor: it returned 1 [19:24] oh, a key and not a bucket? [19:24] can gist it if you want [19:24] yeah, a key [19:24] kk, one sec [19:28] edsu: what is the name of your bucket? [19:29] kasabi [19:31] http://ia700804.s3dns.us.archive.org/kasabi [19:31] The size is being returned in the key list [19:31] hmmmmm [19:31] yeah, hmm it is working now [19:32] edsu-- [19:32] thanks :) [19:34] sorry for making you jump through hoops there ... [19:34] np! [19:34] we're here to help [19:34] seems to be a small bug in my script i'm writing [19:34] rock n' roll :) [19:34] also, if you have an outstanding s3-put in our system, the data will be incorrect [19:35] we have up to a "12 hour eventual consistency" period because of the way our system works [19:35] like: [19:35] if you s3-put foo.txt, it actually launches a bunch of tasks in catalogd, which is the cluster management software [19:35] s3 will return "finished!" immediately [19:36] but that file still has to be copied out to the cluster, duplicated, sunk, and all sorts of stuff [19:36] so if you do a s3-head or s3-get, it won't work [19:36] even though in theory your put was "successful", there are more steps on our end it has to go through [19:37] ok, good to know -- maybe that's what i'm seeing [19:37] will know more in a sec [19:38] http://www.us.archive.org/catalog.php?history=1&identifier=kasabi you can look at outstanding tasks here [19:38] oh so it's consistent now eh? [19:38] yep [19:43] underscor: check this out https://gist.github.com/3138383 [19:44] and then look at the size of latc-metadata.gz in the listing at http://ia700804.us.archive.org/4/items/kasabi/ [19:50] actually renewable-energy-generators.gz is a more interesting key because the html page says it is 792823 bytes, but boto is saying its size 1 [19:51] i definitely get 792823 bytes when downloading it too [19:52] maybe i messed things up by deleting keys and then going and uploading again [19:52] deleted through that flash ui widget thing [20:04] bbiab [20:07] OK, who codes in bash? [20:07] Is GOOD at coding in bash? [20:13] SketchCow, I know a good amount of bash and I think I code well. [20:13] What's up? [20:14] Are you new? Don't remember you. [20:14] I am indeed. [20:14] What brings you to #archiveteam? [20:14] please take off your shirt for the initiation rites [20:15] (welcome!) [20:15] lol [20:16] SketchCow, my typical archiving/hoarding tendencies, love of programming, your loud personality (https://www.youtube.com/watch?v=-2ZTmuX3cog), and a hatred of Yahoo! for the loss of my old GeoCities site. [20:17] we're loud fuckers [20:18] GOOD! [20:18] :D [20:18] I would hope so with someone like SketchCow at the head of it. [20:21] OK, well, great, you're drafted. [20:24] http://fos.textfiles.com/ia-injector [20:24] OK, grab that. [20:26] This is my sketched, sort-of working, beginning of an Internet Archive installer. [20:26] It lets you inject an item into archive.org. [20:28] Alright, grabbed and reading over the code. [20:31] SketchCow, alright, read over. [20:32] Right now, it's very oriented towards me. [20:33] I'd like it that if a person put in some sort of config section, they could get rid of the repetitive things. [20:33] Give me a moment, I'll give you another one. [20:34] I vote for a "did you fuck up (y/n)" before creating a permanent thing [20:34] A little more like it [20:35] http://fos.textfiles.com/ia-ingestor [20:35] My secret weapon [20:39] Alright, what is ingestor supposed to do? [20:39] om nom nom [20:39] Ingestor is a more directed version of injector [20:39] It basically lets you take a bunch of like-structured filenames and make them into individual items. [20:39] So if you have something like: [20:40] schoolgirl_porn_monthly_1989_04.pdf [20:40] schoolgirl_porn_monthly_1989_03.pdf [20:40] schoolgirl_porn_monthly_1989_01.pdf [20:40] You can say "Call them Schoolgirl Porn Monthly, use the _ as a separator between columns, use the 4th column for the year, the 5th for the month." [20:40] aces [20:40] And you can then jam it right down the line, calling injector against every item. [20:41] This is how I add 400 issues at once. [20:41] I've been using this for a year. [20:41] And I was going to pretty these up for further consumption, but fuck it. Someone shove it in github and everyone make it better. [20:41] because I'm busy [20:41] The Power of Open Source [20:42] I am going to make that better. Calling sed more than twice in a command means you need a bigger tool [20:42] That's an interesting maxium you just made up. [20:42] does IA have a test area [20:42] You can declare something a part of collection test and it'll be mouth-loved within a short time [20:44] cool [20:44] Before you make radical changes, I can explain almost everything I did in there. [20:44] do you need any specific file sizes or just the script working [20:45] The script works now. [20:45] I can understand what you did. You beat up some filenames into normalization and then upload them via curl [20:45] For this particular item, it uses the current filenames as hints to do the uploads into the s3 interface. [20:46] It also understands Jan=01, etc. [20:47] SketchCow, so you just need it cleaned up/less ugly? [20:48] config file loading [20:48] I am worried that right now it has very poor error checking. [20:48] you want logging [20:48] I am fine with the config file being on the front of injector [20:48] ingestor, I think, does the vast majority of what I want. [20:48] Note how TEST= is at the front [20:48] So it comes, default, in test mode, so you can see what it THINKS it'll try and do. [20:50] Example, I have creepy magazine: [20:50] root@teamarchive-1:/2/MAGAZINES/WARREN/Creepy (Warren)# ls [20:50] Creepy 001.cbr Creepy 031.cbr Creepy 061.cbr Creepy 091.cbr Creepy 121.cbz [20:50] Creepy 002.cbr Creepy 032 (1970).cbz Creepy 062.cbr Creepy 092.cbr Creepy 122.cbr [20:50] do you run the filenames over anything before feeding it to ingestor [20:50] Creepy 003.cbr Creepy 033.cbr Creepy 063.cbr Creepy 093.cbr Creepy 123.cbr [20:50] Creepy 004.cbr Creepy 034.cbz Creepy 064.cbr Creepy 094.cbr Creepy 124.cbr [20:50] Creepy 005.cbz Creepy 035.cbr Creepy 065.cbz Creepy 095.cbr Creepy 125.cbr [20:50] See, they're all there with the proper number [20:50] I have scripts to rename them to help normalize, yes. [20:50] Ingestor shouldn't try and normalize. [20:53] what kind of errors do you want checked [20:53] I think it would be nice to generate a completed log, yes. [20:53] It would be good for it to see if the file exists, if it completed the upload [20:54] do you also want to make source the file is not zero bytes [20:58] yes, that would be nice. [20:59] both injector and ingestor [20:59] injector looks like it could use some input checking on the user input [21:05] injector should be sleeping with injestor. [21:37] that'd produce injector^2 [23:17] http://194.71.107.80/torrent/7380688/BYTE_magazine_full-res_scans_PDF_JC1.0_20120622 — should be added to IA [23:42] so I've put up the steps & commands I used for 3.5'' floppy imaging (MS-DOS & Windows floppies only) on the discussion page for Rescuing Floppy Disks [23:42] comments would be appreciated [23:48] and I suggest using ddrescue in all cases because of its status display, and reading each disk twice