#archiveteam 2012-07-18,Wed

โ†‘back Search

Time Nickname Message
00:25 ๐Ÿ”— SketchCow ha.
00:25 ๐Ÿ”— SketchCow You know what that says?
00:25 ๐Ÿ”— SketchCow "Turn Yahoo! into Facebook's culture."
00:25 ๐Ÿ”— SketchCow Period.
01:07 ๐Ÿ”— underscor s w
01:07 ๐Ÿ”— underscor oops
01:26 ๐Ÿ”— underscor I know what I'll be doing in vegas on my free day
01:26 ๐Ÿ”— underscor http://www.youtube.com/watch?NR=1&feature=fvwrel&v=kS6B1jwLvjc
01:29 ๐Ÿ”— chronomex yeah, right
04:41 ๐Ÿ”— underscor chronomex: no, really
04:41 ๐Ÿ”— underscor I'm planning on it
04:41 ๐Ÿ”— chronomex k
05:36 ๐Ÿ”— underscor maybe I'll get the video package
05:36 ๐Ÿ”— underscor so you guys can laugh at me freefalling
05:37 ๐Ÿ”— chronomex :D
08:46 ๐Ÿ”— ersi underscor: fuck yea!
13:45 ๐Ÿ”— edsu underscor: you have time for a quick boto question?
13:46 ๐Ÿ”— edsu underscor: i'm able to get a bucket, and get a key file ok
13:46 ๐Ÿ”— edsu underscor: but when i call k.size on the key file it always seems to say 1
13:47 ๐Ÿ”— edsu underscor: when I can see from the HTML presentation of the bucket that there is more data than that
13:47 ๐Ÿ”— edsu underscor: is that not implemented in the ia s3 api?
14:37 ๐Ÿ”— Schbirid quick psa on fileplanet: we have (slowish) read ftp access now. mirroring is in progress. life is good.
15:02 ๐Ÿ”— SmileyG which bit of fileplanet Schbirid ?
15:02 ๐Ÿ”— Schbirid all of them
15:13 ๐Ÿ”— SketchCow When's fileplanet dying again?
15:14 ๐Ÿ”— Schbirid not announced, they will just make it more static first
15:32 ๐Ÿ”— SmileyG :D nice.
19:11 ๐Ÿ”— underscor edsu: yeah, we don't provide correct size info that way
19:11 ๐Ÿ”— underscor unfortunately
19:13 ๐Ÿ”— edsu underscor: ok, thnx
19:13 ๐Ÿ”— edsu i guess i can do a head request and get it that way
19:14 ๐Ÿ”— underscor nope, that's what k.getsize does
19:15 ๐Ÿ”— underscor I don't know how you get it from our system, actually
19:15 ๐Ÿ”— underscor hmm
19:22 ๐Ÿ”— underscor edsu: ok, just talked to the s3 guy
19:23 ๐Ÿ”— underscor edsu: you can either getkeys and manually sum the sizes
19:24 ๐Ÿ”— edsu i was calling k.size
19:24 ๐Ÿ”— edsu underscor: where k was a key
19:24 ๐Ÿ”— edsu underscor: it returned 1
19:24 ๐Ÿ”— underscor oh, a key and not a bucket?
19:24 ๐Ÿ”— edsu can gist it if you want
19:24 ๐Ÿ”— edsu yeah, a key
19:24 ๐Ÿ”— underscor kk, one sec
19:28 ๐Ÿ”— underscor edsu: what is the name of your bucket?
19:29 ๐Ÿ”— edsu kasabi
19:31 ๐Ÿ”— underscor http://ia700804.s3dns.us.archive.org/kasabi
19:31 ๐Ÿ”— underscor The size is being returned in the key list
19:31 ๐Ÿ”— underscor hmmmmm
19:31 ๐Ÿ”— edsu yeah, hmm it is working now
19:32 ๐Ÿ”— edsu edsu--
19:32 ๐Ÿ”— edsu thanks :)
19:34 ๐Ÿ”— edsu sorry for making you jump through hoops there ...
19:34 ๐Ÿ”— underscor np!
19:34 ๐Ÿ”— underscor we're here to help
19:34 ๐Ÿ”— edsu seems to be a small bug in my script i'm writing
19:34 ๐Ÿ”— edsu rock n' roll :)
19:34 ๐Ÿ”— underscor also, if you have an outstanding s3-put in our system, the data will be incorrect
19:35 ๐Ÿ”— underscor we have up to a "12 hour eventual consistency" period because of the way our system works
19:35 ๐Ÿ”— underscor like:
19:35 ๐Ÿ”— underscor if you s3-put foo.txt, it actually launches a bunch of tasks in catalogd, which is the cluster management software
19:35 ๐Ÿ”— underscor s3 will return "finished!" immediately
19:36 ๐Ÿ”— underscor but that file still has to be copied out to the cluster, duplicated, sunk, and all sorts of stuff
19:36 ๐Ÿ”— underscor so if you do a s3-head or s3-get, it won't work
19:36 ๐Ÿ”— underscor even though in theory your put was "successful", there are more steps on our end it has to go through
19:37 ๐Ÿ”— edsu ok, good to know -- maybe that's what i'm seeing
19:37 ๐Ÿ”— edsu will know more in a sec
19:38 ๐Ÿ”— underscor http://www.us.archive.org/catalog.php?history=1&identifier=kasabi you can look at outstanding tasks here
19:38 ๐Ÿ”— edsu oh so it's consistent now eh?
19:38 ๐Ÿ”— underscor yep
19:43 ๐Ÿ”— edsu underscor: check this out https://gist.github.com/3138383
19:44 ๐Ÿ”— edsu and then look at the size of latc-metadata.gz in the listing at http://ia700804.us.archive.org/4/items/kasabi/
19:50 ๐Ÿ”— edsu actually renewable-energy-generators.gz is a more interesting key because the html page says it is 792823 bytes, but boto is saying its size 1
19:51 ๐Ÿ”— edsu i definitely get 792823 bytes when downloading it too
19:52 ๐Ÿ”— edsu maybe i messed things up by deleting keys and then going and uploading again
19:52 ๐Ÿ”— edsu deleted through that flash ui widget thing
20:04 ๐Ÿ”— edsu bbiab
20:07 ๐Ÿ”— SketchCow OK, who codes in bash?
20:07 ๐Ÿ”— SketchCow Is GOOD at coding in bash?
20:13 ๐Ÿ”— virato SketchCow, I know a good amount of bash and I think I code well.
20:13 ๐Ÿ”— virato What's up?
20:14 ๐Ÿ”— SketchCow Are you new? Don't remember you.
20:14 ๐Ÿ”— virato I am indeed.
20:14 ๐Ÿ”— SketchCow What brings you to #archiveteam?
20:14 ๐Ÿ”— Schbirid please take off your shirt for the initiation rites
20:15 ๐Ÿ”— Schbirid (welcome!)
20:15 ๐Ÿ”— virato lol
20:16 ๐Ÿ”— virato SketchCow, my typical archiving/hoarding tendencies, love of programming, your loud personality (https://www.youtube.com/watch?v=-2ZTmuX3cog), and a hatred of Yahoo! for the loss of my old GeoCities site.
20:17 ๐Ÿ”— chronomex we're loud fuckers
20:18 ๐Ÿ”— virato GOOD!
20:18 ๐Ÿ”— virato :D
20:18 ๐Ÿ”— virato I would hope so with someone like SketchCow at the head of it.
20:21 ๐Ÿ”— SketchCow OK, well, great, you're drafted.
20:24 ๐Ÿ”— SketchCow http://fos.textfiles.com/ia-injector
20:24 ๐Ÿ”— SketchCow OK, grab that.
20:26 ๐Ÿ”— SketchCow This is my sketched, sort-of working, beginning of an Internet Archive installer.
20:26 ๐Ÿ”— SketchCow It lets you inject an item into archive.org.
20:28 ๐Ÿ”— virato Alright, grabbed and reading over the code.
20:31 ๐Ÿ”— virato SketchCow, alright, read over.
20:32 ๐Ÿ”— SketchCow Right now, it's very oriented towards me.
20:33 ๐Ÿ”— SketchCow I'd like it that if a person put in some sort of config section, they could get rid of the repetitive things.
20:33 ๐Ÿ”— SketchCow Give me a moment, I'll give you another one.
20:34 ๐Ÿ”— DFJustin I vote for a "did you fuck up (y/n)" before creating a permanent thing
20:34 ๐Ÿ”— SketchCow A little more like it
20:35 ๐Ÿ”— SketchCow http://fos.textfiles.com/ia-ingestor
20:35 ๐Ÿ”— SketchCow My secret weapon
20:39 ๐Ÿ”— virato Alright, what is ingestor supposed to do?
20:39 ๐Ÿ”— chronomex om nom nom
20:39 ๐Ÿ”— SketchCow Ingestor is a more directed version of injector
20:39 ๐Ÿ”— SketchCow It basically lets you take a bunch of like-structured filenames and make them into individual items.
20:39 ๐Ÿ”— SketchCow So if you have something like:
20:40 ๐Ÿ”— SketchCow schoolgirl_porn_monthly_1989_04.pdf
20:40 ๐Ÿ”— SketchCow schoolgirl_porn_monthly_1989_03.pdf
20:40 ๐Ÿ”— SketchCow schoolgirl_porn_monthly_1989_01.pdf
20:40 ๐Ÿ”— SketchCow You can say "Call them Schoolgirl Porn Monthly, use the _ as a separator between columns, use the 4th column for the year, the 5th for the month."
20:40 ๐Ÿ”— DFJustin aces
20:40 ๐Ÿ”— SketchCow And you can then jam it right down the line, calling injector against every item.
20:41 ๐Ÿ”— SketchCow This is how I add 400 issues at once.
20:41 ๐Ÿ”— SketchCow I've been using this for a year.
20:41 ๐Ÿ”— SketchCow And I was going to pretty these up for further consumption, but fuck it. Someone shove it in github and everyone make it better.
20:41 ๐Ÿ”— SketchCow because I'm busy
20:41 ๐Ÿ”— virato The Power of Open Source
20:42 ๐Ÿ”— omf_ I am going to make that better. Calling sed more than twice in a command means you need a bigger tool
20:42 ๐Ÿ”— SketchCow That's an interesting maxium you just made up.
20:42 ๐Ÿ”— omf_ does IA have a test area
20:42 ๐Ÿ”— SketchCow You can declare something a part of collection test and it'll be mouth-loved within a short time
20:44 ๐Ÿ”— omf_ cool
20:44 ๐Ÿ”— SketchCow Before you make radical changes, I can explain almost everything I did in there.
20:44 ๐Ÿ”— omf_ do you need any specific file sizes or just the script working
20:45 ๐Ÿ”— SketchCow The script works now.
20:45 ๐Ÿ”— omf_ I can understand what you did. You beat up some filenames into normalization and then upload them via curl
20:45 ๐Ÿ”— SketchCow For this particular item, it uses the current filenames as hints to do the uploads into the s3 interface.
20:46 ๐Ÿ”— SketchCow It also understands Jan=01, etc.
20:47 ๐Ÿ”— virato SketchCow, so you just need it cleaned up/less ugly?
20:48 ๐Ÿ”— omf_ config file loading
20:48 ๐Ÿ”— SketchCow I am worried that right now it has very poor error checking.
20:48 ๐Ÿ”— omf_ you want logging
20:48 ๐Ÿ”— SketchCow I am fine with the config file being on the front of injector
20:48 ๐Ÿ”— SketchCow ingestor, I think, does the vast majority of what I want.
20:48 ๐Ÿ”— SketchCow Note how TEST= is at the front
20:48 ๐Ÿ”— SketchCow So it comes, default, in test mode, so you can see what it THINKS it'll try and do.
20:50 ๐Ÿ”— SketchCow Example, I have creepy magazine:
20:50 ๐Ÿ”— SketchCow root@teamarchive-1:/2/MAGAZINES/WARREN/Creepy (Warren)# ls
20:50 ๐Ÿ”— SketchCow Creepy 001.cbr Creepy 031.cbr Creepy 061.cbr Creepy 091.cbr Creepy 121.cbz
20:50 ๐Ÿ”— SketchCow Creepy 002.cbr Creepy 032 (1970).cbz Creepy 062.cbr Creepy 092.cbr Creepy 122.cbr
20:50 ๐Ÿ”— omf_ do you run the filenames over anything before feeding it to ingestor
20:50 ๐Ÿ”— SketchCow Creepy 003.cbr Creepy 033.cbr Creepy 063.cbr Creepy 093.cbr Creepy 123.cbr
20:50 ๐Ÿ”— SketchCow Creepy 004.cbr Creepy 034.cbz Creepy 064.cbr Creepy 094.cbr Creepy 124.cbr
20:50 ๐Ÿ”— SketchCow Creepy 005.cbz Creepy 035.cbr Creepy 065.cbz Creepy 095.cbr Creepy 125.cbr
20:50 ๐Ÿ”— SketchCow See, they're all there with the proper number
20:50 ๐Ÿ”— SketchCow I have scripts to rename them to help normalize, yes.
20:50 ๐Ÿ”— SketchCow Ingestor shouldn't try and normalize.
20:53 ๐Ÿ”— omf_ what kind of errors do you want checked
20:53 ๐Ÿ”— SketchCow I think it would be nice to generate a completed log, yes.
20:53 ๐Ÿ”— SketchCow It would be good for it to see if the file exists, if it completed the upload
20:54 ๐Ÿ”— omf_ do you also want to make source the file is not zero bytes
20:58 ๐Ÿ”— SketchCow yes, that would be nice.
20:59 ๐Ÿ”— omf_ both injector and ingestor
20:59 ๐Ÿ”— omf_ injector looks like it could use some input checking on the user input
21:05 ๐Ÿ”— SketchCow injector should be sleeping with injestor.
21:37 ๐Ÿ”— ersi that'd produce injector^2
23:17 ๐Ÿ”— balrog http://194.71.107.80/torrent/7380688/BYTE_magazine_full-res_scans_PDF_JC1.0_20120622 รขย€ย”ร‚ย should be added to IA
23:42 ๐Ÿ”— dashcloud so I've put up the steps & commands I used for 3.5'' floppy imaging (MS-DOS & Windows floppies only) on the discussion page for Rescuing Floppy Disks
23:42 ๐Ÿ”— dashcloud comments would be appreciated
23:48 ๐Ÿ”— balrog and I suggest using ddrescue in all cases because of its status display, and reading each disk twice

irclogger-viewer