Time |
Nickname |
Message |
00:25
๐
|
SketchCow |
ha. |
00:25
๐
|
SketchCow |
You know what that says? |
00:25
๐
|
SketchCow |
"Turn Yahoo! into Facebook's culture." |
00:25
๐
|
SketchCow |
Period. |
01:07
๐
|
underscor |
s w |
01:07
๐
|
underscor |
oops |
01:26
๐
|
underscor |
I know what I'll be doing in vegas on my free day |
01:26
๐
|
underscor |
http://www.youtube.com/watch?NR=1&feature=fvwrel&v=kS6B1jwLvjc |
01:29
๐
|
chronomex |
yeah, right |
04:41
๐
|
underscor |
chronomex: no, really |
04:41
๐
|
underscor |
I'm planning on it |
04:41
๐
|
chronomex |
k |
05:36
๐
|
underscor |
maybe I'll get the video package |
05:36
๐
|
underscor |
so you guys can laugh at me freefalling |
05:37
๐
|
chronomex |
:D |
08:46
๐
|
ersi |
underscor: fuck yea! |
13:45
๐
|
edsu |
underscor: you have time for a quick boto question? |
13:46
๐
|
edsu |
underscor: i'm able to get a bucket, and get a key file ok |
13:46
๐
|
edsu |
underscor: but when i call k.size on the key file it always seems to say 1 |
13:47
๐
|
edsu |
underscor: when I can see from the HTML presentation of the bucket that there is more data than that |
13:47
๐
|
edsu |
underscor: is that not implemented in the ia s3 api? |
14:37
๐
|
Schbirid |
quick psa on fileplanet: we have (slowish) read ftp access now. mirroring is in progress. life is good. |
15:02
๐
|
SmileyG |
which bit of fileplanet Schbirid ? |
15:02
๐
|
Schbirid |
all of them |
15:13
๐
|
SketchCow |
When's fileplanet dying again? |
15:14
๐
|
Schbirid |
not announced, they will just make it more static first |
15:32
๐
|
SmileyG |
:D nice. |
19:11
๐
|
underscor |
edsu: yeah, we don't provide correct size info that way |
19:11
๐
|
underscor |
unfortunately |
19:13
๐
|
edsu |
underscor: ok, thnx |
19:13
๐
|
edsu |
i guess i can do a head request and get it that way |
19:14
๐
|
underscor |
nope, that's what k.getsize does |
19:15
๐
|
underscor |
I don't know how you get it from our system, actually |
19:15
๐
|
underscor |
hmm |
19:22
๐
|
underscor |
edsu: ok, just talked to the s3 guy |
19:23
๐
|
underscor |
edsu: you can either getkeys and manually sum the sizes |
19:24
๐
|
edsu |
i was calling k.size |
19:24
๐
|
edsu |
underscor: where k was a key |
19:24
๐
|
edsu |
underscor: it returned 1 |
19:24
๐
|
underscor |
oh, a key and not a bucket? |
19:24
๐
|
edsu |
can gist it if you want |
19:24
๐
|
edsu |
yeah, a key |
19:24
๐
|
underscor |
kk, one sec |
19:28
๐
|
underscor |
edsu: what is the name of your bucket? |
19:29
๐
|
edsu |
kasabi |
19:31
๐
|
underscor |
http://ia700804.s3dns.us.archive.org/kasabi |
19:31
๐
|
underscor |
The size is being returned in the key list |
19:31
๐
|
underscor |
hmmmmm |
19:31
๐
|
edsu |
yeah, hmm it is working now |
19:32
๐
|
edsu |
edsu-- |
19:32
๐
|
edsu |
thanks :) |
19:34
๐
|
edsu |
sorry for making you jump through hoops there ... |
19:34
๐
|
underscor |
np! |
19:34
๐
|
underscor |
we're here to help |
19:34
๐
|
edsu |
seems to be a small bug in my script i'm writing |
19:34
๐
|
edsu |
rock n' roll :) |
19:34
๐
|
underscor |
also, if you have an outstanding s3-put in our system, the data will be incorrect |
19:35
๐
|
underscor |
we have up to a "12 hour eventual consistency" period because of the way our system works |
19:35
๐
|
underscor |
like: |
19:35
๐
|
underscor |
if you s3-put foo.txt, it actually launches a bunch of tasks in catalogd, which is the cluster management software |
19:35
๐
|
underscor |
s3 will return "finished!" immediately |
19:36
๐
|
underscor |
but that file still has to be copied out to the cluster, duplicated, sunk, and all sorts of stuff |
19:36
๐
|
underscor |
so if you do a s3-head or s3-get, it won't work |
19:36
๐
|
underscor |
even though in theory your put was "successful", there are more steps on our end it has to go through |
19:37
๐
|
edsu |
ok, good to know -- maybe that's what i'm seeing |
19:37
๐
|
edsu |
will know more in a sec |
19:38
๐
|
underscor |
http://www.us.archive.org/catalog.php?history=1&identifier=kasabi you can look at outstanding tasks here |
19:38
๐
|
edsu |
oh so it's consistent now eh? |
19:38
๐
|
underscor |
yep |
19:43
๐
|
edsu |
underscor: check this out https://gist.github.com/3138383 |
19:44
๐
|
edsu |
and then look at the size of latc-metadata.gz in the listing at http://ia700804.us.archive.org/4/items/kasabi/ |
19:50
๐
|
edsu |
actually renewable-energy-generators.gz is a more interesting key because the html page says it is 792823 bytes, but boto is saying its size 1 |
19:51
๐
|
edsu |
i definitely get 792823 bytes when downloading it too |
19:52
๐
|
edsu |
maybe i messed things up by deleting keys and then going and uploading again |
19:52
๐
|
edsu |
deleted through that flash ui widget thing |
20:04
๐
|
edsu |
bbiab |
20:07
๐
|
SketchCow |
OK, who codes in bash? |
20:07
๐
|
SketchCow |
Is GOOD at coding in bash? |
20:13
๐
|
virato |
SketchCow, I know a good amount of bash and I think I code well. |
20:13
๐
|
virato |
What's up? |
20:14
๐
|
SketchCow |
Are you new? Don't remember you. |
20:14
๐
|
virato |
I am indeed. |
20:14
๐
|
SketchCow |
What brings you to #archiveteam? |
20:14
๐
|
Schbirid |
please take off your shirt for the initiation rites |
20:15
๐
|
Schbirid |
(welcome!) |
20:15
๐
|
virato |
lol |
20:16
๐
|
virato |
SketchCow, my typical archiving/hoarding tendencies, love of programming, your loud personality (https://www.youtube.com/watch?v=-2ZTmuX3cog), and a hatred of Yahoo! for the loss of my old GeoCities site. |
20:17
๐
|
chronomex |
we're loud fuckers |
20:18
๐
|
virato |
GOOD! |
20:18
๐
|
virato |
:D |
20:18
๐
|
virato |
I would hope so with someone like SketchCow at the head of it. |
20:21
๐
|
SketchCow |
OK, well, great, you're drafted. |
20:24
๐
|
SketchCow |
http://fos.textfiles.com/ia-injector |
20:24
๐
|
SketchCow |
OK, grab that. |
20:26
๐
|
SketchCow |
This is my sketched, sort-of working, beginning of an Internet Archive installer. |
20:26
๐
|
SketchCow |
It lets you inject an item into archive.org. |
20:28
๐
|
virato |
Alright, grabbed and reading over the code. |
20:31
๐
|
virato |
SketchCow, alright, read over. |
20:32
๐
|
SketchCow |
Right now, it's very oriented towards me. |
20:33
๐
|
SketchCow |
I'd like it that if a person put in some sort of config section, they could get rid of the repetitive things. |
20:33
๐
|
SketchCow |
Give me a moment, I'll give you another one. |
20:34
๐
|
DFJustin |
I vote for a "did you fuck up (y/n)" before creating a permanent thing |
20:34
๐
|
SketchCow |
A little more like it |
20:35
๐
|
SketchCow |
http://fos.textfiles.com/ia-ingestor |
20:35
๐
|
SketchCow |
My secret weapon |
20:39
๐
|
virato |
Alright, what is ingestor supposed to do? |
20:39
๐
|
chronomex |
om nom nom |
20:39
๐
|
SketchCow |
Ingestor is a more directed version of injector |
20:39
๐
|
SketchCow |
It basically lets you take a bunch of like-structured filenames and make them into individual items. |
20:39
๐
|
SketchCow |
So if you have something like: |
20:40
๐
|
SketchCow |
schoolgirl_porn_monthly_1989_04.pdf |
20:40
๐
|
SketchCow |
schoolgirl_porn_monthly_1989_03.pdf |
20:40
๐
|
SketchCow |
schoolgirl_porn_monthly_1989_01.pdf |
20:40
๐
|
SketchCow |
You can say "Call them Schoolgirl Porn Monthly, use the _ as a separator between columns, use the 4th column for the year, the 5th for the month." |
20:40
๐
|
DFJustin |
aces |
20:40
๐
|
SketchCow |
And you can then jam it right down the line, calling injector against every item. |
20:41
๐
|
SketchCow |
This is how I add 400 issues at once. |
20:41
๐
|
SketchCow |
I've been using this for a year. |
20:41
๐
|
SketchCow |
And I was going to pretty these up for further consumption, but fuck it. Someone shove it in github and everyone make it better. |
20:41
๐
|
SketchCow |
because I'm busy |
20:41
๐
|
virato |
The Power of Open Source |
20:42
๐
|
omf_ |
I am going to make that better. Calling sed more than twice in a command means you need a bigger tool |
20:42
๐
|
SketchCow |
That's an interesting maxium you just made up. |
20:42
๐
|
omf_ |
does IA have a test area |
20:42
๐
|
SketchCow |
You can declare something a part of collection test and it'll be mouth-loved within a short time |
20:44
๐
|
omf_ |
cool |
20:44
๐
|
SketchCow |
Before you make radical changes, I can explain almost everything I did in there. |
20:44
๐
|
omf_ |
do you need any specific file sizes or just the script working |
20:45
๐
|
SketchCow |
The script works now. |
20:45
๐
|
omf_ |
I can understand what you did. You beat up some filenames into normalization and then upload them via curl |
20:45
๐
|
SketchCow |
For this particular item, it uses the current filenames as hints to do the uploads into the s3 interface. |
20:46
๐
|
SketchCow |
It also understands Jan=01, etc. |
20:47
๐
|
virato |
SketchCow, so you just need it cleaned up/less ugly? |
20:48
๐
|
omf_ |
config file loading |
20:48
๐
|
SketchCow |
I am worried that right now it has very poor error checking. |
20:48
๐
|
omf_ |
you want logging |
20:48
๐
|
SketchCow |
I am fine with the config file being on the front of injector |
20:48
๐
|
SketchCow |
ingestor, I think, does the vast majority of what I want. |
20:48
๐
|
SketchCow |
Note how TEST= is at the front |
20:48
๐
|
SketchCow |
So it comes, default, in test mode, so you can see what it THINKS it'll try and do. |
20:50
๐
|
SketchCow |
Example, I have creepy magazine: |
20:50
๐
|
SketchCow |
root@teamarchive-1:/2/MAGAZINES/WARREN/Creepy (Warren)# ls |
20:50
๐
|
SketchCow |
Creepy 001.cbr Creepy 031.cbr Creepy 061.cbr Creepy 091.cbr Creepy 121.cbz |
20:50
๐
|
SketchCow |
Creepy 002.cbr Creepy 032 (1970).cbz Creepy 062.cbr Creepy 092.cbr Creepy 122.cbr |
20:50
๐
|
omf_ |
do you run the filenames over anything before feeding it to ingestor |
20:50
๐
|
SketchCow |
Creepy 003.cbr Creepy 033.cbr Creepy 063.cbr Creepy 093.cbr Creepy 123.cbr |
20:50
๐
|
SketchCow |
Creepy 004.cbr Creepy 034.cbz Creepy 064.cbr Creepy 094.cbr Creepy 124.cbr |
20:50
๐
|
SketchCow |
Creepy 005.cbz Creepy 035.cbr Creepy 065.cbz Creepy 095.cbr Creepy 125.cbr |
20:50
๐
|
SketchCow |
See, they're all there with the proper number |
20:50
๐
|
SketchCow |
I have scripts to rename them to help normalize, yes. |
20:50
๐
|
SketchCow |
Ingestor shouldn't try and normalize. |
20:53
๐
|
omf_ |
what kind of errors do you want checked |
20:53
๐
|
SketchCow |
I think it would be nice to generate a completed log, yes. |
20:53
๐
|
SketchCow |
It would be good for it to see if the file exists, if it completed the upload |
20:54
๐
|
omf_ |
do you also want to make source the file is not zero bytes |
20:58
๐
|
SketchCow |
yes, that would be nice. |
20:59
๐
|
omf_ |
both injector and ingestor |
20:59
๐
|
omf_ |
injector looks like it could use some input checking on the user input |
21:05
๐
|
SketchCow |
injector should be sleeping with injestor. |
21:37
๐
|
ersi |
that'd produce injector^2 |
23:17
๐
|
balrog |
http://194.71.107.80/torrent/7380688/BYTE_magazine_full-res_scans_PDF_JC1.0_20120622 รขยยรย should be added to IA |
23:42
๐
|
dashcloud |
so I've put up the steps & commands I used for 3.5'' floppy imaging (MS-DOS & Windows floppies only) on the discussion page for Rescuing Floppy Disks |
23:42
๐
|
dashcloud |
comments would be appreciated |
23:48
๐
|
balrog |
and I suggest using ddrescue in all cases because of its status display, and reading each disk twice |