Time |
Nickname |
Message |
00:57
🔗
|
dashcloud |
SketchCow: I'm sure someone else has mentioned this to you, but the big area assembly is still huge in is video encoding |
01:04
🔗
|
SketchCo1 |
21:05 < dashcloud> SketchCow: I'm sure someone else has mentioned this to you, but the big area assembly is still huge in is video encoding |
01:04
🔗
|
SketchCo1 |
What? |
01:04
🔗
|
dashcloud |
ffmpeg/libav and x264 utilize assembly heavily |
01:06
🔗
|
dashcloud |
if there's anyone on the cutting edge of assembly & processors, that would be those folks |
01:07
🔗
|
SketchCow |
oh. |
01:40
🔗
|
underscor |
For anyone who missed Jason's QA, here's an ad-free version ripped from ustream |
01:40
🔗
|
underscor |
http://tracker.archive.org/jscott_kickstarter_qa.flv |
01:40
🔗
|
primus104 |
awesome, thanks |
01:40
🔗
|
perfinion |
archiving the archiver. good job :D |
01:42
🔗
|
SketchCow |
Yeah, that add bullshit is, in fact, bullshit |
01:42
🔗
|
SketchCow |
ad |
01:58
🔗
|
Wyatts |
Is there an adaptor or something for the smaller Betacam tapes? |
01:59
🔗
|
SketchCow |
The machine just takes them. |
02:00
🔗
|
Wyatts |
Oh, well that's spiffy |
02:02
🔗
|
SketchCow |
The big issue is I have a few freakjob Digital Betacams, and nothing to play them on. |
02:02
🔗
|
SketchCow |
Not even a big issue yet, I have tons of tapes to go. |
02:05
🔗
|
Wyatts |
Ahh, that's right! What proportion were Beta formats again? |
03:07
🔗
|
underscor |
alard: chronomex Coderjoe Spread some op goodness? |
03:07
🔗
|
dnova |
yeah :D |
03:08
🔗
|
underscor |
vmbrasseu: Hey, I know you! :D |
03:08
🔗
|
vmbrasseu |
O RLY? |
03:08
🔗
|
underscor |
:D |
03:08
🔗
|
underscor |
(Alex, from IA) |
03:08
🔗
|
chronomex |
underscor: you're alex?!? |
03:08
🔗
|
vmbrasseu |
Oh, hey! |
03:08
🔗
|
underscor |
chronomex: Yes? |
03:08
🔗
|
vmbrasseu |
*hugs Alex* |
03:08
🔗
|
underscor |
(Was that sarcasm, chronomex?) |
03:08
🔗
|
chronomex |
underscor: put some pressure on those guys to document scandata.xml, I'm tired of not being able to number pages properly |
03:09
🔗
|
* |
chronomex shrug |
03:09
🔗
|
chronomex |
dunno |
03:09
🔗
|
underscor |
chronomex: I'll go yell at people now |
03:09
🔗
|
chronomex |
thanks |
03:09
🔗
|
vmbrasseu |
chronomex: I am one of those "guys" |
03:09
🔗
|
underscor |
Yeah, vm's from the archive too |
03:10
🔗
|
vmbrasseu |
I'll add it to the queue but please don't hold your breath. There's rather a backlog of documentation (read: NONE). |
03:10
🔗
|
underscor |
I didn't want to say anything, in case she wanted to go 'incognito' |
03:10
🔗
|
chronomex |
mhm |
03:10
🔗
|
vmbrasseu |
Meh. I am who I am. One quick web search will out me as someone at IA. ;-) |
03:10
🔗
|
chronomex |
underscor: well, is there internal documentation that exists, or code to read it? I'd take -anything- |
03:11
🔗
|
underscor |
Possibly |
03:11
🔗
|
chronomex |
I |
03:11
🔗
|
vmbrasseu |
Not as such. |
03:11
🔗
|
chronomex |
I've got dozens of things with page numbers like G4AD |
03:11
🔗
|
chronomex |
(technical drawings) |
03:12
🔗
|
underscor |
chronomex: fyi |
03:12
🔗
|
underscor |
[11:11:39 PM] rajamaphone: we will automatically create scandata for you if you upload a pdf |
03:12
🔗
|
underscor |
But I suppose that doesn't help :P |
03:14
🔗
|
chronomex |
right, I don't have a way to number pdfs either |
03:15
🔗
|
chronomex |
also I'm scanning to uncompressed tiffs and uploading those; the software I have doesn't do lossless pdf |
03:16
🔗
|
underscor |
oic |
03:16
🔗
|
chronomex |
but regardless, I don't have pdf numbering capabilities |
03:17
🔗
|
underscor |
That blog post I linked may be of use, idk |
03:17
🔗
|
chronomex |
doesn't look like much in that direction |
03:18
🔗
|
* |
chronomex shrug |
03:35
🔗
|
underscor |
alard: How do you plan to get around SOP? |
03:38
🔗
|
underscor |
Oh, I see how you inject it |
03:42
🔗
|
underscor |
Man, this is *really* well done |
03:48
🔗
|
chronomex |
SOP? |
03:58
🔗
|
underscor |
Same origin policy |
03:59
🔗
|
chronomex |
oh |
04:17
🔗
|
chronomex |
oh dear. |
04:17
🔗
|
chronomex |
vmbrasseu: are you still here? I've discovered an unpleasant bug in the S3 infrastructure. |
04:17
🔗
|
* |
vmbrasseu gasps. |
04:17
🔗
|
vmbrasseu |
Lay it on me. |
04:17
🔗
|
vmbrasseu |
But no promises. |
04:18
🔗
|
chronomex |
I uploaded a file using a PUT to http://s3.us.archive.org/CD-1A210-01/CD-1A210-01/bellsystem_CD-1A210-01_images.zip |
04:18
🔗
|
chronomex |
note the extra slash, it's an error in my script |
04:18
🔗
|
chronomex |
that last / got turned into %2F |
04:18
🔗
|
chronomex |
which prevents derive from running to completion; I also cannot delete it with S3 interface (500 error) nor with the web interface |
04:19
🔗
|
chronomex |
actually not quite |
04:19
🔗
|
chronomex |
I actually uploaded it to http://s3.us.archive.org/bellsystem_CD-1A210-01/CD-1A210-01%2Fbellsystem_CD-1A210-01_images.zip |
04:20
🔗
|
vmbrasseu |
Are you trying to delete the file or the item? |
04:20
🔗
|
chronomex |
er, s,%2F,/, |
04:20
🔗
|
chronomex |
just the file |
04:20
🔗
|
chronomex |
I was able to delete it with that url |
04:20
🔗
|
chronomex |
I got the item id wrong when I was trying to fix it right now |
04:20
🔗
|
chronomex |
but the undeletable-from-web-interface thing sounds like a bug |
04:20
🔗
|
vmbrasseu |
Well, deleting in general is a bit of a delicate issue at IA. |
04:21
🔗
|
chronomex |
understood |
04:21
🔗
|
chronomex |
the % prevents derive from working properly too |
04:21
🔗
|
vmbrasseu |
But the encoding seems bug-like. |
04:21
🔗
|
chronomex |
if I'm not mistaken |
04:21
🔗
|
vmbrasseu |
Deriving is special voodoo. I'm still working on getting the full lowdown on that one so I can't answer whether the % will bork it here. |
04:22
🔗
|
chronomex |
aye |
04:22
🔗
|
chronomex |
tossing that in, I hope it'll get handled properly :) |
04:22
🔗
|
chronomex |
it seems to parse the url into /{item}/{filename}, then encodes filename to be unix-safe |
04:22
🔗
|
vmbrasseu |
As soon as I can get someone to define "handled properly" I assure you it'll enter the correct channels. ;-) |
04:23
🔗
|
chronomex |
hehe okay |
04:23
🔗
|
* |
chronomex goes to undo the havoc he's wreaked so far today |
04:23
🔗
|
vmbrasseu |
Yes, that seems like a correct assumption (encoding filename). I'd have to do some code spelunking to confirm. |
04:24
🔗
|
DFJustin |
do one of you archive.org guys know how to tell the system that you've uploaded a two-page-per-image pdf so the online reader doesn't look retarded http://www.archive.org/stream/DieKoptischenZaubertexteDerSammlungPapyrusErzherzogRainerInWien/stegemann_koptischen_zaubertexte#page/n1/mode/2up |
04:24
🔗
|
vmbrasseu |
As far as I can tell SO FAR there is no way to declare such a thing. |
04:25
🔗
|
vmbrasseu |
However that would likely be rolled up in the aforementioned deriving voodoo. |
04:25
🔗
|
vmbrasseu |
Wait... |
04:25
🔗
|
vmbrasseu |
You're uploading papyri? |
04:25
🔗
|
DFJustin |
I guess |
04:25
🔗
|
vmbrasseu |
Ah, texts about papryi. |
04:26
🔗
|
vmbrasseu |
Still |
04:26
🔗
|
vmbrasseu |
This is relevant to my interests! |
04:26
🔗
|
chronomex |
! |
04:26
🔗
|
chronomex |
what are you interested in ? |
04:27
🔗
|
vmbrasseu |
I have a degree in Classical Philology (Latin but mostly Greek) and was headed to grad school for papyrology when The Big Job Offer came through from California. |
04:27
🔗
|
DFJustin |
heh I guess archiving attracts papyrology geeks |
04:27
🔗
|
chronomex |
neato |
04:28
🔗
|
vmbrasseu |
DFJustin: you just got my attention. I'll poke the appropriate personage(s) to see whether there's an answer to your question. |
04:28
🔗
|
DFJustin |
I'm a computer programmer but I have an amateur interest in philology |
04:28
🔗
|
DFJustin |
the pdf is from the oriental institute site, they have various stuff that I was going to try to feed in |
04:28
🔗
|
vmbrasseu |
Computer programming is so much easier than Ancient Greek. |
04:28
🔗
|
SketchCow |
http://www.archive.org/search.php?query=collection%3Aenter-magazine&sort=-publicdate |
04:28
🔗
|
SketchCow |
awwww yeah |
04:29
🔗
|
BlueMax |
lol |
04:30
🔗
|
DFJustin |
I can crop the pdf manually using briss but it would be nice not to alter it |
04:31
🔗
|
vmbrasseu |
DFJustin: I've sent your question on to likely suspects. |
04:32
🔗
|
DFJustin |
thx |
04:32
🔗
|
vmbrasseu |
Glad to oblige. Stay tuned (probably in a few days). |
04:44
🔗
|
DFJustin |
I need to get back to greek, it was going so well until the aorists :( |
04:44
🔗
|
vmbrasseu |
There's method to that madness. |
04:45
🔗
|
vmbrasseu |
Headed offline here, so we can discuss it off channel sometime. |
04:45
🔗
|
SketchCow |
I'm up to 4tb of Friendster uploaded. |
04:48
🔗
|
Coderjoe |
you madman |
04:48
🔗
|
chronomex |
SketchCow: this is an odd name. http://www.archive.org/details/FRIENDSTER-FRIENDSTER-014200000 |
04:50
🔗
|
SketchCow |
Yes. |
04:51
🔗
|
SketchCow |
That was me dealing with a big |
04:51
🔗
|
SketchCow |
bug |
04:52
🔗
|
SketchCow |
In the code |
04:53
🔗
|
SketchCow |
And the thing is, until it finishes the deriving and the rest, I can't rename the item. |
04:53
🔗
|
* |
chronomex nods |
04:53
🔗
|
SketchCow |
And when you'e deriving/dealing with that many gigs, it takes a while. |
04:53
🔗
|
chronomex |
but you can rename items, that's good. I've got a misnamed item too |
04:53
🔗
|
chronomex |
uploader bugs-- |
04:54
🔗
|
SketchCow |
I can. |
04:54
🔗
|
SketchCow |
I am using a script that does the uploading, called FRIENDSMASH |
04:54
🔗
|
SketchCow |
And I didn't have error checking |
04:55
🔗
|
SketchCow |
Then stepped away and phrased the argument wrong |
04:55
🔗
|
chronomex |
FRIENDSMASH |
04:55
🔗
|
chronomex |
I like it |
04:55
🔗
|
chronomex |
mine are rather more buttoned down |
04:55
🔗
|
chronomex |
but then ... this is The Phone Company |
04:56
🔗
|
SketchCow |
Next is the Yahoo Video stuff. |
04:57
🔗
|
SketchCow |
In both these cases, I'd like to write scripts that will suck down the final items, analyze them, and upload info files. |
04:57
🔗
|
SketchCow |
You saw what I do with CD-ROM images, right. |
04:58
🔗
|
Coderjoe |
whee... only 15 hours left on this file |
04:58
🔗
|
SketchCow |
What file are you uploading |
04:58
🔗
|
Coderjoe |
friendster.002800001-002900000.tar.xz |
04:58
🔗
|
SketchCow |
Uh oh |
04:59
🔗
|
SketchCow |
I'm sorry, stop and reupload. |
04:59
🔗
|
Coderjoe |
uh... |
04:59
🔗
|
Coderjoe |
okay? |
04:59
🔗
|
SketchCow |
I was sure you were done. |
05:00
🔗
|
SketchCow |
Sorry. |
05:00
🔗
|
Coderjoe |
we'll see how it goes... I did use --partial, so it might have kept the dotfile it uploads to |
05:01
🔗
|
Coderjoe |
(and renamed it) |
05:01
🔗
|
Coderjoe |
still waiting for it to tell me anything |
05:01
🔗
|
SketchCow |
Sorry for this. Let's compare the files you have and lengths before you delete them, when you're done |
05:02
🔗
|
SketchCow |
I'm getting a lot of pressure to get this data into the system and make room for more stuff. |
05:02
🔗
|
SketchCow |
The Rsync.net guys want their machine back, etc. |
05:02
🔗
|
Coderjoe |
i suspect it is doing a checksum check on the 95% of the file up there |
05:05
🔗
|
Coderjoe |
stupid massively-asymmetric internet connections |
05:13
🔗
|
db48x |
oh, good |
05:13
🔗
|
db48x |
IO errors on my /dev/sda |
05:18
🔗
|
Coderjoe |
looks like --partial saved it |
05:19
🔗
|
Coderjoe |
it's currently listing a speed of 58MB/s, which is in no way going over my internet connection |
05:20
🔗
|
chronomex |
yeah --partial is awesome |
05:21
🔗
|
Coderjoe |
chronomex: well, in this case, a combination of --partial and the fact that rsync writes to a dotfile |
05:22
🔗
|
chronomex |
rsync only writes to a dotfile if you don't say --partial |
05:22
🔗
|
Coderjoe |
no, it still writes to a dotfile, but then moves the partially-completed dotfile to the final name |
05:23
🔗
|
Coderjoe |
(it uses the non-dotfile as the source for blocks that match the remote file) |
05:24
🔗
|
db48x |
hmm |
05:24
🔗
|
db48x |
rebooting seems to have "fixed" it |
06:08
🔗
|
SketchCow |
Rebooting fixes everything |
06:28
🔗
|
vmbrasseu |
DFJustin: headed to bed but an answer came in to your question and wanted to get it to you ASAP: |
06:28
🔗
|
vmbrasseu |
"Yes, in fact. We added a meta.xml element specifically to deal with that. If they give their item a "bookreader-defaults" value of "mode/1up", BookReader will start up in 1-page mode instead of the usual 2-page mode. See, for instance, item CLARION_CALL_1961-1962_v33 and its Read Online link." |
06:28
🔗
|
vmbrasseu |
Give that a go. |
06:30
🔗
|
vmbrasseu |
Bonne chance et bonne nuit. |
07:40
🔗
|
Wyatt |
Jason, after tonight, I appreciate your push for metadata curation more than ever. |
07:40
🔗
|
perfinion |
what happened tonight? |
07:41
🔗
|
Wyatt |
Oh, I was explaining some of the issues with crowdsourcing tags for music. And to drive my point home, I went to last.fm. |
07:41
🔗
|
Wyatt |
And even I wasn't fully prepared for that mess. :/ |
07:41
🔗
|
perfinion |
yeeah |
07:41
🔗
|
perfinion |
crowd sourcing is a nice idea |
07:42
🔗
|
perfinion |
but it needs stricter implementations |
07:42
🔗
|
Wyatt |
But it requires a guiding hand |
07:42
🔗
|
perfinion |
yeah |
07:42
🔗
|
perfinion |
i suppose just giving some ppl mod rights would be enough |
07:43
🔗
|
Wyatt |
Well part of the issue is last.fm is really just inadequate for this task in its current form. |
07:43
🔗
|
perfinion |
i never really got the point of lastfm |
07:43
🔗
|
Wyatt |
Tags on last.fm are...third-class citizens? |
07:43
🔗
|
perfinion |
why would i want to advertise exactly what songs im listening to? |
07:44
🔗
|
Wyatt |
At its heart, it's something like a social network for music listeners. |
07:44
🔗
|
perfinion |
i guess i dont really use facebook much either, so im the wrong person to figure it out :P |
07:44
🔗
|
Wyatt |
And it makes recommendations and allows you to listen with people and such. I use it primarily to see data about what I listened to and when and how often and such. |
07:45
🔗
|
perfinion |
my music player on my laptop queries it for recommendations |
07:46
🔗
|
perfinion |
but i dont see why i'd want to scrobble my songs |
07:46
🔗
|
perfinion |
although i suppose enough ppl hae to do it otherwise it wont have data for recommendations |
07:46
🔗
|
Wyatt |
Pretty much. It's hueristic based on community similarity rather than actual music traits (Music Genome Project) |
07:48
🔗
|
Wyatt |
It's interesting to me as a case study, and there are valuable lessons to learn from it, but it could use a makeover. |
07:48
🔗
|
perfinion |
indeed |
07:48
🔗
|
Wyatt |
(Though hopefully not like Friendster" |
07:48
🔗
|
perfinion |
hahaha |
07:49
🔗
|
Wyatt |
Funny until it comes true. That'd be one to keep an eye on, come to think of it. :/ |
07:51
🔗
|
ersi |
What's there to grab at last.fm by the way? Every users individual scrobbles? |
07:51
🔗
|
ersi |
usernames? artists / song names? |
07:52
🔗
|
Wyatt |
It also has user groups with forum functionality, wiki pages per-artist and _per-song_...and I think there's some other stuff. |
07:53
🔗
|
Wyatt |
It started as a radio station/forum hybrid bolted to a CS project as I recall. And I think it never really knew what to grow up into so it became a Web 2.0 chimera. |
07:53
🔗
|
ersi |
oh yeah |
07:54
🔗
|
ersi |
Yeah, definitely |
07:55
🔗
|
Wyatt |
Actually, now that I look at the history of last year, it might be one to watch. Owned by viacom and making moves that upset users? Sounds like an unfavourable recipe. |
07:55
🔗
|
ersi |
indeed |
07:55
🔗
|
ersi |
there's a few scripts made by libre.fm to migrate/gobble user scrobbles atleast |
07:56
🔗
|
ersi |
I think one needs to log in with it's user to gobble them though |
07:56
🔗
|
Wyatt |
libre.fm? Haha, okay, I guess I should have seen that coming. |
07:57
🔗
|
Wyatt |
Ah, no, "CBS Interactive"? |
07:58
🔗
|
Wyatt |
Oh, right, them. |
07:58
🔗
|
ersi |
CBS Interactive? |
07:58
🔗
|
Wyatt |
Not Viacom; CBS owns last.fm |
07:58
🔗
|
ersi |
ah |
09:08
🔗
|
chronomex |
huh, I had no idea |
14:13
🔗
|
DFJustin |
yeah last.fm drives me nuts because they have an automated metadata correction system and even pull known-correct data from musicbrainz and still utterly fail to meaningfully fix anything |
14:14
🔗
|
DFJustin |
and basically don't seem to give a shit despite blog posts trumpeting all this |
14:15
🔗
|
Wyatt |
Oh my, I wasn't aware of THAT aspect. |
14:16
🔗
|
DFJustin |
this is pretty slick though http://encukou.github.com/lastscrape-gui/ |
14:17
🔗
|
Wyatt |
Ooh, nice |
14:21
🔗
|
DFJustin |
like, people have been robovoting on these since 2009 and half of them still don't pass the autocorrect threshold http://www.last.fm/group/The+Auto-Correct+Correction+Brigade/forum/119632/_/522788 |
14:25
🔗
|
Wyatt |
That doesn't terribly surprise me. |
14:26
🔗
|
Wyatt |
Which goes back to my thesis that the push for curation is much appreciated. |
17:02
🔗
|
chronomex |
metadata curation is the exact opposite of sexy |
17:03
🔗
|
Zebranky |
That's one for /topic |
17:20
🔗
|
DFJustin |
the thing is as a web company you don't even have to do anything, just slap on an edit button and let asperger's do the work for you |
17:22
🔗
|
Coderjoe |
which could turn out bad, as some aspergers don't realize or care that they are actually incorrect. |
17:34
🔗
|
DFJustin |
it's still a huge improvement over routing everything through your staff who don't care |
17:34
🔗
|
DFJustin |
it's amazing to me how many sites don't understand this |
17:34
🔗
|
DFJustin |
like, even archive.org won't let visitors fix metadata, and surprise, their metadata sucks |
17:43
🔗
|
Coderjoe |
they made the assumption that the people adding items would care enough about them, i guess |
18:48
🔗
|
SketchCow |
the OpenLibrary interface allows metadata repair |
18:49
🔗
|
SketchCow |
But the issue is different. The issue isn't the uploaders won't do metadata, it's that there's a severe documentation problem that some people are working on, and which I'm trying to help with. |
19:21
🔗
|
DFJustin |
I mean stuff like this where people can only leave an ineffectual comment https://encrypted.google.com/search?q=%22wrong+book%22+site%3Aarchive.org%2Fdetails&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:unofficial&client=firefox-a |
19:22
🔗
|
DFJustin |
yes, if you e-mail collections-service they'll deal with it but that's a high barrier |
19:28
🔗
|
alard |
Quick statistics update: there are 449.287 free articles on JSTOR (that I know of). |
20:07
🔗
|
* |
Electroni Great Electronics Sale! Prices are reduced up to 50%! Laptops, PDAs, Tablet PCs and more only at X Laptops Co, Ltd. Check us out at 4http://XLaptops.net |
20:07
🔗
|
* |
Electroni Great Electronics Sale! Prices are reduced up to 50%! Laptops, PDAs, Tablet PCs and more only at X Laptops Co, Ltd. Check us out at 4http://XLaptops.net |
20:51
🔗
|
Coderjoe |
woohoo |
20:51
🔗
|
Coderjoe |
2 minutes left on this file |
20:54
🔗
|
Coderjoe |
and done |
20:54
🔗
|
Coderjoe |
SketchCow: done with friendster.002800001-002900000.tar.xz |
21:04
🔗
|
SketchCow |
Thanks. |
21:04
🔗
|
SketchCow |
Can you give me the bytesize? |
23:49
🔗
|
Coderjoe |
SketchCow: 102797504180 |
23:50
🔗
|
Coderjoe |
SketchCow: I forgot to move other files out of the directory I was uploading from, so I accidentally started uploading friendster.000104001-000105000.tar.xz again |
23:52
🔗
|
Coderjoe |
there's a .csv file with filenames, sizes, and crc32s of all of the files I have |
23:59
🔗
|
Wyatt |
Now that's curious...what might cause warc-wget to segfault after only 5800 files? |