#archiveteam 2011-09-22,Thu

↑back Search

Time	Nickname	Message
00:57 ^🔗	dashcloud	SketchCow: I'm sure someone else has mentioned this to you, but the big area assembly is still huge in is video encoding
01:04 ^🔗	SketchCo1	21:05 < dashcloud> SketchCow: I'm sure someone else has mentioned this to you, but the big area assembly is still huge in is video encoding
01:04 ^🔗	SketchCo1	What?
01:04 ^🔗	dashcloud	ffmpeg/libav and x264 utilize assembly heavily
01:06 ^🔗	dashcloud	if there's anyone on the cutting edge of assembly & processors, that would be those folks
01:07 ^🔗	SketchCow	oh.
01:40 ^🔗	underscor	For anyone who missed Jason's QA, here's an ad-free version ripped from ustream
01:40 ^🔗	underscor	http://tracker.archive.org/jscott_kickstarter_qa.flv
01:40 ^🔗	primus104	awesome, thanks
01:40 ^🔗	perfinion	archiving the archiver. good job :D
01:42 ^🔗	SketchCow	Yeah, that add bullshit is, in fact, bullshit
01:42 ^🔗	SketchCow	ad
01:58 ^🔗	Wyatts	Is there an adaptor or something for the smaller Betacam tapes?
01:59 ^🔗	SketchCow	The machine just takes them.
02:00 ^🔗	Wyatts	Oh, well that's spiffy
02:02 ^🔗	SketchCow	The big issue is I have a few freakjob Digital Betacams, and nothing to play them on.
02:02 ^🔗	SketchCow	Not even a big issue yet, I have tons of tapes to go.
02:05 ^🔗	Wyatts	Ahh, that's right! What proportion were Beta formats again?
03:07 ^🔗	underscor	alard: chronomex Coderjoe Spread some op goodness?
03:07 ^🔗	dnova	yeah :D
03:08 ^🔗	underscor	vmbrasseu: Hey, I know you! :D
03:08 ^🔗	vmbrasseu	O RLY?
03:08 ^🔗	underscor	:D
03:08 ^🔗	underscor	(Alex, from IA)
03:08 ^🔗	chronomex	underscor: you're alex?!?
03:08 ^🔗	vmbrasseu	Oh, hey!
03:08 ^🔗	underscor	chronomex: Yes?
03:08 ^🔗	vmbrasseu	hugs Alex
03:08 ^🔗	underscor	(Was that sarcasm, chronomex?)
03:08 ^🔗	chronomex	underscor: put some pressure on those guys to document scandata.xml, I'm tired of not being able to number pages properly
03:09 ^🔗	*	chronomex shrug
03:09 ^🔗	chronomex	dunno
03:09 ^🔗	underscor	chronomex: I'll go yell at people now
03:09 ^🔗	chronomex	thanks
03:09 ^🔗	vmbrasseu	chronomex: I am one of those "guys"
03:09 ^🔗	underscor	Yeah, vm's from the archive too
03:10 ^🔗	vmbrasseu	I'll add it to the queue but please don't hold your breath. There's rather a backlog of documentation (read: NONE).
03:10 ^🔗	underscor	I didn't want to say anything, in case she wanted to go 'incognito'
03:10 ^🔗	chronomex	mhm
03:10 ^🔗	vmbrasseu	Meh. I am who I am. One quick web search will out me as someone at IA. ;-)
03:10 ^🔗	chronomex	underscor: well, is there internal documentation that exists, or code to read it? I'd take -anything-
03:11 ^🔗	underscor	Possibly
03:11 ^🔗	chronomex	I
03:11 ^🔗	vmbrasseu	Not as such.
03:11 ^🔗	chronomex	I've got dozens of things with page numbers like G4AD
03:11 ^🔗	chronomex	(technical drawings)
03:12 ^🔗	underscor	chronomex: fyi
03:12 ^🔗	underscor	[11:11:39 PM] rajamaphone: we will automatically create scandata for you if you upload a pdf
03:12 ^🔗	underscor	But I suppose that doesn't help :P
03:14 ^🔗	chronomex	right, I don't have a way to number pdfs either
03:15 ^🔗	chronomex	also I'm scanning to uncompressed tiffs and uploading those; the software I have doesn't do lossless pdf
03:16 ^🔗	underscor	oic
03:16 ^🔗	chronomex	but regardless, I don't have pdf numbering capabilities
03:17 ^🔗	underscor	That blog post I linked may be of use, idk
03:17 ^🔗	chronomex	doesn't look like much in that direction
03:18 ^🔗	*	chronomex shrug
03:35 ^🔗	underscor	alard: How do you plan to get around SOP?
03:38 ^🔗	underscor	Oh, I see how you inject it
03:42 ^🔗	underscor	Man, this is really well done
03:48 ^🔗	chronomex	SOP?
03:58 ^🔗	underscor	Same origin policy
03:59 ^🔗	chronomex	oh
04:17 ^🔗	chronomex	oh dear.
04:17 ^🔗	chronomex	vmbrasseu: are you still here? I've discovered an unpleasant bug in the S3 infrastructure.
04:17 ^🔗	*	vmbrasseu gasps.
04:17 ^🔗	vmbrasseu	Lay it on me.
04:17 ^🔗	vmbrasseu	But no promises.
04:18 ^🔗	chronomex	I uploaded a file using a PUT to http://s3.us.archive.org/CD-1A210-01/CD-1A210-01/bellsystem_CD-1A210-01_images.zip
04:18 ^🔗	chronomex	note the extra slash, it's an error in my script
04:18 ^🔗	chronomex	that last / got turned into %2F
04:18 ^🔗	chronomex	which prevents derive from running to completion; I also cannot delete it with S3 interface (500 error) nor with the web interface
04:19 ^🔗	chronomex	actually not quite
04:19 ^🔗	chronomex	I actually uploaded it to http://s3.us.archive.org/bellsystem_CD-1A210-01/CD-1A210-01%2Fbellsystem_CD-1A210-01_images.zip
04:20 ^🔗	vmbrasseu	Are you trying to delete the file or the item?
04:20 ^🔗	chronomex	er, s,%2F,/,
04:20 ^🔗	chronomex	just the file
04:20 ^🔗	chronomex	I was able to delete it with that url
04:20 ^🔗	chronomex	I got the item id wrong when I was trying to fix it right now
04:20 ^🔗	chronomex	but the undeletable-from-web-interface thing sounds like a bug
04:20 ^🔗	vmbrasseu	Well, deleting in general is a bit of a delicate issue at IA.
04:21 ^🔗	chronomex	understood
04:21 ^🔗	chronomex	the % prevents derive from working properly too
04:21 ^🔗	vmbrasseu	But the encoding seems bug-like.
04:21 ^🔗	chronomex	if I'm not mistaken
04:21 ^🔗	vmbrasseu	Deriving is special voodoo. I'm still working on getting the full lowdown on that one so I can't answer whether the % will bork it here.
04:22 ^🔗	chronomex	aye
04:22 ^🔗	chronomex	tossing that in, I hope it'll get handled properly :)
04:22 ^🔗	chronomex	it seems to parse the url into /{item}/{filename}, then encodes filename to be unix-safe
04:22 ^🔗	vmbrasseu	As soon as I can get someone to define "handled properly" I assure you it'll enter the correct channels. ;-)
04:23 ^🔗	chronomex	hehe okay
04:23 ^🔗	*	chronomex goes to undo the havoc he's wreaked so far today
04:23 ^🔗	vmbrasseu	Yes, that seems like a correct assumption (encoding filename). I'd have to do some code spelunking to confirm.
04:24 ^🔗	DFJustin	do one of you archive.org guys know how to tell the system that you've uploaded a two-page-per-image pdf so the online reader doesn't look retarded http://www.archive.org/stream/DieKoptischenZaubertexteDerSammlungPapyrusErzherzogRainerInWien/stegemann_koptischen_zaubertexte#page/n1/mode/2up
04:24 ^🔗	vmbrasseu	As far as I can tell SO FAR there is no way to declare such a thing.
04:25 ^🔗	vmbrasseu	However that would likely be rolled up in the aforementioned deriving voodoo.
04:25 ^🔗	vmbrasseu	Wait...
04:25 ^🔗	vmbrasseu	You're uploading papyri?
04:25 ^🔗	DFJustin	I guess
04:25 ^🔗	vmbrasseu	Ah, texts about papryi.
04:26 ^🔗	vmbrasseu	Still
04:26 ^🔗	vmbrasseu	This is relevant to my interests!
04:26 ^🔗	chronomex	!
04:26 ^🔗	chronomex	what are you interested in ?
04:27 ^🔗	vmbrasseu	I have a degree in Classical Philology (Latin but mostly Greek) and was headed to grad school for papyrology when The Big Job Offer came through from California.
04:27 ^🔗	DFJustin	heh I guess archiving attracts papyrology geeks
04:27 ^🔗	chronomex	neato
04:28 ^🔗	vmbrasseu	DFJustin: you just got my attention. I'll poke the appropriate personage(s) to see whether there's an answer to your question.
04:28 ^🔗	DFJustin	I'm a computer programmer but I have an amateur interest in philology
04:28 ^🔗	DFJustin	the pdf is from the oriental institute site, they have various stuff that I was going to try to feed in
04:28 ^🔗	vmbrasseu	Computer programming is so much easier than Ancient Greek.
04:28 ^🔗	SketchCow	http://www.archive.org/search.php?query=collection%3Aenter-magazine&sort=-publicdate
04:28 ^🔗	SketchCow	awwww yeah
04:29 ^🔗	BlueMax	lol
04:30 ^🔗	DFJustin	I can crop the pdf manually using briss but it would be nice not to alter it
04:31 ^🔗	vmbrasseu	DFJustin: I've sent your question on to likely suspects.
04:32 ^🔗	DFJustin	thx
04:32 ^🔗	vmbrasseu	Glad to oblige. Stay tuned (probably in a few days).
04:44 ^🔗	DFJustin	I need to get back to greek, it was going so well until the aorists :(
04:44 ^🔗	vmbrasseu	There's method to that madness.
04:45 ^🔗	vmbrasseu	Headed offline here, so we can discuss it off channel sometime.
04:45 ^🔗	SketchCow	I'm up to 4tb of Friendster uploaded.
04:48 ^🔗	Coderjoe	you madman
04:48 ^🔗	chronomex	SketchCow: this is an odd name. http://www.archive.org/details/FRIENDSTER-FRIENDSTER-014200000
04:50 ^🔗	SketchCow	Yes.
04:51 ^🔗	SketchCow	That was me dealing with a big
04:51 ^🔗	SketchCow	bug
04:52 ^🔗	SketchCow	In the code
04:53 ^🔗	SketchCow	And the thing is, until it finishes the deriving and the rest, I can't rename the item.
04:53 ^🔗	*	chronomex nods
04:53 ^🔗	SketchCow	And when you'e deriving/dealing with that many gigs, it takes a while.
04:53 ^🔗	chronomex	but you can rename items, that's good. I've got a misnamed item too
04:53 ^🔗	chronomex	uploader bugs--
04:54 ^🔗	SketchCow	I can.
04:54 ^🔗	SketchCow	I am using a script that does the uploading, called FRIENDSMASH
04:54 ^🔗	SketchCow	And I didn't have error checking
04:55 ^🔗	SketchCow	Then stepped away and phrased the argument wrong
04:55 ^🔗	chronomex	FRIENDSMASH
04:55 ^🔗	chronomex	I like it
04:55 ^🔗	chronomex	mine are rather more buttoned down
04:55 ^🔗	chronomex	but then ... this is The Phone Company
04:56 ^🔗	SketchCow	Next is the Yahoo Video stuff.
04:57 ^🔗	SketchCow	In both these cases, I'd like to write scripts that will suck down the final items, analyze them, and upload info files.
04:57 ^🔗	SketchCow	You saw what I do with CD-ROM images, right.
04:58 ^🔗	Coderjoe	whee... only 15 hours left on this file
04:58 ^🔗	SketchCow	What file are you uploading
04:58 ^🔗	Coderjoe	friendster.002800001-002900000.tar.xz
04:58 ^🔗	SketchCow	Uh oh
04:59 ^🔗	SketchCow	I'm sorry, stop and reupload.
04:59 ^🔗	Coderjoe	uh...
04:59 ^🔗	Coderjoe	okay?
04:59 ^🔗	SketchCow	I was sure you were done.
05:00 ^🔗	SketchCow	Sorry.
05:00 ^🔗	Coderjoe	we'll see how it goes... I did use --partial, so it might have kept the dotfile it uploads to
05:01 ^🔗	Coderjoe	(and renamed it)
05:01 ^🔗	Coderjoe	still waiting for it to tell me anything
05:01 ^🔗	SketchCow	Sorry for this. Let's compare the files you have and lengths before you delete them, when you're done
05:02 ^🔗	SketchCow	I'm getting a lot of pressure to get this data into the system and make room for more stuff.
05:02 ^🔗	SketchCow	The Rsync.net guys want their machine back, etc.
05:02 ^🔗	Coderjoe	i suspect it is doing a checksum check on the 95% of the file up there
05:05 ^🔗	Coderjoe	stupid massively-asymmetric internet connections
05:13 ^🔗	db48x	oh, good
05:13 ^🔗	db48x	IO errors on my /dev/sda
05:18 ^🔗	Coderjoe	looks like --partial saved it
05:19 ^🔗	Coderjoe	it's currently listing a speed of 58MB/s, which is in no way going over my internet connection
05:20 ^🔗	chronomex	yeah --partial is awesome
05:21 ^🔗	Coderjoe	chronomex: well, in this case, a combination of --partial and the fact that rsync writes to a dotfile
05:22 ^🔗	chronomex	rsync only writes to a dotfile if you don't say --partial
05:22 ^🔗	Coderjoe	no, it still writes to a dotfile, but then moves the partially-completed dotfile to the final name
05:23 ^🔗	Coderjoe	(it uses the non-dotfile as the source for blocks that match the remote file)
05:24 ^🔗	db48x	hmm
05:24 ^🔗	db48x	rebooting seems to have "fixed" it
06:08 ^🔗	SketchCow	Rebooting fixes everything
06:28 ^🔗	vmbrasseu	DFJustin: headed to bed but an answer came in to your question and wanted to get it to you ASAP:
06:28 ^🔗	vmbrasseu	"Yes, in fact.Â We added a meta.xml element specifically to deal with that. If they give their item a "bookreader-defaults" value of "mode/1up", BookReader will start up in 1-page mode instead of the usual 2-page mode. See, for instance, item CLARION_CALL_1961-1962_v33 and its Read Online link."
06:28 ^🔗	vmbrasseu	Give that a go.
06:30 ^🔗	vmbrasseu	Bonne chance et bonne nuit.
07:40 ^🔗	Wyatt	Jason, after tonight, I appreciate your push for metadata curation more than ever.
07:40 ^🔗	perfinion	what happened tonight?
07:41 ^🔗	Wyatt	Oh, I was explaining some of the issues with crowdsourcing tags for music. And to drive my point home, I went to last.fm.
07:41 ^🔗	Wyatt	And even I wasn't fully prepared for that mess. :/
07:41 ^🔗	perfinion	yeeah
07:41 ^🔗	perfinion	crowd sourcing is a nice idea
07:42 ^🔗	perfinion	but it needs stricter implementations
07:42 ^🔗	Wyatt	But it requires a guiding hand
07:42 ^🔗	perfinion	yeah
07:42 ^🔗	perfinion	i suppose just giving some ppl mod rights would be enough
07:43 ^🔗	Wyatt	Well part of the issue is last.fm is really just inadequate for this task in its current form.
07:43 ^🔗	perfinion	i never really got the point of lastfm
07:43 ^🔗	Wyatt	Tags on last.fm are...third-class citizens?
07:43 ^🔗	perfinion	why would i want to advertise exactly what songs im listening to?
07:44 ^🔗	Wyatt	At its heart, it's something like a social network for music listeners.
07:44 ^🔗	perfinion	i guess i dont really use facebook much either, so im the wrong person to figure it out :P
07:44 ^🔗	Wyatt	And it makes recommendations and allows you to listen with people and such. I use it primarily to see data about what I listened to and when and how often and such.
07:45 ^🔗	perfinion	my music player on my laptop queries it for recommendations
07:46 ^🔗	perfinion	but i dont see why i'd want to scrobble my songs
07:46 ^🔗	perfinion	although i suppose enough ppl hae to do it otherwise it wont have data for recommendations
07:46 ^🔗	Wyatt	Pretty much. It's hueristic based on community similarity rather than actual music traits (Music Genome Project)
07:48 ^🔗	Wyatt	It's interesting to me as a case study, and there are valuable lessons to learn from it, but it could use a makeover.
07:48 ^🔗	perfinion	indeed
07:48 ^🔗	Wyatt	(Though hopefully not like Friendster"
07:48 ^🔗	perfinion	hahaha
07:49 ^🔗	Wyatt	Funny until it comes true. That'd be one to keep an eye on, come to think of it. :/
07:51 ^🔗	ersi	What's there to grab at last.fm by the way? Every users individual scrobbles?
07:51 ^🔗	ersi	usernames? artists / song names?
07:52 ^🔗	Wyatt	It also has user groups with forum functionality, wiki pages per-artist and _per-song_...and I think there's some other stuff.
07:53 ^🔗	Wyatt	It started as a radio station/forum hybrid bolted to a CS project as I recall. And I think it never really knew what to grow up into so it became a Web 2.0 chimera.
07:53 ^🔗	ersi	oh yeah
07:54 ^🔗	ersi	Yeah, definitely
07:55 ^🔗	Wyatt	Actually, now that I look at the history of last year, it might be one to watch. Owned by viacom and making moves that upset users? Sounds like an unfavourable recipe.
07:55 ^🔗	ersi	indeed
07:55 ^🔗	ersi	there's a few scripts made by libre.fm to migrate/gobble user scrobbles atleast
07:56 ^🔗	ersi	I think one needs to log in with it's user to gobble them though
07:56 ^🔗	Wyatt	libre.fm? Haha, okay, I guess I should have seen that coming.
07:57 ^🔗	Wyatt	Ah, no, "CBS Interactive"?
07:58 ^🔗	Wyatt	Oh, right, them.
07:58 ^🔗	ersi	CBS Interactive?
07:58 ^🔗	Wyatt	Not Viacom; CBS owns last.fm
07:58 ^🔗	ersi	ah
09:08 ^🔗	chronomex	huh, I had no idea
14:13 ^🔗	DFJustin	yeah last.fm drives me nuts because they have an automated metadata correction system and even pull known-correct data from musicbrainz and still utterly fail to meaningfully fix anything
14:14 ^🔗	DFJustin	and basically don't seem to give a shit despite blog posts trumpeting all this
14:15 ^🔗	Wyatt	Oh my, I wasn't aware of THAT aspect.
14:16 ^🔗	DFJustin	this is pretty slick though http://encukou.github.com/lastscrape-gui/
14:17 ^🔗	Wyatt	Ooh, nice
14:21 ^🔗	DFJustin	like, people have been robovoting on these since 2009 and half of them still don't pass the autocorrect threshold http://www.last.fm/group/The+Auto-Correct+Correction+Brigade/forum/119632/_/522788
14:25 ^🔗	Wyatt	That doesn't terribly surprise me.
14:26 ^🔗	Wyatt	Which goes back to my thesis that the push for curation is much appreciated.
17:02 ^🔗	chronomex	metadata curation is the exact opposite of sexy
17:03 ^🔗	Zebranky	That's one for /topic
17:20 ^🔗	DFJustin	the thing is as a web company you don't even have to do anything, just slap on an edit button and let asperger's do the work for you
17:22 ^🔗	Coderjoe	which could turn out bad, as some aspergers don't realize or care that they are actually incorrect.
17:34 ^🔗	DFJustin	it's still a huge improvement over routing everything through your staff who don't care
17:34 ^🔗	DFJustin	it's amazing to me how many sites don't understand this
17:34 ^🔗	DFJustin	like, even archive.org won't let visitors fix metadata, and surprise, their metadata sucks
17:43 ^🔗	Coderjoe	they made the assumption that the people adding items would care enough about them, i guess
18:48 ^🔗	SketchCow	the OpenLibrary interface allows metadata repair
18:49 ^🔗	SketchCow	But the issue is different. The issue isn't the uploaders won't do metadata, it's that there's a severe documentation problem that some people are working on, and which I'm trying to help with.
19:21 ^🔗	DFJustin	I mean stuff like this where people can only leave an ineffectual comment https://encrypted.google.com/search?q=%22wrong+book%22+site%3Aarchive.org%2Fdetails&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:unofficial&client=firefox-a
19:22 ^🔗	DFJustin	yes, if you e-mail collections-service they'll deal with it but that's a high barrier
19:28 ^🔗	alard	Quick statistics update: there are 449.287 free articles on JSTOR (that I know of).
20:07 ^🔗	*	Electroni Great Electronics Sale! Prices are reduced up to 50%! Laptops, PDAs, Tablet PCs and more only at X Laptops Co, Ltd. Check us out at 4http://XLaptops.net
20:07 ^🔗	*	Electroni Great Electronics Sale! Prices are reduced up to 50%! Laptops, PDAs, Tablet PCs and more only at X Laptops Co, Ltd. Check us out at 4http://XLaptops.net
20:51 ^🔗	Coderjoe	woohoo
20:51 ^🔗	Coderjoe	2 minutes left on this file
20:54 ^🔗	Coderjoe	and done
20:54 ^🔗	Coderjoe	SketchCow: done with friendster.002800001-002900000.tar.xz
21:04 ^🔗	SketchCow	Thanks.
21:04 ^🔗	SketchCow	Can you give me the bytesize?
23:49 ^🔗	Coderjoe	SketchCow: 102797504180
23:50 ^🔗	Coderjoe	SketchCow: I forgot to move other files out of the directory I was uploading from, so I accidentally started uploading friendster.000104001-000105000.tar.xz again
23:52 ^🔗	Coderjoe	there's a .csv file with filenames, sizes, and crc32s of all of the files I have
23:59 ^🔗	Wyatt	Now that's curious...what might cause warc-wget to segfault after only 5800 files?

irclogger-viewer