#archiveteam 2012-12-03,Mon

↑back Search

Time	Nickname	Message
00:05 ^🔗	SketchCow	poor city pf heroes
00:24 ^🔗	balrog_	http://www.glitch.com/closing
00:25 ^🔗	balrog_	shutting down in 7 days...
04:55 ^🔗	tuankiet	Why in the items/hour box, I see two black and blue line?
09:40 ^🔗	Nemo_bis	SketchCow: the most downloaded TGM is the one with empty ISO :p https://archive.org/details/cdrom-gamesmachinedvd-volume-14
09:43 ^🔗	chronomex	paging dr godane
09:50 ^🔗	SketchCow	That is a mess, huh.
10:01 ^🔗	godane	hey all
10:01 ^🔗	godane	going to bad
10:02 ^🔗	chronomex	nite
10:06 ^🔗	godane	i'm uploading some nintendo 64 promo tapes
10:06 ^🔗	godane	nite
12:21 ^🔗	SketchCow	Let me see about shoving more crap off of FOS into the world.
12:23 ^🔗	ersi	Data wants to be free!
12:32 ^🔗	SketchCow	53 CD-ROMs about to dead drop.
12:32 ^🔗	SketchCow	I have a program called bchunk that converts a .bin/.cue to an .iso
12:32 ^🔗	SketchCow	About to run it on all these poor things.
12:39 ^🔗	SketchCow	And after that, assuming it works, I'd like us to go through the cdrom collection and find all bin/cue and I can write something that grabs the bin/cue, makes an iso, and uploads the iso.
12:55 ^🔗	norbert79	SketchCow: Are you sure converting them to ISO insteads of MDF is a good choice? ISO is for one tracked images only, what if they have more tracks?
12:57 ^🔗	SketchCow	Then it makes more.
12:58 ^🔗	SketchCow	Regardless, the reason to convert them to .iso (AS WELL, I include the original .bin/cue) is that the archive.org viewer works with iso.
12:58 ^🔗	SketchCow	I have both the original AND the convert.
13:01 ^🔗	SketchCow	http://archive.org/details/gambler_cdrom_01 (First one!)
13:02 ^🔗	SketchCow	http://ia601500.us.archive.org/isoview.php?iso=/20/items/gambler_cdrom_01/GAMBLER_01_1996_0901.iso therefore works.
13:02 ^🔗	SketchCow	Now, please allow me to do this 53 more times.
13:09 ^🔗	SketchCow	http://archive.org/details/cdrom_gambler
13:09 ^🔗	SketchCow	Aaaand, they're appearing!
13:09 ^🔗	SketchCow	Not bad, only takes a few minutes.
13:26 ^🔗	void__	SketchCow: the code of that isoview.php is free? is public somewhere?
13:29 ^🔗	SketchCow	I assume so, no idea where though.
13:29 ^🔗	SketchCow	It's just using tar.
13:31 ^🔗	void__	ok
13:48 ^🔗	SketchCow	Hey, want to hear a pet peeve?
13:49 ^🔗	SketchCow	When someone criticizes my shit, and puts a :) at the end.
13:49 ^🔗	SketchCow	Want. to. dragon. punch.
14:07 ^🔗	underscor	void__: It's basically just formatting the output of 7z l
14:07 ^🔗	underscor	The code isn't very exciting, and is nearly all just archive-specific header spewing, etc
14:09 ^🔗	void__	underscor: 7z called with a system() or equivalent?
14:09 ^🔗	underscor	Yes
14:09 ^🔗	void__	ok, thank you
14:09 ^🔗	underscor	We wrap system calls in a bunch of layers of abstraction, but that's what ends up happening at the very end
15:13 ^🔗	DFJustin	SketchCow: is https://archive.org/details/gambler_cdrom_32b supposed to have 10 cds on it
15:18 ^🔗	SketchCow	They're tracks.
15:18 ^🔗	SketchCow	This is what BCHUNK does.
15:21 ^🔗	DFJustin	oic
15:26 ^🔗	DFJustin	underscor: any chance of using lsar/unar instead of 7z? there are currently a lot of archives that confuse it for one reason or another
15:34 ^🔗	DFJustin	and they understand .bin without needing all this tomfoolery
15:48 ^🔗	SketchCow	I asked about this. There was a Reason.
15:48 ^🔗	SketchCow	It was mostly related to not shaking up the poor cluster of machines to do this new thing.
15:48 ^🔗	SketchCow	Until, I guess, a more powerful set happens AND has an opportunity to update.
16:02 ^🔗	underscor	Yeah, basically what SketchCow said. Adding things to the workers won't happen until we roll out a new set
17:02 ^🔗	schbiridi	is there some linux tool/script to nicely archive websites via their rss feeds?
18:11 ^🔗	obscure__	Hey sketch, you around?
18:46 ^🔗	DFJustin	BCHUNK apparently supports .wav instead of .cdr output, might be more ia-friendly
18:58 ^🔗	schbiridi	it does, i use that all the time
19:27 ^🔗	schbiridi	just saw jason's tweet, http//mdf2iso.berlios.de is nice too
19:27 ^🔗	schbiridi	i also have mdfextract installed but do not recall using it
21:29 ^🔗	riordan	Know I'm probably late to the party on this, but is there any desire to archive The Daily?
21:31 ^🔗	chronomex	the daily what?
21:32 ^🔗	riordan	Newscorp's "Revolutionary" ipad-only newspaper
21:32 ^🔗	riordan	Articles were posted to a web CMS but could only be accessed when sent from a subscriber
21:32 ^🔗	chronomex	sure, why not
21:33 ^🔗	riordan	Andy Baio started it back in 2011 when it started
21:33 ^🔗	chronomex	The Daily is also the name of my alma mater's newspaper, which happens to be the oldest news periodical in the state
21:34 ^🔗	riordan	hah
21:36 ^🔗	riordan	http://waxy.org/2011/02/the_daily_indexed/
21:37 ^🔗	riordan	Bascially Steve Jobs made Murdoch fall in love with the iPad, convinced it it'd save the business model of newspapers. It didn't.
21:38 ^🔗	riordan	But Murdoch hired some of the smartest writers and locked their content up so almost nobody was able to read it
21:38 ^🔗	balrog_	yup
21:39 ^🔗	riordan	It'd be a shitty process; probably involve people with iPads subscribing or using the week free trial, grabbing as much of the backlog as possible, and sharing as many article links per issue as they could to an email address (or set of email accounts) that we could use to get the links to the stories and scrape them
21:40 ^🔗	riordan	http://www.theatlantic.com/technology/archive/2012/12/3-theses-about-the-dailys-demise/265842/
21:40 ^🔗	ersi	Grab anything you can
21:40 ^🔗	ersi	If you can
21:40 ^🔗	balrog_	"It's also worth noting that Google's slowly indexing all the articles too, and search engines aren't blocked in their robots.txt file."
21:41 ^🔗	riordan	balrog_: not anymore
21:41 ^🔗	riordan	Their robots.txt now blocks robots
21:41 ^🔗	balrog_	riordan: they're still showing up in google
21:41 ^🔗	riordan	rly?
21:42 ^🔗	balrog_	als, http://www.thedaily.com/robots.txt
21:42 ^🔗	balrog_	maybe they once had excluded ia_archiver?
21:42 ^🔗	riordan	oh damn
21:42 ^🔗	riordan	looks like they already pulled the app out
21:42 ^🔗	balrog_	that or they did "If you cannot put a robots.txt file up, read our exclusion policy. If you think it applies to you, send a request to us at info@archive.org."
21:42 ^🔗	riordan	so it's all moot
21:42 ^🔗	balrog_	they're still google indexed at least
21:43 ^🔗	balrog_	site:thedaily.com inurl:page/2012 and site:thedaily.com inurl:page/2011
21:44 ^🔗	riordan	For some reason I'm not getting anything from the google index for site:thedaily.com
21:45 ^🔗	balrog_	I get 42700 results
21:45 ^🔗	riordan	hmmm
21:45 ^🔗	balrog_	with just the query "site:thedaily.com"
21:45 ^🔗	balrog_	(no quotes)
21:46 ^🔗	riordan	hmmm - well something up what google's serving to me then
21:46 ^🔗	balrog_	try in another browser
21:46 ^🔗	riordan	got it
21:46 ^🔗	balrog_	ok
21:49 ^🔗	riordan	and they've got their recent material in a sitemap
21:49 ^🔗	riordan	http://www.thedaily.com/sitemap-news.xml.gz
21:49 ^🔗	balrog_	only the past month or so
21:51 ^🔗	riordan	Unlike most newspapers, which get sucked up into lexisnexis, this thing's probably going nowhere
21:52 ^🔗	balrog_	yeah I'm afraid so
21:53 ^🔗	riordan	I've got a friend who knows their editor in chief - I'll see if I can pass a message along to ask if there's a plan of succession for content?
21:53 ^🔗	balrog_	it would be nice if the old content could at least be archived
21:53 ^🔗	riordan	precisely
21:54 ^🔗	chronomex	I like how the website is just images of text
21:54 ^🔗	chronomex	that's A+ CMS right there
21:55 ^🔗	ersi	AAA+
21:56 ^🔗	riordan	a+++ business model. Would fail again

irclogger-viewer