#archiveteam 2013-01-10,Thu

↑back Search

Time	Nickname	Message
02:06 ^🔗	arisboch	Hi, anyone know about yourfanfiction.com, they did go offline, any word from the archive team about backups, yourfanficiton.com said some time in advance, that they're in danger?
02:10 ^🔗	chronomex	never heard of them
03:16 ^🔗	SketchCow	OK, that's enough of nemo's stuff
03:16 ^🔗	SketchCow	That's ONE way to spend six hours
06:07 ^🔗	twrist	So..
06:07 ^🔗	twrist	I cannot seem to run yahooblog-grab
06:08 ^🔗	twrist	Running pipeline.py just quits without any output
11:51 ^🔗	Pimpollo	www.jizzday.com
11:51 ^🔗	Pimpollo	www.jizzday.com
14:04 ^🔗	Nemo_bis	All accounts will be backed up and made available for download (actually, you can do this now, but the new backups will be offline on archive.org and available forever.)
14:04 ^🔗	Nemo_bis	http://status.net/2013/01/09/preview-of-changes-to-identi-ca
14:04 ^🔗	Nemo_bis	To be trusted?
14:05 ^🔗	GLaDOS	Could be a decoy to hold us off.
14:06 ^🔗	GLaDOS	Do it anyway, for the sake of doing it.
14:06 ^🔗	GLaDOS	...unless they're in here.
16:05 ^🔗	Coderjoe	if anything has been uploaded to IA, it can be checked on, even if it is dark
16:08 ^🔗	Coderjoe	though knowing part of the identifier, or the uploader or collection helps find it
16:17 ^🔗	Nemo_bis	Coderjoe: what do you mean checked on?
16:17 ^🔗	Coderjoe	it can be looked up in the catalog
16:18 ^🔗	Coderjoe	though only admins or possibly local IA users would be able to access the files
16:18 ^🔗	Nemo_bis	with wildcard search you mean
16:18 ^🔗	Nemo_bis	I should use that I guess.
16:18 ^🔗	Coderjoe	or the metamanager
16:19 ^🔗	Coderjoe	which I don't know if there is limited access to
16:20 ^🔗	Coderjoe	i think the features to do any changes require rights, but just doing queries doesn't
16:21 ^🔗	Nemo_bis	chages require shell access, IIRC queries adminship
16:28 ^🔗	Coderjoe	i don't think shell access is needed for the changes I refer to in the metamanager
16:28 ^🔗	Coderjoe	(like moving items between collections and the like)
16:34 ^🔗	Nemo_bis	hm
16:35 ^🔗	godane1	start my uploading of the screen savers has one per a item: http://archive.org/details/The.Screen.Savers.2004.04.01
16:35 ^🔗	godane1	*i started
17:05 ^🔗	underscor	Viewing metamgr you to pass User::any_admin()
17:05 ^🔗	underscor	Which effectively means you need to be a collection owner
17:05 ^🔗	underscor	You get (more?) buttons if you're User::slash_admin()
17:06 ^🔗	underscor	which basically means you can frobnicate any item
17:06 ^🔗	underscor	To view a dark item's files, though, you have to have shell access
17:06 ^🔗	underscor	on the datanodes
17:15 ^🔗	godane1	underscor: can you find out why new robots.txt override older ones?
17:15 ^🔗	godane1	example of older robots.txt: http://web.archive.org/web/20040630192118/http://cetips.com/robots.txt
17:16 ^🔗	godane1	example of the newer ones: http://web.archive.org/web/20111004042827/http://cetips.com/robots.txt
17:17 ^🔗	Coderjoe	"oh crap. we didn't want that to be stored on IA." perhaps?
17:18 ^🔗	godane1	more like the newer ones is cause of that bad website sitter that blocks ia bots
17:24 ^🔗	underscor	It is a policy decision to avoid legal drama.
17:25 ^🔗	underscor	But, in theory, we could tie website captures to whatever state the robots.txt had at the time of the capture
17:25 ^🔗	underscor	instead of the current one
17:25 ^🔗	underscor	However, there are a lot of people who use the robots.txt block thinking that their stuff wouldn't show up again, so we'd need some other way to "opt out"
17:58 ^🔗	balrog_	godane1: the new robots.txt doesn't seem to imply that ia_archiver should be blocked at all?
18:00 ^🔗	godane1	i was able look at this site like more then a year ago
18:00 ^🔗	godane1	but now i can't
18:00 ^🔗	godane1	this in the newer robot.txt: User-Agent: ia_archiver
18:01 ^🔗	godane1	and a Disallow: is under that
18:03 ^🔗	balrog_	godane1: yes but nothing is specified to be disallowed.
18:04 ^🔗	godane1	i know that
18:04 ^🔗	balrog_	isn't that a bug in ia_archiver? :/
18:04 ^🔗	godane1	maybe
18:04 ^🔗	balrog_	I brought this up a week or two ago
18:04 ^🔗	balrog_	it sure looks like a bug
18:05 ^🔗	godane1	maybe it just blocks anything if ia_archiver comes up
18:05 ^🔗	balrog_	yeah but what if I specifically want to ALLOW ia_archiver for a site? (as it seems to be the case here)
18:05 ^🔗	balrog_	http://www.robotstxt.org/robotstxt.html states that the syntax on that site is "To allow a single robot"
18:05 ^🔗	balrog_	this is most definitely a bug ;(
18:06 ^🔗	balrog_	err no it isn't
18:06 ^🔗	balrog_	the reason it's blocked is this: http://web.archive.org/web/20120819150435/http://spi.domainsponsor.com/ds_robots.txt
18:06 ^🔗	balrog_	a rogue robots.txt
18:06 ^🔗	godane1	thats what i thought too
18:07 ^🔗	godane1	maybe archive should have a special black list of robots.txt
18:07 ^🔗	godane1	if something like comes up then just ignore
18:07 ^🔗	balrog_	or apply only to current/future crawls
18:07 ^🔗	balrog_	and don't black out older ones
18:08 ^🔗	balrog_	that would make the most sense
18:08 ^🔗	godane1	that works too
18:08 ^🔗	balrog_	http://archive.org/post/423432/domainsponsorcom-erasing-prior-archived-copies-of-135000-domains âÂ though jory2 derailed the thread :(
18:10 ^🔗	balrog_	(at the end)
18:10 ^🔗	balrog_	http://archive.org/post/433169/domainsponsorcom-monikercom-re-deleted-archive-after-domain-backorder etc
18:16 ^🔗	schbiridi	maybe ia could only block if the whois did not change between the time of retrival and the current robots.txt
18:19 ^🔗	DFJustin	it seems like the main problem is these large-scale squatters and that could be taken care of with a few special cases
18:20 ^🔗	balrog_	DFJustin: yes, this is the main problem âÂ the large-scale squatters
18:40 ^🔗	SketchCow	The bitsavers ingestion has hit its stride!
18:50 ^🔗	Nemo_bis	Nice, derivers were lazying again.
18:54 ^🔗	Smiley	MOAR FATA.
19:27 ^🔗	SketchCow	http://archive.org/details/bitsavers now is starting to have individual companies
21:09 ^🔗	guigui	hello! how do I know/limit how much disk space the warrior uses?
21:38 ^🔗	SketchCow	I think it's under a gig, isn't it?
21:40 ^🔗	alard	The disk image can grow to up to 60 GB.
21:41 ^🔗	SketchCow	!
21:41 ^🔗	alard	There's a way to give it more space, or less: disconnect the "data" disk from the VM, create a new virtual disk image of the size you want and connect it.
21:41 ^🔗	alard	The warrior will format the new drive when it boots.
21:43 ^🔗	alard	(The problem with these virtual disk images is that they only grow, never shrink. So even if though the warrior removes the downloaded files, the disk image will eventually grow to its full size.)
22:41 ^🔗	ersi	sigh.

irclogger-viewer