#archiveteam 2013-01-10,Thu

↑back Search

Time Nickname Message
02:06 πŸ”— arisboch Hi, anyone know about yourfanfiction.com, they did go offline, any word from the archive team about backups, yourfanficiton.com said some time in advance, that they're in danger?
02:10 πŸ”— chronomex never heard of them
03:16 πŸ”— SketchCow OK, that's enough of nemo's stuff
03:16 πŸ”— SketchCow That's ONE way to spend six hours
06:07 πŸ”— twrist So..
06:07 πŸ”— twrist I cannot seem to run yahooblog-grab
06:08 πŸ”— twrist Running pipeline.py just quits without any output
11:51 πŸ”— Pimpollo www.jizzday.com
11:51 πŸ”— Pimpollo www.jizzday.com
14:04 πŸ”— Nemo_bis All accounts will be backed up and made available for download (actually, you can do this now, but the new backups will be offline on archive.org and available forever.)
14:04 πŸ”— Nemo_bis http://status.net/2013/01/09/preview-of-changes-to-identi-ca
14:04 πŸ”— Nemo_bis To be trusted?
14:05 πŸ”— GLaDOS Could be a decoy to hold us off.
14:06 πŸ”— GLaDOS Do it anyway, for the sake of doing it.
14:06 πŸ”— GLaDOS ...unless they're in here.
16:05 πŸ”— Coderjoe if anything has been uploaded to IA, it can be checked on, even if it is dark
16:08 πŸ”— Coderjoe though knowing part of the identifier, or the uploader or collection helps find it
16:17 πŸ”— Nemo_bis Coderjoe: what do you mean checked on?
16:17 πŸ”— Coderjoe it can be looked up in the catalog
16:18 πŸ”— Coderjoe though only admins or possibly local IA users would be able to access the files
16:18 πŸ”— Nemo_bis with wildcard search you mean
16:18 πŸ”— Nemo_bis I should use that I guess.
16:18 πŸ”— Coderjoe or the metamanager
16:19 πŸ”— Coderjoe which I don't know if there is limited access to
16:20 πŸ”— Coderjoe i think the features to do any changes require rights, but just doing queries doesn't
16:21 πŸ”— Nemo_bis chages require shell access, IIRC queries adminship
16:28 πŸ”— Coderjoe i don't think shell access is needed for the changes I refer to in the metamanager
16:28 πŸ”— Coderjoe (like moving items between collections and the like)
16:34 πŸ”— Nemo_bis hm
16:35 πŸ”— godane1 start my uploading of the screen savers has one per a item: http://archive.org/details/The.Screen.Savers.2004.04.01
16:35 πŸ”— godane1 *i started
17:05 πŸ”— underscor Viewing metamgr you to pass User::any_admin()
17:05 πŸ”— underscor Which effectively means you need to be a collection owner
17:05 πŸ”— underscor You get (more?) buttons if you're User::slash_admin()
17:06 πŸ”— underscor which basically means you can frobnicate any item
17:06 πŸ”— underscor To view a dark item's files, though, you have to have shell access
17:06 πŸ”— underscor on the datanodes
17:15 πŸ”— godane1 underscor: can you find out why new robots.txt override older ones?
17:15 πŸ”— godane1 example of older robots.txt: http://web.archive.org/web/20040630192118/http://cetips.com/robots.txt
17:16 πŸ”— godane1 example of the newer ones: http://web.archive.org/web/20111004042827/http://cetips.com/robots.txt
17:17 πŸ”— Coderjoe "oh crap. we didn't want that to be stored on IA." perhaps?
17:18 πŸ”— godane1 more like the newer ones is cause of that bad website sitter that blocks ia bots
17:24 πŸ”— underscor It is a policy decision to avoid legal drama.
17:25 πŸ”— underscor But, in theory, we could tie website captures to whatever state the robots.txt had at the time of the capture
17:25 πŸ”— underscor instead of the current one
17:25 πŸ”— underscor However, there are a lot of people who use the robots.txt block thinking that their stuff wouldn't show up again, so we'd need some other way to "opt out"
17:58 πŸ”— balrog_ godane1: the new robots.txt doesn't seem to imply that ia_archiver should be blocked at all?
18:00 πŸ”— godane1 i was able look at this site like more then a year ago
18:00 πŸ”— godane1 but now i can't
18:00 πŸ”— godane1 this in the newer robot.txt: User-Agent: ia_archiver
18:01 πŸ”— godane1 and a Disallow: is under that
18:03 πŸ”— balrog_ godane1: yes but nothing is specified to be disallowed.
18:04 πŸ”— godane1 i know that
18:04 πŸ”— balrog_ isn't that a bug in ia_archiver? :/
18:04 πŸ”— godane1 maybe
18:04 πŸ”— balrog_ I brought this up a week or two ago
18:04 πŸ”— balrog_ it sure looks like a bug
18:05 πŸ”— godane1 maybe it just blocks anything if ia_archiver comes up
18:05 πŸ”— balrog_ yeah but what if I specifically want to ALLOW ia_archiver for a site? (as it seems to be the case here)
18:05 πŸ”— balrog_ http://www.robotstxt.org/robotstxt.html states that the syntax on that site is "To allow a single robot"
18:05 πŸ”— balrog_ this is most definitely a bug ;(
18:06 πŸ”— balrog_ err no it isn't
18:06 πŸ”— balrog_ the reason it's blocked is this: http://web.archive.org/web/20120819150435/http://spi.domainsponsor.com/ds_robots.txt
18:06 πŸ”— balrog_ a rogue robots.txt
18:06 πŸ”— godane1 thats what i thought too
18:07 πŸ”— godane1 maybe archive should have a special black list of robots.txt
18:07 πŸ”— godane1 if something like comes up then just ignore
18:07 πŸ”— balrog_ or apply only to current/future crawls
18:07 πŸ”— balrog_ and don't black out older ones
18:08 πŸ”— balrog_ that would make the most sense
18:08 πŸ”— godane1 that works too
18:08 πŸ”— balrog_ http://archive.org/post/423432/domainsponsorcom-erasing-prior-archived-copies-of-135000-domains Ҁ” though jory2 derailed the thread :(
18:10 πŸ”— balrog_ (at the end)
18:10 πŸ”— balrog_ http://archive.org/post/433169/domainsponsorcom-monikercom-re-deleted-archive-after-domain-backorder etc
18:16 πŸ”— schbiridi maybe ia could only block if the whois did not change between the time of retrival and the current robots.txt
18:19 πŸ”— DFJustin it seems like the main problem is these large-scale squatters and that could be taken care of with a few special cases
18:20 πŸ”— balrog_ DFJustin: yes, this is the main problem Ҁ” the large-scale squatters
18:40 πŸ”— SketchCow The bitsavers ingestion has hit its stride!
18:50 πŸ”— Nemo_bis Nice, derivers were lazying again.
18:54 πŸ”— Smiley MOAR FATA.
19:27 πŸ”— SketchCow http://archive.org/details/bitsavers now is starting to have individual companies
21:09 πŸ”— guigui hello! how do I know/limit how much disk space the warrior uses?
21:38 πŸ”— SketchCow I think it's under a gig, isn't it?
21:40 πŸ”— alard The disk image can grow to up to 60 GB.
21:41 πŸ”— SketchCow !
21:41 πŸ”— alard There's a way to give it more space, or less: disconnect the "data" disk from the VM, create a new virtual disk image of the size you want and connect it.
21:41 πŸ”— alard The warrior will format the new drive when it boots.
21:43 πŸ”— alard (The problem with these virtual disk images is that they only grow, never shrink. So even if though the warrior removes the downloaded files, the disk image will eventually grow to its full size.)
22:41 πŸ”— ersi sigh.

irclogger-viewer