[09:20] is there any way to upload something to upload something to archive.org without creative commons? [09:21] sure [09:21] i see, it's optional [09:27] there's no generic 'data' category? [09:27] has to be audio, movie, or text? [09:29] you're using the form, aren't you? [09:29] yes [09:29] is there an api? [09:29] sorry, i've never really investigated this before :) [09:29] there are other categories, just not available through the web form [09:30] ah excellent [09:30] http://archive.org/help/abouts3.txt [09:30] oh god, perfect [09:30] thank you [09:32] i'm building a 'blackbox' system for everything i ever create [09:32] and the goal is for it to be as permanent as possible [09:33] however, unless you are an admin, you can only upload to one of a few collections [09:33] Coderjoe: wonder if i can get a collection added for myself [09:33] (which the web form picked via the category you chose) [09:34] that'd be ideal [09:49] ideally i'll have a warc for everything too [09:49] but we'll see [10:10] Coderjoe: you can be added to the approve list for a collection, of course [10:27] mediatype can be set to anything by anyone [10:27] i'm starting to hate the speed of ftp [10:27] godane: only now? [10:28] it normally works fine [10:28] No. It doesn't. [10:28] for me it does [10:28] but ever so often the speed becomes very slow [10:28] Maybe you're the only user left. https://archive.org/~tracey/mrtg/ftp.html [10:29] Every time a single other person tries to use it, you're both ruined. ;) [10:29] I'm using it [10:30] i'm not that good with the scripting uploads to s3 [10:30] I kept getting errors that the drive was full earler [10:30] is there anyone here i should bother for a 'kennethreitz' collection, or should i go through the normal process? [10:30] /cc @chronomex [10:30] hi [10:31] I think underscor or SketchCow are the people to ask [10:31] /cc underscor :) [11:40] i think s3 is very slow too [11:41] not just ftp [11:41] What does this collection have? [11:42] WARCs of everything he's done. [12:14] what the [12:14] the ia donate page no longer has the 3-to-1 match blurb [12:15] That's unfortunate, because maybe there's a few holding out to the absolute last day for some reason [12:15] the amounts reflect it, and the blog post about it says it goes to the 31st [12:16] but the progress meter is gone [12:17] maybe the goal was reached? [12:18] It was lacking 17k yesterday [12:18] ah, doubtful then [12:55] * SmileyG looks in [13:13] SketchCow: i'm working on a continual archive of everything i create, including articles, tweets, photos, music, etc [13:13] SketchCow: the plan is to have it back itself up to archive.org in case I have an untimely demise :) [13:18] it's coming along quite nicely so far [13:18] http://blackbox.kennethreitz.org/records/1e7f3c62-96e4-4be4-a26a-f62c61ce939d [13:18] http://blackbox.kennethreitz.org/records/1e7f3c62-96e4-4be4-a26a-f62c61ce939d/download [14:03] * Nemo_bis has 1200 tasks waiting for admin. :/ [15:45] i think web archive should open up old 90s versions of sites, it sucks now that some domains seem to be totally gone due to a NEW robots.txt put on the active site? [15:45] bla bla bla whine old bla bla [15:45] It's been iterated over a billion times already. [15:47] ah sorry, didnt think about that [15:48] But I agree that it's unfortunate that some new owner of a domain can make the previous owners data hidden in the Wayback Machine. [15:49] There's a lot of data public for what I know, look in the crawldata collection @ IA. It's not everything though, I think. And besides, the data will continue to exist - it's just hidden/darkened (until it's public again, if IA undarks or robots.txt goes away) [15:51] yeah, theres still a chance to see some of it some time later i guess [15:51] it hasnt been a huge thing or anything, only a few sites [15:51] Yeah, but it comes up so often it makes me almost angry everytime it comes up [15:52] i have had a similar reaction :P [15:52] ^_^ [15:53] it's hard to solve though i would think, sometimes a legitimate owner wants to block the whole history and i reckon he should be able to [15:53] i think other times they dont even know about IA maybe [15:54] some have forbidden everything by default and it seems senseless [15:54] I know that the Wayback Machine does a HTTP GET on the robots.txt when it's going to serve something from a crawled domain - everytime [15:54] ah [15:55] Maybe I'm wrong, but I have a faint memory of that from fiddling with the code and trying to set Wayback Machine up (http://github.com/internetarchive/wayback/) [15:57] guess it can also be tested, i have a couple old domains indexed i could set them up again and do before/after robots.txt [15:57] but it does feel that way [15:57] it was restrictive just earlier, a site is blocked and i was totally excited to see it [15:57] some very old site [15:57] brb [15:57] ehe [15:59] Yeah, sucks when you run into the problem [16:43] That's an interesting tactic, kennethre [16:43] SketchCow: thanks, i like it more the longer i think about it [17:47] push: archive should have old copies of robots.txt ? [19:25] anyone here familiar with archiving yahoo groups? [19:25] I found this tool: http://grabyahoogroup.sourceforge.net [20:12] it's giving me error 500s though