[02:09] Found another mobileme user recursing on homepage.mac.com [02:11] Username is tauran; haven't looked at why; busy with the CGI thing. [02:41] Hello [02:42] Was someone here archiving revision 3 shows? [04:59] mm [05:00] a bunch of curl connection failures. i suspect either my network or the s3 endpoint hiccupped [05:35] christ, mobileme user has been uploading for a day now [05:47] chronomex: How big is it? [05:47] (And it's not something that got into a big loop or mirrored some other users?) [05:48] -rw-r--r-- 1 duncan duncan 2.3G Nov 3 11:48 data/c/cr/cra/craig.schmidt/public.me.com/public.me.com-craig.schmidt.warc.gz [05:49] Ah, I see. Sorry, thinking in terms of 100Mb connections. ^^;; [11:09] shaqfu: ah, shame. :) [11:09] runs pretty well here [11:11] how could we get the fileplanet stuff nicely to archive.org? i guess it would be 100k+ items. [11:11] so it might be a bad idea to upoad them individually [11:22] so far i have an average of <3MB per item. but then i am still <50000 and they will just get bigger and bigger [12:07] http://www.gamefront.com/breaking-ign-to-close-fileplanet/ [12:07] "We have decided to archive FilePlanet and will eventually stop operating the site" [12:07] so "archived" is a misleading term [12:08] “While the site will no longer be updated,” IGN told us, “for now users can still use the site as a repository of file content. If/when we remove all site content completely, we’ll be sure to communicate that to users before it happens.” [12:29] Schbirid, why will they get bigger and bigger? you're downloading them chronologically an recent files are bigger? [12:29] 10k files per item should be ok anyway [12:31] soo, I'm at about 1700 wikis downloaded for #wikiteam, but nobody is working on the uploading script [12:32] aka https://code.google.com/p/wikiteam/source/browse/trunk/uploader.py [12:38] yeah, i go IDs upwards and that means later files are bigger (guessing but i am 100% sure) [12:39] yep [12:40] ok, 10k parts sounds like a good idea [12:40] tarred maybe [12:41] Hey, people. [12:41] Anything I need to know about? [12:42] That I'm eager to see ISO images flooding archive.org? [12:42] tar would mean that the files would not be accessible easily [12:43] Well, if the tar archive is under 5-10 GB there's the tar viewer. [12:43] But if you have to load an item description with 10k elements the HTML will take forever. [12:43] tar viewer! i never heard of that [12:43] oh, that is true [12:43] Probably it's better to download everything and then ask SketchCow with real data [12:44] take the /download link and add a / at the end of the URL [12:44] is that indexed by crawlers? [12:45] only if you put links I guess; or maybe not even in that case because there's nofollow even for internal links? [12:45] anyway, for instance: http://archive.org/download/mobileme-hero-1335947007/mobileme-full-1335947007.tar/ [12:46] that is awesome [12:46] nice [12:47] ugh [12:47] I fear I am uploading a bunch of tv show episodes to IA now >_< [12:47] I do agree that fos is getting a little slow. [12:48] Not sure why. [12:57] I'm cleaning up uploaded mobileme sets right now. [12:57] wow [12:57] what about splinder? [12:57] WHat about it? [12:57] is fos slowed down by the splinder tidying up^ [12:57] No, no. [12:57] or did it finish [12:57] It's at a halt point, has been. [13:03] Just verified and removed 1.7tb of mobileme from the machine. [13:04] Another 2tb is being uploaded now. [18:03] e.g. by me [18:07] do you know how to change the spotlight item (on the left sidebar)? http://archive.org/details/spanishrevolution [18:46] there's either a thing in the metadata for the collection or the item [18:47] I think the collectino [20:27] hello everyone [20:38] hi [20:39] why [20:49] ...why? [20:49] WHY! [20:52] Schbirid: Is there any useful metadata coming down with the FP files? [20:52] yeah, i save the download page too [20:52] and the url the file comes from [20:52] Awesome [20:52] eg http://www.fileplanet.com/224884/download [20:53] has a full title "Gas Guzzlers: Combat Carnage Beta Client" [20:53] and their category "Home / Gaming / RPG / Massively Multiplayer / Gas Guzzlers: Combat Carnage / Game Clients" [20:54] perfect would be to save http://www.fileplanet.com/224884/220000/fileinfo/Gas-Guzzlers:-Combat-Carnage-Beta-Client too i guess. but i could not find a way to easily find those URLs so i just do the numeric id increments [20:54] their older files have informative download urls like http://download.direct2drive.com/ftp2/bgchronicles/agportraits/vance/celeb.zip [20:55] So long as the script grabs the page with basic metadata, it sohuld be good [20:55] yeah