[12:49] bleh, totally need a easy way to add a new wiki [12:49] editing the wiki sucks.. ironically :) [12:49] I'll put this here and hope I'll remember it for later http://www.tuhs.org/wiki/The_Unix_Heritage_Society [13:44] ersi, add a new wiki where? [13:46] to some kind of list [13:46] or make it known to 'the team' [14:26] tsk -NC [14:26] meh from within function "SiteStatsUpdate::cacheUpdate". MySQL returned error "1054: Unknown column 'ss_active_users' in 'field list' (localhost)". [14:27] 36 pages [14:33] ersi, downloading [14:37] neat [14:40] ersi, done [14:41] now let's wait for underscor to produce the script for archive.org upload and it will get into the bunch [15:05] lol, i didnt expected these file sizes http://airto.hosted.ats.ucla.edu/wiki/index.php?title=Special:ListFiles&sort=img_size&limit=50&desc=1 [15:11] I have a wiki with 6 GB of images [15:11] 467 wikis downloaded btw [15:12] you want a place in the hall of hardcore wiki archvists, uh? [15:13] nah [15:13] I only want to steal my ISP all the bandwidth I can. [15:13] Upload bandwidth is easy with p2p, downloading constantly quite hard. [15:14] if you dont pay your bill, you are dividing by zero, so you get the optimus stolen bandwidth [15:14] INFINITE. [15:15] heh [15:15] that's my uny's bandwidth [15:15] *uni [15:15] btw it's a bit silly to 7z 6 GB of images [15:16] or even worse a collection of thousands of PDF (of dubious copyright status I'd say) [15:18] Did you hear that Internet Archive crawls the entire web? [15:19] emijrp, yeah, in fact I was saying that those should be made better available, maybe in a directory, and then happily derived too [15:19] emijrp, I can safely assume that a 32 B 7z has something wrong, delete it and rerun the dump? [15:19] Can't I. [15:20] just remove the 7z [15:21] yep [15:21] emijrp, is 1.2 KiB reasonable? let's check [15:21] maybe a wiki with a wrong api [15:21] or empty [15:23] 70 pages and no xml [15:23] 34 lines of 2012-04-10 09:04:48: Error while retrieving the full history of "Main_Page". Trying to save only the last revision for this page [15:25] special:export issues [15:25] shit happens [15:27] emijrp, what should we do then? [15:28] obviously, a tiny eprcent of wikis will fail [15:28] dont care [15:28] Perhaps we need a script to check that there's something within the 7z I upload. Or just upload everything, even a list of titles is useful. [16:44] Why the frack are people 7zipping a bunch of images? [18:08] ersi, because we're 7z everything together. Very good for xml, less useful for images. [18:09] Yeah.. but.. you know.. why [18:10] ersi, because 7z'ing the xml etc. and then tar'ing the 7z archive with the image directory is more code to write? [18:10] Dunno, ask emijrp. :D [18:44] from a fast count, about 10 wikis die every day around the web [18:45] 13,000 died from 2009, (Andrew Pavlo list) [18:58] well, maybe they were moved and we don't know where [18:58] we should rerun the crawler to know [19:07] hm, some python process consuming 2+ GiB of memory [20:41] 'the crawler'? Which one? [20:46] http://www.cs.brown.edu/~pavlo/mediawiki/ [20:46] ah, alright [20:47] whgat do you think about archiving all the pages that Wikipedias delete? [20:47] It exists DeletionPedia for en: but it is inactive, and two German DeletionPedias that looks active. [20:48] it's a really good idea [20:49] how would you go about that? [20:49] grab anything that has a rfd? [20:49] yes.. [20:50] There are speedy deletions, just crap, so it is deleted quickly. [20:50] And deletion discussions for low notable topics. These are deleted after a week or so. [20:51] just crap = test edits, ISFDOSJIFIOSDJOFJAPSF , spam, links, etc [20:51] yeah [20:51] but non-notable things I'd like to preserve [20:51] steal a staff account password and screenscrape the deletion archive on all wikis [20:51] hahahaha [20:51] ransom ariel [20:51] or ask kindly to someone with shell access [20:51] you know, the biggest thing to preserve, imo, are deleted media [20:52] but I guess most of that is deleted for a reason? [20:52] copyvio [20:52] yeah [20:52] and out of educational scope [20:52] or pron [20:53] the only problem are pages that attack people [20:53] if we arhchive all, we are going to archive that dangerous pages [20:53] well, I'd rather have us 'dark' out such items [20:53] and still keep it [20:53] it is the worst problem DeletionPedias face [20:54] well, the worst are supposedly oversighted [20:54] so if you browse the standard deletion archive you won't find them [20:54] http://wikiindex.org/Deletionpedia and See also [20:55] i have a copy of deletionpedia [20:55] downloading pluspedia [20:57] marjorie-wiki fails [20:58] emijrp, I've already downloaded deletionpedia, you know [20:58] It's horribly slow IIRC [20:58] cool [20:58] of course redownloading doesn't har, [20:58] *harm [20:58] yes, slow and buggy, i had to repair my dump to exclude some broken tags [20:59] yeah [20:59] but where is my dump [20:59] in a CD in a box under my bed [20:59] lul [20:59] that sounds plausible [21:01] http://archive.org/details/wiki-deletionpedia.dbatley.com [21:01] took 3 months [21:02] mine was faster [21:05] but histories contain impossible dates, i remember 2012 dates in 2011 [21:05] i dont know why [21:06] hm [21:06] Probably a broken importpload script? [21:07] I suppose they alter history a bit and something goes wrong sometimes. [21:07] emijrp, wanna upload your version to that item? [21:08] it is the same probably, DeletionPedia does not upload new pages since 2010 or so [21:10] even better than, or dumps will be broken in different ways :) [21:10] but broken versions of the same thing [21:15] with the sidebar trick, the documentation is now much better http://code.google.com/p/wikiteam/wiki/AvailableBackups [21:16] Yes. [21:16] I love how Google Code managed to make wiki syntax even more complex. :p [21:27] seeya