[09:33] im going to create profiles for a fuck ton of wikis, in wikiindex [11:19] http://wikiindex.org/index.php?title=Special:NewPages&limit=100 [11:21] http://awa.wikkii.com/wiki/Anti-Wikia_Alliance ha [13:00] hmpf [15:22] emijrp: I've been meaning to poke you about wikis [15:23] tell me [15:23] emijrp: I have some code that uses pywikipediabot to archive a wiki via the mediawiki api. [15:23] In my experience, nearly every wiki I have checked has left their api on. [15:24] And this way it lets me get the wikimarkup text and/or mediawiki's rendered html. [15:25] dumpgenerator.py uses api to get the page titles, and special:export to export all [15:26] What format does it get them in? [15:26] xml [15:26] I remember looking at the wikiteam repo, but it has been a few weeks (months?) [15:26] Hrrm. [15:27] Where are wikis being archived/saved? [15:27] Is there a single location? [15:27] no [15:27] internet archive, google code, and private servers [15:29] dumpgenerator.py works well in most mediawikis (some old versions or servers are buggy, but it is 1%) [15:29] now, we want to generate a list with all internet mediawikis, and ragedownload all [15:30] mmm. That would be ideal, but many wikis are not under any kind of open license. [15:30] Does the xml dump include history? [15:30] sure, the whole history [15:30] Nice [15:31] Forgive me if I wax philosophic for a minute, I have been thinking about archiving as a commitment lately. [15:32] That if I scrape something and store it elsewhere, it is my responsibility to maintain that archive. [15:33] we try to upload all to internet archive [15:34] that guys are good at archiving stuff [15:34] Sure, but there are other responsibilities involved, archive.org will keep things around for sure. [15:34] But going back every N months and getting a fresh dump. Or keeping an eye out for fork of a given wiki and getting an archive of them. [15:35] I happened to have a copy of the encyclopedia dramatica wiki when the O' Internet thing happened [15:36] and I felt that it was my responsibility to publicize the existance of the archive to folks who wanted to fork [20:03] sethish: <3 [20:13] ? [20:18] 08:35:15 < sethish> I happened to have a copy of the encyclopedia dramatica wiki when the O' Internet thing happened [20:22] Oh, yeah. I am not cited on wikipedia tho :-[ and I can show a direct correlation to my dropping the dox and them having content [20:22] but oh well. [20:26] you did it for teh lulz [20:31] 2011-04-05: First commit. [20:32] http://code.google.com/p/wikiteam/wiki/News [20:33] Happy Birthday WikiTeam. [20:40] funny, I just got a nastygram from the bing maps api robot saying that I've exceeded some sort of unknown account limit [20:40] which appears to be the "thanks for using our api for the last year, now fuck off!" [23:57] chronomex, probably that's because everyone's using OSM anyway nowadays [23:57] probably [23:57] I'll just have to transition over, I guess ...