#wikiteam 2012-04-05,Thu

↑back Search

Time Nickname Message
09:33 🔗 emijrp im going to create profiles for a fuck ton of wikis, in wikiindex
11:19 🔗 emijrp http://wikiindex.org/index.php?title=Special:NewPages&limit=100
11:21 🔗 emijrp http://awa.wikkii.com/wiki/Anti-Wikia_Alliance ha
13:00 🔗 Nemo_bis hmpf
15:22 🔗 sethish emijrp: I've been meaning to poke you about wikis
15:23 🔗 emijrp tell me
15:23 🔗 sethish emijrp: I have some code that uses pywikipediabot to archive a wiki via the mediawiki api.
15:23 🔗 sethish In my experience, nearly every wiki I have checked has left their api on.
15:24 🔗 sethish And this way it lets me get the wikimarkup text and/or mediawiki's rendered html.
15:25 🔗 emijrp dumpgenerator.py uses api to get the page titles, and special:export to export all
15:26 🔗 sethish What format does it get them in?
15:26 🔗 emijrp xml
15:26 🔗 sethish I remember looking at the wikiteam repo, but it has been a few weeks (months?)
15:26 🔗 sethish Hrrm.
15:27 🔗 sethish Where are wikis being archived/saved?
15:27 🔗 sethish Is there a single location?
15:27 🔗 emijrp no
15:27 🔗 emijrp internet archive, google code, and private servers
15:29 🔗 emijrp dumpgenerator.py works well in most mediawikis (some old versions or servers are buggy, but it is 1%)
15:29 🔗 emijrp now, we want to generate a list with all internet mediawikis, and ragedownload all
15:30 🔗 sethish mmm. That would be ideal, but many wikis are not under any kind of open license.
15:30 🔗 sethish Does the xml dump include history?
15:30 🔗 emijrp sure, the whole history
15:30 🔗 sethish Nice
15:31 🔗 sethish Forgive me if I wax philosophic for a minute, I have been thinking about archiving as a commitment lately.
15:32 🔗 sethish That if I scrape something and store it elsewhere, it is my responsibility to maintain that archive.
15:33 🔗 emijrp we try to upload all to internet archive
15:34 🔗 emijrp that guys are good at archiving stuff
15:34 🔗 sethish Sure, but there are other responsibilities involved, archive.org will keep things around for sure.
15:34 🔗 sethish But going back every N months and getting a fresh dump. Or keeping an eye out for fork of a given wiki and getting an archive of them.
15:35 🔗 sethish I happened to have a copy of the encyclopedia dramatica wiki when the O' Internet thing happened
15:36 🔗 sethish and I felt that it was my responsibility to publicize the existance of the archive to folks who wanted to fork
20:03 🔗 chronomex sethish: <3
20:13 🔗 sethish ?
20:18 🔗 chronomex 08:35:15 < sethish> I happened to have a copy of the encyclopedia dramatica wiki when the O' Internet thing happened
20:22 🔗 sethish Oh, yeah. I am not cited on wikipedia tho :-[ and I can show a direct correlation to my dropping the dox and them having content
20:22 🔗 sethish but oh well.
20:26 🔗 emijrp you did it for teh lulz
20:31 🔗 emijrp 2011-04-05: First commit.
20:32 🔗 emijrp http://code.google.com/p/wikiteam/wiki/News
20:33 🔗 emijrp Happy Birthday WikiTeam.
20:40 🔗 chronomex funny, I just got a nastygram from the bing maps api robot saying that I've exceeded some sort of unknown account limit
20:40 🔗 chronomex which appears to be the "thanks for using our api for the last year, now fuck off!"
23:57 🔗 Nemo_bis chronomex, probably that's because everyone's using OSM anyway nowadays
23:57 🔗 chronomex probably
23:57 🔗 chronomex I'll just have to transition over, I guess ...

irclogger-viewer