Time |
Nickname |
Message |
09:33
🔗
|
emijrp |
im going to create profiles for a fuck ton of wikis, in wikiindex |
11:19
🔗
|
emijrp |
http://wikiindex.org/index.php?title=Special:NewPages&limit=100 |
11:21
🔗
|
emijrp |
http://awa.wikkii.com/wiki/Anti-Wikia_Alliance ha |
13:00
🔗
|
Nemo_bis |
hmpf |
15:22
🔗
|
sethish |
emijrp: I've been meaning to poke you about wikis |
15:23
🔗
|
emijrp |
tell me |
15:23
🔗
|
sethish |
emijrp: I have some code that uses pywikipediabot to archive a wiki via the mediawiki api. |
15:23
🔗
|
sethish |
In my experience, nearly every wiki I have checked has left their api on. |
15:24
🔗
|
sethish |
And this way it lets me get the wikimarkup text and/or mediawiki's rendered html. |
15:25
🔗
|
emijrp |
dumpgenerator.py uses api to get the page titles, and special:export to export all |
15:26
🔗
|
sethish |
What format does it get them in? |
15:26
🔗
|
emijrp |
xml |
15:26
🔗
|
sethish |
I remember looking at the wikiteam repo, but it has been a few weeks (months?) |
15:26
🔗
|
sethish |
Hrrm. |
15:27
🔗
|
sethish |
Where are wikis being archived/saved? |
15:27
🔗
|
sethish |
Is there a single location? |
15:27
🔗
|
emijrp |
no |
15:27
🔗
|
emijrp |
internet archive, google code, and private servers |
15:29
🔗
|
emijrp |
dumpgenerator.py works well in most mediawikis (some old versions or servers are buggy, but it is 1%) |
15:29
🔗
|
emijrp |
now, we want to generate a list with all internet mediawikis, and ragedownload all |
15:30
🔗
|
sethish |
mmm. That would be ideal, but many wikis are not under any kind of open license. |
15:30
🔗
|
sethish |
Does the xml dump include history? |
15:30
🔗
|
emijrp |
sure, the whole history |
15:30
🔗
|
sethish |
Nice |
15:31
🔗
|
sethish |
Forgive me if I wax philosophic for a minute, I have been thinking about archiving as a commitment lately. |
15:32
🔗
|
sethish |
That if I scrape something and store it elsewhere, it is my responsibility to maintain that archive. |
15:33
🔗
|
emijrp |
we try to upload all to internet archive |
15:34
🔗
|
emijrp |
that guys are good at archiving stuff |
15:34
🔗
|
sethish |
Sure, but there are other responsibilities involved, archive.org will keep things around for sure. |
15:34
🔗
|
sethish |
But going back every N months and getting a fresh dump. Or keeping an eye out for fork of a given wiki and getting an archive of them. |
15:35
🔗
|
sethish |
I happened to have a copy of the encyclopedia dramatica wiki when the O' Internet thing happened |
15:36
🔗
|
sethish |
and I felt that it was my responsibility to publicize the existance of the archive to folks who wanted to fork |
20:03
🔗
|
chronomex |
sethish: <3 |
20:13
🔗
|
sethish |
? |
20:18
🔗
|
chronomex |
08:35:15 < sethish> I happened to have a copy of the encyclopedia dramatica wiki when the O' Internet thing happened |
20:22
🔗
|
sethish |
Oh, yeah. I am not cited on wikipedia tho :-[ and I can show a direct correlation to my dropping the dox and them having content |
20:22
🔗
|
sethish |
but oh well. |
20:26
🔗
|
emijrp |
you did it for teh lulz |
20:31
🔗
|
emijrp |
2011-04-05: First commit. |
20:32
🔗
|
emijrp |
http://code.google.com/p/wikiteam/wiki/News |
20:33
🔗
|
emijrp |
Happy Birthday WikiTeam. |
20:40
🔗
|
chronomex |
funny, I just got a nastygram from the bing maps api robot saying that I've exceeded some sort of unknown account limit |
20:40
🔗
|
chronomex |
which appears to be the "thanks for using our api for the last year, now fuck off!" |
23:57
🔗
|
Nemo_bis |
chronomex, probably that's because everyone's using OSM anyway nowadays |
23:57
🔗
|
chronomex |
probably |
23:57
🔗
|
chronomex |
I'll just have to transition over, I guess ... |