Time |
Nickname |
Message |
12:49
🔗
|
ersi |
bleh, totally need a easy way to add a new wiki |
12:49
🔗
|
ersi |
editing the wiki sucks.. ironically :) |
12:49
🔗
|
ersi |
I'll put this here and hope I'll remember it for later http://www.tuhs.org/wiki/The_Unix_Heritage_Society |
13:44
🔗
|
Nemo_bis |
ersi, add a new wiki where? |
13:46
🔗
|
ersi |
to some kind of list |
13:46
🔗
|
ersi |
or make it known to 'the team' |
14:26
🔗
|
Nemo_bis |
tsk -NC |
14:26
🔗
|
Nemo_bis |
meh from within function "SiteStatsUpdate::cacheUpdate". MySQL returned error "1054: Unknown column 'ss_active_users' in 'field list' (localhost)". |
14:27
🔗
|
Nemo_bis |
36 pages |
14:33
🔗
|
Nemo_bis |
ersi, downloading |
14:37
🔗
|
ersi |
neat |
14:40
🔗
|
Nemo_bis |
ersi, done |
14:41
🔗
|
Nemo_bis |
now let's wait for underscor to produce the script for archive.org upload and it will get into the bunch |
15:05
🔗
|
emijrp |
lol, i didnt expected these file sizes http://airto.hosted.ats.ucla.edu/wiki/index.php?title=Special:ListFiles&sort=img_size&limit=50&desc=1 |
15:11
🔗
|
Nemo_bis |
I have a wiki with 6 GB of images |
15:11
🔗
|
Nemo_bis |
467 wikis downloaded btw |
15:12
🔗
|
emijrp |
you want a place in the hall of hardcore wiki archvists, uh? |
15:13
🔗
|
Nemo_bis |
nah |
15:13
🔗
|
Nemo_bis |
I only want to steal my ISP all the bandwidth I can. |
15:13
🔗
|
Nemo_bis |
Upload bandwidth is easy with p2p, downloading constantly quite hard. |
15:14
🔗
|
emijrp |
if you dont pay your bill, you are dividing by zero, so you get the optimus stolen bandwidth |
15:14
🔗
|
emijrp |
INFINITE. |
15:15
🔗
|
Nemo_bis |
heh |
15:15
🔗
|
Nemo_bis |
that's my uny's bandwidth |
15:15
🔗
|
Nemo_bis |
*uni |
15:15
🔗
|
Nemo_bis |
btw it's a bit silly to 7z 6 GB of images |
15:16
🔗
|
Nemo_bis |
or even worse a collection of thousands of PDF (of dubious copyright status I'd say) |
15:18
🔗
|
emijrp |
Did you hear that Internet Archive crawls the entire web? |
15:19
🔗
|
Nemo_bis |
emijrp, yeah, in fact I was saying that those should be made better available, maybe in a directory, and then happily derived too |
15:19
🔗
|
Nemo_bis |
emijrp, I can safely assume that a 32 B 7z has something wrong, delete it and rerun the dump? |
15:19
🔗
|
Nemo_bis |
Can't I. |
15:20
🔗
|
emijrp |
just remove the 7z |
15:21
🔗
|
Nemo_bis |
yep |
15:21
🔗
|
Nemo_bis |
emijrp, is 1.2 KiB reasonable? let's check |
15:21
🔗
|
emijrp |
maybe a wiki with a wrong api |
15:21
🔗
|
emijrp |
or empty |
15:23
🔗
|
Nemo_bis |
70 pages and no xml |
15:23
🔗
|
Nemo_bis |
34 lines of 2012-04-10 09:04:48: Error while retrieving the full history of "Main_Page". Trying to save only the last revision for this page |
15:25
🔗
|
emijrp |
special:export issues |
15:25
🔗
|
emijrp |
shit happens |
15:27
🔗
|
Nemo_bis |
emijrp, what should we do then? |
15:28
🔗
|
emijrp |
obviously, a tiny eprcent of wikis will fail |
15:28
🔗
|
emijrp |
dont care |
15:28
🔗
|
Nemo_bis |
Perhaps we need a script to check that there's something within the 7z I upload. Or just upload everything, even a list of titles is useful. |
16:44
🔗
|
ersi |
Why the frack are people 7zipping a bunch of images? |
18:08
🔗
|
Nemo_bis |
ersi, because we're 7z everything together. Very good for xml, less useful for images. |
18:09
🔗
|
ersi |
Yeah.. but.. you know.. why |
18:10
🔗
|
Nemo_bis |
ersi, because 7z'ing the xml etc. and then tar'ing the 7z archive with the image directory is more code to write? |
18:10
🔗
|
Nemo_bis |
Dunno, ask emijrp. :D |
18:44
🔗
|
emijrp |
from a fast count, about 10 wikis die every day around the web |
18:45
🔗
|
emijrp |
13,000 died from 2009, (Andrew Pavlo list) |
18:58
🔗
|
Nemo_bis |
well, maybe they were moved and we don't know where |
18:58
🔗
|
Nemo_bis |
we should rerun the crawler to know |
19:07
🔗
|
Nemo_bis |
hm, some python process consuming 2+ GiB of memory |
20:41
🔗
|
ersi |
'the crawler'? Which one? |
20:46
🔗
|
emijrp |
http://www.cs.brown.edu/~pavlo/mediawiki/ |
20:46
🔗
|
ersi |
ah, alright |
20:47
🔗
|
emijrp |
whgat do you think about archiving all the pages that Wikipedias delete? |
20:47
🔗
|
emijrp |
It exists DeletionPedia for en: but it is inactive, and two German DeletionPedias that looks active. |
20:48
🔗
|
ersi |
it's a really good idea |
20:49
🔗
|
underscor |
how would you go about that? |
20:49
🔗
|
underscor |
grab anything that has a rfd? |
20:49
🔗
|
emijrp |
yes.. |
20:50
🔗
|
emijrp |
There are speedy deletions, just crap, so it is deleted quickly. |
20:50
🔗
|
emijrp |
And deletion discussions for low notable topics. These are deleted after a week or so. |
20:51
🔗
|
emijrp |
just crap = test edits, ISFDOSJIFIOSDJOFJAPSF , spam, links, etc |
20:51
🔗
|
underscor |
yeah |
20:51
🔗
|
underscor |
but non-notable things I'd like to preserve |
20:51
🔗
|
Nemo_bis |
steal a staff account password and screenscrape the deletion archive on all wikis |
20:51
🔗
|
underscor |
hahahaha |
20:51
🔗
|
underscor |
ransom ariel |
20:51
🔗
|
Nemo_bis |
or ask kindly to someone with shell access |
20:51
🔗
|
underscor |
you know, the biggest thing to preserve, imo, are deleted media |
20:52
🔗
|
underscor |
but I guess most of that is deleted for a reason? |
20:52
🔗
|
emijrp |
copyvio |
20:52
🔗
|
underscor |
yeah |
20:52
🔗
|
emijrp |
and out of educational scope |
20:52
🔗
|
underscor |
or pron |
20:53
🔗
|
emijrp |
the only problem are pages that attack people |
20:53
🔗
|
emijrp |
if we arhchive all, we are going to archive that dangerous pages |
20:53
🔗
|
ersi |
well, I'd rather have us 'dark' out such items |
20:53
🔗
|
ersi |
and still keep it |
20:53
🔗
|
emijrp |
it is the worst problem DeletionPedias face |
20:54
🔗
|
Nemo_bis |
well, the worst are supposedly oversighted |
20:54
🔗
|
Nemo_bis |
so if you browse the standard deletion archive you won't find them |
20:54
🔗
|
emijrp |
http://wikiindex.org/Deletionpedia and See also |
20:55
🔗
|
emijrp |
i have a copy of deletionpedia |
20:55
🔗
|
emijrp |
downloading pluspedia |
20:57
🔗
|
emijrp |
marjorie-wiki fails |
20:58
🔗
|
Nemo_bis |
emijrp, I've already downloaded deletionpedia, you know |
20:58
🔗
|
Nemo_bis |
It's horribly slow IIRC |
20:58
🔗
|
emijrp |
cool |
20:58
🔗
|
Nemo_bis |
of course redownloading doesn't har, |
20:58
🔗
|
Nemo_bis |
*harm |
20:58
🔗
|
emijrp |
yes, slow and buggy, i had to repair my dump to exclude some broken <page> tags |
20:59
🔗
|
Nemo_bis |
yeah |
20:59
🔗
|
Nemo_bis |
but where is my dump |
20:59
🔗
|
emijrp |
in a CD in a box under my bed |
20:59
🔗
|
emijrp |
lul |
20:59
🔗
|
Nemo_bis |
that sounds plausible |
21:01
🔗
|
Nemo_bis |
http://archive.org/details/wiki-deletionpedia.dbatley.com |
21:01
🔗
|
Nemo_bis |
took 3 months |
21:02
🔗
|
emijrp |
mine was faster |
21:05
🔗
|
emijrp |
but histories contain impossible dates, i remember 2012 dates in 2011 |
21:05
🔗
|
emijrp |
i dont know why |
21:06
🔗
|
Nemo_bis |
hm |
21:06
🔗
|
Nemo_bis |
Probably a broken importpload script? |
21:07
🔗
|
Nemo_bis |
I suppose they alter history a bit and something goes wrong sometimes. |
21:07
🔗
|
Nemo_bis |
emijrp, wanna upload your version to that item? |
21:08
🔗
|
emijrp |
it is the same probably, DeletionPedia does not upload new pages since 2010 or so |
21:10
🔗
|
Nemo_bis |
even better than, or dumps will be broken in different ways :) |
21:10
🔗
|
Nemo_bis |
but broken versions of the same thing |
21:15
🔗
|
emijrp |
with the sidebar trick, the documentation is now much better http://code.google.com/p/wikiteam/wiki/AvailableBackups |
21:16
🔗
|
Nemo_bis |
Yes. |
21:16
🔗
|
Nemo_bis |
I love how Google Code managed to make wiki syntax even more complex. :p |
21:27
🔗
|
emijrp |
seeya |