| Time |
Nickname |
Message |
|
01:15
🔗
|
xmc |
\o/ |
|
11:22
🔗
|
ersi |
snails |
|
11:22
🔗
|
ersi |
snails everywhereeee |
|
12:07
🔗
|
* |
Nemo_bis hands a hammer |
|
12:07
🔗
|
Nemo_bis |
useful also if they lose an s |
|
18:15
🔗
|
pft |
if i pull wikia backups should i put them in https://archive.org/details/wikiteamhttps://archive.org/details/wikiteam ? |
|
18:33
🔗
|
Nemo_bis |
pft: you can't |
|
18:34
🔗
|
Nemo_bis |
but please add WikiTeam keyword so that an admin can later move them |
|
18:34
🔗
|
pft |
ok |
|
18:34
🔗
|
Nemo_bis |
pft: What sort of Wikia backups are you pulling? |
|
18:34
🔗
|
pft |
just their own dumps |
|
18:34
🔗
|
Nemo_bis |
yes but which |
|
18:34
🔗
|
pft |
memory alpha and wookiepedia at the moment |
|
18:34
🔗
|
pft |
i might pull more later |
|
18:34
🔗
|
Nemo_bis |
ah ok, little scale |
|
18:34
🔗
|
pft |
yes |
|
18:35
🔗
|
pft |
wookiepedia is 400m so it's pretty small |
|
18:35
🔗
|
balrog |
are their dumps complete? |
|
18:35
🔗
|
pft |
i haven't tested yet |
|
18:35
🔗
|
Nemo_bis |
I'm in contact with them for them to upload to archive.org all their dumps, but I've been told it needs to be discussed in some senior staff meeting |
|
18:35
🔗
|
pft |
i will ungzip at home and probably load into mediawikis |
|
18:35
🔗
|
Nemo_bis |
balrog: define complete? |
|
18:36
🔗
|
balrog |
have all the data that's visible |
|
18:36
🔗
|
balrog |
Nemo_bis: why is a senior staff meeting required, if I may ask? |
|
18:36
🔗
|
Nemo_bis |
how would I know :) |
|
18:36
🔗
|
Nemo_bis |
and no, not all data visible, that's impossible with XML dumos |
|
18:37
🔗
|
Nemo_bis |
but all data needed to make all the data which is visible, minus logs and private user data :) except I don't see images dumps any longer and they don't dump all wikis |
|
18:37
🔗
|
pft |
yeah i didn't see any image dumps anywhere which is frustrating |
|
18:37
🔗
|
balrog |
they don't rpovide image dumps? :/ |
|
18:37
🔗
|
balrog |
provide* |
|
18:38
🔗
|
balrog |
and not all wikis? can wiki administrators turn it off individually? |
|
18:38
🔗
|
pft |
well dumps.wikia.net appears to be gone and the woanloads available on Special:Statistics seem to be user-generated by staff and have a limited duration |
|
18:38
🔗
|
pft |
http://en.memory-alpha.org/wiki/Special:Statistics |
|
18:38
🔗
|
pft |
that page has a "current pages and history" but i don't see anything about images |
|
18:38
🔗
|
Nemo_bis |
it never did but they made them nevertheless |
|
18:39
🔗
|
Nemo_bis |
perhaps we just need to find out the filenames |
|
18:39
🔗
|
balrog |
s3://wikia_xml_dumps/w/wo/wowwiki_pages_current. xml.gz etc |
|
18:39
🔗
|
Nemo_bis |
for images |
|
18:40
🔗
|
Nemo_bis |
this is how it used to be https://ia801507.us.archive.org/zipview.php?zip=/28/items/wikia_dump_20121204/c.zip |
|
18:41
🔗
|
balrog |
hmm |
|
18:41
🔗
|
balrog |
wikia's source code is open |
|
18:41
🔗
|
balrog |
including the part that uploads the dumps to S3 |
|
18:41
🔗
|
pft |
interesting |
|
18:42
🔗
|
balrog |
https://github.com/Wikia/app/blob/dev/extensions/wikia/WikiFactory/Close/maintenance.php |
|
18:42
🔗
|
balrog |
look for DumpsOnDemand::putToAmazonS3 |
|
18:42
🔗
|
Nemo_bis |
well, not actually all of it |
|
18:42
🔗
|
Nemo_bis |
though they are working on open sourcing it all |
|
18:43
🔗
|
balrog |
Nemo_bis: interesting |
|
18:43
🔗
|
balrog |
https://github.com/Wikia/app/blob/dev/extensions/wikia/WikiFactory/Dumps/DumpsOnDemand.php |
|
18:43
🔗
|
balrog |
"url" => 'http://s3.amazonaws.com/wikia_xml_dumps/' . self::getPath( "{$wgDBname}_pages_current.xml.gz" ), |
|
18:43
🔗
|
balrog |
"url" => 'http://s3.amazonaws.com/wikia_xml_dumps/' . self::getPath( "{$wgDBname}_pages_full.xml.gz" ), |
|
18:43
🔗
|
balrog |
don't see anything for images |
|
18:45
🔗
|
pft |
yeah, they appear as tars in the link Nemo_bis pasted |
|
18:45
🔗
|
pft |
i'm guessing that was more of a manual thing they did |
|
18:45
🔗
|
balrog |
"Wikia does not perform dumps of images (but see m:Wikix)." |
|
18:45
🔗
|
balrog |
http://meta.wikimedia.org/wiki/Wikix |
|
18:46
🔗
|
balrog |
...interesting |
|
18:46
🔗
|
balrog |
that will extract and grab all images in an xml dump |
|
18:46
🔗
|
pft |
nice1 |
|
18:46
🔗
|
pft |
er nice! |
|
18:46
🔗
|
Nemo_bis |
pft: it was not manual |
|
18:46
🔗
|
pft |
ahh o |
|
18:46
🔗
|
pft |
er ok |
|
18:47
🔗
|
Nemo_bis |
wikix is horribly painful |
|
18:47
🔗
|
Nemo_bis |
and it's not designed to handle 300k wikis |
|
18:47
🔗
|
pft |
ahhh |
|
18:47
🔗
|
pft |
sorry, i realize this is all stuff you have been down before |
|
18:48
🔗
|
balrog |
Nemo_bis: really? :/ |
|
18:48
🔗
|
pft |
just trying to figure out how to help |
|
18:48
🔗
|
balrog |
Nemo_bis: is there a reference to what's been done? |
|
18:55
🔗
|
Nemo_bis |
balrog: about? |
|
18:55
🔗
|
balrog |
with regards to what tools have been tested and such |
|
18:56
🔗
|
Nemo_bis |
for what |
|
18:57
🔗
|
balrog |
dumping large wikis |
|
18:57
🔗
|
Nemo_bis |
most Wikia wikis are very tiny |
|
18:57
🔗
|
Nemo_bis |
there isn't much to test, we only need to see if Wikia is helpful or not |
|
18:58
🔗
|
Nemo_bis |
if it's not helpful, we'll have to run dumpgenerator on all their 350k wikis to get all the text and images |
|
18:58
🔗
|
balrog |
ouch |
|
18:58
🔗
|
Nemo_bis |
but that's not particularly painful, just a bit boring |
|
18:58
🔗
|
balrog |
how difficult would it be to submit a PR to their repo that would cause images to also be archived? |
|
18:58
🔗
|
Nemo_bis |
unless they go really rogue and disable API or so, which I don't think they'd do though |
|
18:59
🔗
|
Nemo_bis |
they allegedly have problems with space |
|
18:59
🔗
|
balrog |
how many wikis have we run into which have disabled API access? |
|
18:59
🔗
|
Nemo_bis |
this is probably what the seniors have to discuss, whether to spend 10 $ instead of 5 for the space on S3 :) |
|
18:59
🔗
|
Nemo_bis |
thousands |
|
18:59
🔗
|
balrog |
how do we dump those? :/ |
|
18:59
🔗
|
Nemo_bis |
with pre-API method |
|
18:59
🔗
|
Nemo_bis |
Special:Export |
|
19:00
🔗
|
Nemo_bis |
some disable even that, but it's been only a couple wikis so far |
|
19:00
🔗
|
pft |
i tried to grab memory-alpha but coudln't find the api page for it before i did more readinga nd found that I could download the dump |
|
19:00
🔗
|
Nemo_bis |
usually the problem with wiki sysadmins is stupidity, not malice |
|
19:04
🔗
|
xmc |
same with forums, too |
|
19:06
🔗
|
Nemo_bis |
:) |
|
19:08
🔗
|
balrog |
what's the best way to dump forums though? they're not as rough on wget at least |
|
19:11
🔗
|
pft |
we need to start contributing to open-source projects to put in easy backup things that are publicly enabled by default ;) |
|
19:12
🔗
|
Nemo_bis |
pft: you're welcome :) https://bugzilla.wikimedia.org/buglist.cgi?resolution=---&query_format=advanced&component=Export%2FImport |
|
19:13
🔗
|
pft |
nice |