Time |
Nickname |
Message |
06:15
🔗
|
Nemo_bis |
FYI, I'm getting about 1100 new dumps per day on dumps.wikia.net |
06:16
🔗
|
Nemo_bis |
If it goes on like this, in 32 weeks we're done O_o |
06:26
🔗
|
cmx |
awesome! |
06:26
🔗
|
cmx |
done with wikia, that is :P |
06:27
🔗
|
omf_ |
and I am going to download all those when they get up on IA ;) |
06:44
🔗
|
Nemo_bis |
omf_: why don't you instead help me by downloading them all and uploading to IA? :) |
06:44
🔗
|
Nemo_bis |
like http://archive.org/details/wikia_dump_20121204 |
06:44
🔗
|
Nemo_bis |
it's just recursive wget and some zipping :) |
06:46
🔗
|
omf_ |
I can help a little this month and more next month. See I already blew my budget for the month on screenshots of posterous and only got so far |
06:46
🔗
|
omf_ |
I am trying to keep costs down since I used to rack them up pretty high |
06:47
🔗
|
omf_ |
My home internet is shit |
06:48
🔗
|
omf_ |
I tried the higher speeds before but the quality of service was shittier so I downgraded. |
06:48
🔗
|
Nemo_bis |
Ah, not best person to ask then. :) |
06:48
🔗
|
omf_ |
Posterous was different |
06:49
🔗
|
omf_ |
that required a lot of CPU power so it cost more |
06:49
🔗
|
Nemo_bis |
I can do it, I just have to format this stupid external sata hdd |
06:49
🔗
|
omf_ |
how big a chunk are we talking about? |
06:49
🔗
|
omf_ |
size wise |
06:49
🔗
|
Nemo_bis |
no chunks |
06:49
🔗
|
Nemo_bis |
just a wget of some 3-400 GB |
06:50
🔗
|
Nemo_bis |
I can do it myself, no worries |
06:50
🔗
|
Nemo_bis |
aww underscor is not even in this channel any longer, traitor :| |
10:28
🔗
|
Nemo_bis |
as soon as fix this I'll archive the Wikia dumps http://p.defau.lt/?eeBaa4Jb1zuYi0JR0DPZYg |
16:19
🔗
|
underscor |
Nemo_bis: I'm not? |
16:19
🔗
|
underscor |
SketchCow: ops spread |
16:20
🔗
|
* |
Nemo_bis faceplams |
16:20
🔗
|
Nemo_bis |
(twice for typos) |
16:20
🔗
|
Nemo_bis |
underscor: so can you do a wget -r of dumps.wikia.net and upload the zips to archive.org? :D |
16:22
🔗
|
underscor |
I suppose ;p |
16:22
🔗
|
underscor |
Do these include the media? |
16:46
🔗
|
Nemo_bis |
underscor: they should but it's not clear to me at what conditions they are created |
16:46
🔗
|
Nemo_bis |
I guess most wikis just don't have any image |
16:47
🔗
|
Nemo_bis |
Last times I just made a .zip for each letter https://archive.org/details/wikia_dump_20121204 |
16:51
🔗
|
omf_ |
I too have noticed many of the wikia community sites are seriously lacking images, I always put it down to not having enough volunteers |
16:52
🔗
|
omf_ |
just getting a good skeleton of data into place takes a lot of effort |
16:57
🔗
|
Nemo_bis |
well, or they just use Wikimedia Commons |
17:00
🔗
|
omf_ |
wikimedia commons has pretty harsh standards for inclusion, that doesn't work on content sites like Star Trek where there is no CC content unless it was fan generated at a conference |
17:00
🔗
|
omf_ |
most wikia sites that have images do not own the images at least from what I have observed of the Scifi wikia sites |
17:48
🔗
|
SketchCow |
WHAT DID YOU ALL DO |
17:51
🔗
|
cmx |
I did nothing wrong! |
17:56
🔗
|
sethish |
:-| |
18:52
🔗
|
underscor |
I blame poor training by our leader |
19:11
🔗
|
sethish |
underscor, have you backed up any semantic mediawiki wikis? Is there anything special one needs to do? |
19:12
🔗
|
underscor |
semantic? |
19:12
🔗
|
underscor |
I'm not familiar :c |
19:15
🔗
|
Nemo_bis |
sethish: afaik, no |
19:15
🔗
|
Nemo_bis |
everything comes from the wiki pages and can be rebuilt from there |
19:16
🔗
|
Nemo_bis |
the question is rather what it takes to *import* a semantic wiki |
19:53
🔗
|
sethish |
mmm, how are y'all generating dumps these days? Scrape script, or do you have a mediawiki-api wrapper to dump wikimarkup? |
19:53
🔗
|
sethish |
Both are important, scrape always works |
19:54
🔗
|
sethish |
underscor, I'm also isforinsects, we've spoken before. I did the backup of Encyclopedia Dramatica a few years ago that ended up getting used to rebuild the site |
19:58
🔗
|
Nemo_bis |
we use dumpgenerator.py |
19:58
🔗
|
Nemo_bis |
we'd like to use the API more |
20:00
🔗
|
sethish |
Have you ever done any mass upload scripts? I have a big collection of images from the CDC that I need to get over to wiki commons and I would love help |
20:02
🔗
|
cmx |
cult of dead cow or center for disease control? |
20:02
🔗
|
cmx |
:P |
20:08
🔗
|
sethish |
Center for Disease Control (and Prevention) |
20:08
🔗
|
sethish |
It's the Public Health Image Library |
20:08
🔗
|
sethish |
I scraped it ages ago |
20:08
🔗
|
sethish |
With good metadata |
20:08
🔗
|
cmx |
nice |
20:13
🔗
|
omf_ |
sethish, I have mass upload scripts for the Internet Archive using their s3 api if you want to upload it there too :) |
20:27
🔗
|
Nemo_bis |
sethish: for Commons we have several tools |
20:27
🔗
|
Nemo_bis |
there's the classic uploader.py |
20:28
🔗
|
Nemo_bis |
https://outreach.wikimedia.org/wiki/GLAM/Resources/Tools |
20:29
🔗
|
Nemo_bis |
https://commons.wikimedia.org/wiki/Commons:Batch_uploading#Tools |
20:30
🔗
|
Nemo_bis |
then I think ingester.py or so on PWB |
20:31
🔗
|
cmx |
ingester.py makes me think of a chipper-shredder |
20:35
🔗
|
omf_ |
Didn't we need a Encyclopedia Dramatica grab? |
20:36
🔗
|
Nemo_bis |
did we? |
20:36
🔗
|
Nemo_bis |
I made one some time ago iirc |
20:57
🔗
|
omf_ |
I just remember being asked about it more than once |
20:57
🔗
|
omf_ |
how old is your backup Nemo_bis |
21:00
🔗
|
Nemo_bis |
just search? :) |
21:02
🔗
|
Nemo_bis |
https://archive.org/details/wiki-encyclopediadramatica.ch |
21:03
🔗
|
Nemo_bis |
Publicdate: 2012-02-29 08:38:47 |
21:03
🔗
|
Nemo_bis |
should be this |
21:10
🔗
|
sethish |
My ED dump is from 2011 |
21:11
🔗
|
sethish |
ED.ch posts links to several more recent dumps |
21:29
🔗
|
Nemo_bis |
upload them then |
21:32
🔗
|
omf_ |
Is this their new official home? http://dramatica.in/ |
21:37
🔗
|
Nemo_bis |
no idea |
21:45
🔗
|
omf_ |
That url before was just a landing page for https://encyclopediadramatica.se/Main_Page |
21:50
🔗
|
sethish |
I think encyclopediadramatica.ch is the canonical url |
22:01
🔗
|
omf_ |
OpenDNS says that encyclopediadramatica.ch does not exist |
22:01
🔗
|
omf_ |
hence why I am poking around at it |
22:08
🔗
|
omf_ |
ED is starting to move TLDs as often as TPB |
22:37
🔗
|
sethish |
Oh, I had it backwards |
22:38
🔗
|
sethish |
it _used_ to be .ch, and moved to .se |
22:38
🔗
|
sethish |
sorry |