#wikiteam 2013-05-23,Thu

↑back Search

Time Nickname Message
06:15 🔗 Nemo_bis FYI, I'm getting about 1100 new dumps per day on dumps.wikia.net
06:16 🔗 Nemo_bis If it goes on like this, in 32 weeks we're done O_o
06:26 🔗 cmx awesome!
06:26 🔗 cmx done with wikia, that is :P
06:27 🔗 omf_ and I am going to download all those when they get up on IA ;)
06:44 🔗 Nemo_bis omf_: why don't you instead help me by downloading them all and uploading to IA? :)
06:44 🔗 Nemo_bis like http://archive.org/details/wikia_dump_20121204
06:44 🔗 Nemo_bis it's just recursive wget and some zipping :)
06:46 🔗 omf_ I can help a little this month and more next month. See I already blew my budget for the month on screenshots of posterous and only got so far
06:46 🔗 omf_ I am trying to keep costs down since I used to rack them up pretty high
06:47 🔗 omf_ My home internet is shit
06:48 🔗 omf_ I tried the higher speeds before but the quality of service was shittier so I downgraded.
06:48 🔗 Nemo_bis Ah, not best person to ask then. :)
06:48 🔗 omf_ Posterous was different
06:49 🔗 omf_ that required a lot of CPU power so it cost more
06:49 🔗 Nemo_bis I can do it, I just have to format this stupid external sata hdd
06:49 🔗 omf_ how big a chunk are we talking about?
06:49 🔗 omf_ size wise
06:49 🔗 Nemo_bis no chunks
06:49 🔗 Nemo_bis just a wget of some 3-400 GB
06:50 🔗 Nemo_bis I can do it myself, no worries
06:50 🔗 Nemo_bis aww underscor is not even in this channel any longer, traitor :|
10:28 🔗 Nemo_bis as soon as fix this I'll archive the Wikia dumps http://p.defau.lt/?eeBaa4Jb1zuYi0JR0DPZYg
16:19 🔗 underscor Nemo_bis: I'm not?
16:19 🔗 underscor SketchCow: ops spread
16:20 🔗 * Nemo_bis faceplams
16:20 🔗 Nemo_bis (twice for typos)
16:20 🔗 Nemo_bis underscor: so can you do a wget -r of dumps.wikia.net and upload the zips to archive.org? :D
16:22 🔗 underscor I suppose ;p
16:22 🔗 underscor Do these include the media?
16:46 🔗 Nemo_bis underscor: they should but it's not clear to me at what conditions they are created
16:46 🔗 Nemo_bis I guess most wikis just don't have any image
16:47 🔗 Nemo_bis Last times I just made a .zip for each letter https://archive.org/details/wikia_dump_20121204
16:51 🔗 omf_ I too have noticed many of the wikia community sites are seriously lacking images, I always put it down to not having enough volunteers
16:52 🔗 omf_ just getting a good skeleton of data into place takes a lot of effort
16:57 🔗 Nemo_bis well, or they just use Wikimedia Commons
17:00 🔗 omf_ wikimedia commons has pretty harsh standards for inclusion, that doesn't work on content sites like Star Trek where there is no CC content unless it was fan generated at a conference
17:00 🔗 omf_ most wikia sites that have images do not own the images at least from what I have observed of the Scifi wikia sites
17:48 🔗 SketchCow WHAT DID YOU ALL DO
17:51 🔗 cmx I did nothing wrong!
17:56 🔗 sethish :-|
18:52 🔗 underscor I blame poor training by our leader
19:11 🔗 sethish underscor, have you backed up any semantic mediawiki wikis? Is there anything special one needs to do?
19:12 🔗 underscor semantic?
19:12 🔗 underscor I'm not familiar :c
19:15 🔗 Nemo_bis sethish: afaik, no
19:15 🔗 Nemo_bis everything comes from the wiki pages and can be rebuilt from there
19:16 🔗 Nemo_bis the question is rather what it takes to *import* a semantic wiki
19:53 🔗 sethish mmm, how are y'all generating dumps these days? Scrape script, or do you have a mediawiki-api wrapper to dump wikimarkup?
19:53 🔗 sethish Both are important, scrape always works
19:54 🔗 sethish underscor, I'm also isforinsects, we've spoken before. I did the backup of Encyclopedia Dramatica a few years ago that ended up getting used to rebuild the site
19:58 🔗 Nemo_bis we use dumpgenerator.py
19:58 🔗 Nemo_bis we'd like to use the API more
20:00 🔗 sethish Have you ever done any mass upload scripts? I have a big collection of images from the CDC that I need to get over to wiki commons and I would love help
20:02 🔗 cmx cult of dead cow or center for disease control?
20:02 🔗 cmx :P
20:08 🔗 sethish Center for Disease Control (and Prevention)
20:08 🔗 sethish It's the Public Health Image Library
20:08 🔗 sethish I scraped it ages ago
20:08 🔗 sethish With good metadata
20:08 🔗 cmx nice
20:13 🔗 omf_ sethish, I have mass upload scripts for the Internet Archive using their s3 api if you want to upload it there too :)
20:27 🔗 Nemo_bis sethish: for Commons we have several tools
20:27 🔗 Nemo_bis there's the classic uploader.py
20:28 🔗 Nemo_bis https://outreach.wikimedia.org/wiki/GLAM/Resources/Tools
20:29 🔗 Nemo_bis https://commons.wikimedia.org/wiki/Commons:Batch_uploading#Tools
20:30 🔗 Nemo_bis then I think ingester.py or so on PWB
20:31 🔗 cmx ingester.py makes me think of a chipper-shredder
20:35 🔗 omf_ Didn't we need a Encyclopedia Dramatica grab?
20:36 🔗 Nemo_bis did we?
20:36 🔗 Nemo_bis I made one some time ago iirc
20:57 🔗 omf_ I just remember being asked about it more than once
20:57 🔗 omf_ how old is your backup Nemo_bis
21:00 🔗 Nemo_bis just search? :)
21:02 🔗 Nemo_bis https://archive.org/details/wiki-encyclopediadramatica.ch
21:03 🔗 Nemo_bis Publicdate: 2012-02-29 08:38:47
21:03 🔗 Nemo_bis should be this
21:10 🔗 sethish My ED dump is from 2011
21:11 🔗 sethish ED.ch posts links to several more recent dumps
21:29 🔗 Nemo_bis upload them then
21:32 🔗 omf_ Is this their new official home? http://dramatica.in/
21:37 🔗 Nemo_bis no idea
21:45 🔗 omf_ That url before was just a landing page for https://encyclopediadramatica.se/Main_Page
21:50 🔗 sethish I think encyclopediadramatica.ch is the canonical url
22:01 🔗 omf_ OpenDNS says that encyclopediadramatica.ch does not exist
22:01 🔗 omf_ hence why I am poking around at it
22:08 🔗 omf_ ED is starting to move TLDs as often as TPB
22:37 🔗 sethish Oh, I had it backwards
22:38 🔗 sethish it _used_ to be .ch, and moved to .se
22:38 🔗 sethish sorry

irclogger-viewer