[06:15] FYI, I'm getting about 1100 new dumps per day on dumps.wikia.net [06:16] If it goes on like this, in 32 weeks we're done O_o [06:26] awesome! [06:26] done with wikia, that is :P [06:27] and I am going to download all those when they get up on IA ;) [06:44] omf_: why don't you instead help me by downloading them all and uploading to IA? :) [06:44] like http://archive.org/details/wikia_dump_20121204 [06:44] it's just recursive wget and some zipping :) [06:46] I can help a little this month and more next month. See I already blew my budget for the month on screenshots of posterous and only got so far [06:46] I am trying to keep costs down since I used to rack them up pretty high [06:47] My home internet is shit [06:48] I tried the higher speeds before but the quality of service was shittier so I downgraded. [06:48] Ah, not best person to ask then. :) [06:48] Posterous was different [06:49] that required a lot of CPU power so it cost more [06:49] I can do it, I just have to format this stupid external sata hdd [06:49] how big a chunk are we talking about? [06:49] size wise [06:49] no chunks [06:49] just a wget of some 3-400 GB [06:50] I can do it myself, no worries [06:50] aww underscor is not even in this channel any longer, traitor :| [10:28] as soon as fix this I'll archive the Wikia dumps http://p.defau.lt/?eeBaa4Jb1zuYi0JR0DPZYg [16:19] Nemo_bis: I'm not? [16:19] SketchCow: ops spread [16:20] * Nemo_bis faceplams [16:20] (twice for typos) [16:20] underscor: so can you do a wget -r of dumps.wikia.net and upload the zips to archive.org? :D [16:22] I suppose ;p [16:22] Do these include the media? [16:46] underscor: they should but it's not clear to me at what conditions they are created [16:46] I guess most wikis just don't have any image [16:47] Last times I just made a .zip for each letter https://archive.org/details/wikia_dump_20121204 [16:51] I too have noticed many of the wikia community sites are seriously lacking images, I always put it down to not having enough volunteers [16:52] just getting a good skeleton of data into place takes a lot of effort [16:57] well, or they just use Wikimedia Commons [17:00] wikimedia commons has pretty harsh standards for inclusion, that doesn't work on content sites like Star Trek where there is no CC content unless it was fan generated at a conference [17:00] most wikia sites that have images do not own the images at least from what I have observed of the Scifi wikia sites [17:48] WHAT DID YOU ALL DO [17:51] I did nothing wrong! [17:56] :-| [18:52] I blame poor training by our leader [19:11] underscor, have you backed up any semantic mediawiki wikis? Is there anything special one needs to do? [19:12] semantic? [19:12] I'm not familiar :c [19:15] sethish: afaik, no [19:15] everything comes from the wiki pages and can be rebuilt from there [19:16] the question is rather what it takes to *import* a semantic wiki [19:53] mmm, how are y'all generating dumps these days? Scrape script, or do you have a mediawiki-api wrapper to dump wikimarkup? [19:53] Both are important, scrape always works [19:54] underscor, I'm also isforinsects, we've spoken before. I did the backup of Encyclopedia Dramatica a few years ago that ended up getting used to rebuild the site [19:58] we use dumpgenerator.py [19:58] we'd like to use the API more [20:00] Have you ever done any mass upload scripts? I have a big collection of images from the CDC that I need to get over to wiki commons and I would love help [20:02] cult of dead cow or center for disease control? [20:02] :P [20:08] Center for Disease Control (and Prevention) [20:08] It's the Public Health Image Library [20:08] I scraped it ages ago [20:08] With good metadata [20:08] nice [20:13] sethish, I have mass upload scripts for the Internet Archive using their s3 api if you want to upload it there too :) [20:27] sethish: for Commons we have several tools [20:27] there's the classic uploader.py [20:28] https://outreach.wikimedia.org/wiki/GLAM/Resources/Tools [20:29] https://commons.wikimedia.org/wiki/Commons:Batch_uploading#Tools [20:30] then I think ingester.py or so on PWB [20:31] ingester.py makes me think of a chipper-shredder [20:35] Didn't we need a Encyclopedia Dramatica grab? [20:36] did we? [20:36] I made one some time ago iirc [20:57] I just remember being asked about it more than once [20:57] how old is your backup Nemo_bis [21:00] just search? :) [21:02] https://archive.org/details/wiki-encyclopediadramatica.ch [21:03] Publicdate: 2012-02-29 08:38:47 [21:03] should be this [21:10] My ED dump is from 2011 [21:11] ED.ch posts links to several more recent dumps [21:29] upload them then [21:32] Is this their new official home? http://dramatica.in/ [21:37] no idea [21:45] That url before was just a landing page for https://encyclopediadramatica.se/Main_Page [21:50] I think encyclopediadramatica.ch is the canonical url [22:01] OpenDNS says that encyclopediadramatica.ch does not exist [22:01] hence why I am poking around at it [22:08] ED is starting to move TLDs as often as TPB [22:37] Oh, I had it backwards [22:38] it _used_ to be .ch, and moved to .se [22:38] sorry