[08:18] http://www.archive.org/search.php?query=subject%3A%22wikiteam%22 [20:13] * Nemo_bis uploading a 20 GiB image dump of a wiki to IA :-O [20:14] emijrp, http://www.archive.org/search.php?query=subject%3A%22wikiteam%22 [20:15] nice [20:16] im going to contact jason scott, to try add wikiteam items onto archive team collection http://www.archive.org/details/archiveteam [20:17] Yes, I wanted to ask you about it. [20:17] I tried to upload them there but it says I have no permission (obviously). [20:18] By the way, I'm using http://www.archive.org/help/abouts3.txt after trying browser and FTP it's sooo sweet [20:20] i check .xml before upload them to google code [20:20] i use this line [20:20] inside dump directory [20:20] wc -l *.txt;grep "" *.xml -c;grep "<page>" *.xml -c;grep "</page>" *.xml -c;grep "<revision>" *.xml -c;grep "</revision>" *.xml -c [20:21] <emijrp> it first 3 numbers are equal, and last two too, them, the dump is ok [20:21] <emijrp> most times there is no problem [20:21] <emijrp> but, check them before upload [20:22] <emijrp> you can remove the first wc -l [20:22] <emijrp> grep "<title>" *.xml -c;grep "<page>" *.xml -c;grep "</page>" *.xml -c;grep "<revision>" *.xml -c;grep "</revision>" *.xml -c [20:22] <emijrp> if first 3 numbers* [20:43] <Nemo_bis> Thank you. [20:43] <Nemo_bis> Did my dumps have problems? [20:45] <emijrp> i have no check them [20:46] <Nemo_bis> ok. It would be useful to add to the Tutorials which changes/checks you do un dumps before uploading them, so we can save you some time. [20:46] <Nemo_bis> It's also quite boring to uncompress multi-GiB 7z archives to run grep on them. [20:47] <emijrp> not much, only the integrity check [20:48] <Nemo_bis> that is? [20:51] <emijrp> the grep thing [20:51] <emijrp> last section http://code.google.com/p/wikiteam/wiki/Tutorial [20:55] <emijrp> 30513 [20:55] <emijrp> 30513 [20:55] <emijrp> 75202 [20:55] <Nemo_bis> yes, I understand how it works [20:55] * Nemo_bis loves grep [20:55] <emijrp> infictivecom-20110712-history is ok [20:58] <Nemo_bis> what CPU do you have? [20:58] <Nemo_bis> mine is very slow, but I could uncompress some of them and check myself [20:58] <emijrp> dualcore [20:58] <emijrp> 2ghz [20:59] <emijrp> acwikkiinet_w-20110712-history is ok too [21:00] <Nemo_bis> there's also https://docs.google.com/leaf?id=0By9Ct0yopDdVNzExNzIxMWQtN2Q5Ny00NzQzLTgyOWQtMTdkZjcwNDNhY2E0&hl=it [21:01] <emijrp> what the hell you saved 911dataset? that site had 700k pages [21:01] <emijrp> sure you did to entirely? [21:02] <Nemo_bis> took a while but it was quite fast [21:02] <Nemo_bis> not completely sure, though [21:03] <emijrp> by the way the site is now offline [21:03] <emijrp> i hope you did not waste their bandwidth lol [21:03] <Nemo_bis> :-D [21:03] <Nemo_bis> 6 days [21:04] <Nemo_bis> but only 274 MB ?? [21:04] <Nemo_bis> perhaps something got lost [21:05] <Nemo_bis> Can you access http://www.wikilib.com/ ? Either it's offline or they blocked my IP [21:05] <Nemo_bis> couldn't comkplete download [21:05] <emijrp> what conecction do you have? [21:06] <Nemo_bis> 10 Mb/s full duplex, fiber [21:06] <Nemo_bis> Fastweb (Milan) [21:07] <Nemo_bis> If those websites had enough bandwidth I'd use some PC at university with 60-100 Mb/s connection... [21:07] <emijrp> openwetwareorg-20110712-history is ok [21:08] <emijrp> sometimes the bottleneck is on cpu, special:export is resources consiming [21:09] <emijrp> boobpedia crashes if you dont add a delay [21:09] <emijrp> wikilib error 503 [21:10] <emijrp> i developed a script to repair corrupt dumps [21:11] <emijrp> it removes corrupt pages [21:11] <emijrp> but i want to test it before uploading it to svn [21:13] <Nemo_bis> ok [21:14] <emijrp> dumpgenerator.py includes some comprobations before merge a page to the whole dump [21:14] <emijrp> but man, this is a script to recover wikis, not websites with 100,000 pages some of then with hundreds of revisions [21:15] <emijrp> : P [21:16] <emijrp> a user reported an issue because he cant download all the images of English Wikipedia using dumpgenerator.py, maaaannn [21:16] <Nemo_bis> :-D [21:17] <Nemo_bis> just had a memoryerror on some huge page [21:17] <Nemo_bis> but was only 900 MiB RAM, :-p I had more free :-/ [21:19] <emijrp> you saved some big wikis [21:20] <emijrp> wikieducator users requesting dump http://wikieducator.org/WikiEducator:Community_Council/Meetings/First/Motiondump [21:21] <Nemo_bis> lol [21:21] <Nemo_bis> didn't have much support [21:21] <Nemo_bis> wowpedia dump is currently at 12 Gib and increasing [21:21] <Nemo_bis> *GiB [21:22] <emijrp> you didnt upload images for wikieducator [21:22] <emijrp> wowpedia is a wikia site? they offer backups in that case [21:23] <Nemo_bis> I'm uploading still images [21:23] <emijrp> i use to search of google like this: dump site:wikieducator.org [21:24] <emijrp> looking for official dumps [21:24] <Nemo_bis> yes [21:24] <emijrp> before run the script [21:24] <Nemo_bis> www.wowpedia.org doesn't seem to be Wikia [21:24] <emijrp> wikieducatororg-20110712-history is ok [21:25] <Nemo_bis> argh, why do people let pages grow at 20000 150 KB revisions?? [21:26] <Nemo_bis> I need all my RAM for that page .-/ [21:27] <Nemo_bis> I think this project could be a threat: dump your wiki or I'll DoS it � with a good use. .-p [21:29] <emijrp> also, you can add links to backups in wikiindex [21:30] <Nemo_bis> boooring [21:30] <Nemo_bis> We need some automated system for this, sooner or later [21:30] <Nemo_bis> (I mean, the whole project) [21:31] <emijrp> look the infobox last parameter http://wikiindex.org/Rezepte-Wiki [21:33] <Nemo_bis> yes. wikiindex also needs an API parameter in infobox [21:34] <Nemo_bis> although some evil wikis disable API or are just too outdated [21:39] <emijrp> thats a problem [21:42] <emijrp> what is the limit of google doc hosting? [21:43] <emijrp> s23org_w-20110707-wikidump is ok [21:46] <Nemo_bis> Google Docs is only 1 GiB [21:46] <Nemo_bis> although you can expand [21:46] <Nemo_bis> I have several Google accounts :-p [21:58] <emijrp> strategywiki is oko [22:14] <Nemo_bis> for some reason can't export http://wiki.guildwars.com/index.php?title=Guild:Its_Mocha_(historical)&action=history [22:14] <Nemo_bis> (only last revision export works) [22:17] <emijrp> stupidediaorg-20110712-history is ok [22:17] <emijrp> all the dumps you uploaded to IA are ok, please check future dumps before upload them