[01:37] Nemo_bis: is http://citywiki.ugr.es/w/api.php down for you? [05:02] underscor: yes, did you DOS it? [05:17] Nemo_bis: no, haven't been able to start cause it was down [05:17] was/is [05:17] underscor: ok [05:18] underscor: can you claim emijrp's batches? [05:18] sure, I can, but I don't have a free box atm [05:18] I assume that's not a problem [05:19] I mean, waiting until I finish one of the other 7 lists I'm running or w/e [05:19] my downstream is maxed [05:19] sure [05:19] nice, how much are you downloading? [05:20] 67GB in ~2.5 days [05:25] underscor: how do I launch a command in screen -dm so that its output is redirected to a file? [05:25] I use tmux [05:25] no idea [05:25] sorry [05:25] hm [05:25] ok [05:26] (and I just do command > filename 2>&1 [05:26] ) [05:28] underscor: can you do something about the uploader's todos? :) [05:31] perhaps, we'll see how much time I have [07:30] underscor: Nemo_bis http://archive.org/details/wiki-citywiki.ugr.es city wiki was dumped before, it is huge, stop your thread [07:31] emijrp: it's down anyway [07:31] ok [07:31] perhaps the panic download was just a time [07:32] in* [07:34] mhm! [07:38] I think that wiki has frequent problems [07:38] emijrp: where did I reupload it? [07:38] I'm pretty sure I had deleted the file [07:39] Nemo_bis: it was a notice for underscor [07:40] emijrp: ah ok [07:41] emijrp: I'm not sure it's correct, but script claims to have finished download of all my batches [07:41] so now I'm generating logs and I started uploads for 2249 more wikis [07:42] (not all of them actually existing of course) [07:42] hahahaha [07:42] lines like that are so awesome [07:42] so now I'm generating logs and I started uploads for 2249 more wikis [07:51] 2249 wikis? that is the 1% of wikis out there [07:51] ; ) [07:51] emijrp: give me more lists instead of whining! ;-) [07:52] is it normal to have to rerun the launcher over the list a bunch of times? [07:52] underscor: yep [07:52] ok [07:53] wanted to make sure I wasn't doing something wrong [07:53] well, ideally that should b fixed of course [07:53] rerun doesnt make bad, it skips those completed [07:53] Always things like: [07:53] Template:--, 1 edits [07:53] An error have occurred while retrieving "Template:.com" [07:53] Server is slow... Waiting some seconds and retrying... [07:53] Template:.cn, 1 edits [07:53] Please, resume the dump, --resume [07:53] No tag found: dump failed, needs fixing; resume didn't work. Exiting. [07:53] so they're fine to just rerun launcher? [07:53] underscor: it was worse before, it compressed everything and forgot about it! [07:53] yes [07:54] kk, awesome [07:54] typical slow wiki [07:54] when a download is stuck, you can ctrl-c it and trigger a "Server is slow" retry, most often [07:57] wikis are the new book, and we are saving incunabula, wikis create just 10 years after wikipedia was founded [07:58] wikis will appear in the OpenLibrary XXVth centutry [08:15] I hope 7z will still work! :D [08:20] you troll, 7z specifications are free and attached to every 7z software instance [08:21] The official 7z file format specification is distributed with the program's source code. The specification can be found in plain text format in the 'doc' subdirectory of the source code distribution. http://en.wikipedia.org/wiki/7-Zip [08:29] emijrp: come on, that doesn't ensure a format can last for 5 centuries [08:30] 7z is saved in more computers that the dumps. [08:31] also, in 5 centuries will be ccryptoanalysts focused in zip formats [08:32] a new and cool occupation, to study that old zip formats humans used long time ago when hard disk hadn't infinite capacity [08:33] super-long-term digital preservation seems mostly free territory to me [08:33] exercise 1 in cryptoanalyst school: create a 7zunzip from the scratch (3 points) [08:44] one of this full of wikis is delivered by Nemo_bis each hour http://en.wikipedia.org/wiki/File:BW_Fjord_an_Glameyer_Stack_2007-12-15.JPG [08:47] another one arriving to IA headquarters in San Francisco http://commons.wikimedia.org/wiki/File:Golden_Gate_Bridge_Yang_Ming_Line.jpg [11:03] lol [11:51] ha that is image is fucking saved in the grab http://ia601207.us.archive.org/zipview.php?zip=/19/items/wikimediacommons-200606/2006-06-03.zip&file=2006%2F06%2F03%2FGolden_Gate_Bridge_Yang_Ming_Line.jpg [12:20] in case you wonder why no new wikis arrived in a while: 26G Jun 9 06:21 halodemomodscom_mediawiki-20120602-wikidump.7z [12:27] what is you up bandwith? [12:33] emijrp: 10 Mb/s [12:33] (fiber) [12:33] so it will take some more hours [12:34] I could have uploaded from university of course, but the uploader script sometimes requires human input and this reuled it out [12:35] http://p.defau.lt/?sMBkkjIwT_jiFahHxKfocw [12:38] that paste is amazing, i dont really know how many wikis we are saving from fire [13:16] well $ ls -1 *7z | wc -l [13:16] 8354 [13:26] /2 [13:28] yep [13:47] Oh god why, January + February 2007 of Commons grab already amounts to 91GB [13:48] wait no, should half that number... [13:52] Hydriz: are you still doing this from home? [13:55] yep, why? :P [13:55] No, I am not using the Toolserver :P [13:56] (although the available space seems to be decreasing ATM) [14:01] what is your up bandwidth Hydriz ? [14:02] never wondered [14:02] but depends on what server I am using [14:02] which servers? [14:02] For very large files, I use the Sourceforge shell service [14:03] large files means above 10GB up to 80 perhaps [14:03] then the rest is my home network [14:03] recently upgraded to a faster speed [14:04] so around 1-3MB/s [14:06] whats with ia601206 [14:07] or rather, ia701206 [14:07] zzz [14:27] Ah, I see the Archive got hit by spam yesterday? [14:33] did it [14:34] http://archive.org/details/BuyFluconazole_269 :) [14:35] don't click the link in the description please [14:36] why not, it's so compelling [14:37] heh [14:37] what are that? pills for archiving faster? [14:37] i will take two [14:39] * Hydriz shall write a disclaimer for this [14:40] Mail: Are you interested in old Wikipedia and old Wikia dumps? I have a few [14:40] can use the disk space for other purposes. [14:40] terabytes of data that I've been looking for a stable home for, so I [14:43] emijrp: the WMF is looking for those [14:43] bla bla bla [14:43] http://dumps.wikimedia.org/archive/ [14:43] WMF can wait [14:44] why tdo they want old dumps if they delete old dumps? [14:44] I mean that it would be easy to share them with them [14:44] they want a sample of them... [14:45] underscor: can you provide a place for that 2tb stuff? [14:53] Lol, no point asking the WMF [14:54] they are already struggling with space with the current dumps [14:55] it depends on how valuable they are [14:56] emijrp: Do you know what years those dumps are? [14:57] I meant the Wikipedia dumps [15:00] nope [15:02] emijrp: anyway, it's just a matter of how he wants to deliver them, what's easiest for him [15:02] if he has enough bandwidth, IA is easy [15:02] emaip :P [15:02] *email :P [15:03] yes, but it may require a bit curation [15:03] and i dont know if he want to curate it or just upload [15:04] thats why i ask underscor for a machine where to host all that, and we will see how many items tocreate and so [15:13] haha hydriz chat from toolserver [16:25] Yeah, we're *really* getting hammered with spam oer the last two days [16:25] :( [16:25] emijrp: hi, yeah, I have space [16:29] emijrp: wait, so this is someone who has WMF dumps, not from WMF themselves? [16:30] underscor: yes [16:30] cool! [18:35] underscor: ok i CC: to [18:35] you* [18:38] underscor: can you provide ftp or what? [18:58] FTP?? hoooooooooorrible [18:58] lack of progress info from curl kills me [19:01] ah, s3 is being slow again [19:20] Didn't you know? s3 actually stands for "Slow, Slow, Slow" [19:45] heh [19:46] ersi: I got used at IA's FTP which is soooooooo much worse [19:46] I can't be convinced that s3 isn't wonderful [19:47] dunno, s3 always seems like the worst place to pull things from [19:47] slow as shit clogged up the toilett :/ [19:47] FTP is worse! [19:47] and the web thingy always fails for me [19:51] best is gopher [19:54] 3,321 http://archive.org/details/wikiteam [20:23] I have great speeds from FTP but use it internally. [20:23] Also, I fedex in hard drives. [20:44] Nemo_bis: Curl displays a progress bar on uploads when you redirect standard out to a file. (You can find out more about that in the manpage for curl) [20:45] soultcer: yes I know but the script doesn't do so [20:46] Change the script ;-) [20:46] soultcer: too late, launched already :) [20:46] (for all wikis I have) [21:31] s3 is great if you use multiput [21:31] A single tcp upload stream just sucks from most residential providers [21:33] When you say S3, do you mean the IA S3-interface or Amazon's S3? [21:35] underscor: it's never been a problem on my end [21:35] * Nemo_bis always means IA [21:43] soultcer: ias3 [21:43] * underscor always means IA [21:43] yeah [21:44] Nemo_bis: No, I know. That was more towards ersi [21:44] Slowest part of uploading to the IA was that everything got routed via Hurricane Electric, which has crazy congestion on it's transatlantic cables [21:45] ah, I see [21:45] he.net? [21:45] yeah, IA doesn't always have the most efficient routing [21:45] but multihoming a lot of providers is expensive [22:10] I know, and I definitely don't blame IA for the slow upload speed. [22:11] Though I do dislike that I pay my provider good money, and then everything is routed via HE.net because they favor cheap connectivity over quality. [22:38] underscor: I mean 'the real s3' [22:38] ie. Amazon S3 [22:48] aha