#wikiteam 2012-08-10,Fri

↑back Search

Time Nickname Message
01:37 🔗 underscor Nemo_bis: is http://citywiki.ugr.es/w/api.php down for you?
05:02 🔗 Nemo_bis underscor: yes, did you DOS it?
05:17 🔗 underscor Nemo_bis: no, haven't been able to start cause it was down
05:17 🔗 underscor was/is
05:17 🔗 Nemo_bis underscor: ok
05:18 🔗 Nemo_bis underscor: can you claim emijrp's batches?
05:18 🔗 underscor sure, I can, but I don't have a free box atm
05:18 🔗 underscor I assume that's not a problem
05:19 🔗 underscor I mean, waiting until I finish one of the other 7 lists I'm running or w/e
05:19 🔗 underscor my downstream is maxed
05:19 🔗 Nemo_bis sure
05:19 🔗 Nemo_bis nice, how much are you downloading?
05:20 🔗 underscor 67GB in ~2.5 days
05:25 🔗 Nemo_bis underscor: how do I launch a command in screen -dm so that its output is redirected to a file?
05:25 🔗 underscor I use tmux
05:25 🔗 underscor no idea
05:25 🔗 underscor sorry
05:25 🔗 Nemo_bis hm
05:25 🔗 Nemo_bis ok
05:26 🔗 underscor (and I just do command > filename 2>&1
05:26 🔗 underscor )
05:28 🔗 Nemo_bis underscor: can you do something about the uploader's todos? :)
05:31 🔗 underscor perhaps, we'll see how much time I have
07:30 🔗 emijrp underscor: Nemo_bis http://archive.org/details/wiki-citywiki.ugr.es city wiki was dumped before, it is huge, stop your thread
07:31 🔗 underscor emijrp: it's down anyway
07:31 🔗 emijrp ok
07:31 🔗 emijrp perhaps the panic download was just a time
07:32 🔗 emijrp in*
07:34 🔗 underscor mhm!
07:38 🔗 Nemo_bis I think that wiki has frequent problems
07:38 🔗 Nemo_bis emijrp: where did I reupload it?
07:38 🔗 Nemo_bis I'm pretty sure I had deleted the file
07:39 🔗 emijrp Nemo_bis: it was a notice for underscor
07:40 🔗 Nemo_bis emijrp: ah ok
07:41 🔗 Nemo_bis emijrp: I'm not sure it's correct, but script claims to have finished download of all my batches
07:41 🔗 Nemo_bis so now I'm generating logs and I started uploads for 2249 more wikis
07:42 🔗 Nemo_bis (not all of them actually existing of course)
07:42 🔗 underscor hahahaha
07:42 🔗 underscor lines like that are so awesome
07:42 🔗 underscor <Nemo_bis> so now I'm generating logs and I started uploads for 2249 more wikis
07:51 🔗 emijrp 2249 wikis? that is the 1% of wikis out there
07:51 🔗 emijrp ; )
07:51 🔗 Nemo_bis emijrp: give me more lists instead of whining! ;-)
07:52 🔗 underscor is it normal to have to rerun the launcher over the list a bunch of times?
07:52 🔗 Nemo_bis underscor: yep
07:52 🔗 underscor ok
07:53 🔗 underscor wanted to make sure I wasn't doing something wrong
07:53 🔗 Nemo_bis well, ideally that should b fixed of course
07:53 🔗 emijrp rerun doesnt make bad, it skips those completed
07:53 🔗 underscor Always things like:
07:53 🔗 underscor Template:--, 1 edits
07:53 🔗 underscor An error have occurred while retrieving "Template:.com"
07:53 🔗 underscor Server is slow... Waiting some seconds and retrying...
07:53 🔗 underscor Template:.cn, 1 edits
07:53 🔗 underscor Please, resume the dump, --resume
07:53 🔗 underscor No </mediawiki> tag found: dump failed, needs fixing; resume didn't work. Exiting.
07:53 🔗 underscor so they're fine to just rerun launcher?
07:53 🔗 Nemo_bis underscor: it was worse before, it compressed everything and forgot about it!
07:53 🔗 Nemo_bis yes
07:54 🔗 underscor kk, awesome
07:54 🔗 emijrp typical slow wiki
07:54 🔗 Nemo_bis when a download is stuck, you can ctrl-c it and trigger a "Server is slow" retry, most often
07:57 🔗 emijrp wikis are the new book, and we are saving incunabula, wikis create just 10 years after wikipedia was founded
07:58 🔗 emijrp wikis will appear in the OpenLibrary XXVth centutry
08:15 🔗 Nemo_bis I hope 7z will still work! :D
08:20 🔗 emijrp you troll, 7z specifications are free and attached to every 7z software instance
08:21 🔗 emijrp The official 7z file format specification is distributed with the program's source code. The specification can be found in plain text format in the 'doc' subdirectory of the source code distribution. http://en.wikipedia.org/wiki/7-Zip
08:29 🔗 Nemo_bis emijrp: come on, that doesn't ensure a format can last for 5 centuries
08:30 🔗 emijrp 7z is saved in more computers that the dumps.
08:31 🔗 emijrp also, in 5 centuries will be ccryptoanalysts focused in zip formats
08:32 🔗 emijrp a new and cool occupation, to study that old zip formats humans used long time ago when hard disk hadn't infinite capacity
08:33 🔗 Nemo_bis super-long-term digital preservation seems mostly free territory to me
08:33 🔗 emijrp exercise 1 in cryptoanalyst school: create a 7zunzip from the scratch (3 points)
08:44 🔗 emijrp one of this full of wikis is delivered by Nemo_bis each hour http://en.wikipedia.org/wiki/File:BW_Fjord_an_Glameyer_Stack_2007-12-15.JPG
08:47 🔗 emijrp another one arriving to IA headquarters in San Francisco http://commons.wikimedia.org/wiki/File:Golden_Gate_Bridge_Yang_Ming_Line.jpg
11:03 🔗 Nemo_bis lol
11:51 🔗 emijrp ha that is image is fucking saved in the grab http://ia601207.us.archive.org/zipview.php?zip=/19/items/wikimediacommons-200606/2006-06-03.zip&file=2006%2F06%2F03%2FGolden_Gate_Bridge_Yang_Ming_Line.jpg
12:20 🔗 Nemo_bis in case you wonder why no new wikis arrived in a while: 26G Jun 9 06:21 halodemomodscom_mediawiki-20120602-wikidump.7z
12:27 🔗 emijrp what is you up bandwith?
12:33 🔗 Nemo_bis emijrp: 10 Mb/s
12:33 🔗 Nemo_bis (fiber)
12:33 🔗 Nemo_bis so it will take some more hours
12:34 🔗 Nemo_bis I could have uploaded from university of course, but the uploader script sometimes requires human input and this reuled it out
12:35 🔗 Nemo_bis http://p.defau.lt/?sMBkkjIwT_jiFahHxKfocw
12:38 🔗 emijrp that paste is amazing, i dont really know how many wikis we are saving from fire
13:16 🔗 Nemo_bis well $ ls -1 *7z | wc -l
13:16 🔗 Nemo_bis 8354
13:26 🔗 emijrp /2
13:28 🔗 Nemo_bis yep
13:47 🔗 Hydriz Oh god why, January + February 2007 of Commons grab already amounts to 91GB
13:48 🔗 Hydriz wait no, should half that number...
13:52 🔗 Nemo_bis Hydriz: are you still doing this from home?
13:55 🔗 Hydriz yep, why? :P
13:55 🔗 Hydriz No, I am not using the Toolserver :P
13:56 🔗 Hydriz (although the available space seems to be decreasing ATM)
14:01 🔗 emijrp what is your up bandwidth Hydriz ?
14:02 🔗 Hydriz never wondered
14:02 🔗 Hydriz but depends on what server I am using
14:02 🔗 emijrp which servers?
14:02 🔗 Hydriz For very large files, I use the Sourceforge shell service
14:03 🔗 Hydriz large files means above 10GB up to 80 perhaps
14:03 🔗 Hydriz then the rest is my home network
14:03 🔗 Hydriz recently upgraded to a faster speed
14:04 🔗 Hydriz so around 1-3MB/s
14:06 🔗 Hydriz whats with ia601206
14:07 🔗 Hydriz or rather, ia701206
14:07 🔗 Hydriz zzz
14:27 🔗 Hydriz Ah, I see the Archive got hit by spam yesterday?
14:33 🔗 Nemo_bis did it
14:34 🔗 Hydriz http://archive.org/details/BuyFluconazole_269 :)
14:35 🔗 Hydriz don't click the link in the description please
14:36 🔗 Nemo_bis why not, it's so compelling
14:37 🔗 Hydriz heh
14:37 🔗 emijrp what are that? pills for archiving faster?
14:37 🔗 emijrp i will take two
14:39 🔗 * Hydriz shall write a disclaimer for this
14:40 🔗 emijrp Mail: Are you interested in old Wikipedia and old Wikia dumps? I have a few
14:40 🔗 emijrp can use the disk space for other purposes.
14:40 🔗 emijrp terabytes of data that I've been looking for a stable home for, so I
14:43 🔗 Nemo_bis emijrp: the WMF is looking for those
14:43 🔗 emijrp bla bla bla
14:43 🔗 Nemo_bis http://dumps.wikimedia.org/archive/
14:43 🔗 emijrp WMF can wait
14:44 🔗 emijrp why tdo they want old dumps if they delete old dumps?
14:44 🔗 Nemo_bis I mean that it would be easy to share them with them
14:44 🔗 Nemo_bis they want a sample of them...
14:45 🔗 emijrp underscor: can you provide a place for that 2tb stuff?
14:53 🔗 Hydriz Lol, no point asking the WMF
14:54 🔗 Hydriz they are already struggling with space with the current dumps
14:55 🔗 Nemo_bis it depends on how valuable they are
14:56 🔗 Hydriz emijrp: Do you know what years those dumps are?
14:57 🔗 Hydriz I meant the Wikipedia dumps
15:00 🔗 emijrp nope
15:02 🔗 Nemo_bis emijrp: anyway, it's just a matter of how he wants to deliver them, what's easiest for him
15:02 🔗 Nemo_bis if he has enough bandwidth, IA is easy
15:02 🔗 Hydriz emaip :P
15:02 🔗 Hydriz *email :P
15:03 🔗 emijrp yes, but it may require a bit curation
15:03 🔗 emijrp and i dont know if he want to curate it or just upload
15:04 🔗 emijrp thats why i ask underscor for a machine where to host all that, and we will see how many items tocreate and so
15:13 🔗 emijrp haha hydriz chat from toolserver
16:25 🔗 underscor Yeah, we're *really* getting hammered with spam oer the last two days
16:25 🔗 underscor :(
16:25 🔗 underscor emijrp: hi, yeah, I have space
16:29 🔗 underscor emijrp: wait, so this is someone who has WMF dumps, not from WMF themselves?
16:30 🔗 Nemo_bis underscor: yes
16:30 🔗 underscor cool!
18:35 🔗 emijrp underscor: ok i CC: to
18:35 🔗 emijrp you*
18:38 🔗 emijrp underscor: can you provide ftp or what?
18:58 🔗 Nemo_bis FTP?? hoooooooooorrible
18:58 🔗 Nemo_bis lack of progress info from curl kills me
19:01 🔗 Nemo_bis ah, s3 is being slow again
19:20 🔗 ersi Didn't you know? s3 actually stands for "Slow, Slow, Slow"
19:45 🔗 Nemo_bis heh
19:46 🔗 Nemo_bis ersi: I got used at IA's FTP which is soooooooo much worse
19:46 🔗 Nemo_bis I can't be convinced that s3 isn't wonderful
19:47 🔗 ersi dunno, s3 always seems like the worst place to pull things from
19:47 🔗 ersi slow as shit clogged up the toilett :/
19:47 🔗 Nemo_bis FTP is worse!
19:47 🔗 Nemo_bis and the web thingy always fails for me
19:51 🔗 emijrp best is gopher
19:54 🔗 emijrp 3,321 http://archive.org/details/wikiteam
20:23 🔗 SketchCow I have great speeds from FTP but use it internally.
20:23 🔗 SketchCow Also, I fedex in hard drives.
20:44 🔗 soultcer Nemo_bis: Curl displays a progress bar on uploads when you redirect standard out to a file. (You can find out more about that in the manpage for curl)
20:45 🔗 Nemo_bis soultcer: yes I know but the script doesn't do so
20:46 🔗 soultcer Change the script ;-)
20:46 🔗 Nemo_bis soultcer: too late, launched already :)
20:46 🔗 Nemo_bis (for all wikis I have)
21:31 🔗 underscor s3 is great if you use multiput
21:31 🔗 underscor A single tcp upload stream just sucks from most residential providers
21:33 🔗 soultcer When you say S3, do you mean the IA S3-interface or Amazon's S3?
21:35 🔗 Nemo_bis underscor: it's never been a problem on my end
21:35 🔗 * Nemo_bis always means IA
21:43 🔗 underscor soultcer: ias3
21:43 🔗 * underscor always means IA
21:43 🔗 underscor yeah
21:44 🔗 underscor Nemo_bis: No, I know. That was more towards ersi
21:44 🔗 soultcer Slowest part of uploading to the IA was that everything got routed via Hurricane Electric, which has crazy congestion on it's transatlantic cables
21:45 🔗 underscor ah, I see
21:45 🔗 Nemo_bis he.net?
21:45 🔗 underscor yeah, IA doesn't always have the most efficient routing
21:45 🔗 underscor but multihoming a lot of providers is expensive
22:10 🔗 soultcer I know, and I definitely don't blame IA for the slow upload speed.
22:11 🔗 soultcer Though I do dislike that I pay my provider good money, and then everything is routed via HE.net because they favor cheap connectivity over quality.
22:38 🔗 ersi underscor: I mean 'the real s3'
22:38 🔗 ersi ie. Amazon S3
22:48 🔗 underscor aha

irclogger-viewer