#wikiteam 2012-08-06,Mon

↑back Search

Time Nickname Message
12:09 πŸ”— emijrp Nemo_bis: im working on the uploader
12:09 πŸ”— emijrp http://archive.org/details/wiki-androidjimsh
12:11 πŸ”— emijrp check the fields
12:11 πŸ”— emijrp what to do with those wikis with no license metadata in api query?
12:13 πŸ”— emijrp also, appending date to item name is ok, right?
12:13 πŸ”— emijrp the curl command you sent me doesnt include it, example http://archive.org/details/wiki-androidjimsh
12:15 πŸ”— Nemo_bis emijrp: looking
12:16 πŸ”— Nemo_bis emijrp: no, item identifier shouldn't contain date
12:16 πŸ”— emijrp but that may produce collisions
12:16 πŸ”— emijrp distinct users uploading dump for a wiki in different dates
12:17 πŸ”— Nemo_bis emijrp: how so? I think it's better if we put different dates in the same item
12:17 πŸ”— Nemo_bis you only need error handling, but I think IA doesn't let you disrupt anything
12:17 πŸ”— Nemo_bis such items will need to be uploaded by a wikiteam collection admin (like you, me and underscor IIRC)
12:18 πŸ”— emijrp SketchCow: opinion?
12:19 πŸ”— emijrp Nemo_bis: also, description is almost empty
12:19 πŸ”— Nemo_bis it's early morning there
12:19 πŸ”— Nemo_bis emijrp: where are you fetching the description from
12:19 πŸ”— Nemo_bis emijrp: the item id pattern is what Alexis told us to use
12:19 πŸ”— Nemo_bis no date in it
12:20 πŸ”— Nemo_bis emijrp: as for the license info, what API info are using? I think there are (or have been) multiple places across the MW releases
12:22 πŸ”— emijrp http://android.jim.sh/api.php?action=query&meta=siteinfo&siprop=general|rightsinfo&format=xml
12:22 πŸ”— Nemo_bis emijrp: I think that when you can't find a proper license URL you should just take whatever you find (non-URLs, names in the rightsinfo fiels, even main page or Project:About or Project:COpyright and dump it in the "rights" field, and in any case tag the item so that we can check it later
12:23 πŸ”— Nemo_bis emijrp: according to docs that didn't exist before MW 1.15+
12:25 πŸ”— emijrp or just a link to the mainpage
12:25 πŸ”— Nemo_bis emijrp: also note https://bugzilla.wikimedia.org/show_bug.cgi?id=29918#c1
12:25 πŸ”— emijrp http://wikipapers.referata.com/wiki/Main_Page#footer
12:26 πŸ”— Nemo_bis link is bad, might disappear any time
12:27 πŸ”— Nemo_bis hmm is wikistats.wmflabs.org down?
12:27 πŸ”— emijrp you said link Project:About
12:28 πŸ”— Nemo_bis no, I'd just include its content
12:28 πŸ”— Nemo_bis whatever it is, can't harm
12:28 πŸ”— Nemo_bis (if there's no license URL)
12:30 πŸ”— Nemo_bis btw I'm still without a proper PC right now but I have dumps on an external HDD and connection now works (actually it's freer than usual), so I can upload lots of stuff
12:46 πŸ”— emijrp if you upload a file with the same name to an item, are they overwrited‘
12:46 πŸ”— emijrp ?
12:48 πŸ”— Nemo_bis emijrp: this I don't know
12:49 πŸ”— emijrp the description is not
12:49 πŸ”— Nemo_bis emijrp: description and other metadata is never overwritten unless you add ignore bucket something
12:50 πŸ”— Nemo_bis emijrp: https://www.mediawiki.org/w/index.php?title=Special:Contributions/Nemo_bis&offset=&limit=5&target=Nemo+bis
12:51 πŸ”— Nemo_bis So, I think licenseurl should contain whatever "url" is, additionally if it's not a link to creativecommons.org or fsf.org you should fetch it and add it to "rights" field
12:51 πŸ”— Nemo_bis additionally, it would probably be best to add to "rights" whatever MediaWiki:Copyright contains even if the wiki uses a license, to get cases like the referata wiki footer you linked
12:52 πŸ”— emijrp some people add the copyright info to main page, project:about, project:copyright, etc, it is a mess
12:55 πŸ”— emijrp yuo can run a 100 batch, and check the log, how many wikis havent coypright metadata
12:57 πŸ”— Nemo_bis yes I can do such a batch when the script is ready
12:57 πŸ”— Nemo_bis is it?
13:00 πŸ”— Nemo_bis emijrp: which of those pages are you teching right now?
13:01 πŸ”— emijrp none, just api metadata
13:02 πŸ”— Nemo_bis emijrp: can you add at least mediawiki:copyright?
13:02 πŸ”— Nemo_bis shouldn't be hard with API or ?action=raw
13:02 πŸ”— emijrp no, im going to fetch <li id="copyright"></li> from the mainpage, i think is better
13:03 πŸ”— Nemo_bis emijrp: what's that?
13:04 πŸ”— emijrp http://amsterdam.campuswiki.nl/nl/Hoofdpagina look html code
13:04 πŸ”— emijrp at the bottom
13:04 πŸ”— Nemo_bis ah the footer
13:04 πŸ”— Nemo_bis but this doesn't let you discover if it's customised or not
13:04 πŸ”— Nemo_bis you should include it only if it's custom, otherwise it's worthless crap which doesn't let us wikis which actually need info
13:05 πŸ”— emijrp that <li> is from old mediawiki
13:05 πŸ”— emijrp prior 1.12
13:06 πŸ”— emijrp 1.15*
13:06 πŸ”— Nemo_bis emijrp: ah, so you're going to use it only in those cases, because there's no API?
13:06 πŸ”— emijrp yes
13:06 πŸ”— Nemo_bis ok
13:07 πŸ”— Nemo_bis when there's API, it's still worth including MediaWIki:copyright iff existing
13:21 πŸ”— emijrp in origialurl field, api or mainpage link?
13:22 πŸ”— emijrp Nemo_bis:
13:23 πŸ”— Nemo_bis emijrp: hm?
13:24 πŸ”— balrog_ are there any wikiteam tools for DokuWiki?
13:24 πŸ”— Nemo_bis dump it in the "rights" IA field
13:24 πŸ”— Nemo_bis balrog_: not yet
13:24 πŸ”— balrog_ :/ ok
13:24 πŸ”— Nemo_bis AFAIK
13:25 πŸ”— emijrp Nemo_bis: i mean originalurl field
13:25 πŸ”— emijrp http://archive.org/details/wiki-amsterdamcampuswikinl_wiki
13:29 πŸ”— Nemo_bis emijrp: API
13:30 πŸ”— Nemo_bis that's what we decided, I don't remember all the reasons but one of them is that script path is non trivial
13:31 πŸ”— Nemo_bis emijrp: so you manage to extract the licenseurl even if there's no API now?
13:32 πŸ”— emijrp i copy copyright info from html footer, and link to #footer
13:32 πŸ”— emijrp anyway, there are wikis without copyright info at all
13:33 πŸ”— emijrp first api, if fails then html, if fails then skip dump
13:33 πŸ”— Nemo_bis emijrp: skip??
13:34 πŸ”— emijrp if there is no license data, uplaod anyway ?
13:34 πŸ”— Nemo_bis yes
13:34 πŸ”— Nemo_bis but tag it in some way so that we can review it later
13:34 πŸ”— emijrp ok
13:35 πŸ”— emijrp add unknowncopyright keyword
13:35 πŸ”— Nemo_bis for instance NO_COPYRIGHT_INFO in rights field
13:35 πŸ”— Nemo_bis yeah whatever
13:37 πŸ”— Nemo_bis emijrp: are you adding the footer content to 'rights' field even if you tech license from API?
13:37 πŸ”— emijrp no
13:38 πŸ”— Nemo_bis emijrp: it would be better, if MediaWiki:copyright is custom
13:38 πŸ”— Nemo_bis (or just raw text of the message)
13:39 πŸ”— Nemo_bis emijrp: copyright message in the footer was introduced with Domas Mituzas 2006-01-22 00:49:58 +0000 670) 'copyright' => 'Content is available under $1.',
13:44 πŸ”— Nemo_bis ah no it just was in another file
13:46 πŸ”— emijrp yes, but $1 is defined in lcoalsettings.php
13:46 πŸ”— emijrp and showed in the footer
13:46 πŸ”— emijrp you cant read mediawiki:copyrgiht licence
13:46 πŸ”— emijrp shown* license*
13:50 πŸ”— emijrp Nemo_bis: tell me a field for dump date
13:51 πŸ”— Nemo_bis emijrp: I know you can't, but the text can still contain info
13:51 πŸ”— Nemo_bis like in your referata wiki
13:51 πŸ”— Nemo_bis while the link is already provided by the API
13:51 πŸ”— Nemo_bis now looking for a proper field
13:54 πŸ”— Nemo_bis emijrp: 'addeddate' would seem ok but I still have to dig
13:55 πŸ”— emijrp 'date'
13:57 πŸ”— Nemo_bis emijrp: no, that's for te date of creation
13:57 πŸ”— Nemo_bis but the content is created before the dump
13:57 πŸ”— emijrp creation of what?
13:57 πŸ”— emijrp creation of dump
13:57 πŸ”— Nemo_bis it would be like replacing the date of publication of a book with the date of scanning
13:57 πŸ”— Nemo_bis no, creation of the work
13:58 πŸ”— emijrp downloaddate, backupdate...
13:58 πŸ”— emijrp dumpdate
13:59 πŸ”— emijrp ?
14:00 πŸ”— Nemo_bis emijrp: better to use existing ones
14:01 πŸ”— emijrp addeddate?
14:01 πŸ”— Nemo_bis no addeddate is wrong
14:02 πŸ”— Nemo_bis it's already used by the system on upload
14:02 πŸ”— Nemo_bis soo let's see
14:02 πŸ”— Nemo_bis emijrp: how about a last-updated-date
14:03 πŸ”— Nemo_bis it could contain the date of the last dump
14:03 πŸ”— emijrp anyway if more than 1 dump are allowed per item, date is nonsense
14:04 πŸ”— emijrp date is on filenames
14:04 πŸ”— Nemo_bis emijrp: it's necessarily nonsense for wikis
14:04 πŸ”— emijrp no field for date
14:04 πŸ”— Nemo_bis it should be replaced with something like 2001-2012
14:05 πŸ”— Nemo_bis emijrp: however, a way to see when the dump has been updates last would be useful
14:06 πŸ”— Nemo_bis at some point we could use it to select wikis to be redownloaded
14:06 πŸ”— emijrp ok
14:14 πŸ”— emijrp i think you can use it Nemo_bis http://code.google.com/p/wikiteam/source/browse/trunk/uploader.py
14:14 πŸ”— emijrp you need keys.txt file
14:14 πŸ”— emijrp download it in the same directory you have the dumps
14:14 πŸ”— emijrp and listxxx.txt is required
14:15 πŸ”— emijrp im going to add instructions here http://code.google.com/p/wikiteam/w/edit/TaskForce
14:19 πŸ”— Nemo_bis emijrp: so it uploads only what one has in the list?
14:19 πŸ”— Nemo_bis (just to check)
14:19 πŸ”— emijrp yes
14:19 πŸ”— Nemo_bis oki
14:19 πŸ”— * Nemo_bis eager to try
14:22 πŸ”— emijrp instructions http://code.google.com/p/wikiteam/wiki/TaskForce#footer
14:23 πŸ”— emijrp try
14:28 πŸ”— emijrp Nemo_bis: works?
14:31 πŸ”— emijrp balrog_: dokuwiki is not supported, really only mediawiki is supported
14:31 πŸ”— balrog_ I've used the mediawiki tool and it works well
14:37 πŸ”— emijrp do you know any dokuwiki wikifarm?
14:38 πŸ”— balrog_ well, I was thinking of the MESS wiki, but that's a mess
14:38 πŸ”— balrog_ because it was lost with no backups
14:38 πŸ”— balrog_ so it might have to be scraped from IA :(
14:44 πŸ”— emijrp sure
14:51 πŸ”— emijrp Nemo_bis is fapping watching the script to upload a fuckton of wikis
15:08 πŸ”— Nemo_bis emijrp: been busy, now I should tri
15:10 πŸ”— Nemo_bis emijrp: first batch includes a 100 GB wiki, it won't overwritten right?
15:11 πŸ”— emijrp how overwritten? uploader script just read from harddisk
15:12 πŸ”— Nemo_bis emijrp: I mean on archive.org
15:12 πŸ”— Nemo_bis I don't want to reupload that huge dump
15:12 πŸ”— emijrp i dont know
15:12 πŸ”— Nemo_bis hmm
15:12 πŸ”— emijrp then just remove it from listxxx.txt
15:12 πŸ”— Nemo_bis let's try another batch first
15:12 πŸ”— Nemo_bis right
15:13 πŸ”— emijrp remove from list those dumps you uploaded in the past
15:13 πŸ”— Nemo_bis I'm so smart, I had already removed it
15:13 πŸ”— emijrp but do not make svn commit
15:14 πŸ”— emijrp you should use different directories for svn and the taskforce downlaods
15:14 πŸ”— Nemo_bis IndexError: list index out of range
15:14 πŸ”— Nemo_bis sure
15:14 πŸ”— Nemo_bis hmm
15:14 πŸ”— Nemo_bis wait
15:15 πŸ”— emijrp paste the entire error
15:15 πŸ”— Nemo_bis no, I was just stupid
15:16 πŸ”— Nemo_bis emijrp: http://archive.org/details/wiki-citwikioberlinedu
15:17 πŸ”— Nemo_bis we used to include the dots though
15:17 πŸ”— emijrp no -wikidump.7z ?
15:17 πŸ”— Nemo_bis dunno if it was a good idea
15:17 πŸ”— Nemo_bis emijrp: http://www.us.archive.org/log_show.php?task_id=117329967
15:17 πŸ”— Nemo_bis you should see it now
15:19 πŸ”— Nemo_bis afk for some mins
15:31 πŸ”— Nemo_bis hmpf IOError: [Errno socket error] [Errno 110] Connection timed out
15:31 πŸ”— Nemo_bis now let's see what happens if I just rerun
15:32 πŸ”— Nemo_bis emijrp: ^
15:33 πŸ”— Nemo_bis file gets overwitten
15:34 πŸ”— emijrp no -wikidump.7z
15:34 πŸ”— Nemo_bis there isn't any probably
15:34 πŸ”— emijrp i have used the script to upload 3 small wikis and all is ok
15:35 πŸ”— emijrp perhaps i may move the urllib2 query inside the try catch
15:35 πŸ”— emijrp to avoid time out errors
15:35 πŸ”— emijrp probably some wikis are dead now
15:36 πŸ”— Nemo_bis hm there's an images directory but no archive
15:37 πŸ”— Nemo_bis later I'll have to check all the archives, recompress if needed and reupload
15:37 πŸ”— Nemo_bis so I seriously need the script not to reupload stuff which is already on archive.org
15:37 πŸ”— Nemo_bis it timed out again
15:38 πŸ”— Nemo_bis http://p.defau.lt/?as62zqO_kO6K8Duh_jCG1Q
15:40 πŸ”— emijrp yep cloverfield.despoiler.org/api.php
15:40 πŸ”— emijrp no response
15:40 πŸ”— Nemo_bis it didn't fail before
15:41 πŸ”— Nemo_bis http://archive.org/details/wiki-cloverfielddespoilerorg
15:42 πŸ”— emijrp erratic server
15:42 πŸ”— emijrp update the uplaoder script, i added try except
15:43 πŸ”— Nemo_bis emijrp: how about reupload?
15:44 πŸ”— emijrp ?
15:45 πŸ”— Nemo_bis emijrp: does it still reupload stuff already uploaded?
15:45 πŸ”— emijrp yes
15:45 πŸ”— Nemo_bis hmpf
15:45 πŸ”— emijrp there is any command to skip? in curl?
15:46 πŸ”— emijrp i thought you didnt upload dumps
15:46 πŸ”— emijrp thats why you needed the script
15:46 πŸ”— Nemo_bis sure but for instance now I have to manually edit the list of files every time I have to rerun it
15:47 πŸ”— Nemo_bis and for this wiki I must reupload the 250 MB archive to also upload the small history 7z
15:47 πŸ”— emijrp ok
15:48 πŸ”— Nemo_bis perhaps it would be easier to use the script to put the metadata on a csv file for their bulkuploader?
15:48 πŸ”— Nemo_bis there must be some command though :/
15:49 πŸ”— emijrp the uploader creates a log with the dump status
15:49 πŸ”— emijrp i will read it at first, to skip those uploaded before
15:58 πŸ”— emijrp Nemo_bis: try now, it skips uploaded dumps
15:58 πŸ”— emijrp but you must to conservate the .log
15:59 πŸ”— Nemo_bis emijrp: ok
15:59 πŸ”— Nemo_bis emijrp: are you putting the "lang" from siteinfo in the "language" field?
16:04 πŸ”— emijrp nope
16:04 πŸ”— emijrp i will add it later
16:47 πŸ”— SketchCow Quick question.
16:48 πŸ”— SketchCow Who is James Michael Dupont. One of you guys?
16:59 πŸ”— Nemo_bis SketchCow: somehow
17:00 πŸ”— Nemo_bis he's a Wikipedian who's archiving some speedy deleted articles to a Wikia wiki
17:00 πŸ”— Nemo_bis (from en.wikipedia)
17:02 πŸ”— Nemo_bis SketchCow: is he flooding IA?
17:02 πŸ”— Nemo_bis he wanted to add XML exports of articles to wikiteam collection but it should be in a subcollection at least (if any)
17:20 πŸ”— SketchCow He is not.
17:20 πŸ”— SketchCow He just asked for access to stuff.
17:20 πŸ”— SketchCow I can give him a subcollection
17:20 πŸ”— SketchCow just making sure we're in the relatively correct location.
17:24 πŸ”— Nemo_bis SketchCow: we don't know him very well, of course you own the house but IMvHO better if he's not admin of wikiteam
17:24 πŸ”— Nemo_bis a subcollection would be good
17:28 πŸ”— underscor I've traded a lot of emails, I trust him.
17:28 πŸ”— underscor but a subcollection would organize it better
17:29 πŸ”— Nemo_bis yep
17:30 πŸ”— SketchCow Give him a subcollection, then
17:30 πŸ”— SketchCow I'll link it into archiveteam's wikiteam as a whole
18:33 πŸ”— emijrp Nemo_bis: language must be 'en' or 'English'?
18:33 πŸ”— emijrp api returns 'en'
18:33 πŸ”— emijrp and make a conversor is a pain
18:36 πŸ”— emijrp i can convert the basic ones
18:39 πŸ”— emijrp ok, new version of uploader, detecting languages
18:39 πŸ”— emijrp update
18:47 πŸ”— SketchCow Hey, emijrp
18:48 πŸ”— SketchCow We're giving wikideletion guy a subcollection pointing to the wikiteam collection
18:48 πŸ”— SketchCow So he can add the stuff but it's not in the main thing and he's not an admin
18:48 πŸ”— SketchCow Also, I closed new accounts on archiveteam.org while we clean
18:49 πŸ”— emijrp ok SketchCow, about wikideletion guy
18:50 πŸ”— emijrp about at.org, i wouldnt care about deleting user account but deleting spam pages created by them
18:51 πŸ”— emijrp also, an antispam extension is needed, main issue are bot registration, and extensions may be triggered in that cases, or when adding an external link, etc
18:52 πŸ”— emijrp http://www.mediawiki.org/wiki/Extension:ConfirmEdit + http://www.mediawiki.org/wiki/Extension:QuestyCaptcha
18:53 πŸ”— SketchCow Well, I want deleted accounts AND deleting spam pages.
18:54 πŸ”— emijrp $wgCaptchaQuestions[] = array( 'question' => "...", 'answer' => "..." );
18:54 πŸ”— emijrp $wgCaptchaTriggers['createaccount'] = true;
19:15 πŸ”— emijrp SketchCow: also we are grabbing Wikimedia Commons entirely month by month http://archive.org/details/wikimediacommons-200507
19:15 πŸ”— emijrp 12*7 years = 84 items
19:16 πŸ”— SketchCow !
19:16 πŸ”— emijrp not sure if subcollection is desired
19:16 πŸ”— SketchCow Yeah, it will be.
19:16 πŸ”— SketchCow But for now, just keep going.
19:17 πŸ”— emijrp http://archive.org/search.php?query=%22wikimedia%20commons%20grab%22
19:18 πŸ”— SketchCow http://archive.org/details/wikimediacommons exists already, apparently.
19:20 πŸ”— emijrp but that is an item, not a subcollection, right?
19:20 πŸ”— emijrp it was created by Hydriz probably, but he saw it is many TB for a single item, and separated into months
19:22 πŸ”— SketchCow It's a subcollection now.
19:22 πŸ”— SketchCow I'll move the crap over
19:24 πŸ”— SketchCow Done.
19:24 πŸ”— SketchCow It'll filter into place across the next 5-40 minutes.
19:27 πŸ”— SketchCow http://archive.org/search.php?query=collection%3Awikimediacommons&sort=-publicdate
19:30 πŸ”— emijrp it may be a subcollection inside wikiteam?
19:31 πŸ”— SketchCow It is.
19:31 πŸ”— SketchCow http://archive.org/details/wikimediacommons
19:32 πŸ”— SketchCow if you click on the all items, they're there; archive.org has these walkers that go through and shore up collections later.
19:32 πŸ”— SketchCow http://ia601202.us.archive.org/zipview.php?zip=/22/items/wikimediacommons-200607/2006-07-01.zip&file=2006%2F07%2F01%2FHooters_Girls_calendar_signing_at_Kandahar.jpg
19:32 πŸ”— SketchCow SAVED FOREVER
19:32 πŸ”— emijrp ah ok, it is now shown here http://archive.org/details/wikiteam may be lag
19:32 πŸ”— emijrp not*
19:33 πŸ”— SketchCow Yeah, not lag.
19:33 πŸ”— SketchCow See, a thing goes through over time and shores up collection sets, cleans messes, etc.
19:33 πŸ”— emijrp ok
19:33 πŸ”— SketchCow That's one of archive.org's weirdness that's culturally difficult for folks - how stuff kind of fades into view
19:33 πŸ”— SketchCow It's not 1/0
19:35 πŸ”— emijrp the zip explorer is very useful and cool in this case
19:35 πŸ”— emijrp and the .xml files are opened in firefox too, so you can read the image description and license
19:35 πŸ”— emijrp pure win
19:41 πŸ”— emijrp anyway, some files were lost in wikimedia servers http://upload.wikimedia.org/wikipedia/commons/archive/b/b8/20050415212201!SMS_Bluecher.jpg and they are saved as empty files in the grab
20:06 πŸ”— SketchCow Frau Bluecher!!! (whinny)
20:49 πŸ”— Nemo_bis emijrp: AFAIK language codes work
20:58 πŸ”— Nemo_bis emijrp: launched script for 3557 more wikis
20:58 πŸ”— Nemo_bis not all of them actually downloaded of course
20:58 πŸ”— emijrp cool
20:59 πŸ”— Nemo_bis emijrp: it would be nice if you could redirect curl output so that I can see the progress info
20:59 πŸ”— Nemo_bis I can monitor it with nethogs but it's not quite the same
21:00 πŸ”— emijrp do it yourself and commit
21:00 πŸ”— Nemo_bis emijrp: don't know exactly what output you're expecting in the logs
21:00 πŸ”— Nemo_bis anyway it's not urgent
21:01 πŸ”— Nemo_bis SketchCow: what the command to prevent curl/s3 from replacing an existing file with same filename?
21:02 πŸ”— chronomex -nc --no-clobber
21:03 πŸ”— Nemo_bis so easy? :p
21:05 πŸ”— Nemo_bis chronomex: are you? I mean when one UPloads with s3
21:05 πŸ”— Nemo_bis *you sure
21:05 πŸ”— chronomex oh
21:05 πŸ”— chronomex I was thinking wget
21:05 πŸ”— chronomex sorry
21:06 πŸ”— Nemo_bis chronomex: I want to avoid overwriting existing files/items in IA with my local ones
21:09 πŸ”— Nemo_bis good, saturating bandwidth to IA for once
21:11 πŸ”— Nemo_bis emijrp: it's not the uploader's fault but I wonder if we should do something to catch the duplicates I overlooked: http://archive.org/details/wiki-docwikigumstixcom http://archive.org/details/wiki-docwikigumstixorg
21:12 πŸ”— Nemo_bis emijrp: a pretty feature would be downloading the wiki logo and upload it to the item as well
21:12 πŸ”— Nemo_bis (surely not prioritary :p)
21:13 πŸ”— emijrp k
21:19 πŸ”— emijrp Nemo_bis: add it as a TODO comment in top of uploader.py
21:19 πŸ”— Nemo_bis emijrp: ok
21:31 πŸ”— emijrp Nemo_bis: the speedydeletion guy does not upload the entire history, just the last revision
21:32 πŸ”— emijrp http://speedydeletion.wikia.com/wiki/Salvador_De_Jesus_%28deleted_05_Sep_2008_at_20:47%29
21:32 πŸ”— emijrp http://deletionpedia.dbatley.com/w/index.php?title=Salvador_De_Jesus_%28deleted_05_Sep_2008_at_20:47%29&action=history
21:33 πŸ”— emijrp http://speedydeletion.wikia.com/wiki/Salvador_De_Jesus_%28deleted_05_Sep_2008_at_20:47%29?action=history
21:33 πŸ”— emijrp i think he really paste the text instead of import
21:34 πŸ”— Nemo_bis hmmmmmmmm
21:45 πŸ”— underscor who is alphacorp again?
21:45 πŸ”— Nemo_bis underscor: Hydriz
21:45 πŸ”— underscor thx
21:46 πŸ”— Nemo_bis hm? http://p.defau.lt/?puN_G_zKXbv1lz9TfSliPg
21:47 πŸ”— underscor Nemo_bis: is that a space in the identifier?
21:48 πŸ”— Nemo_bis underscor: it shouldn't, let me check
21:49 πŸ”— Nemo_bis underscor: I don't think so http://archive.org/details/wiki-editionorg_w
21:49 πŸ”— underscor woah, weird
21:50 πŸ”— underscor it started and then failed
21:50 πŸ”— underscor delete and retry?
21:50 πŸ”— Nemo_bis it's two very small files, maybe it didn't manage to finish the first before the second
21:50 πŸ”— underscor ah
21:50 πŸ”— underscor yeah
21:50 πŸ”— underscor need a longer pause
21:52 πŸ”— Nemo_bis underscor: it didn't even manage to set the collection and other metadata http://archive.org/details/wiki-editionorg_w
22:06 πŸ”— Nemo_bis underscor: weird, just 300 KB more and it works http://archive.org/details/wiki-emiswikioecnk12ohus
22:16 πŸ”— Nemo_bis underscor: we also have some escaping problem, fancy fixing it while emijrp is offline? :) http://archive.org/details/wiki-encitizendiumorg
22:16 πŸ”— Nemo_bis shouldn't be too hard
22:19 πŸ”— Nemo_bis sigh http://archive.org/details/wiki-enecgpediaorg http://archive.org/details/wiki-en.ecgpedia.org
22:19 πŸ”— underscor Nemo_bis: Working on it
22:19 πŸ”— underscor the fix is easy, but trying to figure out where we munge the data
22:22 πŸ”— Nemo_bis underscor: from siteinfo API query IIRC
22:22 πŸ”— Nemo_bis otherwise meta HTML tags?
22:22 πŸ”— underscor no, no
22:22 πŸ”— underscor not your fault
22:23 πŸ”— underscor somewhere in the s3 pipline, it's double entity-ized
22:23 πŸ”— underscor pipeline*

irclogger-viewer