#wikiteam 2012-05-05,Sat

↑back Search

Time Nickname Message
12:49 πŸ”— Nemo_bis hmpf http://p.defau.lt/?kiCBGoPJdWS009nCngWidA
12:51 πŸ”— Nemo_bis uh, I summoned emijrp
12:52 πŸ”— Nemo_bis emijrp, do you know if dumpgenerator is able to resume broken xml like these? hmpf http://p.defau.lt/?kiCBGoPJdWS009nCngWidA
12:53 πŸ”— Nemo_bis and btw I'm at 3418 7z to upload
12:54 πŸ”— emijrp did you try?
12:54 πŸ”— emijrp it removes the last <page> until EOF
12:54 πŸ”— emijrp and resume
12:54 πŸ”— Nemo_bis sometimes I get 7z which contain broken xml like those
12:55 πŸ”— Nemo_bis but it might be launcher's fault, I suppose it happens when the download fails with unexpected reasons
12:55 πŸ”— Nemo_bis "sometimes" = at least a hundred dozens times until now
12:55 πŸ”— Nemo_bis s/dozens//
12:55 πŸ”— emijrp there is not check before to launch 7z
12:56 πŸ”— Nemo_bis I now added one
12:56 πŸ”— Nemo_bis emijrp, https://code.google.com/p/wikiteam/source/diff?spec=svn648&r=648&format=side&path=/trunk/batchdownload/launcher.py&old_path=/trunk/batchdownload/launcher.py&old=613
12:56 πŸ”— Nemo_bis I hope it works despite me not knowing python at all...
12:58 πŸ”— emijrp i dont know, i didnt tested it
12:58 πŸ”— emijrp i get an error using S3
12:58 πŸ”— emijrp http://pastebin.com/czCM8Rt7
12:59 πŸ”— Nemo_bis what's the error?
12:59 πŸ”— emijrp reload
13:00 πŸ”— Nemo_bis are you sure it accepts those filenames?
13:01 πŸ”— Nemo_bis when you use the web interface, it tells you that spaces are removed, it's converted to CamelCase etc.; s3 probabl doesn't do it for you
13:02 πŸ”— emijrp there is no spaces in the filename
13:02 πŸ”— Nemo_bis I think underscores are not accepted either
13:04 πŸ”— Nemo_bis Filenames with SPACE characters will be renamed to "camel case".
13:04 πŸ”— Nemo_bis For example, "my LA roadtrip.mov" will become "MyLaRoadtrip.mov"
13:04 πŸ”— Nemo_bis Letters, numbers, periods (.), dashes (-), or underscores (_) are ok, but other characters will be removed.
13:05 πŸ”— Nemo_bis oh, emijrp, perhaps it's just that you have to specify a collection
13:07 πŸ”— emijrp i did it before, but removed it while debuggin
13:07 πŸ”— emijrp not work
13:07 πŸ”— Nemo_bis opensource_movies ?
13:07 πŸ”— Nemo_bis that's the only collection you're allowed to upload to AFAIK
13:08 πŸ”— * Nemo_bis idiot
13:08 πŸ”— Nemo_bis it's just that you forgot to put the item name
13:08 πŸ”— Nemo_bis the last URL should have a "directory" which will become the item identifier
13:10 πŸ”— Nemo_bis Identifier "spanishrevolution-Madrid_Manifestacin_24_de_Julio_-_Hacia_el_congreso-00ytP9" is available.
13:10 πŸ”— emijrp max length issues?
13:10 πŸ”— Nemo_bis probably must be ASCII and not longer than that
13:10 πŸ”— Nemo_bis ó
13:11 πŸ”— Nemo_bis --upload-file Madrid_Manifestación_24_de_Julio_-_Hacia_el_congreso-00ytP96jGKU.mp4 http://s3.us.archive.org/spanishrevolution-Madrid_Manifestacion_24_de_Julio_-_Hacia_el_congreso/spanishrevolution-Madrid_Manifestación_24_de_Julio_-_Hacia_el_congreso-00ytP96jGKU.mp4
13:11 πŸ”— Nemo_bis this should work? shorter identifier would be nice though
13:13 πŸ”— emijrp nope
13:13 πŸ”— emijrp error
13:14 πŸ”— Nemo_bis remove the ó from the filename too?
13:17 πŸ”— emijrp too
13:17 πŸ”— Nemo_bis maybe an identifier + filename length limit?
13:22 πŸ”— emijrp if i paste it the command line it works
13:22 πŸ”— emijrp http://archive.org/details/spanishrevolution-Madrid_Manifestacion_24_de_Jul
13:47 πŸ”— emijrp the propble is using subprocess
13:47 πŸ”— emijrp os.system() works
13:47 πŸ”— emijrp problem*
13:50 πŸ”— Nemo_bis emijrp, why problem?
13:50 πŸ”— emijrp not sure
13:51 πŸ”— Nemo_bis it seems to work for me
13:51 πŸ”— emijrp http://archive.org/details/spanishrevolution-15M_democracia_real_ya_Madrid-03YgOi0yOfo.mp4
13:51 πŸ”— Nemo_bis oh, you mean for this upload
13:51 πŸ”— Nemo_bis you're creating a YouTube archiver?
13:51 πŸ”— emijrp only for spanish revolution videos
13:52 πŸ”— Nemo_bis ok
13:53 πŸ”— Nemo_bis what error do you get with subprocess?
13:54 πŸ”— Nemo_bis and how do you call it exactly
13:54 πŸ”— Nemo_bis I've fought with it a bit too this morning
13:57 πŸ”— emijrp ^[a-zA-Z0-9][a-zA-Z0-9_.-]{4,100}$
13:57 πŸ”— emijrp that is the item filter
13:57 πŸ”— emijrp <Error><Code>InvalidBucketName</Code><Message>The specified bucket is not valid.</Message><Resource>Bucket names should be valid archive identifiers; try someting matching this regular expression: ^[a-zA-Z0-9][a-zA-Z0-9_.-]{4,100}$
15:14 πŸ”— emijrp Nemo_bis: what is you upstreamn bandwith?
15:21 πŸ”— emijrp an error while upgrading mediawiki in referata wikifarm has deleted images in 10 wikis
15:21 πŸ”— emijrp forwarding email to the group
15:28 πŸ”— Nemo_bis emijrp, 10 Mb/s at home
15:35 πŸ”— Nemo_bis I don't understand how my upstream can help
15:38 πŸ”— Nemo_bis he might perhaps ask Tim Starling how he recovered those files he lost a while ago
15:38 πŸ”— * Nemo_bis doesn't remember the details
15:38 πŸ”— emijrp they recovered them from cache, some backups and people re-uploading
15:39 πŸ”— emijrp you cant undelete files on linux
15:39 πŸ”— emijrp (easily)
15:39 πŸ”— Nemo_bis ok
15:39 πŸ”— Nemo_bis so, I can help in some way?
15:43 πŸ”— emijrp no
15:44 πŸ”— emijrp i asked because i need to upload many CC videos here http://archive.org/details/spanishrevolution
15:48 πŸ”— emijrp do you want to help? Nemo_bis
15:49 πŸ”— Nemo_bis emijrp, ok if it's just running commands
15:49 πŸ”— emijrp is there any way i can edit the items you upload?
16:34 πŸ”— emijrp Nemo_bis: https://code.google.com/p/emijrp/source/browse/trunk/scrapers/youtube2internetarchive.py
16:35 πŸ”— emijrp and the videos http://pastebin.com/06Vt2qNx
16:35 πŸ”— emijrp the script include instuctions
16:47 πŸ”— emijrp mm
16:47 πŸ”— emijrp im not sure if you can add items to that collection, you are not an admin
16:48 πŸ”— emijrp em.h.
16:53 πŸ”— emijrp if that case, just comment the collection header line, and the subject "spanishrevolution" will be enough
17:18 πŸ”— Nemo_bis emijrp, can't you give me your s3 keys for now?
17:18 πŸ”— emijrp why?
17:18 πŸ”— Nemo_bis you can recreate them later
17:18 πŸ”— emijrp ah
17:18 πŸ”— Nemo_bis so that I can upload them directly to the collection
17:19 πŸ”— emijrp ok
17:19 πŸ”— emijrp dont upload pr0n
17:19 πŸ”— Nemo_bis heh
17:19 πŸ”— Nemo_bis also, I don't want to have those videos in my "watchlist" :p
17:19 πŸ”— emijrp where is the watchlist?
17:21 πŸ”— Nemo_bis nowhere, I mean the list of contributions
17:24 πŸ”— emijrp is that avialable?
17:24 πŸ”— emijrp i dont see anything like my profile or so
17:24 πŸ”— Nemo_bis http://archive.org/catalog.php?history=1&justme=1
17:41 πŸ”— Nemo_bis emijrp, I suppose I should see some output?
17:41 πŸ”— Nemo_bis nothing seems to be happening
17:41 πŸ”— emijrp yes
17:42 πŸ”— emijrp did you create download/videostodo.txt
17:42 πŸ”— emijrp keys.txt
17:42 πŸ”— emijrp oh
17:42 πŸ”— emijrp you need youtube-dl
17:42 πŸ”— Nemo_bis yes, did everything
17:42 πŸ”— Nemo_bis but for some reason the list of video is blank now *facepalms*
17:43 πŸ”— Nemo_bis emijrp, what version are you using?
17:43 πŸ”— Nemo_bis youtube-dl: error: no such option: --write-info-json
17:44 πŸ”— emijrp python youtube-dl -U
17:44 πŸ”— emijrp to update
17:48 πŸ”— Nemo_bis emijrp, but I've just downloaded it!
17:48 πŸ”— emijrp weird
17:48 πŸ”— Nemo_bis ah, it's in the same google code directory, too easy
17:49 πŸ”— Nemo_bis I downloaded http://www.textfiles.com/videoyahoo/SCRIPTS/youtube-dl :p
17:50 πŸ”— Nemo_bis it's a bit annoying that the script blanks the list of videos
17:50 πŸ”— Nemo_bis hm, gave same error
17:50 πŸ”— Nemo_bis afk no
17:50 πŸ”— Nemo_bis w
18:04 πŸ”— emijrp glad to see you here SketchCow
18:10 πŸ”— Nemo_bis emijrp, so what youtube-dl should I use?
18:10 πŸ”— emijrp the last version
18:11 πŸ”— emijrp do -U
18:18 πŸ”— Nemo_bis emijrp, ok
18:19 πŸ”— Nemo_bis I seem to be always getting ERROR: unable to download video webpage: HTTP Error 404: Not Found
18:20 πŸ”— emijrp em
18:20 πŸ”— emijrp try to launch youtube-dl alone
18:21 πŸ”— emijrp python youtube-dl -t -i -c url --write-info-json --format 18
18:25 πŸ”— Nemo_bis this way it woks
18:25 πŸ”— Nemo_bis probably something wrong about language settings
18:28 πŸ”— Nemo_bis emijrp, can't you remove the 1JCWblUBdH4.mp4 from the identifier in http://archive.org/details/spanishrevolution-Manifestacio_19J_Placa_Catalunya_Barcelona-1JCWblUBdH4.mp4
18:28 πŸ”— Nemo_bis the identifier never contains format, at least
18:29 πŸ”— Nemo_bis seems to have been a false alarm then? the description has been added
18:31 πŸ”— emijrp did you upload that using the script? or handy?
18:36 πŸ”— Nemo_bis script
18:37 πŸ”— Nemo_bis emijrp, I see only 7 videos uploaded out of 10, 2 have been skipped for the size limit (set to 25 GB now) but I don't understand the last one
18:38 πŸ”— emijrp mmm
18:38 πŸ”— emijrp what it says?, perhaps it was skipped because it exists with that name?
18:39 πŸ”— Nemo_bis can't find it, have to dig the output
18:44 πŸ”— Nemo_bis dunno, no error shown
18:44 πŸ”— Nemo_bis for some reason Manifestació_19J_Plaça_Catalunya_Barcelona-1JCWblUBdH4.mp4 is the working directory
18:45 πŸ”— Nemo_bis but was uploaded
18:47 πŸ”— Nemo_bis emijrp, all ok, RSS feed missed the last entry
18:48 πŸ”— Nemo_bis emijrp, all looks nice, if you don't want to change the identifiers or anything else I'd continue with all the others
18:49 πŸ”— emijrp wait a moment i check
18:49 πŸ”— emijrp the items uploaded
19:08 πŸ”— emijrp the slash is converted to %
19:08 πŸ”— emijrp 7/9 http://archive.org/details/spanishrevolution-Juglares_en_Biblio_Acampada_Sol_7_9-1YPywiNqON4.mp4
19:14 πŸ”— emijrp Nemo_bis: i have added an re.sub to change % to /
19:15 πŸ”— emijrp make another batch of 10 please
19:15 πŸ”— Nemo_bis ok
19:15 πŸ”— emijrp update the script
19:20 πŸ”— Nemo_bis whoops
19:20 πŸ”— Nemo_bis doing too many things at the same time
19:23 πŸ”— Nemo_bis emijrp,
19:23 πŸ”— Nemo_bis [download] 100.0% of 6.80M at 412.54k/s ETA 00:00
19:23 πŸ”— Nemo_bis [download] Destination: ΓŽΒ£ΓΒΓŽΒ½ΓΒ„ΓΒΓŽΒΏΓΒ†ΓŽΒΏΓΒ‚_ΓŽΒ›ΓŽΒΏΓΒ…ΓŽΒΊΓŽΒ¬ΓŽΒ½ΓŽΒΉΓŽΒΊΓŽΒΏΓΒ‚_-_Compañero_Loukanikos-28fYNSrOqmI.mp4
19:23 πŸ”— Nemo_bis [info] Video description metadata as JSON to: ΓŽΒ£ΓΒΓŽΒ½ΓΒ„ΓΒΓŽΒΏΓΒ†ΓŽΒΏΓΒ‚_ΓŽΒ›ΓŽΒΏΓΒ…ΓŽΒΊΓŽΒ¬ΓŽΒ½ΓŽΒΉΓŽΒΊΓŽΒΏΓΒ‚_-_Compañero_Loukanikos-28fYNSrOqmI.mp4.info.json
19:23 πŸ”— Nemo_bis Traceback (most recent call last):
19:23 πŸ”— Nemo_bis File "youtube2internetarchive.py", line 115, in <module>
19:23 πŸ”— Nemo_bis if not re.search(ur"Item cannot be found", urllib.urlopen('http://archive.org/details/%s' % (itemname)).read()):
19:23 πŸ”— Nemo_bis File "/usr/lib/python2.7/urllib.py", line 84, in urlopen
19:23 πŸ”— Nemo_bis return opener.open(url)
19:23 πŸ”— Nemo_bis File "/usr/lib/python2.7/urllib.py", line 177, in open
19:23 πŸ”— Nemo_bis fullurl = unwrap(toBytes(fullurl))
19:23 πŸ”— Nemo_bis File "/usr/lib/python2.7/urllib.py", line 1050, in toBytes
19:23 πŸ”— Nemo_bis " contains non-ASCII characters")
19:23 πŸ”— Nemo_bis UnicodeError: URL u'http://archive.org/details/spanishrevolution-\u03a3\u03c5\u03bd\u03c4\u03c1\u03bf\u03c6\u03bf\u03c2_\u039b\u03bf\u03c5\u03ba\u03b1\u03bd\u03b9\u03ba\u03bf\u03c2_-_Companero_Loukanikos-28fYNSrOqmI.mp4' contains non-ASCII characters
19:33 πŸ”— emijrp paste de error again Nemo_bis
19:34 πŸ”— emijrp ok
19:34 πŸ”— Nemo_bis found it=
19:34 πŸ”— emijrp im wathicng the channel log
19:34 πŸ”— Nemo_bis ok
19:35 πŸ”— emijrp Nemo_bis: try again with that video alone
19:36 πŸ”— emijrp fixed the script
19:38 πŸ”— Nemo_bis emijrp, it's the same
19:38 πŸ”— Nemo_bis maybe downloaded it too early
19:39 πŸ”— emijrp i change line 115, now it includes unicode(...
19:39 πŸ”— emijrp changed*
19:45 πŸ”— Nemo_bis still getting the same error
19:46 πŸ”— Nemo_bis retrying after deleting the downloaded files
19:46 πŸ”— Nemo_bis same
19:54 πŸ”— Nemo_bis emijrp, ?
20:03 πŸ”— emijrp i will see now, wait
20:14 πŸ”— emijrp fixed
20:14 πŸ”— emijrp http://archive.org/details/spanishrevolution-__-_Companero_Loukanikos-28fYNSrOqmI.mp4
20:15 πŸ”— emijrp i remove all non a-z0-9_.- chars
20:15 πŸ”— emijrp that is IA form does
20:16 πŸ”— emijrp Nemo_bis:
20:16 πŸ”— emijrp launch 50
20:49 πŸ”— Nemo_bis emijrp, the following 50 ?
20:49 πŸ”— emijrp yep
21:05 πŸ”— Nemo_bis hey emijrp, now you're ready to apply this experience to the wikis uploader, aren't you? :)
21:05 πŸ”— Nemo_bis hello mutoso, how is the download of wikis going?
21:12 πŸ”— Nemo_bis emijrp, can't you use more compact identifiers, like youtube-2t0HW4Cceuc , and leave the rest to title and filenames?
21:12 πŸ”— emijrp nooooo
21:12 πŸ”— emijrp titles are good for google results
21:12 πŸ”— Nemo_bis the URL you mean?
21:13 πŸ”— Nemo_bis the current solution doesn't even guarantee unique identifiers, does it? the alphanumeric part can be truncated and the rest could be not unique
21:13 πŸ”— emijrp i think that it is a limit case
21:25 πŸ”— Nemo_bis emijrp, have you looked at the check output for the wikis I downloaded?
21:25 πŸ”— emijrp yes
21:25 πŸ”— Nemo_bis emijrp, all ok?
21:25 πŸ”— emijrp how all ok?
21:26 πŸ”— emijrp they are partial dumps, if you resume, what happens?
21:26 πŸ”— Nemo_bis I mean, does it sound reasonably ok or will I have to fix
21:26 πŸ”— Nemo_bis noo I don't mean that
21:26 πŸ”— Nemo_bis the links I put under "Final check" for each row in https://code.google.com/p/wikiteam/wiki/TaskForce
21:27 πŸ”— Nemo_bis and I don't know what happened to those dumps, let's see if I manage to look
21:27 πŸ”— emijrp yes, that looks ok
21:28 πŸ”— Nemo_bis the current batches seem to complete very quickly, I'd start some 50 more instances tonight
21:34 πŸ”— Nemo_bis emijrp, so, some have been happily resumed, 2 are still broken, 1 failed again with some weird error like index.php not found or so
21:35 πŸ”— Nemo_bis and at least it was not compressed and will be retried next time
21:40 πŸ”— emijrp to make the wiki uplaoder, i need to get the api for every wiki
21:41 πŸ”— emijrp and it is inside the 7z
22:05 πŸ”— Nemo_bis emijrp, no, I kept all the config files
22:05 πŸ”— emijrp i will read the feed lists better
22:05 πŸ”— Nemo_bis ?
22:06 πŸ”— emijrp the uploader will read feed lists, check if there is a 7z for that wiki and upload
22:06 πŸ”— Nemo_bis the launcher now compresses in non-solid 7z so extracting the config should be very fast btw
22:06 πŸ”— Nemo_bis as you want
22:07 πŸ”— Nemo_bis if it looked for 7z (or directories) it could be used for any 7z generated by dumpgenerator, but I don't mind much
22:07 πŸ”— Nemo_bis feed list = list of api.php, right?
22:07 πŸ”— emijrp do you add the config to the 7z?
22:08 πŸ”— emijrp config.txt?
22:08 πŸ”— Nemo_bis uh, I don't remember
22:08 πŸ”— Nemo_bis doesn't the launcher do so?
22:08 πŸ”— emijrp i dont think so, i excluded it
22:08 πŸ”— Nemo_bis aww why
22:08 πŸ”— Nemo_bis well, using lists as input might be easier anyway
22:09 πŸ”— Nemo_bis script is doing last YouTube video of the 50 bunch
22:09 πŸ”— Nemo_bis I think
22:09 πŸ”— emijrp because config.txt saves your local path, and if you use absolute path you show your PC username
22:10 πŸ”— Nemo_bis oh
22:10 πŸ”— Nemo_bis but launcher.py doesn't use absolute paths
22:16 πŸ”— Nemo_bis emijrp, do you want me to upload all the remaining videos?
22:16 πŸ”— emijrp wait a second
22:17 πŸ”— Nemo_bis ok
22:27 πŸ”— emijrp looks ok
22:27 πŸ”— emijrp upload the rest
22:43 πŸ”— Nemo_bis k
22:52 πŸ”— emijrp i will check them tomorrow

irclogger-viewer