[12:49] hmpf http://p.defau.lt/?kiCBGoPJdWS009nCngWidA [12:51] uh, I summoned emijrp [12:52] emijrp, do you know if dumpgenerator is able to resume broken xml like these? hmpf http://p.defau.lt/?kiCBGoPJdWS009nCngWidA [12:53] and btw I'm at 3418 7z to upload [12:54] did you try? [12:54] it removes the last until EOF [12:54] and resume [12:54] sometimes I get 7z which contain broken xml like those [12:55] but it might be launcher's fault, I suppose it happens when the download fails with unexpected reasons [12:55] "sometimes" = at least a hundred dozens times until now [12:55] s/dozens// [12:55] there is not check before to launch 7z [12:56] I now added one [12:56] emijrp, https://code.google.com/p/wikiteam/source/diff?spec=svn648&r=648&format=side&path=/trunk/batchdownload/launcher.py&old_path=/trunk/batchdownload/launcher.py&old=613 [12:56] I hope it works despite me not knowing python at all... [12:58] i dont know, i didnt tested it [12:58] i get an error using S3 [12:58] http://pastebin.com/czCM8Rt7 [12:59] what's the error? [12:59] reload [13:00] are you sure it accepts those filenames? [13:01] when you use the web interface, it tells you that spaces are removed, it's converted to CamelCase etc.; s3 probabl doesn't do it for you [13:02] there is no spaces in the filename [13:02] I think underscores are not accepted either [13:04] Filenames with SPACE characters will be renamed to "camel case". [13:04] For example, "my LA roadtrip.mov" will become "MyLaRoadtrip.mov" [13:04] Letters, numbers, periods (.), dashes (-), or underscores (_) are ok, but other characters will be removed. [13:05] oh, emijrp, perhaps it's just that you have to specify a collection [13:07] i did it before, but removed it while debuggin [13:07] not work [13:07] opensource_movies ? [13:07] that's the only collection you're allowed to upload to AFAIK [13:08] * Nemo_bis idiot [13:08] it's just that you forgot to put the item name [13:08] the last URL should have a "directory" which will become the item identifier [13:10] Identifier "spanishrevolution-Madrid_Manifestacin_24_de_Julio_-_Hacia_el_congreso-00ytP9" is available. [13:10] max length issues? [13:10] probably must be ASCII and not longer than that [13:10] Ã³ [13:11] --upload-file Madrid_ManifestaciÃ³n_24_de_Julio_-_Hacia_el_congreso-00ytP96jGKU.mp4 http://s3.us.archive.org/spanishrevolution-Madrid_Manifestacion_24_de_Julio_-_Hacia_el_congreso/spanishrevolution-Madrid_ManifestaciÃ³n_24_de_Julio_-_Hacia_el_congreso-00ytP96jGKU.mp4 [13:11] this should work? shorter identifier would be nice though [13:13] nope [13:13] error [13:14] remove the Ã³ from the filename too? [13:17] too [13:17] maybe an identifier + filename length limit? [13:22] if i paste it the command line it works [13:22] http://archive.org/details/spanishrevolution-Madrid_Manifestacion_24_de_Jul [13:47] the propble is using subprocess [13:47] os.system() works [13:47] problem* [13:50] emijrp, why problem? [13:50] not sure [13:51] it seems to work for me [13:51] http://archive.org/details/spanishrevolution-15M_democracia_real_ya_Madrid-03YgOi0yOfo.mp4 [13:51] oh, you mean for this upload [13:51] you're creating a YouTube archiver? [13:51] only for spanish revolution videos [13:52] ok [13:53] what error do you get with subprocess? [13:54] and how do you call it exactly [13:54] I've fought with it a bit too this morning [13:57] ^[a-zA-Z0-9][a-zA-Z0-9_.-]{4,100}$ [13:57] that is the item filter [13:57] InvalidBucketNameThe specified bucket is not valid.Bucket names should be valid archive identifiers; try someting matching this regular expression: ^[a-zA-Z0-9][a-zA-Z0-9_.-]{4,100}$ [15:14] Nemo_bis: what is you upstreamn bandwith? [15:21] an error while upgrading mediawiki in referata wikifarm has deleted images in 10 wikis [15:21] forwarding email to the group [15:28] emijrp, 10 Mb/s at home [15:35] I don't understand how my upstream can help [15:38] he might perhaps ask Tim Starling how he recovered those files he lost a while ago [15:38] * Nemo_bis doesn't remember the details [15:38] they recovered them from cache, some backups and people re-uploading [15:39] you cant undelete files on linux [15:39] (easily) [15:39] ok [15:39] so, I can help in some way? [15:43] no [15:44] i asked because i need to upload many CC videos here http://archive.org/details/spanishrevolution [15:48] do you want to help? Nemo_bis [15:49] emijrp, ok if it's just running commands [15:49] is there any way i can edit the items you upload? [16:34] Nemo_bis: https://code.google.com/p/emijrp/source/browse/trunk/scrapers/youtube2internetarchive.py [16:35] and the videos http://pastebin.com/06Vt2qNx [16:35] the script include instuctions [16:47] mm [16:47] im not sure if you can add items to that collection, you are not an admin [16:48] em.h. [16:53] if that case, just comment the collection header line, and the subject "spanishrevolution" will be enough [17:18] emijrp, can't you give me your s3 keys for now? [17:18] why? [17:18] you can recreate them later [17:18] ah [17:18] so that I can upload them directly to the collection [17:19] ok [17:19] dont upload pr0n [17:19] heh [17:19] also, I don't want to have those videos in my "watchlist" :p [17:19] where is the watchlist? [17:21] nowhere, I mean the list of contributions [17:24] is that avialable? [17:24] i dont see anything like my profile or so [17:24] http://archive.org/catalog.php?history=1&justme=1 [17:41] emijrp, I suppose I should see some output? [17:41] nothing seems to be happening [17:41] yes [17:42] did you create download/videostodo.txt [17:42] keys.txt [17:42] oh [17:42] you need youtube-dl [17:42] yes, did everything [17:42] but for some reason the list of video is blank now *facepalms* [17:43] emijrp, what version are you using? [17:43] youtube-dl: error: no such option: --write-info-json [17:44] python youtube-dl -U [17:44] to update [17:48] emijrp, but I've just downloaded it! [17:48] weird [17:48] ah, it's in the same google code directory, too easy [17:49] I downloaded http://www.textfiles.com/videoyahoo/SCRIPTS/youtube-dl :p [17:50] it's a bit annoying that the script blanks the list of videos [17:50] hm, gave same error [17:50] afk no [17:50] w [18:04] glad to see you here SketchCow [18:10] emijrp, so what youtube-dl should I use? [18:10] the last version [18:11] do -U [18:18] emijrp, ok [18:19] I seem to be always getting ERROR: unable to download video webpage: HTTP Error 404: Not Found [18:20] em [18:20] try to launch youtube-dl alone [18:21] python youtube-dl -t -i -c url --write-info-json --format 18 [18:25] this way it woks [18:25] probably something wrong about language settings [18:28] emijrp, can't you remove the 1JCWblUBdH4.mp4 from the identifier in http://archive.org/details/spanishrevolution-Manifestacio_19J_Placa_Catalunya_Barcelona-1JCWblUBdH4.mp4 [18:28] the identifier never contains format, at least [18:29] seems to have been a false alarm then? the description has been added [18:31] did you upload that using the script? or handy? [18:36] script [18:37] emijrp, I see only 7 videos uploaded out of 10, 2 have been skipped for the size limit (set to 25 GB now) but I don't understand the last one [18:38] mmm [18:38] what it says?, perhaps it was skipped because it exists with that name? [18:39] can't find it, have to dig the output [18:44] dunno, no error shown [18:44] for some reason ManifestaciÃ³_19J_PlaÃ§a_Catalunya_Barcelona-1JCWblUBdH4.mp4 is the working directory [18:45] but was uploaded [18:47] emijrp, all ok, RSS feed missed the last entry [18:48] emijrp, all looks nice, if you don't want to change the identifiers or anything else I'd continue with all the others [18:49] wait a moment i check [18:49] the items uploaded [19:08] the slash is converted to % [19:08] 7/9 http://archive.org/details/spanishrevolution-Juglares_en_Biblio_Acampada_Sol_7_9-1YPywiNqON4.mp4 [19:14] Nemo_bis: i have added an re.sub to change % to / [19:15] make another batch of 10 please [19:15] ok [19:15] update the script [19:20] whoops [19:20] doing too many things at the same time [19:23] emijrp, [19:23] [download] 100.0% of 6.80M at 412.54k/s ETA 00:00 [19:23] [download] Destination: Î£ÏÎ½ÏÏÎ¿ÏÎ¿Ï_ÎÎ¿ÏÎºÎ¬Î½Î¹ÎºÎ¿Ï_-_CompaÃ±ero_Loukanikos-28fYNSrOqmI.mp4 [19:23] [info] Video description metadata as JSON to: Î£ÏÎ½ÏÏÎ¿ÏÎ¿Ï_ÎÎ¿ÏÎºÎ¬Î½Î¹ÎºÎ¿Ï_-_CompaÃ±ero_Loukanikos-28fYNSrOqmI.mp4.info.json [19:23] Traceback (most recent call last): [19:23] File "youtube2internetarchive.py", line 115, in [19:23] if not re.search(ur"Item cannot be found", urllib.urlopen('http://archive.org/details/%s' % (itemname)).read()): [19:23] File "/usr/lib/python2.7/urllib.py", line 84, in urlopen [19:23] return opener.open(url) [19:23] File "/usr/lib/python2.7/urllib.py", line 177, in open [19:23] fullurl = unwrap(toBytes(fullurl)) [19:23] File "/usr/lib/python2.7/urllib.py", line 1050, in toBytes [19:23] " contains non-ASCII characters") [19:23] UnicodeError: URL u'http://archive.org/details/spanishrevolution-\u03a3\u03c5\u03bd\u03c4\u03c1\u03bf\u03c6\u03bf\u03c2_\u039b\u03bf\u03c5\u03ba\u03b1\u03bd\u03b9\u03ba\u03bf\u03c2_-_Companero_Loukanikos-28fYNSrOqmI.mp4' contains non-ASCII characters [19:33] paste de error again Nemo_bis [19:34] ok [19:34] found it= [19:34] im wathicng the channel log [19:34] ok [19:35] Nemo_bis: try again with that video alone [19:36] fixed the script [19:38] emijrp, it's the same [19:38] maybe downloaded it too early [19:39] i change line 115, now it includes unicode(... [19:39] changed* [19:45] still getting the same error [19:46] retrying after deleting the downloaded files [19:46] same [19:54] emijrp, ? [20:03] i will see now, wait [20:14] fixed [20:14] http://archive.org/details/spanishrevolution-__-_Companero_Loukanikos-28fYNSrOqmI.mp4 [20:15] i remove all non a-z0-9_.- chars [20:15] that is IA form does [20:16] Nemo_bis: [20:16] launch 50 [20:49] emijrp, the following 50 ? [20:49] yep [21:05] hey emijrp, now you're ready to apply this experience to the wikis uploader, aren't you? :) [21:05] hello mutoso, how is the download of wikis going? [21:12] emijrp, can't you use more compact identifiers, like youtube-2t0HW4Cceuc , and leave the rest to title and filenames? [21:12] nooooo [21:12] titles are good for google results [21:12] the URL you mean? [21:13] the current solution doesn't even guarantee unique identifiers, does it? the alphanumeric part can be truncated and the rest could be not unique [21:13] i think that it is a limit case [21:25] emijrp, have you looked at the check output for the wikis I downloaded? [21:25] yes [21:25] emijrp, all ok? [21:25] how all ok? [21:26] they are partial dumps, if you resume, what happens? [21:26] I mean, does it sound reasonably ok or will I have to fix [21:26] noo I don't mean that [21:26] the links I put under "Final check" for each row in https://code.google.com/p/wikiteam/wiki/TaskForce [21:27] and I don't know what happened to those dumps, let's see if I manage to look [21:27] yes, that looks ok [21:28] the current batches seem to complete very quickly, I'd start some 50 more instances tonight [21:34] emijrp, so, some have been happily resumed, 2 are still broken, 1 failed again with some weird error like index.php not found or so [21:35] and at least it was not compressed and will be retried next time [21:40] to make the wiki uplaoder, i need to get the api for every wiki [21:41] and it is inside the 7z [22:05] emijrp, no, I kept all the config files [22:05] i will read the feed lists better [22:05] ? [22:06] the uploader will read feed lists, check if there is a 7z for that wiki and upload [22:06] the launcher now compresses in non-solid 7z so extracting the config should be very fast btw [22:06] as you want [22:07] if it looked for 7z (or directories) it could be used for any 7z generated by dumpgenerator, but I don't mind much [22:07] feed list = list of api.php, right? [22:07] do you add the config to the 7z? [22:08] config.txt? [22:08] uh, I don't remember [22:08] doesn't the launcher do so? [22:08] i dont think so, i excluded it [22:08] aww why [22:08] well, using lists as input might be easier anyway [22:09] script is doing last YouTube video of the 50 bunch [22:09] I think [22:09] because config.txt saves your local path, and if you use absolute path you show your PC username [22:10] oh [22:10] but launcher.py doesn't use absolute paths [22:16] emijrp, do you want me to upload all the remaining videos? [22:16] wait a second [22:17] ok [22:27] looks ok [22:27] upload the rest [22:43] k [22:52] i will check them tomorrow