Time |
Nickname |
Message |
12:49
π
|
Nemo_bis |
hmpf http://p.defau.lt/?kiCBGoPJdWS009nCngWidA |
12:51
π
|
Nemo_bis |
uh, I summoned emijrp |
12:52
π
|
Nemo_bis |
emijrp, do you know if dumpgenerator is able to resume broken xml like these? hmpf http://p.defau.lt/?kiCBGoPJdWS009nCngWidA |
12:53
π
|
Nemo_bis |
and btw I'm at 3418 7z to upload |
12:54
π
|
emijrp |
did you try? |
12:54
π
|
emijrp |
it removes the last <page> until EOF |
12:54
π
|
emijrp |
and resume |
12:54
π
|
Nemo_bis |
sometimes I get 7z which contain broken xml like those |
12:55
π
|
Nemo_bis |
but it might be launcher's fault, I suppose it happens when the download fails with unexpected reasons |
12:55
π
|
Nemo_bis |
"sometimes" = at least a hundred dozens times until now |
12:55
π
|
Nemo_bis |
s/dozens// |
12:55
π
|
emijrp |
there is not check before to launch 7z |
12:56
π
|
Nemo_bis |
I now added one |
12:56
π
|
Nemo_bis |
emijrp, https://code.google.com/p/wikiteam/source/diff?spec=svn648&r=648&format=side&path=/trunk/batchdownload/launcher.py&old_path=/trunk/batchdownload/launcher.py&old=613 |
12:56
π
|
Nemo_bis |
I hope it works despite me not knowing python at all... |
12:58
π
|
emijrp |
i dont know, i didnt tested it |
12:58
π
|
emijrp |
i get an error using S3 |
12:58
π
|
emijrp |
http://pastebin.com/czCM8Rt7 |
12:59
π
|
Nemo_bis |
what's the error? |
12:59
π
|
emijrp |
reload |
13:00
π
|
Nemo_bis |
are you sure it accepts those filenames? |
13:01
π
|
Nemo_bis |
when you use the web interface, it tells you that spaces are removed, it's converted to CamelCase etc.; s3 probabl doesn't do it for you |
13:02
π
|
emijrp |
there is no spaces in the filename |
13:02
π
|
Nemo_bis |
I think underscores are not accepted either |
13:04
π
|
Nemo_bis |
Filenames with SPACE characters will be renamed to "camel case". |
13:04
π
|
Nemo_bis |
For example, "my LA roadtrip.mov" will become "MyLaRoadtrip.mov" |
13:04
π
|
Nemo_bis |
Letters, numbers, periods (.), dashes (-), or underscores (_) are ok, but other characters will be removed. |
13:05
π
|
Nemo_bis |
oh, emijrp, perhaps it's just that you have to specify a collection |
13:07
π
|
emijrp |
i did it before, but removed it while debuggin |
13:07
π
|
emijrp |
not work |
13:07
π
|
Nemo_bis |
opensource_movies ? |
13:07
π
|
Nemo_bis |
that's the only collection you're allowed to upload to AFAIK |
13:08
π
|
* |
Nemo_bis idiot |
13:08
π
|
Nemo_bis |
it's just that you forgot to put the item name |
13:08
π
|
Nemo_bis |
the last URL should have a "directory" which will become the item identifier |
13:10
π
|
Nemo_bis |
Identifier "spanishrevolution-Madrid_Manifestacin_24_de_Julio_-_Hacia_el_congreso-00ytP9" is available. |
13:10
π
|
emijrp |
max length issues? |
13:10
π
|
Nemo_bis |
probably must be ASCII and not longer than that |
13:10
π
|
Nemo_bis |
ΓΒ³ |
13:11
π
|
Nemo_bis |
--upload-file Madrid_ManifestaciΓΒ³n_24_de_Julio_-_Hacia_el_congreso-00ytP96jGKU.mp4 http://s3.us.archive.org/spanishrevolution-Madrid_Manifestacion_24_de_Julio_-_Hacia_el_congreso/spanishrevolution-Madrid_ManifestaciΓΒ³n_24_de_Julio_-_Hacia_el_congreso-00ytP96jGKU.mp4 |
13:11
π
|
Nemo_bis |
this should work? shorter identifier would be nice though |
13:13
π
|
emijrp |
nope |
13:13
π
|
emijrp |
error |
13:14
π
|
Nemo_bis |
remove the ΓΒ³ from the filename too? |
13:17
π
|
emijrp |
too |
13:17
π
|
Nemo_bis |
maybe an identifier + filename length limit? |
13:22
π
|
emijrp |
if i paste it the command line it works |
13:22
π
|
emijrp |
http://archive.org/details/spanishrevolution-Madrid_Manifestacion_24_de_Jul |
13:47
π
|
emijrp |
the propble is using subprocess |
13:47
π
|
emijrp |
os.system() works |
13:47
π
|
emijrp |
problem* |
13:50
π
|
Nemo_bis |
emijrp, why problem? |
13:50
π
|
emijrp |
not sure |
13:51
π
|
Nemo_bis |
it seems to work for me |
13:51
π
|
emijrp |
http://archive.org/details/spanishrevolution-15M_democracia_real_ya_Madrid-03YgOi0yOfo.mp4 |
13:51
π
|
Nemo_bis |
oh, you mean for this upload |
13:51
π
|
Nemo_bis |
you're creating a YouTube archiver? |
13:51
π
|
emijrp |
only for spanish revolution videos |
13:52
π
|
Nemo_bis |
ok |
13:53
π
|
Nemo_bis |
what error do you get with subprocess? |
13:54
π
|
Nemo_bis |
and how do you call it exactly |
13:54
π
|
Nemo_bis |
I've fought with it a bit too this morning |
13:57
π
|
emijrp |
^[a-zA-Z0-9][a-zA-Z0-9_.-]{4,100}$ |
13:57
π
|
emijrp |
that is the item filter |
13:57
π
|
emijrp |
<Error><Code>InvalidBucketName</Code><Message>The specified bucket is not valid.</Message><Resource>Bucket names should be valid archive identifiers; try someting matching this regular expression: ^[a-zA-Z0-9][a-zA-Z0-9_.-]{4,100}$ |
15:14
π
|
emijrp |
Nemo_bis: what is you upstreamn bandwith? |
15:21
π
|
emijrp |
an error while upgrading mediawiki in referata wikifarm has deleted images in 10 wikis |
15:21
π
|
emijrp |
forwarding email to the group |
15:28
π
|
Nemo_bis |
emijrp, 10 Mb/s at home |
15:35
π
|
Nemo_bis |
I don't understand how my upstream can help |
15:38
π
|
Nemo_bis |
he might perhaps ask Tim Starling how he recovered those files he lost a while ago |
15:38
π
|
* |
Nemo_bis doesn't remember the details |
15:38
π
|
emijrp |
they recovered them from cache, some backups and people re-uploading |
15:39
π
|
emijrp |
you cant undelete files on linux |
15:39
π
|
emijrp |
(easily) |
15:39
π
|
Nemo_bis |
ok |
15:39
π
|
Nemo_bis |
so, I can help in some way? |
15:43
π
|
emijrp |
no |
15:44
π
|
emijrp |
i asked because i need to upload many CC videos here http://archive.org/details/spanishrevolution |
15:48
π
|
emijrp |
do you want to help? Nemo_bis |
15:49
π
|
Nemo_bis |
emijrp, ok if it's just running commands |
15:49
π
|
emijrp |
is there any way i can edit the items you upload? |
16:34
π
|
emijrp |
Nemo_bis: https://code.google.com/p/emijrp/source/browse/trunk/scrapers/youtube2internetarchive.py |
16:35
π
|
emijrp |
and the videos http://pastebin.com/06Vt2qNx |
16:35
π
|
emijrp |
the script include instuctions |
16:47
π
|
emijrp |
mm |
16:47
π
|
emijrp |
im not sure if you can add items to that collection, you are not an admin |
16:48
π
|
emijrp |
em.h. |
16:53
π
|
emijrp |
if that case, just comment the collection header line, and the subject "spanishrevolution" will be enough |
17:18
π
|
Nemo_bis |
emijrp, can't you give me your s3 keys for now? |
17:18
π
|
emijrp |
why? |
17:18
π
|
Nemo_bis |
you can recreate them later |
17:18
π
|
emijrp |
ah |
17:18
π
|
Nemo_bis |
so that I can upload them directly to the collection |
17:19
π
|
emijrp |
ok |
17:19
π
|
emijrp |
dont upload pr0n |
17:19
π
|
Nemo_bis |
heh |
17:19
π
|
Nemo_bis |
also, I don't want to have those videos in my "watchlist" :p |
17:19
π
|
emijrp |
where is the watchlist? |
17:21
π
|
Nemo_bis |
nowhere, I mean the list of contributions |
17:24
π
|
emijrp |
is that avialable? |
17:24
π
|
emijrp |
i dont see anything like my profile or so |
17:24
π
|
Nemo_bis |
http://archive.org/catalog.php?history=1&justme=1 |
17:41
π
|
Nemo_bis |
emijrp, I suppose I should see some output? |
17:41
π
|
Nemo_bis |
nothing seems to be happening |
17:41
π
|
emijrp |
yes |
17:42
π
|
emijrp |
did you create download/videostodo.txt |
17:42
π
|
emijrp |
keys.txt |
17:42
π
|
emijrp |
oh |
17:42
π
|
emijrp |
you need youtube-dl |
17:42
π
|
Nemo_bis |
yes, did everything |
17:42
π
|
Nemo_bis |
but for some reason the list of video is blank now *facepalms* |
17:43
π
|
Nemo_bis |
emijrp, what version are you using? |
17:43
π
|
Nemo_bis |
youtube-dl: error: no such option: --write-info-json |
17:44
π
|
emijrp |
python youtube-dl -U |
17:44
π
|
emijrp |
to update |
17:48
π
|
Nemo_bis |
emijrp, but I've just downloaded it! |
17:48
π
|
emijrp |
weird |
17:48
π
|
Nemo_bis |
ah, it's in the same google code directory, too easy |
17:49
π
|
Nemo_bis |
I downloaded http://www.textfiles.com/videoyahoo/SCRIPTS/youtube-dl :p |
17:50
π
|
Nemo_bis |
it's a bit annoying that the script blanks the list of videos |
17:50
π
|
Nemo_bis |
hm, gave same error |
17:50
π
|
Nemo_bis |
afk no |
17:50
π
|
Nemo_bis |
w |
18:04
π
|
emijrp |
glad to see you here SketchCow |
18:10
π
|
Nemo_bis |
emijrp, so what youtube-dl should I use? |
18:10
π
|
emijrp |
the last version |
18:11
π
|
emijrp |
do -U |
18:18
π
|
Nemo_bis |
emijrp, ok |
18:19
π
|
Nemo_bis |
I seem to be always getting ERROR: unable to download video webpage: HTTP Error 404: Not Found |
18:20
π
|
emijrp |
em |
18:20
π
|
emijrp |
try to launch youtube-dl alone |
18:21
π
|
emijrp |
python youtube-dl -t -i -c url --write-info-json --format 18 |
18:25
π
|
Nemo_bis |
this way it woks |
18:25
π
|
Nemo_bis |
probably something wrong about language settings |
18:28
π
|
Nemo_bis |
emijrp, can't you remove the 1JCWblUBdH4.mp4 from the identifier in http://archive.org/details/spanishrevolution-Manifestacio_19J_Placa_Catalunya_Barcelona-1JCWblUBdH4.mp4 |
18:28
π
|
Nemo_bis |
the identifier never contains format, at least |
18:29
π
|
Nemo_bis |
seems to have been a false alarm then? the description has been added |
18:31
π
|
emijrp |
did you upload that using the script? or handy? |
18:36
π
|
Nemo_bis |
script |
18:37
π
|
Nemo_bis |
emijrp, I see only 7 videos uploaded out of 10, 2 have been skipped for the size limit (set to 25 GB now) but I don't understand the last one |
18:38
π
|
emijrp |
mmm |
18:38
π
|
emijrp |
what it says?, perhaps it was skipped because it exists with that name? |
18:39
π
|
Nemo_bis |
can't find it, have to dig the output |
18:44
π
|
Nemo_bis |
dunno, no error shown |
18:44
π
|
Nemo_bis |
for some reason ManifestaciΓΒ³_19J_PlaΓΒ§a_Catalunya_Barcelona-1JCWblUBdH4.mp4 is the working directory |
18:45
π
|
Nemo_bis |
but was uploaded |
18:47
π
|
Nemo_bis |
emijrp, all ok, RSS feed missed the last entry |
18:48
π
|
Nemo_bis |
emijrp, all looks nice, if you don't want to change the identifiers or anything else I'd continue with all the others |
18:49
π
|
emijrp |
wait a moment i check |
18:49
π
|
emijrp |
the items uploaded |
19:08
π
|
emijrp |
the slash is converted to % |
19:08
π
|
emijrp |
7/9 http://archive.org/details/spanishrevolution-Juglares_en_Biblio_Acampada_Sol_7_9-1YPywiNqON4.mp4 |
19:14
π
|
emijrp |
Nemo_bis: i have added an re.sub to change % to / |
19:15
π
|
emijrp |
make another batch of 10 please |
19:15
π
|
Nemo_bis |
ok |
19:15
π
|
emijrp |
update the script |
19:20
π
|
Nemo_bis |
whoops |
19:20
π
|
Nemo_bis |
doing too many things at the same time |
19:23
π
|
Nemo_bis |
emijrp, |
19:23
π
|
Nemo_bis |
[download] 100.0% of 6.80M at 412.54k/s ETA 00:00 |
19:23
π
|
Nemo_bis |
[download] Destination: ΓΒ£ΓΒΓΒ½ΓΒΓΒΓΒΏΓΒΓΒΏΓΒ_ΓΒΓΒΏΓΒ
ΓΒΊΓΒ¬ΓΒ½ΓΒΉΓΒΊΓΒΏΓΒ_-_CompaΓΒ±ero_Loukanikos-28fYNSrOqmI.mp4 |
19:23
π
|
Nemo_bis |
[info] Video description metadata as JSON to: ΓΒ£ΓΒΓΒ½ΓΒΓΒΓΒΏΓΒΓΒΏΓΒ_ΓΒΓΒΏΓΒ
ΓΒΊΓΒ¬ΓΒ½ΓΒΉΓΒΊΓΒΏΓΒ_-_CompaΓΒ±ero_Loukanikos-28fYNSrOqmI.mp4.info.json |
19:23
π
|
Nemo_bis |
Traceback (most recent call last): |
19:23
π
|
Nemo_bis |
File "youtube2internetarchive.py", line 115, in <module> |
19:23
π
|
Nemo_bis |
if not re.search(ur"Item cannot be found", urllib.urlopen('http://archive.org/details/%s' % (itemname)).read()): |
19:23
π
|
Nemo_bis |
File "/usr/lib/python2.7/urllib.py", line 84, in urlopen |
19:23
π
|
Nemo_bis |
return opener.open(url) |
19:23
π
|
Nemo_bis |
File "/usr/lib/python2.7/urllib.py", line 177, in open |
19:23
π
|
Nemo_bis |
fullurl = unwrap(toBytes(fullurl)) |
19:23
π
|
Nemo_bis |
File "/usr/lib/python2.7/urllib.py", line 1050, in toBytes |
19:23
π
|
Nemo_bis |
" contains non-ASCII characters") |
19:23
π
|
Nemo_bis |
UnicodeError: URL u'http://archive.org/details/spanishrevolution-\u03a3\u03c5\u03bd\u03c4\u03c1\u03bf\u03c6\u03bf\u03c2_\u039b\u03bf\u03c5\u03ba\u03b1\u03bd\u03b9\u03ba\u03bf\u03c2_-_Companero_Loukanikos-28fYNSrOqmI.mp4' contains non-ASCII characters |
19:33
π
|
emijrp |
paste de error again Nemo_bis |
19:34
π
|
emijrp |
ok |
19:34
π
|
Nemo_bis |
found it= |
19:34
π
|
emijrp |
im wathicng the channel log |
19:34
π
|
Nemo_bis |
ok |
19:35
π
|
emijrp |
Nemo_bis: try again with that video alone |
19:36
π
|
emijrp |
fixed the script |
19:38
π
|
Nemo_bis |
emijrp, it's the same |
19:38
π
|
Nemo_bis |
maybe downloaded it too early |
19:39
π
|
emijrp |
i change line 115, now it includes unicode(... |
19:39
π
|
emijrp |
changed* |
19:45
π
|
Nemo_bis |
still getting the same error |
19:46
π
|
Nemo_bis |
retrying after deleting the downloaded files |
19:46
π
|
Nemo_bis |
same |
19:54
π
|
Nemo_bis |
emijrp, ? |
20:03
π
|
emijrp |
i will see now, wait |
20:14
π
|
emijrp |
fixed |
20:14
π
|
emijrp |
http://archive.org/details/spanishrevolution-__-_Companero_Loukanikos-28fYNSrOqmI.mp4 |
20:15
π
|
emijrp |
i remove all non a-z0-9_.- chars |
20:15
π
|
emijrp |
that is IA form does |
20:16
π
|
emijrp |
Nemo_bis: |
20:16
π
|
emijrp |
launch 50 |
20:49
π
|
Nemo_bis |
emijrp, the following 50 ? |
20:49
π
|
emijrp |
yep |
21:05
π
|
Nemo_bis |
hey emijrp, now you're ready to apply this experience to the wikis uploader, aren't you? :) |
21:05
π
|
Nemo_bis |
hello mutoso, how is the download of wikis going? |
21:12
π
|
Nemo_bis |
emijrp, can't you use more compact identifiers, like youtube-2t0HW4Cceuc , and leave the rest to title and filenames? |
21:12
π
|
emijrp |
nooooo |
21:12
π
|
emijrp |
titles are good for google results |
21:12
π
|
Nemo_bis |
the URL you mean? |
21:13
π
|
Nemo_bis |
the current solution doesn't even guarantee unique identifiers, does it? the alphanumeric part can be truncated and the rest could be not unique |
21:13
π
|
emijrp |
i think that it is a limit case |
21:25
π
|
Nemo_bis |
emijrp, have you looked at the check output for the wikis I downloaded? |
21:25
π
|
emijrp |
yes |
21:25
π
|
Nemo_bis |
emijrp, all ok? |
21:25
π
|
emijrp |
how all ok? |
21:26
π
|
emijrp |
they are partial dumps, if you resume, what happens? |
21:26
π
|
Nemo_bis |
I mean, does it sound reasonably ok or will I have to fix |
21:26
π
|
Nemo_bis |
noo I don't mean that |
21:26
π
|
Nemo_bis |
the links I put under "Final check" for each row in https://code.google.com/p/wikiteam/wiki/TaskForce |
21:27
π
|
Nemo_bis |
and I don't know what happened to those dumps, let's see if I manage to look |
21:27
π
|
emijrp |
yes, that looks ok |
21:28
π
|
Nemo_bis |
the current batches seem to complete very quickly, I'd start some 50 more instances tonight |
21:34
π
|
Nemo_bis |
emijrp, so, some have been happily resumed, 2 are still broken, 1 failed again with some weird error like index.php not found or so |
21:35
π
|
Nemo_bis |
and at least it was not compressed and will be retried next time |
21:40
π
|
emijrp |
to make the wiki uplaoder, i need to get the api for every wiki |
21:41
π
|
emijrp |
and it is inside the 7z |
22:05
π
|
Nemo_bis |
emijrp, no, I kept all the config files |
22:05
π
|
emijrp |
i will read the feed lists better |
22:05
π
|
Nemo_bis |
? |
22:06
π
|
emijrp |
the uploader will read feed lists, check if there is a 7z for that wiki and upload |
22:06
π
|
Nemo_bis |
the launcher now compresses in non-solid 7z so extracting the config should be very fast btw |
22:06
π
|
Nemo_bis |
as you want |
22:07
π
|
Nemo_bis |
if it looked for 7z (or directories) it could be used for any 7z generated by dumpgenerator, but I don't mind much |
22:07
π
|
Nemo_bis |
feed list = list of api.php, right? |
22:07
π
|
emijrp |
do you add the config to the 7z? |
22:08
π
|
emijrp |
config.txt? |
22:08
π
|
Nemo_bis |
uh, I don't remember |
22:08
π
|
Nemo_bis |
doesn't the launcher do so? |
22:08
π
|
emijrp |
i dont think so, i excluded it |
22:08
π
|
Nemo_bis |
aww why |
22:08
π
|
Nemo_bis |
well, using lists as input might be easier anyway |
22:09
π
|
Nemo_bis |
script is doing last YouTube video of the 50 bunch |
22:09
π
|
Nemo_bis |
I think |
22:09
π
|
emijrp |
because config.txt saves your local path, and if you use absolute path you show your PC username |
22:10
π
|
Nemo_bis |
oh |
22:10
π
|
Nemo_bis |
but launcher.py doesn't use absolute paths |
22:16
π
|
Nemo_bis |
emijrp, do you want me to upload all the remaining videos? |
22:16
π
|
emijrp |
wait a second |
22:17
π
|
Nemo_bis |
ok |
22:27
π
|
emijrp |
looks ok |
22:27
π
|
emijrp |
upload the rest |
22:43
π
|
Nemo_bis |
k |
22:52
π
|
emijrp |
i will check them tomorrow |