Time |
Nickname |
Message |
10:09
🔗
|
midas |
right so, the delay is different? |
10:17
🔗
|
Nemo_bis |
midas: till some time ago --delay was not applied to all kind of requests |
10:18
🔗
|
Nemo_bis |
I *think* I've added it everywhere now |
10:18
🔗
|
midas |
lol :p ill check if my current version is different from the one online now |
10:18
🔗
|
Nemo_bis |
But in addition to that there is some automatic delay when the server feels slow |
10:19
🔗
|
Nemo_bis |
On top of that, the python delay function used doesn't literally delay as much as you tell it to, AFAIK |
10:19
🔗
|
midas |
and to top that, it's IMSLP, seriously not cool |
10:19
🔗
|
midas |
but fuckit, im going to grab it. |
10:33
🔗
|
danneh_ |
Nemo_bis: Just wondering, I haven't done too much wiki archiving before, but I've got some old dumps uploaded on Archive.org (automatically uploaded by uploader.py) |
10:34
🔗
|
danneh_ |
If I do some new dumps of the same wikis and upload them with uploader.py , will it conflict with the old dumps already up there? |
10:34
🔗
|
danneh_ |
The old dumps were done at least a few months ago or so, just wanna try updating them |
11:07
🔗
|
SketchCow |
No. |
11:07
🔗
|
SketchCow |
Just be careful you're not doing dumps of massive wikis frequently |
11:08
🔗
|
danneh_ |
Fair enough. Will do, thanks for the advice |
11:09
🔗
|
danneh_ |
I can't throw up the archiveteam.org grab I just finished using uploader.py, but given it says "Item already exists.", I'm guessing it's just because someone else already 'has' the archiveteam wiki archive item, and they need to upload it themselves (or I could, changing the metadata/etc) |
11:11
🔗
|
Nemo_bis |
danneh_: that's because you don't have the rights to edit/manage those items |
11:11
🔗
|
Nemo_bis |
Can you tell me what wikis those are? |
11:13
🔗
|
Nemo_bis |
Ouch, I had downloaded it twice in about a month by mistake :P https://archive.org/details/wiki-archiveteamorg |
11:13
🔗
|
Nemo_bis |
But it's a small wiki anyway ;) |
11:13
🔗
|
danneh_ |
Yep. The one that's failing is the ArchiveTeam.org archive I just did, but if all else fails I'll just leave it 'til whoever normally archives/manages it does it |
11:13
🔗
|
danneh_ |
Aha, fair enough |
11:14
🔗
|
Nemo_bis |
If you upload your dump anywhere I'll update the item and delete one of mine |
11:14
🔗
|
Nemo_bis |
In general, if you want to archive many wikis we can just make you admin :) |
11:14
🔗
|
danneh_ |
It's on my server at home, not too sure where I'd be able to easily upload it via command line |
11:16
🔗
|
Nemo_bis |
If you want to try some mass downloading we'd need some of those: https://code.google.com/p/wikiteam/source/browse/trunk/batchdownload/taskforce/mediawikis_pavlo.alive.filtered.todo.txt |
11:16
🔗
|
Nemo_bis |
Let me know if you're interested and I'll update the list |
11:17
🔗
|
danneh_ |
Ah, those are currently undownloaded? |
11:17
🔗
|
Nemo_bis |
Those are wikis that failed download for me, for one reason or another |
11:18
🔗
|
danneh_ |
I've got about 250GB I can spare right now, unfortunately rest is allocated to archiving personal junk |
11:18
🔗
|
danneh_ |
But I'll do my best and letcha know if I'm able to grab stuff |
11:19
🔗
|
midas |
hmm, we should be thinking about organising this a little bit so we dont do them over and over again (multiple people grabbing the same wiki and such) |
11:19
🔗
|
danneh_ |
launcher.py is giving me a little trouble, though: http://pastebin.com/niyNQfH8 |
11:20
🔗
|
danneh_ |
Running Python 2.7.6 on Arch, dumpgenerator.py itself works (just used it to do that grab) |
11:20
🔗
|
danneh_ |
I'll have a look and try to see where it's failing, got some spare time |
11:21
🔗
|
midas |
seems you need to add some extra information in your command, but i never used launcher.py yet |
11:22
🔗
|
danneh_ |
oh, wait |
11:23
🔗
|
danneh_ |
it's because here: subprocess.call('python dumpgenerator.py --api=%s --xml --images' % wiki, shell=True) |
11:23
🔗
|
danneh_ |
that line assumes python is python2, which it is for pretty well everything except Arch (where python defaults to py3) |
11:24
🔗
|
Nemo_bis |
hmpf, thought I had fixed that |
11:25
🔗
|
Nemo_bis |
can just replace "python " with "./"? |
11:25
🔗
|
danneh_ |
that's what I was thinking, lemme try it out |
11:25
🔗
|
Nemo_bis |
What happens on Windows (horror!) I have no idea |
11:26
🔗
|
midas |
probably default to blue screen |
11:26
🔗
|
danneh_ |
Yep, that works fine when it's ./dumpgenerator.py instead of the python ... |
11:27
🔗
|
danneh_ |
Windows is a bit silly, especially in that it's not usually in their $PATH by default (have to manually add it, at least last time I installed it on Win) |
11:31
🔗
|
Nemo_bis |
./ will stop working if one forgot to make the file executable, but we can live with that I suppose |
11:31
🔗
|
danneh_ |
ah yeah, I had to chmod +x it |
11:32
🔗
|
danneh_ |
do file permissions carry over with svn source control? |
11:39
🔗
|
Nemo_bis |
I hope so on UNIX, but again no idea what happens on Windows |
11:50
🔗
|
danneh_ |
If we're really looking to go all-out on Windows support, you'd probably wanna do something like: locate_python() that finds the system python and throws the path in front of the command |
11:50
🔗
|
danneh_ |
we couldn't just do argv[-1] or something silly like that, to get the actual name/path of the program that started running launcher.py, maybe |
11:54
🔗
|
danneh_ |
ah, but even if we could that wouldn't work for ./launcher.py |
11:55
🔗
|
danneh_ |
we'd probably just need to do the function, which returns "./" on Posix-like and otherwise searches the usual Win32 directories for it |
11:55
🔗
|
danneh_ |
if we super wanted to |
11:56
🔗
|
Nemo_bis |
if something breaks and there is someone in the world who cares, we'll get to know them at last |
11:56
🔗
|
Nemo_bis |
danneh_: https://code.google.com/p/wikiteam/source/browse/trunk/batchdownload/taskforce/mediawikis_pavlo.alive.filtered.todo.txt is all for you, I removed some 100 wikis which completed in the last few months |
11:57
🔗
|
Nemo_bis |
It's always like that, 2000 wikis get done in one day and then the last 100 take months |
11:57
🔗
|
danneh_ |
Nemo_bis: if you're on a Windows system, mind trying this for me? http://pastebin.com/uWXLtaES |
11:57
🔗
|
Nemo_bis |
I'm not |
11:57
🔗
|
danneh_ |
hopefully that should be a quick, simple answer to it |
11:57
🔗
|
Nemo_bis |
Didn't touch Windows in many years |
11:57
🔗
|
danneh_ |
Ah, I'll try it at work tomorrow and letcha know if it returns the right thing |
11:58
🔗
|
danneh_ |
Also, thanks for the list. Know whether most of them are smallish, only got about 250GB to kick around right now unfortunately |
11:59
🔗
|
danneh_ |
I'll probably just glance through before I try downloading, get some sorta measure of their size |
11:59
🔗
|
Nemo_bis |
Sob, many of those wikis fail for certificate errors |
12:01
🔗
|
SketchCow |
We are their last hope. |
12:02
🔗
|
danneh_ |
I'll go through and do my best |
12:02
🔗
|
danneh_ |
'specially the ones just running off direct IPs |
12:02
🔗
|
Nemo_bis |
You could try some s/https/http/ |
12:02
🔗
|
danneh_ |
Yeah, that's what I'm hoping |
12:02
🔗
|
Nemo_bis |
I have no idea what makes most of those fail :) |
12:03
🔗
|
danneh_ |
All else fails, I'll just go through and try to coax Py to ignore the cert errors :) |
13:09
🔗
|
danneh_ |
Nemo_bis: Currently grabbing 163.13.175.46 , the first one failed with "Error. You forget mandatory parameters:" (suspect it's due to api.php not being enabled), and the second one failed to grab the main page ~5 times before I killed it and skipped it manually |
13:10
🔗
|
danneh_ |
But ah well, this one has about 60k pages, some Chinese baseball wiki so I'll leave it running for a few days and see how it goes |
13:24
🔗
|
Nemo_bis |
danneh_: ah, many many of those fail because they redirect to different URLs. If you also want to do some coding, you can try replacing urllib with Requests and see how many more wikis work https://code.google.com/p/wikiteam/issues/detail?id=104 |
13:26
🔗
|
danneh_ |
aha, Requests is the best |
13:27
🔗
|
danneh_ |
I'll do my best, hopefully it shouldn't be too difficult/annoying to port over |
13:28
🔗
|
danneh_ |
I'm pretty busy right now, so can't promise anything, but I'll do my best |
13:30
🔗
|
Nemo_bis |
Sure, I'm just giving ideas :) |
21:45
🔗
|
danneh_ |
Nemo_bis: Had things fail like this much before? http://pastebin.com/GrG3wckD |
22:02
🔗
|
danneh_ |
.n |
22:02
🔗
|
danneh_ |
Also, sorry 'bout bugging you in particular, you've just seemed to do most dev stuff from what I've seen in here! |