Time |
Nickname |
Message |
14:38
🔗
|
bebzol |
arkiver - if you are reading it, could you please check my ownlog-grab repository? It's rewritten and all scripts seem to work fine |
16:01
🔗
|
arkiver |
bebzoll: sure! |
16:03
🔗
|
bebzol |
thanks :) |
16:04
🔗
|
bebzol |
I'm not on IRC all the time so please, write me an email: bebum@o2.pl :). Thanks in advance |
16:04
🔗
|
arkiver |
looks fine to me |
16:04
🔗
|
arkiver |
are you sure there are no url that need to be scrape, which are not automatically scraped by wget? |
16:05
🔗
|
arkiver |
did you check the files that are downloaded when you load a page to check for important pages that may not be scraped by wget? |
16:08
🔗
|
bebzol |
yeah, I confirmed it with debugging mode in Firefox - all website dependencies like CSS/images are handled by lua script |
16:10
🔗
|
arkiver |
ok, looks great then! |
16:11
🔗
|
arkiver |
so this https://github.com/ArchiveTeam/ownlog-grab/blob/master/pipeline.py#L176 will download the whole website |
16:11
🔗
|
bebzol |
cool :) |
16:12
🔗
|
bebzol |
yes |
16:12
🔗
|
bebzol |
I must go - I'll catch you later :) |
16:13
🔗
|
arkiver |
ok |
16:13
🔗
|
bebzol |
those blogs are really simple in structure |
16:13
🔗
|
arkiver |
bebzol |
16:13
🔗
|
arkiver |
I'll make a push requests later tonight |
16:37
🔗
|
arkiver |
bebzol: made a pull request |
19:40
🔗
|
bebzol |
arkiver, are you here? |
20:30
🔗
|
arkiver |
bebzol: yes |
20:32
🔗
|
bebzol |
thanks for changes in the code :). Actually --domain parameter should be only set for one particular subdomain, like xyz.ownlog.com - so wget doesn't grab main ownlog.com site and catalogues |
20:33
🔗
|
bebzol |
there are no external dependencies outside subdomain |
20:33
🔗
|
arkiver |
--domain for ownlog.com will also grab xyz.ownlog.com |
20:33
🔗
|
arkiver |
please take a look at the changes in the lua script |
20:33
🔗
|
arkiver |
it makes sure nothing else then *item_value*.ownlog.com is being downloaded |
20:34
🔗
|
arkiver |
that is the easiest way to do those tings |
20:34
🔗
|
arkiver |
things* |
20:34
🔗
|
arkiver |
and you can easily edit and add other kind of domains to it |
20:35
🔗
|
bebzol |
all right, now I see it |
20:35
🔗
|
bebzol |
tomorrow I'll make tests and I think we can go on production with it |
20:35
🔗
|
arkiver |
yep |
20:35
🔗
|
arkiver |
good luck! |
20:35
🔗
|
bebzol |
ok, thanks :) |
20:36
🔗
|
bebzol |
I'll let you know tomorrow |
20:39
🔗
|
arkiver |
I'll try to take a better look at the websites and make a second pull request if needed |
20:51
🔗
|
DFJustin |
https://twitter.com/DigitCurator/status/526810006629654528 |
22:49
🔗
|
SketchCow |
https://github.com/mamedev/mame/commit/f22371f389f3abac59a40028dc488406ec27f670 |
22:49
🔗
|
SketchCow |
Sorry, awesome mispaste |
23:51
🔗
|
ArloJames |
This must get annoying after a while (why not have a separate channel?) but |
23:51
🔗
|
ArloJames |
WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD |
23:52
🔗
|
chronomex |
yahoosucks |
23:52
🔗
|
ArloJames |
Thanks. |
23:53
🔗
|
chronomex |
ArloJames: if you're going to work on the wiki, please hang around in irc for a while |
23:54
🔗
|
ArloJames |
I had not planned to do any editing yet, just create an account. I will lurk moar, fear not. |
23:58
🔗
|
chronomex |
ok, cool |
23:58
🔗
|
chronomex |
good to meet you then |
23:58
🔗
|
chronomex |
or something i guess |