Time |
Nickname |
Message |
00:24
🔗
|
underscor |
ivan`: they wouldn't be in hadoop or any easily queriable state until they were fully injested into the wayback machine |
00:24
🔗
|
underscor |
the landing/temp storage repositories don't have enough juice to really do anything to the data they're receiving on-the-fly |
00:29
🔗
|
ivan` |
okay |
00:30
🔗
|
ivan` |
underscor: what if I gave you a Python program that read all the warcs at low nice/ionice, and wrote out some URL list |
00:30
🔗
|
ivan` |
that's what I meant by querying, sorry for being vague |
00:31
🔗
|
underscor |
ah, yeah, that would probably work |
00:31
🔗
|
ivan` |
cool |
00:31
🔗
|
omf_ |
ivan`, I would recommend looking at warcat. It is the newest warc program written by chfoo |
00:32
🔗
|
omf_ |
http://www.archiveteam.org/index.php?title=The_WARC_Ecosystem |
00:32
🔗
|
ivan` |
aaaaa Python 3 |
00:33
🔗
|
ivan` |
thanks though, I might give it a try |
05:09
🔗
|
ersi |
Cool. 15,439 downloads of seesaw from PyPi |
06:54
🔗
|
ivan` |
<alard> [19:37:29] The custom data is generated by a little bit of Ruby that runs on the tracker. This script gets the item name and can fill the data object with other things. |
06:55
🔗
|
ivan` |
where do I put this ruby code, and what calls it? |
07:07
🔗
|
ivan` |
oh, I see if rules = redis.get("#{ prefix }extra_parameters") |
07:07
🔗
|
ivan` |
eval(rules) |
07:12
🔗
|
ivan` |
so my universal-tracker is using a redis server I have running, but when I telnet to the redis server and do KEYS, I get nothing? |
07:12
🔗
|
ivan` |
*0 |
07:12
🔗
|
ivan` |
KEYS * |
07:15
🔗
|
ivan` |
oh, I assume it has multiple databases |
07:17
🔗
|
ivan` |
heh "By default there are 16 databases (indexed from 0 to 15) and you can navigate between them using select command. Number of databases can be changed in redis config file with databases setting." |
07:42
🔗
|
ivan` |
for the lucky person googling the log in the future, |
07:42
🔗
|
ivan` |
# redis-cli |
07:42
🔗
|
ivan` |
OK |
07:42
🔗
|
ivan` |
redis 127.0.0.1:6379> select 13 |
07:42
🔗
|
ivan` |
then insert some code like |
07:42
🔗
|
ivan` |
redis 127.0.0.1:6379[13]> set greader:extra_parameters 'data["task_urls"] = ["http://127.0.0.1/", "http://127.0.0.2/"]' |
07:43
🔗
|
ivan` |
OK |
07:45
🔗
|
ersi |
yay |
07:45
🔗
|
ersi |
Looks like you're streaming ahead :-) |
07:48
🔗
|
ivan` |
universal-tracker appears to be accepting items that do not match the valid item regexp |
07:48
🔗
|
ivan` |
I'm submitting through the textarea |
08:17
🔗
|
omf_ |
ivan`, we have an #archiveteam-tracker channel |
08:17
🔗
|
omf_ |
it might get people's attention there faster |
08:20
🔗
|
ersi |
He's developing a pipeline project - it's not just tracker talk dude |
08:20
🔗
|
ersi |
And that channel has another purpose |
08:35
🔗
|
ivan` |
too many channels |
08:35
🔗
|
ivan` |
omf_: but thanks, joined |
08:42
🔗
|
ersi |
That channel is more for talking about the tracker operations though |
12:29
🔗
|
ivan` |
alard: any idea why this block of code can be evaled inside irb just fine, but not inside calculate_extra_parameters? https://www.refheap.com/a7038a1414918ee6c6201fbb2/raw |
12:30
🔗
|
ivan` |
hm, I guess data can be nil |
12:31
🔗
|
ivan` |
oh, I think I have to look inside item instead of data |
12:34
🔗
|
ivan` |
yep, and item is just a string |
18:47
🔗
|
alard |
ivan`: Yes, I think that's it. I had the same problem at some point. item_name is set after calculate_extra_parameters, that's what makes it confusing. |