[00:24] ivan`: they wouldn't be in hadoop or any easily queriable state until they were fully injested into the wayback machine [00:24] the landing/temp storage repositories don't have enough juice to really do anything to the data they're receiving on-the-fly [00:29] okay [00:30] underscor: what if I gave you a Python program that read all the warcs at low nice/ionice, and wrote out some URL list [00:30] that's what I meant by querying, sorry for being vague [00:31] ah, yeah, that would probably work [00:31] cool [00:31] ivan`, I would recommend looking at warcat. It is the newest warc program written by chfoo [00:32] http://www.archiveteam.org/index.php?title=The_WARC_Ecosystem [00:32] aaaaa Python 3 [00:33] thanks though, I might give it a try [05:09] Cool. 15,439 downloads of seesaw from PyPi [06:54] [19:37:29] The custom data is generated by a little bit of Ruby that runs on the tracker. This script gets the item name and can fill the data object with other things. [06:55] where do I put this ruby code, and what calls it? [07:07] oh, I see if rules = redis.get("#{ prefix }extra_parameters") [07:07] eval(rules) [07:12] so my universal-tracker is using a redis server I have running, but when I telnet to the redis server and do KEYS, I get nothing? [07:12] *0 [07:12] KEYS * [07:15] oh, I assume it has multiple databases [07:17] heh "By default there are 16 databases (indexed from 0 to 15) and you can navigate between them using select command. Number of databases can be changed in redis config file with databases setting." [07:42] for the lucky person googling the log in the future, [07:42] # redis-cli [07:42] OK [07:42] redis 127.0.0.1:6379> select 13 [07:42] then insert some code like [07:42] redis 127.0.0.1:6379[13]> set greader:extra_parameters 'data["task_urls"] = ["http://127.0.0.1/", "http://127.0.0.2/"]' [07:43] OK [07:45] yay [07:45] Looks like you're streaming ahead :-) [07:48] universal-tracker appears to be accepting items that do not match the valid item regexp [07:48] I'm submitting through the textarea [08:17] ivan`, we have an #archiveteam-tracker channel [08:17] it might get people's attention there faster [08:20] He's developing a pipeline project - it's not just tracker talk dude [08:20] And that channel has another purpose [08:35] too many channels [08:35] omf_: but thanks, joined [08:42] That channel is more for talking about the tracker operations though [12:29] alard: any idea why this block of code can be evaled inside irb just fine, but not inside calculate_extra_parameters? https://www.refheap.com/a7038a1414918ee6c6201fbb2/raw [12:30] hm, I guess data can be nil [12:31] oh, I think I have to look inside item instead of data [12:34] yep, and item is just a string [18:47] ivan`: Yes, I think that's it. I had the same problem at some point. item_name is set after calculate_extra_parameters, that's what makes it confusing.