#warrior 2013-05-28,Tue

↑back Search

Time Nickname Message
00:24 🔗 underscor ivan`: they wouldn't be in hadoop or any easily queriable state until they were fully injested into the wayback machine
00:24 🔗 underscor the landing/temp storage repositories don't have enough juice to really do anything to the data they're receiving on-the-fly
00:29 🔗 ivan` okay
00:30 🔗 ivan` underscor: what if I gave you a Python program that read all the warcs at low nice/ionice, and wrote out some URL list
00:30 🔗 ivan` that's what I meant by querying, sorry for being vague
00:31 🔗 underscor ah, yeah, that would probably work
00:31 🔗 ivan` cool
00:31 🔗 omf_ ivan`, I would recommend looking at warcat. It is the newest warc program written by chfoo
00:32 🔗 omf_ http://www.archiveteam.org/index.php?title=The_WARC_Ecosystem
00:32 🔗 ivan` aaaaa Python 3
00:33 🔗 ivan` thanks though, I might give it a try
05:09 🔗 ersi Cool. 15,439 downloads of seesaw from PyPi
06:54 🔗 ivan` <alard> [19:37:29] The custom data is generated by a little bit of Ruby that runs on the tracker. This script gets the item name and can fill the data object with other things.
06:55 🔗 ivan` where do I put this ruby code, and what calls it?
07:07 🔗 ivan` oh, I see if rules = redis.get("#{ prefix }extra_parameters")
07:07 🔗 ivan` eval(rules)
07:12 🔗 ivan` so my universal-tracker is using a redis server I have running, but when I telnet to the redis server and do KEYS, I get nothing?
07:12 🔗 ivan` *0
07:12 🔗 ivan` KEYS *
07:15 🔗 ivan` oh, I assume it has multiple databases
07:17 🔗 ivan` heh "By default there are 16 databases (indexed from 0 to 15) and you can navigate between them using select command. Number of databases can be changed in redis config file with databases setting."
07:42 🔗 ivan` for the lucky person googling the log in the future,
07:42 🔗 ivan` # redis-cli
07:42 🔗 ivan` OK
07:42 🔗 ivan` redis 127.0.0.1:6379> select 13
07:42 🔗 ivan` then insert some code like
07:42 🔗 ivan` redis 127.0.0.1:6379[13]> set greader:extra_parameters 'data["task_urls"] = ["http://127.0.0.1/", "http://127.0.0.2/"]'
07:43 🔗 ivan` OK
07:45 🔗 ersi yay
07:45 🔗 ersi Looks like you're streaming ahead :-)
07:48 🔗 ivan` universal-tracker appears to be accepting items that do not match the valid item regexp
07:48 🔗 ivan` I'm submitting through the textarea
08:17 🔗 omf_ ivan`, we have an #archiveteam-tracker channel
08:17 🔗 omf_ it might get people's attention there faster
08:20 🔗 ersi He's developing a pipeline project - it's not just tracker talk dude
08:20 🔗 ersi And that channel has another purpose
08:35 🔗 ivan` too many channels
08:35 🔗 ivan` omf_: but thanks, joined
08:42 🔗 ersi That channel is more for talking about the tracker operations though
12:29 🔗 ivan` alard: any idea why this block of code can be evaled inside irb just fine, but not inside calculate_extra_parameters? https://www.refheap.com/a7038a1414918ee6c6201fbb2/raw
12:30 🔗 ivan` hm, I guess data can be nil
12:31 🔗 ivan` oh, I think I have to look inside item instead of data
12:34 🔗 ivan` yep, and item is just a string
18:47 🔗 alard ivan`: Yes, I think that's it. I had the same problem at some point. item_name is set after calculate_extra_parameters, that's what makes it confusing.

irclogger-viewer