#urlteam 2013-10-02,Wed

↑back Search

Time Nickname Message
07:36 🔗 soultcer GLaDOS: I'm afraid I don't have a http server with enough diskspace for that
07:41 🔗 soultcer GLaDOS, I think there is a problem with the tracker
07:41 🔗 soultcer The tracker makes sure to only hand out one task for each service to the same IP, so that the scrapers don't get blocked
07:41 🔗 soultcer But your tracker only hands out one task for each service, for all IPs
07:41 🔗 omf_ soultcer, is the data available anywhere else online besides bit torrent?
07:42 🔗 soultcer I think the tracker does not get the real IP of the http request sender, but the IP of your reverse proxy
07:44 🔗 soultcer omf_, not that I know of. But the torrent should be seeded, according to the trackers
07:45 🔗 soultcer I'll try to add another seed
07:49 🔗 omf_ So 80gb for everything
07:49 🔗 GLaDOS Ah, derp, my mistake
07:49 🔗 GLaDOS How would I make apache pass the real IP?
07:53 🔗 GLaDOS Also, soultcer, sending it on an external HDD would be fine.
07:53 🔗 GLaDOS Actually, I have a perfect external for this
07:53 🔗 soultcer I think web.py uses the X-Forwarded-For header
07:54 🔗 soultcer Using a reverse proxy before the tracker is good though, because then the front page with the stats (which is expensive to generate) can be cached
08:07 🔗 GLaDOS ugh
08:07 🔗 GLaDOS not working
08:09 🔗 soultcer stupid nat and only barely working trackers
08:10 🔗 GLaDOS I'm going to try using varnish instead
08:10 🔗 soultcer Whatever works for you. I'm a huge nginx fan myself
08:11 🔗 GLaDOS Eh, I've yet to get into nginx
08:12 🔗 soultcer Ok, now I have found two peers for the torrent, both seeders
08:18 🔗 GLaDOS Ok, I'll try launching it again
08:41 🔗 omf_ well?
08:43 🔗 GLaDOS Seems like varnish fixed it.
08:43 🔗 GLaDOS All 6 of my scrapers got jobs
08:44 🔗 soultcer Yeah, each of them got one job for a different service
08:44 🔗 soultcer The problem still exists
08:45 🔗 GLaDOS Yeah, nevermind.
08:50 🔗 GLaDOS Yeah, web.ctx.ip is supposed to be returning the x-forwarded-for ip
08:51 🔗 GLaDOS 127.0.0.1:54662 - - [02/Oct/2013 10:53:07] "HTTP/1.1 GET /task/get" - 200 OK
08:51 🔗 GLaDOS Tracker only sees 127.0.0.1 though:
09:14 🔗 GLaDOS soultcer: do you suggest replacing web.ctx.ip with this function: http://code.activestate.com/recipes/577795-get-users-ip-address-even-when-theyre-behind-a-pro/
09:15 🔗 GLaDOS Or hell, just web.ctx.env['HTTP_X_FORWARDED_FOR']
09:27 🔗 GLaDOS funfact: disabling cache for /task/(.*) might help
09:27 🔗 GLaDOS OH GOD IP ADDRESS IS NOW NULL
09:28 🔗 GLaDOS (didn't change code btw)
09:48 🔗 GLaDOS I think it's handing out requests properly now
09:50 🔗 GLaDOS IT IS
09:50 🔗 GLaDOS soultcer: I FIXED IT
09:50 🔗 GLaDOS Turns out varnish wasn't passing X-FORWARDED-FOR at times
09:50 🔗 GLaDOS Forced it to do that, and not keep connection open
09:59 🔗 Cameron_D /data is returning nothing now :(
10:01 🔗 GLaDOS what
10:02 🔗 GLaDOS Cameron_D: did you mean: http://urlteam.terrywri.st/data/
10:02 🔗 GLaDOS the end slash matters, apparently
10:04 🔗 Cameron_D yeah, blank here
10:05 🔗 GLaDOS ..
10:06 🔗 GLaDOS It returns for me
10:09 🔗 Cameron_D its working on my phone, so maybe my proxy at home had cached nothing
10:09 🔗 GLaDOS probably.
10:10 🔗 ersi /data and /data/ are technically two very different things.
10:11 🔗 omf_ http://urlteam.terrywri.st/data/ works for me
10:12 🔗 ersi It takes a few moments, but works for me as well.
11:17 🔗 soultcer You are only allowing one concurrent access to the backend, right?
11:29 🔗 GLaDOS For everything but /task
11:29 🔗 GLaDOS If I did that with /task, the same issue would happen, just with the first connecting IP
11:39 🔗 soultcer huh?
11:39 🔗 soultcer sqlite3 doesn't support locks and the task assignment is a two-step process: a) Select a suitable task b) assign task to user
11:40 🔗 GLaDOS I know, but it can't be a continuous connection if we want to keep sending the x-forwarded-for connection
11:40 🔗 ersi ie: at least one read and one write
11:40 🔗 GLaDOS At least that's what I read
11:42 🔗 soultcer HTTP is stateless. You can send whatever headers you want with each request.
11:42 🔗 soultcer The problem is, there should not be more than one write access on the db. sqlite supports transactions and so on, so it won't break or corrupt the database
11:42 🔗 soultcer But it could accidentally hand out the same task twice
11:48 🔗 GLaDOS Okay, removed the connection closing.
12:35 🔗 soultcer When you get the time you should just throw the whole tracker thing away and rewrite it to actually work without any hacks ;-)
12:37 🔗 GLaDOS That's implying that I can code for shit.
12:37 🔗 GLaDOS Might do it as a school project
12:37 🔗 GLaDOS "And why did you do this?" "Because URL shorteners were a fucking awful idea."
13:02 🔗 ersi "See this short link"
13:03 🔗 ersi -"It.. doesn't work."
13:03 🔗 ersi "Point taken yet?"
13:05 🔗 GLaDOS Put all my references as short urls on a about to die URL shortener
22:58 🔗 SketchCow Do we need anything here?
22:58 🔗 SketchCow Tracker is not workikng?

irclogger-viewer