#urlteam 2013-10-02,Wed

↑back Search

Time	Nickname	Message
07:36 ^🔗	soultcer	GLaDOS: I'm afraid I don't have a http server with enough diskspace for that
07:41 ^🔗	soultcer	GLaDOS, I think there is a problem with the tracker
07:41 ^🔗	soultcer	The tracker makes sure to only hand out one task for each service to the same IP, so that the scrapers don't get blocked
07:41 ^🔗	soultcer	But your tracker only hands out one task for each service, for all IPs
07:41 ^🔗	omf_	soultcer, is the data available anywhere else online besides bit torrent?
07:42 ^🔗	soultcer	I think the tracker does not get the real IP of the http request sender, but the IP of your reverse proxy
07:44 ^🔗	soultcer	omf_, not that I know of. But the torrent should be seeded, according to the trackers
07:45 ^🔗	soultcer	I'll try to add another seed
07:49 ^🔗	omf_	So 80gb for everything
07:49 ^🔗	GLaDOS	Ah, derp, my mistake
07:49 ^🔗	GLaDOS	How would I make apache pass the real IP?
07:53 ^🔗	GLaDOS	Also, soultcer, sending it on an external HDD would be fine.
07:53 ^🔗	GLaDOS	Actually, I have a perfect external for this
07:53 ^🔗	soultcer	I think web.py uses the X-Forwarded-For header
07:54 ^🔗	soultcer	Using a reverse proxy before the tracker is good though, because then the front page with the stats (which is expensive to generate) can be cached
08:07 ^🔗	GLaDOS	ugh
08:07 ^🔗	GLaDOS	not working
08:09 ^🔗	soultcer	stupid nat and only barely working trackers
08:10 ^🔗	GLaDOS	I'm going to try using varnish instead
08:10 ^🔗	soultcer	Whatever works for you. I'm a huge nginx fan myself
08:11 ^🔗	GLaDOS	Eh, I've yet to get into nginx
08:12 ^🔗	soultcer	Ok, now I have found two peers for the torrent, both seeders
08:18 ^🔗	GLaDOS	Ok, I'll try launching it again
08:41 ^🔗	omf_	well?
08:43 ^🔗	GLaDOS	Seems like varnish fixed it.
08:43 ^🔗	GLaDOS	All 6 of my scrapers got jobs
08:44 ^🔗	soultcer	Yeah, each of them got one job for a different service
08:44 ^🔗	soultcer	The problem still exists
08:45 ^🔗	GLaDOS	Yeah, nevermind.
08:50 ^🔗	GLaDOS	Yeah, web.ctx.ip is supposed to be returning the x-forwarded-for ip
08:51 ^🔗	GLaDOS	127.0.0.1:54662 - - [02/Oct/2013 10:53:07] "HTTP/1.1 GET /task/get" - 200 OK
08:51 ^🔗	GLaDOS	Tracker only sees 127.0.0.1 though:
09:14 ^🔗	GLaDOS	soultcer: do you suggest replacing web.ctx.ip with this function: http://code.activestate.com/recipes/577795-get-users-ip-address-even-when-theyre-behind-a-pro/
09:15 ^🔗	GLaDOS	Or hell, just web.ctx.env['HTTP_X_FORWARDED_FOR']
09:27 ^🔗	GLaDOS	funfact: disabling cache for /task/(.*) might help
09:27 ^🔗	GLaDOS	OH GOD IP ADDRESS IS NOW NULL
09:28 ^🔗	GLaDOS	(didn't change code btw)
09:48 ^🔗	GLaDOS	I think it's handing out requests properly now
09:50 ^🔗	GLaDOS	IT IS
09:50 ^🔗	GLaDOS	soultcer: I FIXED IT
09:50 ^🔗	GLaDOS	Turns out varnish wasn't passing X-FORWARDED-FOR at times
09:50 ^🔗	GLaDOS	Forced it to do that, and not keep connection open
09:59 ^🔗	Cameron_D	/data is returning nothing now :(
10:01 ^🔗	GLaDOS	what
10:02 ^🔗	GLaDOS	Cameron_D: did you mean: http://urlteam.terrywri.st/data/
10:02 ^🔗	GLaDOS	the end slash matters, apparently
10:04 ^🔗	Cameron_D	yeah, blank here
10:05 ^🔗	GLaDOS	..
10:06 ^🔗	GLaDOS	It returns for me
10:09 ^🔗	Cameron_D	its working on my phone, so maybe my proxy at home had cached nothing
10:09 ^🔗	GLaDOS	probably.
10:10 ^🔗	ersi	/data and /data/ are technically two very different things.
10:11 ^🔗	omf_	http://urlteam.terrywri.st/data/ works for me
10:12 ^🔗	ersi	It takes a few moments, but works for me as well.
11:17 ^🔗	soultcer	You are only allowing one concurrent access to the backend, right?
11:29 ^🔗	GLaDOS	For everything but /task
11:29 ^🔗	GLaDOS	If I did that with /task, the same issue would happen, just with the first connecting IP
11:39 ^🔗	soultcer	huh?
11:39 ^🔗	soultcer	sqlite3 doesn't support locks and the task assignment is a two-step process: a) Select a suitable task b) assign task to user
11:40 ^🔗	GLaDOS	I know, but it can't be a continuous connection if we want to keep sending the x-forwarded-for connection
11:40 ^🔗	ersi	ie: at least one read and one write
11:40 ^🔗	GLaDOS	At least that's what I read
11:42 ^🔗	soultcer	HTTP is stateless. You can send whatever headers you want with each request.
11:42 ^🔗	soultcer	The problem is, there should not be more than one write access on the db. sqlite supports transactions and so on, so it won't break or corrupt the database
11:42 ^🔗	soultcer	But it could accidentally hand out the same task twice
11:48 ^🔗	GLaDOS	Okay, removed the connection closing.
12:35 ^🔗	soultcer	When you get the time you should just throw the whole tracker thing away and rewrite it to actually work without any hacks ;-)
12:37 ^🔗	GLaDOS	That's implying that I can code for shit.
12:37 ^🔗	GLaDOS	Might do it as a school project
12:37 ^🔗	GLaDOS	"And why did you do this?" "Because URL shorteners were a fucking awful idea."
13:02 ^🔗	ersi	"See this short link"
13:03 ^🔗	ersi	-"It.. doesn't work."
13:03 ^🔗	ersi	"Point taken yet?"
13:05 ^🔗	GLaDOS	Put all my references as short urls on a about to die URL shortener
22:58 ^🔗	SketchCow	Do we need anything here?
22:58 ^🔗	SketchCow	Tracker is not workikng?

irclogger-viewer