Time |
Nickname |
Message |
07:36
🔗
|
soultcer |
GLaDOS: I'm afraid I don't have a http server with enough diskspace for that |
07:41
🔗
|
soultcer |
GLaDOS, I think there is a problem with the tracker |
07:41
🔗
|
soultcer |
The tracker makes sure to only hand out one task for each service to the same IP, so that the scrapers don't get blocked |
07:41
🔗
|
soultcer |
But your tracker only hands out one task for each service, for all IPs |
07:41
🔗
|
omf_ |
soultcer, is the data available anywhere else online besides bit torrent? |
07:42
🔗
|
soultcer |
I think the tracker does not get the real IP of the http request sender, but the IP of your reverse proxy |
07:44
🔗
|
soultcer |
omf_, not that I know of. But the torrent should be seeded, according to the trackers |
07:45
🔗
|
soultcer |
I'll try to add another seed |
07:49
🔗
|
omf_ |
So 80gb for everything |
07:49
🔗
|
GLaDOS |
Ah, derp, my mistake |
07:49
🔗
|
GLaDOS |
How would I make apache pass the real IP? |
07:53
🔗
|
GLaDOS |
Also, soultcer, sending it on an external HDD would be fine. |
07:53
🔗
|
GLaDOS |
Actually, I have a perfect external for this |
07:53
🔗
|
soultcer |
I think web.py uses the X-Forwarded-For header |
07:54
🔗
|
soultcer |
Using a reverse proxy before the tracker is good though, because then the front page with the stats (which is expensive to generate) can be cached |
08:07
🔗
|
GLaDOS |
ugh |
08:07
🔗
|
GLaDOS |
not working |
08:09
🔗
|
soultcer |
stupid nat and only barely working trackers |
08:10
🔗
|
GLaDOS |
I'm going to try using varnish instead |
08:10
🔗
|
soultcer |
Whatever works for you. I'm a huge nginx fan myself |
08:11
🔗
|
GLaDOS |
Eh, I've yet to get into nginx |
08:12
🔗
|
soultcer |
Ok, now I have found two peers for the torrent, both seeders |
08:18
🔗
|
GLaDOS |
Ok, I'll try launching it again |
08:41
🔗
|
omf_ |
well? |
08:43
🔗
|
GLaDOS |
Seems like varnish fixed it. |
08:43
🔗
|
GLaDOS |
All 6 of my scrapers got jobs |
08:44
🔗
|
soultcer |
Yeah, each of them got one job for a different service |
08:44
🔗
|
soultcer |
The problem still exists |
08:45
🔗
|
GLaDOS |
Yeah, nevermind. |
08:50
🔗
|
GLaDOS |
Yeah, web.ctx.ip is supposed to be returning the x-forwarded-for ip |
08:51
🔗
|
GLaDOS |
127.0.0.1:54662 - - [02/Oct/2013 10:53:07] "HTTP/1.1 GET /task/get" - 200 OK |
08:51
🔗
|
GLaDOS |
Tracker only sees 127.0.0.1 though: |
09:14
🔗
|
GLaDOS |
soultcer: do you suggest replacing web.ctx.ip with this function: http://code.activestate.com/recipes/577795-get-users-ip-address-even-when-theyre-behind-a-pro/ |
09:15
🔗
|
GLaDOS |
Or hell, just web.ctx.env['HTTP_X_FORWARDED_FOR'] |
09:27
🔗
|
GLaDOS |
funfact: disabling cache for /task/(.*) might help |
09:27
🔗
|
GLaDOS |
OH GOD IP ADDRESS IS NOW NULL |
09:28
🔗
|
GLaDOS |
(didn't change code btw) |
09:48
🔗
|
GLaDOS |
I think it's handing out requests properly now |
09:50
🔗
|
GLaDOS |
IT IS |
09:50
🔗
|
GLaDOS |
soultcer: I FIXED IT |
09:50
🔗
|
GLaDOS |
Turns out varnish wasn't passing X-FORWARDED-FOR at times |
09:50
🔗
|
GLaDOS |
Forced it to do that, and not keep connection open |
09:59
🔗
|
Cameron_D |
/data is returning nothing now :( |
10:01
🔗
|
GLaDOS |
what |
10:02
🔗
|
GLaDOS |
Cameron_D: did you mean: http://urlteam.terrywri.st/data/ |
10:02
🔗
|
GLaDOS |
the end slash matters, apparently |
10:04
🔗
|
Cameron_D |
yeah, blank here |
10:05
🔗
|
GLaDOS |
.. |
10:06
🔗
|
GLaDOS |
It returns for me |
10:09
🔗
|
Cameron_D |
its working on my phone, so maybe my proxy at home had cached nothing |
10:09
🔗
|
GLaDOS |
probably. |
10:10
🔗
|
ersi |
/data and /data/ are technically two very different things. |
10:11
🔗
|
omf_ |
http://urlteam.terrywri.st/data/ works for me |
10:12
🔗
|
ersi |
It takes a few moments, but works for me as well. |
11:17
🔗
|
soultcer |
You are only allowing one concurrent access to the backend, right? |
11:29
🔗
|
GLaDOS |
For everything but /task |
11:29
🔗
|
GLaDOS |
If I did that with /task, the same issue would happen, just with the first connecting IP |
11:39
🔗
|
soultcer |
huh? |
11:39
🔗
|
soultcer |
sqlite3 doesn't support locks and the task assignment is a two-step process: a) Select a suitable task b) assign task to user |
11:40
🔗
|
GLaDOS |
I know, but it can't be a continuous connection if we want to keep sending the x-forwarded-for connection |
11:40
🔗
|
ersi |
ie: at least one read and one write |
11:40
🔗
|
GLaDOS |
At least that's what I read |
11:42
🔗
|
soultcer |
HTTP is stateless. You can send whatever headers you want with each request. |
11:42
🔗
|
soultcer |
The problem is, there should not be more than one write access on the db. sqlite supports transactions and so on, so it won't break or corrupt the database |
11:42
🔗
|
soultcer |
But it could accidentally hand out the same task twice |
11:48
🔗
|
GLaDOS |
Okay, removed the connection closing. |
12:35
🔗
|
soultcer |
When you get the time you should just throw the whole tracker thing away and rewrite it to actually work without any hacks ;-) |
12:37
🔗
|
GLaDOS |
That's implying that I can code for shit. |
12:37
🔗
|
GLaDOS |
Might do it as a school project |
12:37
🔗
|
GLaDOS |
"And why did you do this?" "Because URL shorteners were a fucking awful idea." |
13:02
🔗
|
ersi |
"See this short link" |
13:03
🔗
|
ersi |
-"It.. doesn't work." |
13:03
🔗
|
ersi |
"Point taken yet?" |
13:05
🔗
|
GLaDOS |
Put all my references as short urls on a about to die URL shortener |
22:58
🔗
|
SketchCow |
Do we need anything here? |
22:58
🔗
|
SketchCow |
Tracker is not workikng? |