Time |
Nickname |
Message |
08:33
🔗
|
ersi |
soultcer: Caught an utf8 decode error when submitting the result back to the tracker o_o http://pejsta.nu/1148 |
08:35
🔗
|
ersi |
I guess it's related to me specifying an username for the scraper |
08:39
🔗
|
ersi |
Okay, snipurl.com seems to have rate limiting as well |
08:47
🔗
|
ersi |
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 2-4: invalid data |
08:47
🔗
|
ersi |
so, apparently not from the username :p I'll take a looksie |
08:52
🔗
|
ersi |
hmm, maybe related to me running this in python2.6 |
11:51
🔗
|
ersi |
Ran it on python2.7 - with issues as well >_> |
15:35
🔗
|
soultcer |
ersi: Those utf8 errors usually happened when web.py was trying to parse POST data. The gzipped result file is sent via POST and web.py always tried to decode it. |
15:36
🔗
|
soultcer |
What web.py version are you running? |
15:39
🔗
|
ersi |
soultcer: Seems like I'm running web.py-0.37 |
15:39
🔗
|
ersi |
Are you running a newer/older version? |
15:40
🔗
|
soultcer |
0.34 |
15:40
🔗
|
ersi |
Exciting |
15:40
🔗
|
soultcer |
But if there is a bug in the tracker so it doesn't work with web.py 0.37, I rather fix the bug in the tracker |
15:40
🔗
|
ersi |
Yeah, that's the sane approach. :-) |
15:41
🔗
|
soultcer |
Unfortunately the whole tracker is pretty much a big hack, so fixing bugs is hard :D |
15:42
🔗
|
ersi |
Oh, I'm used to that from my work. Everything is a hack. Especially """Enterprise"" software" |
15:54
🔗
|
ersi |
I'm sure snipurls have rate limiting now btw. Gonna investigate that after/if I get this working :) |
15:54
🔗
|
soultcer |
ersi: Try creating a directory called "files" i the tracker directory |
15:54
🔗
|
ersi |
Alright |
15:55
🔗
|
soultcer |
The utf8 error was just shown because web.py caught some kind of exception and tried to display debug information about that |
15:57
🔗
|
ersi |
Ugly, but fair enough. I created the directory, I'll see if it works better now :) |
15:57
🔗
|
ersi |
Well, yes and no. It created a tmpfile there. But it still hands off 500's |
15:59
🔗
|
ersi |
Ah, and the tmpfile is the actual list of codes|urls - gzipped. |
15:59
🔗
|
soultcer |
Run the app with debug disabled so it just shows the original exception (web.config.debug = False in tracker.py) |
15:59
🔗
|
ersi |
Will do |
16:00
🔗
|
ersi |
I captured the traffic with tcpdump btw, if that'd be of any help - meaning we got the HTTP PUT and all' |
16:01
🔗
|
soultcer |
The exception will probably be more useful |
16:03
🔗
|
ersi |
It'll come any second now (I'm too lazy to create smaller tasks.. currently chunks of 50 codes) |
16:03
🔗
|
soultcer |
Hehe when I debug test I simply modify tinyback to only do 10 lookups and then stop. (Just don't forget to remove that "feature" before you run it on the real tracker again) |
16:05
🔗
|
ersi |
Heh, d'oh! :-) |
16:06
🔗
|
ersi |
On a side note; I've been thinking about "How to kick start a web crawl" and this is a great source (ie output from URL shorteners) of seed urls.. You get quite the randomness |
16:07
🔗
|
ersi |
Here we go! Finally, an exception! http://pejsta.nu/1149 |
16:07
🔗
|
ersi |
woot |
16:08
🔗
|
soultcer |
Yes, lot of randomness but also probably a lot of spam sites. Spammers love URL shorteners |
16:09
🔗
|
ersi |
Hehe, since I don't specify a username - I guess I trigger this issue ^_^ |
16:09
🔗
|
ersi |
data variable seems to be defined in 'if username' |
16:09
🔗
|
soultcer |
yeah, you can probably simply move the data = outside the if and it will work |
16:10
🔗
|
ersi |
Or should the last line be within the first if? |
16:10
🔗
|
ersi |
ie. db.update() call |
16:12
🔗
|
soultcer |
No, it should be outside the if |
16:12
🔗
|
soultcer |
Inside the if is the code that does the stats for every user |
16:12
🔗
|
soultcer |
Outside is the code that does the stats for the url shorteners |
16:13
🔗
|
soultcer |
If a downloader doesn't have a username, we still want to count his task in the url shortener statistic |
16:13
🔗
|
ersi |
Yay, now it works. |
16:14
🔗
|
soultcer |
Man, luckily I put the whole thing into a single DB transaction. This means all contributions by downloaders with username were rejected, but at least the task would be reassigned. |
16:14
🔗
|
soultcer |
Without the transaction, all work done by unassigned users would have been lost with no way for me to know which had been lost :-/ |
16:15
🔗
|
* |
ersi shrugs |
16:45
🔗
|
soultcer |
ersi: What kind of rate limiting did you find btw? |
16:50
🔗
|
ersi |
soultcer: I've only concluded that there is rate limiting - which block/holds request for a period - then it resumes as usual |
16:50
🔗
|
ersi |
I'll investigate the rate soon |
16:54
🔗
|
soultcer |
Also, snipurl hands out URLs sequentially, but they did not start with "0" as first ID, but apparently with "20wa5rt" apparently |
16:54
🔗
|
ersi |
haha, wtf |
16:54
🔗
|
ersi |
So.. what would the next logical code be, after "20wa5rt"? :D |
16:55
🔗
|
ersi |
"20wa5ru"? |
16:56
🔗
|
soultcer |
In that case yes, since they do numbers before letters |
16:56
🔗
|
soultcer |
And they never assign a code with -_~ in it, even though it would be a valid code |
16:59
🔗
|
ersi |
Huh |
16:59
🔗
|
ersi |
Hm, maybe we should setup GitHub Notifications for tinyarchive to here as well |
17:04
🔗
|
ersi |
oh, hehe. I pushed up a branch with the same fix :p I'll just delete it |
17:05
🔗
|
ersi |
soultcer: If you write "closes issue #1" or "fixes issue #1" in your commit message, github will automatically close the issue for you btw |
17:05
🔗
|
soultcer |
Ah |
17:06
🔗
|
soultcer |
I actually made the commit before you opened the issue, I just forgot to push to github |
17:06
🔗
|
ersi |
ah :) |
17:06
🔗
|
ersi |
Hmm, how do I delete a remote branch? |
17:06
🔗
|
soultcer |
git push <remote> :<branch-to-delete> |
17:07
🔗
|
ersi |
as in; "git push origin :fixing-issue-1" (if that's my branch name) ? |
17:07
🔗
|
soultcer |
Yes |
17:07
🔗
|
ersi |
Cool |
17:08
🔗
|
soultcer |
I should have just let you made the commit, after all it's your bug and your fix |
17:08
🔗
|
soultcer |
I just went through the tracker.tinyarchive.org logs, and apparently nobody ever tried to submit a task with no username |
17:09
🔗
|
ersi |
cool :D |
17:09
🔗
|
ersi |
That, or it ain't get logged? |
17:09
🔗
|
soultcer |
Nah, every exception gets logged |
17:09
🔗
|
ersi |
Regarding the bug fix, don't worry about it. I'll hang on and fix more of these :-) |
17:10
🔗
|
soultcer |
There are tons of more stupid bugs in both the tinyback and the tracker code |
17:10
🔗
|
ersi |
What are you running tinyarchive under btw? Apache with mod_wsgi? Nginx with wsgi/gunicorn? |
17:10
🔗
|
soultcer |
If you want to do something more fun you can also play around with other stuff like making the graphs more useful |
17:10
🔗
|
ersi |
I'm not much of a UI guy :) |
17:10
🔗
|
ersi |
Besides, I got to report an issue - that's always fun \o/ |
17:11
🔗
|
ersi |
I'll get to investigating snipurl's rate limiting now. |
17:18
🔗
|
ersi |
How do I 'checkout' a branch I don't have locally, but is available remotely? |
17:21
🔗
|
ersi |
By the way: 2012-12-07 18:20:20,659 tinyback.Reaper DEBUG: Fetching code 6~, try 1 |
17:21
🔗
|
ersi |
2012-12-07 18:20:43,827 tinyback.Reaper DEBUG: Code 6~ leads to URL 'http://ganga-japan.com/unsecured-payday-loans-immediate-cash-without-collateral.html' |
17:21
🔗
|
ersi |
Seems like they do use ~ though :) |
17:35
🔗
|
soultcer |
Only if you select it as custom "nickname" |
17:38
🔗
|
ersi |
Oh, I see |
22:41
🔗
|
deathy |
yay...got tinyback running also on my raspberry pi, now sleep time.. |
22:51
🔗
|
chronomex |
heh |
22:51
🔗
|
chronomex |
cool |