| Time | 
    Nickname | 
    Message | 
    
        | 
            08:33
            
                🔗
            
         | 
        ersi | 
        soultcer: Caught an utf8 decode error when submitting the result back to the tracker o_o http://pejsta.nu/1148 | 
    
    
        | 
            08:35
            
                🔗
            
         | 
        ersi | 
        I guess it's related to me specifying an username for the scraper | 
    
    
        | 
            08:39
            
                🔗
            
         | 
        ersi | 
        Okay, snipurl.com seems to have rate limiting as well | 
    
    
        | 
            08:47
            
                🔗
            
         | 
        ersi | 
        UnicodeDecodeError: 'utf8' codec can't decode bytes in position 2-4: invalid data | 
    
    
        | 
            08:47
            
                🔗
            
         | 
        ersi | 
        so, apparently not from the username :p I'll take a looksie | 
    
    
        | 
            08:52
            
                🔗
            
         | 
        ersi | 
        hmm, maybe related to me running this in python2.6 | 
    
    
        | 
            11:51
            
                🔗
            
         | 
        ersi | 
        Ran it on python2.7 - with issues as well >_> | 
    
    
        | 
            15:35
            
                🔗
            
         | 
        soultcer | 
        ersi: Those utf8 errors usually happened when web.py was trying to parse POST data. The gzipped result file is sent via POST and web.py always tried to decode it. | 
    
    
        | 
            15:36
            
                🔗
            
         | 
        soultcer | 
        What web.py version are you running? | 
    
    
        | 
            15:39
            
                🔗
            
         | 
        ersi | 
        soultcer: Seems like I'm running web.py-0.37 | 
    
    
        | 
            15:39
            
                🔗
            
         | 
        ersi | 
        Are you running a newer/older version? | 
    
    
        | 
            15:40
            
                🔗
            
         | 
        soultcer | 
        0.34 | 
    
    
        | 
            15:40
            
                🔗
            
         | 
        ersi | 
        Exciting | 
    
    
        | 
            15:40
            
                🔗
            
         | 
        soultcer | 
        But if there is a bug in the tracker so it doesn't work with web.py 0.37, I rather fix the bug in the tracker | 
    
    
        | 
            15:40
            
                🔗
            
         | 
        ersi | 
        Yeah, that's the sane approach. :-) | 
    
    
        | 
            15:41
            
                🔗
            
         | 
        soultcer | 
        Unfortunately the whole tracker is pretty much a big hack, so fixing bugs is hard :D | 
    
    
        | 
            15:42
            
                🔗
            
         | 
        ersi | 
        Oh, I'm used to that from my work. Everything is a hack. Especially """Enterprise"" software" | 
    
    
        | 
            15:54
            
                🔗
            
         | 
        ersi | 
        I'm sure snipurls have rate limiting now btw. Gonna investigate that after/if I get this working :) | 
    
    
        | 
            15:54
            
                🔗
            
         | 
        soultcer | 
        ersi: Try creating a directory called "files" i the tracker directory | 
    
    
        | 
            15:54
            
                🔗
            
         | 
        ersi | 
        Alright | 
    
    
        | 
            15:55
            
                🔗
            
         | 
        soultcer | 
        The utf8 error was just shown because web.py caught some kind of exception and tried to display debug information about that | 
    
    
        | 
            15:57
            
                🔗
            
         | 
        ersi | 
        Ugly, but fair enough. I created the directory, I'll see if it works better now :) | 
    
    
        | 
            15:57
            
                🔗
            
         | 
        ersi | 
        Well, yes and no. It created a tmpfile there. But it still hands off 500's | 
    
    
        | 
            15:59
            
                🔗
            
         | 
        ersi | 
        Ah, and the tmpfile is the actual list of codes|urls - gzipped. | 
    
    
        | 
            15:59
            
                🔗
            
         | 
        soultcer | 
        Run the app with debug disabled so it just shows the original exception (web.config.debug = False in tracker.py) | 
    
    
        | 
            15:59
            
                🔗
            
         | 
        ersi | 
        Will do | 
    
    
        | 
            16:00
            
                🔗
            
         | 
        ersi | 
        I captured the traffic with tcpdump btw, if that'd be of any help - meaning we got the HTTP PUT and all' | 
    
    
        | 
            16:01
            
                🔗
            
         | 
        soultcer | 
        The exception will probably be more useful | 
    
    
        | 
            16:03
            
                🔗
            
         | 
        ersi | 
        It'll come any second now (I'm too lazy to create smaller tasks.. currently chunks of 50 codes) | 
    
    
        | 
            16:03
            
                🔗
            
         | 
        soultcer | 
        Hehe when I debug test I simply modify tinyback to only do 10 lookups and then stop. (Just don't forget to remove that "feature" before you run it on the real tracker again) | 
    
    
        | 
            16:05
            
                🔗
            
         | 
        ersi | 
        Heh, d'oh! :-) | 
    
    
        | 
            16:06
            
                🔗
            
         | 
        ersi | 
        On a side note; I've been thinking about "How to kick start a web crawl" and this is a great source (ie output from URL shorteners) of seed urls.. You get quite the randomness | 
    
    
        | 
            16:07
            
                🔗
            
         | 
        ersi | 
        Here we go! Finally, an exception! http://pejsta.nu/1149 | 
    
    
        | 
            16:07
            
                🔗
            
         | 
        ersi | 
        woot | 
    
    
        | 
            16:08
            
                🔗
            
         | 
        soultcer | 
        Yes, lot of randomness but also probably a lot of spam sites. Spammers love URL shorteners | 
    
    
        | 
            16:09
            
                🔗
            
         | 
        ersi | 
        Hehe, since I don't specify a username - I guess I trigger this issue ^_^ | 
    
    
        | 
            16:09
            
                🔗
            
         | 
        ersi | 
        data variable seems to be defined in 'if username' | 
    
    
        | 
            16:09
            
                🔗
            
         | 
        soultcer | 
        yeah, you can probably simply move the data = outside the if and it will work | 
    
    
        | 
            16:10
            
                🔗
            
         | 
        ersi | 
        Or should the last line be within the first if? | 
    
    
        | 
            16:10
            
                🔗
            
         | 
        ersi | 
        ie. db.update() call | 
    
    
        | 
            16:12
            
                🔗
            
         | 
        soultcer | 
        No, it should be outside the if | 
    
    
        | 
            16:12
            
                🔗
            
         | 
        soultcer | 
        Inside the if is the code that does the stats for every user | 
    
    
        | 
            16:12
            
                🔗
            
         | 
        soultcer | 
        Outside is the code that does the stats for the url shorteners | 
    
    
        | 
            16:13
            
                🔗
            
         | 
        soultcer | 
        If a downloader doesn't have a username, we still want to count his task in the url shortener statistic | 
    
    
        | 
            16:13
            
                🔗
            
         | 
        ersi | 
        Yay, now it works. | 
    
    
        | 
            16:14
            
                🔗
            
         | 
        soultcer | 
        Man, luckily I put the whole thing into a single DB transaction. This means all contributions by downloaders with username were rejected, but at least the task would be reassigned. | 
    
    
        | 
            16:14
            
                🔗
            
         | 
        soultcer | 
        Without the transaction, all work done by unassigned users would have been lost with no way for me to know which had been lost :-/ | 
    
    
        | 
            16:15
            
                🔗
            
         | 
        * | 
        ersi shrugs | 
    
    
        | 
            16:45
            
                🔗
            
         | 
        soultcer | 
        ersi: What kind of rate limiting did you find btw? | 
    
    
        | 
            16:50
            
                🔗
            
         | 
        ersi | 
        soultcer: I've only concluded that there is rate limiting - which block/holds request for a period - then it resumes as usual | 
    
    
        | 
            16:50
            
                🔗
            
         | 
        ersi | 
        I'll investigate the rate soon | 
    
    
        | 
            16:54
            
                🔗
            
         | 
        soultcer | 
        Also, snipurl hands out URLs sequentially, but they did not start with "0" as first ID, but apparently with "20wa5rt" apparently | 
    
    
        | 
            16:54
            
                🔗
            
         | 
        ersi | 
        haha, wtf | 
    
    
        | 
            16:54
            
                🔗
            
         | 
        ersi | 
        So.. what would the next logical code be, after "20wa5rt"? :D | 
    
    
        | 
            16:55
            
                🔗
            
         | 
        ersi | 
        "20wa5ru"? | 
    
    
        | 
            16:56
            
                🔗
            
         | 
        soultcer | 
        In that case yes, since they do numbers before letters | 
    
    
        | 
            16:56
            
                🔗
            
         | 
        soultcer | 
        And they never assign a code with -_~ in it, even though it would be a valid code | 
    
    
        | 
            16:59
            
                🔗
            
         | 
        ersi | 
        Huh | 
    
    
        | 
            16:59
            
                🔗
            
         | 
        ersi | 
        Hm, maybe we should setup GitHub Notifications for tinyarchive to here as well | 
    
    
        | 
            17:04
            
                🔗
            
         | 
        ersi | 
        oh, hehe. I pushed up a branch with the same fix :p I'll just delete it | 
    
    
        | 
            17:05
            
                🔗
            
         | 
        ersi | 
        soultcer: If you write "closes issue #1" or "fixes issue #1" in your commit message, github will automatically close the issue for you btw | 
    
    
        | 
            17:05
            
                🔗
            
         | 
        soultcer | 
        Ah | 
    
    
        | 
            17:06
            
                🔗
            
         | 
        soultcer | 
        I actually made the commit before you opened the issue, I just forgot to push to github | 
    
    
        | 
            17:06
            
                🔗
            
         | 
        ersi | 
        ah :) | 
    
    
        | 
            17:06
            
                🔗
            
         | 
        ersi | 
        Hmm, how do I delete a remote branch? | 
    
    
        | 
            17:06
            
                🔗
            
         | 
        soultcer | 
        git push <remote> :<branch-to-delete> | 
    
    
        | 
            17:07
            
                🔗
            
         | 
        ersi | 
        as in; "git push origin :fixing-issue-1" (if that's my branch name) ? | 
    
    
        | 
            17:07
            
                🔗
            
         | 
        soultcer | 
        Yes | 
    
    
        | 
            17:07
            
                🔗
            
         | 
        ersi | 
        Cool | 
    
    
        | 
            17:08
            
                🔗
            
         | 
        soultcer | 
        I should have just let you made the commit, after all it's your bug and your fix | 
    
    
        | 
            17:08
            
                🔗
            
         | 
        soultcer | 
        I just went through the tracker.tinyarchive.org logs, and apparently nobody ever tried to submit a task with no username | 
    
    
        | 
            17:09
            
                🔗
            
         | 
        ersi | 
        cool :D | 
    
    
        | 
            17:09
            
                🔗
            
         | 
        ersi | 
        That, or it ain't get logged? | 
    
    
        | 
            17:09
            
                🔗
            
         | 
        soultcer | 
        Nah, every exception gets logged | 
    
    
        | 
            17:09
            
                🔗
            
         | 
        ersi | 
        Regarding the bug fix, don't worry about it. I'll hang on and fix more of these :-) | 
    
    
        | 
            17:10
            
                🔗
            
         | 
        soultcer | 
        There are tons of more stupid bugs in both the tinyback and the tracker code | 
    
    
        | 
            17:10
            
                🔗
            
         | 
        ersi | 
        What are you running tinyarchive under btw? Apache with mod_wsgi? Nginx with wsgi/gunicorn? | 
    
    
        | 
            17:10
            
                🔗
            
         | 
        soultcer | 
        If you want to do something more fun you can also play around with other stuff like making the graphs more useful | 
    
    
        | 
            17:10
            
                🔗
            
         | 
        ersi | 
        I'm not much of a UI guy :) | 
    
    
        | 
            17:10
            
                🔗
            
         | 
        ersi | 
        Besides, I got to report an issue - that's always fun \o/ | 
    
    
        | 
            17:11
            
                🔗
            
         | 
        ersi | 
        I'll get to investigating snipurl's rate limiting now. | 
    
    
        | 
            17:18
            
                🔗
            
         | 
        ersi | 
        How do I 'checkout' a branch I don't have locally, but is available remotely? | 
    
    
        | 
            17:21
            
                🔗
            
         | 
        ersi | 
        By the way: 2012-12-07 18:20:20,659 tinyback.Reaper DEBUG: Fetching code 6~, try 1 | 
    
    
        | 
            17:21
            
                🔗
            
         | 
        ersi | 
        2012-12-07 18:20:43,827 tinyback.Reaper DEBUG: Code 6~ leads to URL 'http://ganga-japan.com/unsecured-payday-loans-immediate-cash-without-collateral.html' | 
    
    
        | 
            17:21
            
                🔗
            
         | 
        ersi | 
        Seems like they do use ~ though :) | 
    
    
        | 
            17:35
            
                🔗
            
         | 
        soultcer | 
        Only if you select it as custom "nickname" | 
    
    
        | 
            17:38
            
                🔗
            
         | 
        ersi | 
        Oh, I see | 
    
    
        | 
            22:41
            
                🔗
            
         | 
        deathy | 
        yay...got tinyback running also on my raspberry pi, now sleep time.. | 
    
    
        | 
            22:51
            
                🔗
            
         | 
        chronomex | 
        heh | 
    
    
        | 
            22:51
            
                🔗
            
         | 
        chronomex | 
        cool |