[12:57] <edsu> xmc: i'm a software developer, if there are any tasks that need doing to help get the tracker working
[12:57] <edsu> xmc: just offering to help, if any is needed :)
[13:57] <ersi> Well, the whole thing is kind of in need of being rewritten edsu.
[13:58] <twrist> https://github.com/ArchiveTeam/tinyback go at it
[13:59] <twrist> er, https://github.com/ArchiveTeam/tinyarchive
[14:01] <ersi> though patching up whatever keeps breaking the tracker would probably be low-hanging fruit
[15:04] <GLaDOS> That's the thing; the entire thing is patches.
[15:05] <ersi> Uh, not quite.
[15:05] <GLaDOS> many thansk for ops
[15:06] <ersi> But it is mostly a prototype gone wild. :)
[15:06] <GLaDOS> Well from how soult described it, it sounded like that
[15:06] <ersi> It's hacky, sure - but not "patches"
[15:06] <GLaDOS> Eh, close enough.
[15:09] <edsu> so i'm still learning my way around ; is the urlteam tracker different from the archiveteam tracker?
[15:10] <xmc> yes
[15:11] <ersi> Indeed
[15:12] <ersi> https://github.com/ArchiveTeam/tinyarchive is the urltream tracker.
[15:12] <ersi> https://github.com/ArchiveTeam/tinyback is the urlteam "worker"
[15:18] <edsu> got it
[15:19] <edsu> i imagine those berkeleydbs are getting quite large? or are they purged periodically and saved off to a file?
[15:19] <ersi> yeah
[15:20] * edsu is still reading obv :)
[15:20] <ersi> purged periodically
[15:20] <ersi> I got some infos that aren't in the source code, that are from soultcer (author)'s from this chat
[15:21] <GLaDOS> Oh, yeah, they get purged
[15:21] <GLaDOS> WHEN THE PURGING WORKS
[15:21] <edsu> :-D
[15:21] <edsu> is that the main problem?
[15:21] <GLaDOS> Possibly
[15:21] <GLaDOS> The tracker either just freezes up or gets killed
[15:21] <GLaDOS> Literally, just killed.
[15:21] <GLaDOS> No traceback
[15:21] <edsu> is it running on a vm somewhere?
[15:22] <edsu> oom killer killing it because it is using too much memory?
[15:22] <GLaDOS> Nah, it's on some crappy kimsufi dedi
[15:22] <edsu> huh, ok
[15:22] <GLaDOS> And the killing stopped like a week ago, it just freezes
[15:27] <edsu> looks like it's using web.py is it running as a single process or something else like apache/mod-wsgi?
[15:30] * edsu wonders how big the sqlite db is
[15:36] <GLaDOS> edsu: well it's launched by running tracker.py, so unless that creates multiple web.py processes..
[15:37] <GLaDOS> Also, "45M tasks.sqlite"
[15:41] <edsu> got it, any chance of getting a copy of the db?
[15:43] <GLaDOS> http://dumpground.archivingyoursh.it/tasks.sqlite
[15:44] <_46bit> What am I downloading?
[15:46] <GLaDOS> An sqlite database which is probably corrupted beyond repaid
[15:46] <GLaDOS> but meh
[15:48] <edsu> sqlite file opens at least :)
[15:48] <edsu> sqlite> select count(*) from task;
[15:48] <edsu> 191000
[15:49] <GLaDOS> THAT'S THE CAUSE!
[15:50] <edsu> ?
[15:50] <GLaDOS> probably
[15:50] <GLaDOS> hopefully
[15:50] <edsu> it's my fault, isn't it
[15:50] <GLaDOS> nah
[15:50] <GLaDOS> i haven't dived into the internals yet
[15:50] <GLaDOS> wanted to keep my sanity
[15:50] <edsu> cause is too many rows?
[15:50] <GLaDOS> maybe, no idea.
[15:51] <edsu> the python i've been looking at so far is quite readable
[15:51] <GLaDOS> (spoiler: I cannot into programming. I merely host things.
[15:51] <edsu> have only looked at a small pocket of it though
[15:51] <GLaDOS> )
[15:51] <edsu> hosting things is the hard part :)
[15:52] <GLaDOS> I find it easy now.
[15:54] <edsu> http://ec2-54-204-142-70.compute-1.amazonaws.com:8080/
[15:55] <GLaDOS> Leave it for a day, it'll freeze up.
[15:55] <edsu> is the symptom of it freezing up that it just stops responding to http requests?
[15:56] <GLaDOS> Yeah, as people seem to complain
[15:56] <edsu> so it might require people to actually use it i guess
[15:56] <GLaDOS> 50x, but I'm not sure if that's from varnish
[15:56] <edsu> i guess i could use it myself
[15:57] <edsu> just point an instance of tinyback at it
[15:58] <edsu> could that disturb anything, or can they run in isolation from the rest of the infrastructure ok?
[15:58] <GLaDOS> yeah, they run in isolation fine
[16:01] <GLaDOS> WELL, NOW I SLEEP
[16:02] <edsu> 'nite :)
[16:02] <edsu> and thanks for the help
[16:06] <edsu> i wonder if there are some cron jobs that run periodically
[16:11] <edsu> i guess David Triendl (soult) isn't on the scene anymore?
[16:12] <edsu> or he is back doing his studies :)
[17:30] <edsu> GLaDOS: when you wake up again, i'd be interested to know if there's anything running from cron for tinyarchive
[19:31] <edsu> if anyone else wants to point there tinyback at http://ec2-54-204-142-70.compute-1.amazonaws.com:8080 to help me debug it that would be swell
[19:31] <edsu> s/there/their/
[19:58] <pft> edsu: i'm pointing at it
[19:59] <pft> theoretically. see two WARNING messages about ServiceExceptions
[20:04] <edsu> pft: awesome, can you paste?
[20:04] <edsu> are you Levon?
[20:04] <pft> 2013-11-05 20:57:49,923 tinyback.Reaper WARNING: ServiceException(Unexpected response on status 200) on code b4ont9
[20:04] <pft> yes
[20:04] <pft> 2013-11-05 20:57:51,780 tinyback.Reaper WARNING: ServiceException(HTTP status changed from 200 to 301 on second request) on code b4ont9
[20:04] <pft> those are failures on the shortener i assume
[20:04] <pft> 2013-11-05 21:03:43,322 tinyback.Tracker WARNING: Server refused data for task 571a69c8-29ad-11e3-8225-00224d7a9dd0
[20:04] <pft> not sure what that is
[20:05] <edsu> yes, that does look like a shortener error
[20:05] <edsu> but that last one is on the tinyarchive side
[20:05] <pft> yeah that's what i was suspecting
[20:12] <edsu> looks like the server refused is when the task is PUT back to tinyarchive and tinyback gets a 409 Conflict error
[20:15] <pft> ahh
[20:16] <edsu> looks like that can happen when the task isn't 'properly assigned' https://github.com/ArchiveTeam/tinyarchive/blob/master/tracker/tracker.py#L244
[20:16] <edsu> maybe that prevents multiple people from reporting back on the same task?
[20:17] <pft> seems likely
[20:17] <edsu> i could add some more logging if you keep seeing errors like that
[20:17] <edsu> i can see you've submitted 5 tasks so far ok
[20:18] <edsu> and an anonymous user is doing some as well
[20:19] <pft> yeah i see some thigns going by
[20:20] <edsu> thanks for kicking the tires, it's funny when you run software and want to see it fail :)
[20:21] <pft> that's most of my life over here ;)
[20:31] <pft> and it appears to be out of tasks now
[20:33] <edsu> yes
[20:33] <edsu> any idea how to populate more?
[20:33] <pft> I have no idea
[20:33] * edsu eyes task_create.py
[20:33] <pft> ahahah
[20:34] <pft> maybe if you punch some random buttons on that!
[20:34] <edsu> hehe
[20:35] <pft> i keep meaning to look into more archive team code stuff but with everything going on for me right now is imply haven't got the time
[20:37] <edsu> i know the feeling
[20:53] <edsu> hmm, task_create.py seems to be part of the story
[20:53] <edsu> but not all of it
[20:54] <pft> :|
[20:55] <edsu> looks like you picked up some more work though?
[20:55] <pft> it seems to have
[20:56] <edsu> hmm well, ok then :-D
[20:57] <pft> now it says not asks
[20:57] <pft> er no tasks
[20:57] <pft> now it got a task!
[20:57] <pft> unpredictable
[20:57] <edsu> hmm i guess i need to look at how that works
[20:57] <edsu> i think it's somewhat time dependent
[20:57] <edsu> "Any sufficiently advanced technology is indistinguishable from magic."
[20:58] <pft> yeah
[20:58] <pft> probably more logging and more status outputs would be helpful
[20:58] <pft> there's my random software development statement that's applicable to everything for the day
[21:01] <edsu> it's so true, though
[21:02] <edsu> print considered helpful
[21:03] <edsu> pft: you're catching up http://ec2-54-204-142-70.compute-1.amazonaws.com:8080/
[21:05] <pft> i am ON FIRE!
[21:05] <edsu> dude, yes
[21:11] <Benjojo> Hey, How do I get involved in this? Whats the software requirements?
[21:12] <pft> how involved do you want to be? you can always download and run the archiveteam warrior VM
[21:12] <pft> http://www.archiveteam.org/index.php?title=ArchiveTeam_Warrior
[21:12] <Benjojo> Well, I'm interested in the URLTeam part, I have the server infrastructure to help
[21:13] <pft> well in terms of fetching-and-submitting the warrior is the easiest way
[21:13] <Benjojo> Okay, Sorry I have no idea how any of this works :)
[21:13] <pft> if you want to actually get into running server infrastructure I'm not sure, GLaDOS runs some of that, I believe
[21:14] <pft> in terms of downloading data from the url shorteners and submitting it to the archive you just need to get the warrior vms running
[21:14] <Benjojo> Hm, Would I be able to load that into EXSi?
[21:15] <pft> teh vm is distributed as an .ova file, i believe you can pull that into vmware
[21:16] <pft> hmm, perhaps not, though http://badcheese.com/~steve/atlogs/?chan=warrior&day=2012-10-13
[21:18] <Benjojo> Hmm
[21:19] <pft> i futzed with trying to get the warrior to work in vmware fusion at one point and gave up and just installed virtualbox
[21:19] <Benjojo> Yeah, All of my systems are EXSi for simplicity so It would be useful if I could just load a few templates in a few boxes
[21:19] <Benjojo> But that hope seems to look a little distant
[21:20] <pft> you could try pulling the .ova file into VMWare and see if it works
[21:21] <Benjojo> Na
[21:21] <Benjojo> I tried
[21:21] <Benjojo> and it said it was not suppored
[21:21] <pft> ahh :(
[21:21] <Benjojo> http://i.imgur.com/aDUd3qo.png
[21:22] <pft> http://tad-do.net/2012/01/30/converting-virtualbox-to-vmware-esxi/
[21:22] <pft> hmm
[21:22] <pft> i haven't used esx in years so i have no idea
[21:24] <Benjojo> Hm, I think I will look into this later
[21:24] <pft> roger that
[21:24] <pft> it would probably help out the project a lot if you cuold figure out how to make the vm load in esx and update the wiki
[21:28] <edsu> i saw you can load a vmware image into ec2 ; but haven't tried w/ the warrior
[21:29] <edsu> Benjojo: when the tracker is working, you can also run a python program to get urlteam tasks and submit them back
[21:29] <Benjojo> Oh?
[21:30] <edsu> Benjojo: i say this as soeone who has been lurking in here for 1 day ; so please apply buolder of salt
[21:30] <pft> yeah. that's a little more involved becuase you have to know what's going on with the tracker but it will run in a base level debian install
[21:30] <edsu> boulder
[21:30] <Benjojo> Ah, Alrighty
[21:31] <pft> that's essentialy what teh warrior vm is - a debian isntall with some packages installed and the seesaw script
[21:31] <edsu> Benjojo: you see the section on TinyBack here? http://archiveteam.org/?title=URLTeam
[21:31] <pft> the nice thing about the warrior is that it phones home so if new projects are added or trackers change, it can get that information and update accordingly
[21:31] <edsu> git clone ; ./run.py
[21:32] <edsu> alas the tracker is currently doing this http://urlteam.terrywri.st/ :-|
[21:32] <edsu> pft: i'm assuming, perhaps wrongly, that the warrior talks to that tracker too by default?
[21:33] <pft> i believe it does
[21:33] <pft> right now people running the warrior are probably hangingo ut with a screen that says "No item received. Retrying after 30 seconds..."
[21:35] <pft> i think the blip.tv project is on hold and formspring seems to have 0 pendings so teh warrior is probably idle for most people
[21:35] <pft> http://tracker.archiveteam.org/
[21:42] <edsu> pft: is there a way to see stats on what's happening at http://tracker.archiveteam.org/ ?
[21:42] <edsu> or do you have to run the warrior to see the leaderboard, etc?
[21:42] <edsu> or perhaps i'm just not clicking on the right link :)
[21:43] <edsu> oh i see they each have their own leaderboard, duh
[21:43] <pft> yeah
[21:43] <pft> you can see stats for particular projects but urlteam works differently
[21:43] <edsu> e.g. http://tracker.archiveteam.org/bloopertv/
[23:23] <pft> edsu: do you want me to leave this going?