#urlteam 2013-11-08,Fri

↑back Search

Time Nickname Message
14:32 🔗 edsu GLaDOS: sure, yeah
14:34 🔗 edsu ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC5/yScdTEOtHvkwh92s4Ry4I+gUfk3UC/+6M4LuM/kdAF69QONR4JLyR9baesCOrj64ajvlCYFwWJaP1/tMLup2ECCTvtEpazh0Jp0/iFLLb+kJVWqKxpbf6qqWihW3mErQqUxgdkJ05GhPC8DjoBY9EI01f2JuOWLdJP0Iw9mnt/T8hEmzh5VTeL9m3/+UJ+KXQRGlH811IkOTHVILD+DaoVBPH+W1O8LMjfH6O3hJFWksHmACSshhj2xyvY3Xrpc/bp32dKChVv2NDITh/iAEf3mlZ7drsu/Dw3ikLy9/kDhI5Z29XhMv2XeQTi3HSU+eBUM3AJlORvyI8v7/moN ed@prajna.home
14:37 🔗 GLaDOS Okay, try connecting to urlteam@urlteam.terrywri.st
14:45 🔗 GLaDOS It's running in a tmux session
14:48 🔗 edsu ok, i'm there
14:48 🔗 edsu can definitely see the warrior traffic with varnishncsa
14:48 🔗 edsu :-)
14:49 🔗 edsu tmux uses ctrl-b instead of ctrl-a like screen?
14:50 🔗 edsu is it ok if i enable varnish logging?
14:53 🔗 edsu well i did it, you should see /var/log/varnish/varnishncsa.log now
14:55 🔗 GLaDOS Yeah, it uses ^B
14:55 🔗 GLaDOS Also yeah, go ahead
14:56 🔗 PepsiMax :O
14:56 🔗 PepsiMax warriror vor urlvoid
14:56 🔗 PepsiMax uh urltean?
14:57 🔗 PepsiMax I like
14:57 🔗 edsu so can i start up the tracker again, so we can see what errors we might get?
14:57 🔗 edsu i still haven't seen any on my dev instances after thousands of tasks from 5-6 clients
14:58 🔗 edsu jeezum there are 39732 files in /home/urlteam/tinyarchive/tracker/files
14:59 🔗 edsu that's where the task results are stored temporarily before they are processed by other bits of the code i haven't looked at yet
15:03 🔗 edsu looks like one of them is corrupted, won't uncompress tracker/files/tmp025hl6
15:04 🔗 * edsu is counting how many url mappings are in tracker/files/*
15:04 🔗 edsu zcat tracker/files/* | wc -l
15:04 🔗 GLaDOS You can, it should be in the history for the top tmux window
15:04 🔗 edsu 609,394
15:04 🔗 edsu i wonder if that corrupted gzip file was messing things up
15:05 🔗 edsu ok, i'll restart
15:05 🔗 GLaDOS Also, possibly.
15:05 🔗 edsu i think i can remove the task for that file
15:05 🔗 edsu from the sqlite db
15:05 🔗 PepsiMax ok, I have an tinyback instance running now :)
15:07 🔗 edsu ok, i remove that task associated with the corrupted file
15:07 🔗 GLaDOS edsu: notice the initial influx of people?
15:07 🔗 GLaDOS oh wait, you will
15:07 🔗 edsu i haven't started it up yet
15:08 🔗 edsu i'll do that now :-)
15:08 🔗 GLaDOS oh god, that's not the version of tmux i think it is
15:08 🔗 edsu although it might be nice to let it clean up what it has before opening up to the world?
15:08 🔗 GLaDOS Or maybe just try exporting it all?
15:08 🔗 edsu so what commands would you normally do when you remembered?
15:08 🔗 edsu or was something else hitting the admin urls?
15:09 🔗 GLaDOS It was just a run of cleanup.py every so often
15:09 🔗 edsu ok, how about we try that first before opening up to the world?
15:10 🔗 edsu i can bring it up on another port and we can issue cleanup?
15:11 🔗 GLaDOS You could just take varnish down..
15:12 🔗 edsu alright i'll do that
15:12 🔗 edsu actually i'll just use a different port
15:12 🔗 edsu right now varnish is giving a nice 503 error
15:13 🔗 edsu which is better for the warriors than some kind of connection refused error ; although maybe it doesn't matter
15:13 🔗 edsu i can already see the problem :)
15:13 🔗 edsu i brought it up on :7091
15:16 🔗 edsu oops, i mean :8000
15:17 🔗 edsu doing an /admin/cleanup now
15:18 🔗 edsu still have the files in tinyarchive/tracker/files
15:22 🔗 edsu i'm still confused about where the data goes after tasks have been stored
15:23 🔗 edsu is there something that comes along periodically and collects up the data via the web api?
15:24 🔗 edsu is David Triendl still around?
15:26 🔗 edsu soultcer: ping :)
15:27 🔗 edsu i suspect that there is something that comes along and collects stuff
15:31 🔗 GLaDOS fetch_finished is supposed to be able to do it
15:33 🔗 edsu oh
15:33 🔗 edsu you want to try running that?
15:34 🔗 edsu the tracker is running at http://localhost:8000
15:34 🔗 edsu still
15:35 🔗 edsu from the varnishncsa log it looks like there are 37 active warriors
15:36 🔗 edsu i can run fetch_finished i guess
15:39 🔗 edsu seems to be working
15:43 🔗 edsu waiting for it to stop :)
15:47 🔗 GLaDOS you didn't define an output directory, did you?
15:47 🔗 edsu no, so it is going in '.' :-)
15:47 🔗 edsu i can move them elsewhere when it's done if you want
15:47 🔗 GLaDOS Ah, true/
15:47 🔗 edsu where do you typically dump them?
15:47 🔗 GLaDOS Funny thing, never was able to dump them
15:48 🔗 edsu yeah, it uses the web app
15:48 🔗 GLaDOS ah
15:48 🔗 edsu the tracker api, so if the tracker was frozen, script would get jammed up too i guess
15:48 🔗 GLaDOS well, im going to sleep
15:48 🔗 GLaDOS have fun, i guess.
15:48 🔗 edsu what would you do with the files when they are done normally?
15:48 🔗 edsu ok, where are you btw?
15:49 🔗 GLaDOS Perth, 'straya
15:49 🔗 edsu nice
15:49 🔗 edsu i'm ehs@pobox.com btw
15:49 🔗 GLaDOS I'd just leave them in a directory. I have a copy of the URLTeam torrent locally
15:49 🔗 edsu if async communication works
15:49 🔗 GLaDOS ah yeah
15:49 🔗 GLaDOS i rarely use email
15:49 🔗 edsu so you normally create the files, and then create a torrent of it
15:49 🔗 edsu ?
15:50 🔗 GLaDOS Yeah, we have scripts in the same directory for importing them into the torrent db
15:50 🔗 edsu does the torrent get put up on internet archive i guess?
15:50 🔗 GLaDOS The torrent doesn't get put up AFAIK, but we give a copy to 301works
15:50 🔗 edsu gotcha
15:51 🔗 GLaDOS night
15:51 🔗 edsu which script does the torrent db thing?
15:51 🔗 edsu g'night
15:51 🔗 GLaDOS release_import
15:51 🔗 edsu thnx
15:51 🔗 GLaDOS and stuff
19:15 🔗 pft so is the real tracker back up?
20:07 🔗 Chat3769 Hello
20:07 🔗 Chat3769 Hello
20:18 🔗 edsu it's not back up yet
20:18 🔗 edsu getting close, the fetch_finished finally finished
20:19 🔗 edsu now to run release_import
20:20 🔗 edsu fetch_finished kept failing because there were 5 tasks that looked to be complete but lacked task files
20:20 🔗 edsu which caused it to abort
20:30 🔗 edsu might have to wait for GLaDOS to return
20:31 🔗 edsu i don't think release_import gets run after fetch_finished
20:31 🔗 edsu think there needs to be a call to create_release.py
20:32 🔗 edsu while the code is quite clean
20:32 🔗 edsu the process for creating a release does need better documentation, and perhaps automation
20:41 🔗 edsu pft: ok, i started it back up
20:41 🔗 edsu figure the release can wait, perhaps
20:41 🔗 edsu immediately i see a bunch of errors about the database being locked :)
20:41 🔗 edsu although some requests seem to be working ok
20:42 🔗 edsu might make sense to move from sqlite to mysql/postgres
20:42 🔗 pft ugh it's using sqlite?
20:42 🔗 edsu yeah
20:43 🔗 pft bluh
20:43 🔗 pft based on the number of clients that are probably running it is most likely spending its life blocking, then
20:43 🔗 edsu right, yeah
20:45 🔗 pft it's not actually submitting the sqlite files anywhere, is it?
20:45 🔗 pft it didn't seem like a complicated schema when i glanced at the source earlier this week
20:45 🔗 edsu eventually they are dumped to a file that is torrented
20:45 🔗 edsu the schema is very light weight, the submitted url mappings are stored in gzipped text files
20:46 🔗 pft so it purges the sqlite file once they've dumped to a file?
20:46 🔗 edsu it just flags them as 'deleted'
20:46 🔗 edsu it doesn't actually delete them from the db
20:46 🔗 pft ahhh ok
20:47 🔗 pft so the database size does grow to inifinity ;)
20:47 🔗 edsu lots of 500 errors now
20:47 🔗 edsu i guess, but there isn't a row for every mapping
20:47 🔗 edsu so maybe it's not too bad
20:47 🔗 pft hmm ok
20:47 🔗 pft i'm happy to modify it to use mysql or postgresql but i guess we need to know what GLaDOS would be happier running
21:02 🔗 edsu looks like it might be a one line change
21:02 🔗 edsu thanks to the abstraction web.py gives you
21:04 🔗 pft nice!
21:05 🔗 edsu do archiveteam folks tend to prefer mysql to postgres or the other way around?
21:05 🔗 edsu i guess it is a question for GlaDOS
21:05 🔗 pft that i don't know
21:05 🔗 pft yeah
21:05 🔗 edsu when he is around
21:06 🔗 edsu i prefer postgres, but actually use mysql more (due to constraints at work)
21:06 🔗 * edsu shrugs
21:06 🔗 pft yeah, i prefer postgres
21:08 🔗 pft though i mostly use mysql
21:10 🔗 edsu here's the line btw
21:10 🔗 edsu https://github.com/ArchiveTeam/tinyarchive/blob/master/tracker/tracker.py#L36
21:10 🔗 pft well
21:11 🔗 pft once you make create scripts and get the data types correct then that's the line that needs to change ;)
21:28 🔗 edsu yeah, i think the create script is just sql
21:28 🔗 edsu didn't seem to be particularly sqlite specific
21:29 🔗 edsu https://github.com/ArchiveTeam/tinyarchive/blob/master/tracker/schema.sql
21:29 🔗 edsu right?
21:29 🔗 edsu might be some stuff to strip out
21:30 🔗 edsu ON CONFLICT IGNORE sticks out
21:31 🔗 edsu but other than that, seems pretty standardish?
21:44 🔗 pft i'm not sure about "TEXT" as a field type
21:44 🔗 pft i guess that is ok
21:45 🔗 pft When an applicable constraint violation occurs, the IGNORE resolution algorithm skips the one row that contains the constraint violation and continues processing subsequent rows of the SQL statement as if nothing went wrong.
21:45 🔗 pft hahaha NOTHING TO SEE HERE! MOVE ALONG!
21:46 🔗 pft yeah this might actually be as easy as you said initially
21:49 🔗 gex Hey, http://urlte.am/ says you guys need a server for storing / seeding the release torrent. I've got a dedicated server with 1Gbps that isn't doing much right now, so I could definitely seed that torrent pretty well. Is there anything else you guys could use a dedicated server for?

irclogger-viewer