#urlteam 2013-11-08,Fri

↑back Search

Time	Nickname	Message
14:32 ^🔗	edsu	GLaDOS: sure, yeah
14:34 ^🔗	edsu	ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC5/yScdTEOtHvkwh92s4Ry4I+gUfk3UC/+6M4LuM/kdAF69QONR4JLyR9baesCOrj64ajvlCYFwWJaP1/tMLup2ECCTvtEpazh0Jp0/iFLLb+kJVWqKxpbf6qqWihW3mErQqUxgdkJ05GhPC8DjoBY9EI01f2JuOWLdJP0Iw9mnt/T8hEmzh5VTeL9m3/+UJ+KXQRGlH811IkOTHVILD+DaoVBPH+W1O8LMjfH6O3hJFWksHmACSshhj2xyvY3Xrpc/bp32dKChVv2NDITh/iAEf3mlZ7drsu/Dw3ikLy9/kDhI5Z29XhMv2XeQTi3HSU+eBUM3AJlORvyI8v7/moN ed@prajna.home
14:37 ^🔗	GLaDOS	Okay, try connecting to urlteam@urlteam.terrywri.st
14:45 ^🔗	GLaDOS	It's running in a tmux session
14:48 ^🔗	edsu	ok, i'm there
14:48 ^🔗	edsu	can definitely see the warrior traffic with varnishncsa
14:48 ^🔗	edsu	:-)
14:49 ^🔗	edsu	tmux uses ctrl-b instead of ctrl-a like screen?
14:50 ^🔗	edsu	is it ok if i enable varnish logging?
14:53 ^🔗	edsu	well i did it, you should see /var/log/varnish/varnishncsa.log now
14:55 ^🔗	GLaDOS	Yeah, it uses ^B
14:55 ^🔗	GLaDOS	Also yeah, go ahead
14:56 ^🔗	PepsiMax	:O
14:56 ^🔗	PepsiMax	warriror vor urlvoid
14:56 ^🔗	PepsiMax	uh urltean?
14:57 ^🔗	PepsiMax	I like
14:57 ^🔗	edsu	so can i start up the tracker again, so we can see what errors we might get?
14:57 ^🔗	edsu	i still haven't seen any on my dev instances after thousands of tasks from 5-6 clients
14:58 ^🔗	edsu	jeezum there are 39732 files in /home/urlteam/tinyarchive/tracker/files
14:59 ^🔗	edsu	that's where the task results are stored temporarily before they are processed by other bits of the code i haven't looked at yet
15:03 ^🔗	edsu	looks like one of them is corrupted, won't uncompress tracker/files/tmp025hl6
15:04 ^🔗	*	edsu is counting how many url mappings are in tracker/files/*
15:04 ^🔗	edsu	zcat tracker/files/* \| wc -l
15:04 ^🔗	GLaDOS	You can, it should be in the history for the top tmux window
15:04 ^🔗	edsu	609,394
15:04 ^🔗	edsu	i wonder if that corrupted gzip file was messing things up
15:05 ^🔗	edsu	ok, i'll restart
15:05 ^🔗	GLaDOS	Also, possibly.
15:05 ^🔗	edsu	i think i can remove the task for that file
15:05 ^🔗	edsu	from the sqlite db
15:05 ^🔗	PepsiMax	ok, I have an tinyback instance running now :)
15:07 ^🔗	edsu	ok, i remove that task associated with the corrupted file
15:07 ^🔗	GLaDOS	edsu: notice the initial influx of people?
15:07 ^🔗	GLaDOS	oh wait, you will
15:07 ^🔗	edsu	i haven't started it up yet
15:08 ^🔗	edsu	i'll do that now :-)
15:08 ^🔗	GLaDOS	oh god, that's not the version of tmux i think it is
15:08 ^🔗	edsu	although it might be nice to let it clean up what it has before opening up to the world?
15:08 ^🔗	GLaDOS	Or maybe just try exporting it all?
15:08 ^🔗	edsu	so what commands would you normally do when you remembered?
15:08 ^🔗	edsu	or was something else hitting the admin urls?
15:09 ^🔗	GLaDOS	It was just a run of cleanup.py every so often
15:09 ^🔗	edsu	ok, how about we try that first before opening up to the world?
15:10 ^🔗	edsu	i can bring it up on another port and we can issue cleanup?
15:11 ^🔗	GLaDOS	You could just take varnish down..
15:12 ^🔗	edsu	alright i'll do that
15:12 ^🔗	edsu	actually i'll just use a different port
15:12 ^🔗	edsu	right now varnish is giving a nice 503 error
15:13 ^🔗	edsu	which is better for the warriors than some kind of connection refused error ; although maybe it doesn't matter
15:13 ^🔗	edsu	i can already see the problem :)
15:13 ^🔗	edsu	i brought it up on :7091
15:16 ^🔗	edsu	oops, i mean :8000
15:17 ^🔗	edsu	doing an /admin/cleanup now
15:18 ^🔗	edsu	still have the files in tinyarchive/tracker/files
15:22 ^🔗	edsu	i'm still confused about where the data goes after tasks have been stored
15:23 ^🔗	edsu	is there something that comes along periodically and collects up the data via the web api?
15:24 ^🔗	edsu	is David Triendl still around?
15:26 ^🔗	edsu	soultcer: ping :)
15:27 ^🔗	edsu	i suspect that there is something that comes along and collects stuff
15:31 ^🔗	GLaDOS	fetch_finished is supposed to be able to do it
15:33 ^🔗	edsu	oh
15:33 ^🔗	edsu	you want to try running that?
15:34 ^🔗	edsu	the tracker is running at http://localhost:8000
15:34 ^🔗	edsu	still
15:35 ^🔗	edsu	from the varnishncsa log it looks like there are 37 active warriors
15:36 ^🔗	edsu	i can run fetch_finished i guess
15:39 ^🔗	edsu	seems to be working
15:43 ^🔗	edsu	waiting for it to stop :)
15:47 ^🔗	GLaDOS	you didn't define an output directory, did you?
15:47 ^🔗	edsu	no, so it is going in '.' :-)
15:47 ^🔗	edsu	i can move them elsewhere when it's done if you want
15:47 ^🔗	GLaDOS	Ah, true/
15:47 ^🔗	edsu	where do you typically dump them?
15:47 ^🔗	GLaDOS	Funny thing, never was able to dump them
15:48 ^🔗	edsu	yeah, it uses the web app
15:48 ^🔗	GLaDOS	ah
15:48 ^🔗	edsu	the tracker api, so if the tracker was frozen, script would get jammed up too i guess
15:48 ^🔗	GLaDOS	well, im going to sleep
15:48 ^🔗	GLaDOS	have fun, i guess.
15:48 ^🔗	edsu	what would you do with the files when they are done normally?
15:48 ^🔗	edsu	ok, where are you btw?
15:49 ^🔗	GLaDOS	Perth, 'straya
15:49 ^🔗	edsu	nice
15:49 ^🔗	edsu	i'm ehs@pobox.com btw
15:49 ^🔗	GLaDOS	I'd just leave them in a directory. I have a copy of the URLTeam torrent locally
15:49 ^🔗	edsu	if async communication works
15:49 ^🔗	GLaDOS	ah yeah
15:49 ^🔗	GLaDOS	i rarely use email
15:49 ^🔗	edsu	so you normally create the files, and then create a torrent of it
15:49 ^🔗	edsu	?
15:50 ^🔗	GLaDOS	Yeah, we have scripts in the same directory for importing them into the torrent db
15:50 ^🔗	edsu	does the torrent get put up on internet archive i guess?
15:50 ^🔗	GLaDOS	The torrent doesn't get put up AFAIK, but we give a copy to 301works
15:50 ^🔗	edsu	gotcha
15:51 ^🔗	GLaDOS	night
15:51 ^🔗	edsu	which script does the torrent db thing?
15:51 ^🔗	edsu	g'night
15:51 ^🔗	GLaDOS	release_import
15:51 ^🔗	edsu	thnx
15:51 ^🔗	GLaDOS	and stuff
19:15 ^🔗	pft	so is the real tracker back up?
20:07 ^🔗	Chat3769	Hello
20:07 ^🔗	Chat3769	Hello
20:18 ^🔗	edsu	it's not back up yet
20:18 ^🔗	edsu	getting close, the fetch_finished finally finished
20:19 ^🔗	edsu	now to run release_import
20:20 ^🔗	edsu	fetch_finished kept failing because there were 5 tasks that looked to be complete but lacked task files
20:20 ^🔗	edsu	which caused it to abort
20:30 ^🔗	edsu	might have to wait for GLaDOS to return
20:31 ^🔗	edsu	i don't think release_import gets run after fetch_finished
20:31 ^🔗	edsu	think there needs to be a call to create_release.py
20:32 ^🔗	edsu	while the code is quite clean
20:32 ^🔗	edsu	the process for creating a release does need better documentation, and perhaps automation
20:41 ^🔗	edsu	pft: ok, i started it back up
20:41 ^🔗	edsu	figure the release can wait, perhaps
20:41 ^🔗	edsu	immediately i see a bunch of errors about the database being locked :)
20:41 ^🔗	edsu	although some requests seem to be working ok
20:42 ^🔗	edsu	might make sense to move from sqlite to mysql/postgres
20:42 ^🔗	pft	ugh it's using sqlite?
20:42 ^🔗	edsu	yeah
20:43 ^🔗	pft	bluh
20:43 ^🔗	pft	based on the number of clients that are probably running it is most likely spending its life blocking, then
20:43 ^🔗	edsu	right, yeah
20:45 ^🔗	pft	it's not actually submitting the sqlite files anywhere, is it?
20:45 ^🔗	pft	it didn't seem like a complicated schema when i glanced at the source earlier this week
20:45 ^🔗	edsu	eventually they are dumped to a file that is torrented
20:45 ^🔗	edsu	the schema is very light weight, the submitted url mappings are stored in gzipped text files
20:46 ^🔗	pft	so it purges the sqlite file once they've dumped to a file?
20:46 ^🔗	edsu	it just flags them as 'deleted'
20:46 ^🔗	edsu	it doesn't actually delete them from the db
20:46 ^🔗	pft	ahhh ok
20:47 ^🔗	pft	so the database size does grow to inifinity ;)
20:47 ^🔗	edsu	lots of 500 errors now
20:47 ^🔗	edsu	i guess, but there isn't a row for every mapping
20:47 ^🔗	edsu	so maybe it's not too bad
20:47 ^🔗	pft	hmm ok
20:47 ^🔗	pft	i'm happy to modify it to use mysql or postgresql but i guess we need to know what GLaDOS would be happier running
21:02 ^🔗	edsu	looks like it might be a one line change
21:02 ^🔗	edsu	thanks to the abstraction web.py gives you
21:04 ^🔗	pft	nice!
21:05 ^🔗	edsu	do archiveteam folks tend to prefer mysql to postgres or the other way around?
21:05 ^🔗	edsu	i guess it is a question for GlaDOS
21:05 ^🔗	pft	that i don't know
21:05 ^🔗	pft	yeah
21:05 ^🔗	edsu	when he is around
21:06 ^🔗	edsu	i prefer postgres, but actually use mysql more (due to constraints at work)
21:06 ^🔗	*	edsu shrugs
21:06 ^🔗	pft	yeah, i prefer postgres
21:08 ^🔗	pft	though i mostly use mysql
21:10 ^🔗	edsu	here's the line btw
21:10 ^🔗	edsu	https://github.com/ArchiveTeam/tinyarchive/blob/master/tracker/tracker.py#L36
21:10 ^🔗	pft	well
21:11 ^🔗	pft	once you make create scripts and get the data types correct then that's the line that needs to change ;)
21:28 ^🔗	edsu	yeah, i think the create script is just sql
21:28 ^🔗	edsu	didn't seem to be particularly sqlite specific
21:29 ^🔗	edsu	https://github.com/ArchiveTeam/tinyarchive/blob/master/tracker/schema.sql
21:29 ^🔗	edsu	right?
21:29 ^🔗	edsu	might be some stuff to strip out
21:30 ^🔗	edsu	ON CONFLICT IGNORE sticks out
21:31 ^🔗	edsu	but other than that, seems pretty standardish?
21:44 ^🔗	pft	i'm not sure about "TEXT" as a field type
21:44 ^🔗	pft	i guess that is ok
21:45 ^🔗	pft	When an applicable constraint violation occurs, the IGNORE resolution algorithm skips the one row that contains the constraint violation and continues processing subsequent rows of the SQL statement as if nothing went wrong.
21:45 ^🔗	pft	hahaha NOTHING TO SEE HERE! MOVE ALONG!
21:46 ^🔗	pft	yeah this might actually be as easy as you said initially
21:49 ^🔗	gex	Hey, http://urlte.am/ says you guys need a server for storing / seeding the release torrent. I've got a dedicated server with 1Gbps that isn't doing much right now, so I could definitely seed that torrent pretty well. Is there anything else you guys could use a dedicated server for?

irclogger-viewer