#archiveteam-ot 2018-11-05,Mon

↑back Search

Time Nickname Message
00:45 🔗 dashcloud has quit IRC (Read error: Connection reset by peer)
00:52 🔗 VerifiedJ has quit IRC (Leaving)
01:27 🔗 dashcloud has joined #archiveteam-ot
02:29 🔗 Stiletto has joined #archiveteam-ot
02:31 🔗 Stilett0 has quit IRC (Ping timeout: 265 seconds)
04:25 🔗 Stilett0 has joined #archiveteam-ot
04:30 🔗 Stiletto has quit IRC (Read error: Operation timed out)
04:30 🔗 Stiletto has joined #archiveteam-ot
04:33 🔗 Stilett0 has quit IRC (Read error: Operation timed out)
05:09 🔗 BlueMaxim has joined #archiveteam-ot
05:12 🔗 BlueMax has quit IRC (Ping timeout: 260 seconds)
05:42 🔗 jodizzle has joined #archiveteam-ot
06:42 🔗 fenn has quit IRC (Remote host closed the connection)
06:51 🔗 Mateon1 has quit IRC (Ping timeout: 255 seconds)
06:51 🔗 Mateon1 has joined #archiveteam-ot
10:54 🔗 BlueMaxim has quit IRC (Read error: Connection reset by peer)
12:45 🔗 VerifiedJ has joined #archiveteam-ot
12:51 🔗 alex____ has quit IRC (Ping timeout: 360 seconds)
12:54 🔗 alex__ has joined #archiveteam-ot
13:28 🔗 Stilett0 has joined #archiveteam-ot
13:28 🔗 Stiletto has quit IRC (Ping timeout: 265 seconds)
13:55 🔗 dashcloud has quit IRC (No Ping reply in 180 seconds.)
13:57 🔗 dashcloud has joined #archiveteam-ot
14:18 🔗 JAA Interesting: https://github.com/Microsoft/ProcDump-for-Linux
14:55 🔗 dashcloud has quit IRC (Quit: No Ping reply in 180 seconds.)
14:56 🔗 dashcloud has joined #archiveteam-ot
16:38 🔗 SimpBrain has quit IRC (Ping timeout: 252 seconds)
17:17 🔗 wp494 has quit IRC (Read error: Operation timed out)
17:17 🔗 wp494 has joined #archiveteam-ot
18:40 🔗 SimpBrain has joined #archiveteam-ot
19:21 🔗 schbirid how shit is https://dronebl.org/ ? my current ip was added more than a year ago and now i have been waiting for a week to have ti removed
19:28 🔗 jut they have an irc
19:31 🔗 schbirid i'll let you take a wild guess
19:31 🔗 schbirid * You are banned from this server- [Listed in DroneBL] Please review https://dronebl.org/lookup?network=AlphaChat&ip=[stripped] and https://www.alphachat.net/blocked.xhtml
19:31 🔗 schbirid * *** Your IP address [stripped] is listed in dnsbl.dronebl.org
19:46 🔗 odemg has joined #archiveteam-ot
19:50 🔗 t2t2 has quit IRC (Remote host closed the connection)
19:52 🔗 t2t2 has joined #archiveteam-ot
20:43 🔗 icedice has joined #archiveteam-ot
21:03 🔗 icedice has quit IRC (Quit: Leaving)
21:30 🔗 schbirid has quit IRC (Remote host closed the connection)
21:35 🔗 BlueMax has joined #archiveteam-ot
21:57 🔗 ivan JAA: did you do any performance testing on the new sqlalchemy query with the priority? I can't prove to myself that it'll be fast because I have no idea how sqlalchemy will arrange the PK
22:02 🔗 ivan afaik the query would be fast with an index on (status, priority) and maybe not with separate indexes
22:04 🔗 JAA ivan: I didn't, and that's something I thought about in general as well, a test for performance to ensure we don't introduce regressions in that sense.
22:04 🔗 ivan (status, priority, id) I mean based on
22:04 🔗 ivan .filter(QueuedURL.status.in_((Status.todo.value, Status.error.value))) \
22:04 🔗 ivan .order_by(QueuedURL.priority.desc(), QueuedURL.status.desc(), QueuedURL.id) \
22:05 🔗 ivan regressing that would be bad
22:05 🔗 ivan I assume you know that if you have an index on (a, b, c) you can do fast lookups on any prefix (a), (a, b), or (a, b, c)
22:06 🔗 JAA Yeah, right.
22:06 🔗 JAA Would an ORDER BY b, a, c also be fast though?
22:09 🔗 ivan I have no idea.
22:09 🔗 JAA I also wonder how fast that ORDER BY status DESC is in general. Since there is no ENUM in SQLite, it would just store strings, so the ordering would be a dumb string comparison, I think.
22:10 🔗 ivan things will depend on how much sqlite has to sort in memory
22:12 🔗 ivan maybe you can populate a giant table of URLs with priorities and see how fast these queries are in a repl
22:13 🔗 ivan and see which indexes sqlalchemy created on the table
22:13 🔗 JAA By the way, when I was working on a distributed version of wpull through a central pgsql DB, I quickly realised that SQLAlchemy is incredibly inefficient. It's still a nice piece of software since it makes code easily portable between different databases, but it's not exactly performant.
22:14 🔗 JAA Yeah, will try that out. Throwing some huge site into it with big sitemaps, then do some EXPLAIN queries.
22:16 🔗 Flashfire I can provide sitemaps
22:16 🔗 JAA One example of SQLAlchemy ORM working against us is the parent_url. SQLAlchemy won't do a JOIN. Instead, it will lazily load that entry through a separate SELECT on the PK when it is first accessed (I think).
22:17 🔗 JAA Which easily doubles the number of queries.
22:18 🔗 JAA ivan: Oh also, I think there's a significant potential for improving DB performance. Currently, check_in selects the table entry by comparing against the URL string. That means there has to be a huge index on url_strings.url. We should replace that by a queued_urls.id value in URLRecord at some point.
23:16 🔗 VerifiedJ has quit IRC (Quit: Leaving)
23:32 🔗 bakJAA has quit IRC (Ping timeout: 260 seconds)
23:33 🔗 betamax has quit IRC (Remote host closed the connection)
23:34 🔗 betamax has joined #archiveteam-ot
23:35 🔗 bakJAA has joined #archiveteam-ot
23:35 🔗 svchfoo1 sets mode: +o bakJAA
23:36 🔗 JAA sets mode: +o bakJAA
23:38 🔗 BlueMax has quit IRC (Read error: Connection reset by peer)
23:38 🔗 BlueMax has joined #archiveteam-ot
23:42 🔗 Odd0002 has quit IRC (Ping timeout: 260 seconds)
23:45 🔗 Odd0002 has joined #archiveteam-ot
