[00:45] *** dashcloud has quit IRC (Read error: Connection reset by peer) [00:52] *** VerifiedJ has quit IRC (Leaving) [01:27] *** dashcloud has joined #archiveteam-ot [02:29] *** Stiletto has joined #archiveteam-ot [02:31] *** Stilett0 has quit IRC (Ping timeout: 265 seconds) [04:25] *** Stilett0 has joined #archiveteam-ot [04:30] *** Stiletto has quit IRC (Read error: Operation timed out) [04:30] *** Stiletto has joined #archiveteam-ot [04:33] *** Stilett0 has quit IRC (Read error: Operation timed out) [05:09] *** BlueMaxim has joined #archiveteam-ot [05:12] *** BlueMax has quit IRC (Ping timeout: 260 seconds) [05:42] *** jodizzle has joined #archiveteam-ot [06:42] *** fenn has quit IRC (Remote host closed the connection) [06:51] *** Mateon1 has quit IRC (Ping timeout: 255 seconds) [06:51] *** Mateon1 has joined #archiveteam-ot [10:54] *** BlueMaxim has quit IRC (Read error: Connection reset by peer) [12:45] *** VerifiedJ has joined #archiveteam-ot [12:51] *** alex____ has quit IRC (Ping timeout: 360 seconds) [12:54] *** alex__ has joined #archiveteam-ot [13:28] *** Stilett0 has joined #archiveteam-ot [13:28] *** Stiletto has quit IRC (Ping timeout: 265 seconds) [13:55] *** dashcloud has quit IRC (No Ping reply in 180 seconds.) [13:57] *** dashcloud has joined #archiveteam-ot [14:18] Interesting: https://github.com/Microsoft/ProcDump-for-Linux [14:55] *** dashcloud has quit IRC (Quit: No Ping reply in 180 seconds.) [14:56] *** dashcloud has joined #archiveteam-ot [16:38] *** SimpBrain has quit IRC (Ping timeout: 252 seconds) [17:17] *** wp494 has quit IRC (Read error: Operation timed out) [17:17] *** wp494 has joined #archiveteam-ot [18:40] *** SimpBrain has joined #archiveteam-ot [19:21] how shit is https://dronebl.org/ ? my current ip was added more than a year ago and now i have been waiting for a week to have ti removed [19:28] they have an irc [19:31] i'll let you take a wild guess [19:31] * You are banned from this server- [Listed in DroneBL] Please review https://dronebl.org/lookup?network=AlphaChat&ip=[stripped] and https://www.alphachat.net/blocked.xhtml [19:31] * *** Your IP address [stripped] is listed in dnsbl.dronebl.org [19:46] *** odemg has joined #archiveteam-ot [19:50] *** t2t2 has quit IRC (Remote host closed the connection) [19:52] *** t2t2 has joined #archiveteam-ot [20:43] *** icedice has joined #archiveteam-ot [21:03] *** icedice has quit IRC (Quit: Leaving) [21:30] *** schbirid has quit IRC (Remote host closed the connection) [21:35] *** BlueMax has joined #archiveteam-ot [21:57] JAA: did you do any performance testing on the new sqlalchemy query with the priority? I can't prove to myself that it'll be fast because I have no idea how sqlalchemy will arrange the PK [22:02] afaik the query would be fast with an index on (status, priority) and maybe not with separate indexes [22:04] ivan: I didn't, and that's something I thought about in general as well, a test for performance to ensure we don't introduce regressions in that sense. [22:04] (status, priority, id) I mean based on [22:04] .filter(QueuedURL.status.in_((Status.todo.value, Status.error.value))) \ [22:04] .order_by(QueuedURL.priority.desc(), QueuedURL.status.desc(), QueuedURL.id) \ [22:05] regressing that would be bad [22:05] I assume you know that if you have an index on (a, b, c) you can do fast lookups on any prefix (a), (a, b), or (a, b, c) [22:06] Yeah, right. [22:06] Would an ORDER BY b, a, c also be fast though? [22:09] I have no idea. [22:09] I also wonder how fast that ORDER BY status DESC is in general. Since there is no ENUM in SQLite, it would just store strings, so the ordering would be a dumb string comparison, I think. [22:10] things will depend on how much sqlite has to sort in memory [22:12] maybe you can populate a giant table of URLs with priorities and see how fast these queries are in a repl [22:13] and see which indexes sqlalchemy created on the table [22:13] By the way, when I was working on a distributed version of wpull through a central pgsql DB, I quickly realised that SQLAlchemy is incredibly inefficient. It's still a nice piece of software since it makes code easily portable between different databases, but it's not exactly performant. [22:14] Yeah, will try that out. Throwing some huge site into it with big sitemaps, then do some EXPLAIN queries. [22:16] I can provide sitemaps [22:16] One example of SQLAlchemy ORM working against us is the parent_url. SQLAlchemy won't do a JOIN. Instead, it will lazily load that entry through a separate SELECT on the PK when it is first accessed (I think). [22:17] Which easily doubles the number of queries. [22:18] ivan: Oh also, I think there's a significant potential for improving DB performance. Currently, check_in selects the table entry by comparing against the URL string. That means there has to be a huge index on url_strings.url. We should replace that by a queued_urls.id value in URLRecord at some point. [23:16] *** VerifiedJ has quit IRC (Quit: Leaving) [23:32] *** bakJAA has quit IRC (Ping timeout: 260 seconds) [23:33] *** betamax has quit IRC (Remote host closed the connection) [23:34] *** betamax has joined #archiveteam-ot [23:35] *** bakJAA has joined #archiveteam-ot [23:35] *** svchfoo1 sets mode: +o bakJAA [23:36] *** JAA sets mode: +o bakJAA [23:38] *** BlueMax has quit IRC (Read error: Connection reset by peer) [23:38] *** BlueMax has joined #archiveteam-ot [23:42] *** Odd0002 has quit IRC (Ping timeout: 260 seconds) [23:45] *** Odd0002 has joined #archiveteam-ot