Time |
Nickname |
Message |
00:45
🔗
|
|
dashcloud has quit IRC (Read error: Connection reset by peer) |
00:52
🔗
|
|
VerifiedJ has quit IRC (Leaving) |
01:27
🔗
|
|
dashcloud has joined #archiveteam-ot |
02:29
🔗
|
|
Stiletto has joined #archiveteam-ot |
02:31
🔗
|
|
Stilett0 has quit IRC (Ping timeout: 265 seconds) |
04:25
🔗
|
|
Stilett0 has joined #archiveteam-ot |
04:30
🔗
|
|
Stiletto has quit IRC (Read error: Operation timed out) |
04:30
🔗
|
|
Stiletto has joined #archiveteam-ot |
04:33
🔗
|
|
Stilett0 has quit IRC (Read error: Operation timed out) |
05:09
🔗
|
|
BlueMaxim has joined #archiveteam-ot |
05:12
🔗
|
|
BlueMax has quit IRC (Ping timeout: 260 seconds) |
05:42
🔗
|
|
jodizzle has joined #archiveteam-ot |
06:42
🔗
|
|
fenn has quit IRC (Remote host closed the connection) |
06:51
🔗
|
|
Mateon1 has quit IRC (Ping timeout: 255 seconds) |
06:51
🔗
|
|
Mateon1 has joined #archiveteam-ot |
10:54
🔗
|
|
BlueMaxim has quit IRC (Read error: Connection reset by peer) |
12:45
🔗
|
|
VerifiedJ has joined #archiveteam-ot |
12:51
🔗
|
|
alex____ has quit IRC (Ping timeout: 360 seconds) |
12:54
🔗
|
|
alex__ has joined #archiveteam-ot |
13:28
🔗
|
|
Stilett0 has joined #archiveteam-ot |
13:28
🔗
|
|
Stiletto has quit IRC (Ping timeout: 265 seconds) |
13:55
🔗
|
|
dashcloud has quit IRC (No Ping reply in 180 seconds.) |
13:57
🔗
|
|
dashcloud has joined #archiveteam-ot |
14:18
🔗
|
JAA |
Interesting: https://github.com/Microsoft/ProcDump-for-Linux |
14:55
🔗
|
|
dashcloud has quit IRC (Quit: No Ping reply in 180 seconds.) |
14:56
🔗
|
|
dashcloud has joined #archiveteam-ot |
16:38
🔗
|
|
SimpBrain has quit IRC (Ping timeout: 252 seconds) |
17:17
🔗
|
|
wp494 has quit IRC (Read error: Operation timed out) |
17:17
🔗
|
|
wp494 has joined #archiveteam-ot |
18:40
🔗
|
|
SimpBrain has joined #archiveteam-ot |
19:21
🔗
|
schbirid |
how shit is https://dronebl.org/ ? my current ip was added more than a year ago and now i have been waiting for a week to have ti removed |
19:28
🔗
|
jut |
they have an irc |
19:31
🔗
|
schbirid |
i'll let you take a wild guess |
19:31
🔗
|
schbirid |
* You are banned from this server- [Listed in DroneBL] Please review https://dronebl.org/lookup?network=AlphaChat&ip=[stripped] and https://www.alphachat.net/blocked.xhtml |
19:31
🔗
|
schbirid |
* *** Your IP address [stripped] is listed in dnsbl.dronebl.org |
19:46
🔗
|
|
odemg has joined #archiveteam-ot |
19:50
🔗
|
|
t2t2 has quit IRC (Remote host closed the connection) |
19:52
🔗
|
|
t2t2 has joined #archiveteam-ot |
20:43
🔗
|
|
icedice has joined #archiveteam-ot |
21:03
🔗
|
|
icedice has quit IRC (Quit: Leaving) |
21:30
🔗
|
|
schbirid has quit IRC (Remote host closed the connection) |
21:35
🔗
|
|
BlueMax has joined #archiveteam-ot |
21:57
🔗
|
ivan |
JAA: did you do any performance testing on the new sqlalchemy query with the priority? I can't prove to myself that it'll be fast because I have no idea how sqlalchemy will arrange the PK |
22:02
🔗
|
ivan |
afaik the query would be fast with an index on (status, priority) and maybe not with separate indexes |
22:04
🔗
|
JAA |
ivan: I didn't, and that's something I thought about in general as well, a test for performance to ensure we don't introduce regressions in that sense. |
22:04
🔗
|
ivan |
(status, priority, id) I mean based on |
22:04
🔗
|
ivan |
.filter(QueuedURL.status.in_((Status.todo.value, Status.error.value))) \ |
22:04
🔗
|
ivan |
.order_by(QueuedURL.priority.desc(), QueuedURL.status.desc(), QueuedURL.id) \ |
22:05
🔗
|
ivan |
regressing that would be bad |
22:05
🔗
|
ivan |
I assume you know that if you have an index on (a, b, c) you can do fast lookups on any prefix (a), (a, b), or (a, b, c) |
22:06
🔗
|
JAA |
Yeah, right. |
22:06
🔗
|
JAA |
Would an ORDER BY b, a, c also be fast though? |
22:09
🔗
|
ivan |
I have no idea. |
22:09
🔗
|
JAA |
I also wonder how fast that ORDER BY status DESC is in general. Since there is no ENUM in SQLite, it would just store strings, so the ordering would be a dumb string comparison, I think. |
22:10
🔗
|
ivan |
things will depend on how much sqlite has to sort in memory |
22:12
🔗
|
ivan |
maybe you can populate a giant table of URLs with priorities and see how fast these queries are in a repl |
22:13
🔗
|
ivan |
and see which indexes sqlalchemy created on the table |
22:13
🔗
|
JAA |
By the way, when I was working on a distributed version of wpull through a central pgsql DB, I quickly realised that SQLAlchemy is incredibly inefficient. It's still a nice piece of software since it makes code easily portable between different databases, but it's not exactly performant. |
22:14
🔗
|
JAA |
Yeah, will try that out. Throwing some huge site into it with big sitemaps, then do some EXPLAIN queries. |
22:16
🔗
|
Flashfire |
I can provide sitemaps |
22:16
🔗
|
JAA |
One example of SQLAlchemy ORM working against us is the parent_url. SQLAlchemy won't do a JOIN. Instead, it will lazily load that entry through a separate SELECT on the PK when it is first accessed (I think). |
22:17
🔗
|
JAA |
Which easily doubles the number of queries. |
22:18
🔗
|
JAA |
ivan: Oh also, I think there's a significant potential for improving DB performance. Currently, check_in selects the table entry by comparing against the URL string. That means there has to be a huge index on url_strings.url. We should replace that by a queued_urls.id value in URLRecord at some point. |
23:16
🔗
|
|
VerifiedJ has quit IRC (Quit: Leaving) |
23:32
🔗
|
|
bakJAA has quit IRC (Ping timeout: 260 seconds) |
23:33
🔗
|
|
betamax has quit IRC (Remote host closed the connection) |
23:34
🔗
|
|
betamax has joined #archiveteam-ot |
23:35
🔗
|
|
bakJAA has joined #archiveteam-ot |
23:35
🔗
|
|
svchfoo1 sets mode: +o bakJAA |
23:36
🔗
|
|
JAA sets mode: +o bakJAA |
23:38
🔗
|
|
BlueMax has quit IRC (Read error: Connection reset by peer) |
23:38
🔗
|
|
BlueMax has joined #archiveteam-ot |
23:42
🔗
|
|
Odd0002 has quit IRC (Ping timeout: 260 seconds) |
23:45
🔗
|
|
Odd0002 has joined #archiveteam-ot |