[00:03] *** schbirid has quit IRC (Quit: Leaving) [00:10] *** apache2 has joined #archiveteam-ot [00:37] *** robogoat_ has quit IRC (Ping timeout: 258 seconds) [00:37] *** robogoat has joined #archiveteam-ot [01:03] Mateon1: I don't get it, why does node need that much stuff in memory for crawling annotations :-) [01:04] how was it stored on disk? [01:26] *** yawkat has quit IRC (Ping timeout: 246 seconds) [01:35] *** yawkat has joined #archiveteam-ot [02:30] yano: This would interest you. It's about the relationship of public libraries and ebooks, and new lending restrictions on the horizon. https://www.eff.org/deeplinks/2019/11/publishers-should-be-making-e-book-licensing-better-not-worse [02:31] Publishers are trying to erase library expectations of 'first sale doctrine' behavior / use in the e-book realm. [02:48] Raccoon: yeah, i saw that :-\ [02:50] Really waiting for them to overstep, like Disney ending licensing of old films from the Fox catalog, like The Day The Earth Stood Still. [02:51] That act should immediately convert the copyright to public domain [02:53] *** ShellyRol has quit IRC (Read error: Connection reset by peer) [02:55] *** ShellyRol has joined #archiveteam-ot [03:14] *** kiskabak has quit IRC (Ping timeout (120 seconds)) [03:15] *** kiskabak has joined #archiveteam-ot [03:15] *** Fusl sets mode: +o kiskabak [03:15] *** Fusl__ sets mode: +o kiskabak [03:15] *** Fusl_ sets mode: +o kiskabak [03:39] *** m007a83 has joined #archiveteam-ot [03:46] *** manjaro-u has quit IRC (Read error: Operation timed out) [03:53] *** BlueMax has joined #archiveteam-ot [04:39] *** manjaro-u has joined #archiveteam-ot [04:40] *** qw3rty has joined #archiveteam-ot [04:49] *** qw3rty2 has quit IRC (Ping timeout: 745 seconds) [04:51] *** manjaro-u has quit IRC (Quit: Konversation terminated!) [05:38] *** manjaro-u has joined #archiveteam-ot [05:50] *** nataraj has joined #archiveteam-ot [05:57] *** manjaro-u has quit IRC (Quit: Konversation terminated!) [06:00] *** nataraj has quit IRC (Read error: Operation timed out) [06:05] *** nataraj has joined #archiveteam-ot [06:17] ivan: Sorry, I missed that as I was asleep already. I need a fast way to check whether a page was visited before queuing that page. At first I just stored the video IDs, playlist IDs and user IDs as strings in a Set(), then when I ran into GC issues I decoded the string IDs into a 64-bit integer, and stored that. [06:19] For the annotation crawler most of this was done server side, and there was only a small set of cached IDs not to recrawl. With this one, there is no server, I'm just dumping lines of text into a TSV file in append mode for later processing. [06:19] *** manjaro-u has joined #archiveteam-ot [06:20] Mateon1: ah, I guess rocksdb might be a better place to store such a set [06:24] *** nataraj has quit IRC (Quit: Konversation terminated!) [06:32] I just took a look, but I have no idea how to use that from an application, the C++ bindings are quite a mess, and I'd prefer to avoid Java if possible. I need to rethink the problem... and I'll probably end up reinventing the database, or something [06:32] I just attempted to replicate the thing that Raccoon just to make sure it's not only 'em; yep, with uBlock being active, I was unable to download https://cdn4.vectorstock.com/i/1000x1000/00/93/of-thief-vector-23180093.jpg (but, I can still download it even with that supposed block by dragging the image from the window into the desktop), [06:32] With uBlock disabled, I was able to download the image normally [06:32] *** dhyan_nat has joined #archiveteam-ot [06:34] Neat, I didn't know Chrome did drag-drop to the desktop. [06:37] Yeah, I usually save images because I'm too lazy to right-click the image, click "Save image as...", and have to either browse for a folder for the image to save in, or just skip it and have it set as the default download and confirm with "Save" [06:39] Speaking of drag and drop, there's one thing that Firefox has that Google Chrome appears to never implement (without an extension, being Tab-Snap) most likely due to what's basically 'dummy-proofing' in the eyes of the developers, being able to open multiple links from dragging text onto the window [06:40] I can drag one link onto Google Chrome, but not multiple at all [06:50] *** kiska has quit IRC (Remote host closed the connection) [06:50] *** Flashfire has quit IRC (Remote host closed the connection) [06:51] *** kiska has joined #archiveteam-ot [06:51] *** Flashfire has joined #archiveteam-ot [06:51] *** Fusl__ sets mode: +o kiska [06:51] *** Fusl sets mode: +o kiska [06:51] *** Fusl_ sets mode: +o kiska [07:42] Mateon1: people have bindings to it for node of course [07:42] and Rust, and a bunch of others [07:56] compared to using postgresql you get compression and more transactions per second while losing multi-user and interesting querying capabilities, so it can make sense for applications where you have a single process touching it [08:11] I'll look into it, thanks for the suggestion. I'll have free time for experimenting on the weekend. [09:10] *** Raccoon has quit IRC (Ping timeout: 612 seconds) [09:17] *** lunik1 has quit IRC (Read error: Connection reset by peer) [09:17] *** lunik1 has joined #archiveteam-ot [09:25] *** Raccoon has joined #archiveteam-ot [09:48] *** manjaro-u has quit IRC (Quit: Konversation terminated!) [10:00] *** manjaro-u has joined #archiveteam-ot [10:06] *** HP_Archiv has quit IRC (Ping timeout: 263 seconds) [10:14] *** manjaro-u has quit IRC (Quit: Konversation terminated!) [10:55] *** BlueMax has quit IRC (Read error: Connection reset by peer) [11:14] *** IAmbience has quit IRC (Quit: Connection closed for inactivity) [11:26] *** Tenebrae has quit IRC (Read error: Operation timed out) [11:41] *** Tenebrae has joined #archiveteam-ot [12:06] *** tuluu_ has joined #archiveteam-ot [12:06] *** tuluu has quit IRC (Read error: Connection reset by peer) [12:18] *** martini has joined #archiveteam-ot [12:45] *** IAmbience has joined #archiveteam-ot [12:53] *** hata has joined #archiveteam-ot [13:21] *** deevious has quit IRC (Read error: Connection reset by peer) [13:22] *** deevious has joined #archiveteam-ot [13:28] *** deevious has quit IRC (Ping timeout: 252 seconds) [14:31] *** systwi_ is now known as systwi [14:44] *** deevious has joined #archiveteam-ot [15:14] *** manjaro-u has joined #archiveteam-ot [15:54] *** martini has quit IRC (Read error: Connection reset by peer) [15:55] *** martini has joined #archiveteam-ot [16:05] neat, https://archivebox.io/ [16:30] *** X-Scale has quit IRC (Ping timeout: 252 seconds) [16:31] *** [X-Scale] has joined #archiveteam-ot [16:31] *** [X-Scale] is now known as X-Scale [16:32] *** deevious has quit IRC (Ping timeout: 252 seconds) [16:36] *** manjaro-u has quit IRC (Konversation terminated!) [16:42] *** Video has joined #archiveteam-ot [16:43]