[00:00] *** BlueMax has quit IRC (Quit: Leaving) [01:04] *** ZizzyDizz has joined #archiveteam-ot [01:04] Hello, I was wondering if anyone here has a way to archive a disqus channel? [01:05] I just found out today they were getting deleted and there's two I really need to save in some capacity. [01:05] And I don't have the bandwidth to do it myself on my home PC, as I have less than 600kbps. [01:10] *** BlueMax has joined #archiveteam-ot [01:22] *** killsushi has quit IRC (Read error: Connection reset by peer) [01:26] *** DogsRNice has quit IRC (Read error: Connection reset by peer) [01:38] *** nepeat has quit IRC (Read error: Connection reset by peer) [01:39] *** nepeat has joined #archiveteam-ot [01:46] *** nepeat has quit IRC (Quit: ZNC 1.7.4 - https://znc.in) [01:47] I can't seem to get grab-site to respect --wpull-args= [01:47] *** nepeat has joined #archiveteam-ot [01:55] *** m007a83_ is now known as m007a83 [02:42] is disqus public or password protected? [02:44] *** qw3rty115 has joined #archiveteam-ot [02:49] *** qw3rty114 has quit IRC (Read error: Operation timed out) [03:42] *** qw3rty116 has joined #archiveteam-ot [03:47] *** qw3rty115 has quit IRC (Ping timeout: 612 seconds) [04:00] *** lunik1 has quit IRC (:x) [04:05] *** ZizzyDizz has quit IRC (Ping timeout: 260 seconds) [04:10] *** dhyan_nat has joined #archiveteam-ot [07:18] *** Mateon1 has joined #archiveteam-ot [09:13] ZizzyDizz: Disqus is heavily JS-based, so you won't be able to get much with wpull/grab-site, wget, etc. [09:28] *** BlueMax has quit IRC (Quit: Leaving) [10:33] *** tuluu has quit IRC (Read error: Connection refused) [10:33] *** tuluu has joined #archiveteam-ot [11:52] *** lunik1 has joined #archiveteam-ot [12:29] *** dhyan_nat has quit IRC (Read error: Operation timed out) [12:59] *** killsushi has joined #archiveteam-ot [15:40] *** h3ndr1k_ has quit IRC (Ping timeout: 252 seconds) [15:53] *** dhyan_nat has joined #archiveteam-ot [17:18] *** h3ndr1k has joined #archiveteam-ot [17:50] How can I become a good steward once we get this 500/500 fiber installed [17:50] need a Quickstart Guide [18:32] Where do you live? [18:32] Can I put a rsync target on that connection? [18:37] I haven't even set up a box yet [18:37] will probably need to do some bouncy bouncy before renting to anyone who'll get us terminated for DMCA [18:59] *** h3ndr1k has quit IRC (Quit: ) [19:03] *** h3ndr1k has joined #archiveteam-ot [19:41] *** ShellyRol has quit IRC (Ping timeout: 745 seconds) [19:52] *** ShellyRol has joined #archiveteam-ot [20:03] *** ZizzyDizz has joined #archiveteam-ot [20:03] No markedL [20:13] *** dhyan_nat has quit IRC (Read error: Operation timed out) [20:38] ZizzyDizz: sounds like you might need to use chromebot. What's the URL so we can experiment? [20:38] Have fun with that. You'll need a *lot* of resources. [20:38] Also, archival talk should happen in -bs. [20:38] Raccoon: over quota is the bigger risk than DMCA with some of these folk [20:38] I'm grabbing it since a few hours. [20:57] *** Hani111 has joined #archiveteam-ot [21:08] *** Hani has quit IRC (Ping timeout: 745 seconds) [21:08] *** Hani111 is now known as Hani [21:23] markedL, yeah, but i wonder if they have a quota for 500/500 on a business fiber [21:24] if we do this, i'm dripping every drop of that sht [21:24] 5.15 TB/day [21:29] I want to start an archive group called Going Postal, where we circulate 4 or 10 TB harddrives between high-bandwidth transmission lines and low-bandwidth archivists. [21:31] Which provider? [21:32] Century Link [21:39] FWIW, "The data usage limit applies to residential HSI. It does not apply to business-class HSI." from https://www.centurylink.com/aboutus/legal/internet-service-disclosure/full-version.html [21:40] (HSI = High Speed Internet) [22:36] anyone know of an easy way to get current tab count in firefox? [22:36] ideally without an addon [22:42] *** BlueMax has joined #archiveteam-ot [22:48] are you setting up a prometheus metric on your tab addiction [22:53] You could parse the sessionstore file in the Firefox profile directory. [22:53] https://superuser.com/questions/1363747/how-to-decode-decipher-mozilla-firefox-proprietary-jsonlz4-format-sessionstor [22:54] "Proprietary" *twitch* [22:54] I see only sessionstore-backups [22:54] Yup [22:54] and it's got some jsonlz4 junk that lz4cat can't read [22:54] They changed that a while ago. [22:55] It's not really "junk". One of the naswers on SU explains it: "Unfortunately, due to a non-standard header, standard tools won't work. There's an open proposal to change that. Apparently the Mozilla header was devised before a standard lz4 frame format existed; it does wrap a standard lz4 block." [22:56] https://github.com/badboy/jsonlz4cat works [22:59] https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/windows/getAll#Examples [23:00] *** ShellyRol has quit IRC (Read error: Operation timed out) [23:01] Hmm, all those tools just skip the first 8 bytes and then decompress the rest. But then why doesn't tail -c+9 | lz4cat work? [23:03] dd can do a byte skip also [23:04] So? [23:04] rephrased, I'd trust dd more than tail , but maybe thats not the issue [23:05] They're both fine. [23:05] And yes, definitely not the issue. [23:05] the size in the header might be different [23:05] *** ShellyRol has joined #archiveteam-ot [23:05] Hmm [23:08] JAA: > "Residential Fiber Gigabit plans are also not subject to data usage limits." [23:08] raaaah why can't this be simple [23:08] Mozilla likes to overengineer things. [23:09] Many things used to be simple. Adding a custom search engine was a simple modification of a .json file. Now it's essentially impossible. [23:11] I suspect some of these profile annoyances are intentional things designed to discourage other software from touching the profile [23:11] Chrome has a pretty crazy session format [23:13] If I'm reading https://github.com/badboy/jsonlz4cat/blob/master/src/main.rs correctly, it does a read(8) for the magic bytes, then a read(4) for the outsize, and then throws the rest into LZ4 decompression. Meanwhile https://gist.github.com/Tblue/62ff47bef7f894e92ed5 , which reportedly also works (haven't tested it), only skips the magic bytes. ¯\_(ツ)_/¯ [23:15] I got tired of browser session managment. use Session Buddy on chrome, prolly ff too [23:15] saved my ass manya time [23:28] *** qw3rty117 has joined #archiveteam-ot [23:34] *** qw3rty116 has quit IRC (Ping timeout: 612 seconds) [23:59] Apparently the problem is that lz4cat uses a different decompression routine than those tools. Specifically, it uses LZ4_decompress_safe, not LZ4_decompress_safe_partial. I think.