#archiveteam-ot 2019-08-31,Sat

↑back Search

Time Nickname Message
00:00 🔗 BlueMax has quit IRC (Quit: Leaving)
01:04 🔗 ZizzyDizz has joined #archiveteam-ot
01:04 🔗 ZizzyDizz Hello, I was wondering if anyone here has a way to archive a disqus channel?
01:05 🔗 ZizzyDizz I just found out today they were getting deleted and there's two I really need to save in some capacity.
01:05 🔗 ZizzyDizz And I don't have the bandwidth to do it myself on my home PC, as I have less than 600kbps.
01:10 🔗 BlueMax has joined #archiveteam-ot
01:22 🔗 killsushi has quit IRC (Read error: Connection reset by peer)
01:26 🔗 DogsRNice has quit IRC (Read error: Connection reset by peer)
01:38 🔗 nepeat has quit IRC (Read error: Connection reset by peer)
01:39 🔗 nepeat has joined #archiveteam-ot
01:46 🔗 nepeat has quit IRC (Quit: ZNC 1.7.4 - https://znc.in)
01:47 🔗 ZizzyDizz I can't seem to get grab-site to respect --wpull-args=
01:47 🔗 nepeat has joined #archiveteam-ot
01:55 🔗 m007a83_ is now known as m007a83
02:42 🔗 markedL is disqus public or password protected?
02:44 🔗 qw3rty115 has joined #archiveteam-ot
02:49 🔗 qw3rty114 has quit IRC (Read error: Operation timed out)
03:42 🔗 qw3rty116 has joined #archiveteam-ot
03:47 🔗 qw3rty115 has quit IRC (Ping timeout: 612 seconds)
04:00 🔗 lunik1 has quit IRC (:x)
04:05 🔗 ZizzyDizz has quit IRC (Ping timeout: 260 seconds)
04:10 🔗 dhyan_nat has joined #archiveteam-ot
07:18 🔗 Mateon1 has joined #archiveteam-ot
09:13 🔗 JAA ZizzyDizz: Disqus is heavily JS-based, so you won't be able to get much with wpull/grab-site, wget, etc.
09:28 🔗 BlueMax has quit IRC (Quit: Leaving)
10:33 🔗 tuluu has quit IRC (Read error: Connection refused)
10:33 🔗 tuluu has joined #archiveteam-ot
11:52 🔗 lunik1 has joined #archiveteam-ot
12:29 🔗 dhyan_nat has quit IRC (Read error: Operation timed out)
12:59 🔗 killsushi has joined #archiveteam-ot
15:40 🔗 h3ndr1k_ has quit IRC (Ping timeout: 252 seconds)
15:53 🔗 dhyan_nat has joined #archiveteam-ot
17:18 🔗 h3ndr1k has joined #archiveteam-ot
17:50 🔗 Raccoon How can I become a good steward once we get this 500/500 fiber installed
17:50 🔗 Raccoon need a Quickstart Guide
18:32 🔗 kiska Where do you live?
18:32 🔗 kiska Can I put a rsync target on that connection?
18:37 🔗 Raccoon I haven't even set up a box yet
18:37 🔗 Raccoon will probably need to do some bouncy bouncy before renting to anyone who'll get us terminated for DMCA
18:59 🔗 h3ndr1k has quit IRC (Quit: )
19:03 🔗 h3ndr1k has joined #archiveteam-ot
19:41 🔗 ShellyRol has quit IRC (Ping timeout: 745 seconds)
19:52 🔗 ShellyRol has joined #archiveteam-ot
20:03 🔗 ZizzyDizz has joined #archiveteam-ot
20:03 🔗 ZizzyDizz No markedL
20:13 🔗 dhyan_nat has quit IRC (Read error: Operation timed out)
20:38 🔗 markedL ZizzyDizz: sounds like you might need to use chromebot. What's the URL so we can experiment?
20:38 🔗 JAA Have fun with that. You'll need a *lot* of resources.
20:38 🔗 JAA Also, archival talk should happen in -bs.
20:38 🔗 markedL Raccoon: over quota is the bigger risk than DMCA with some of these folk
20:38 🔗 JAA I'm grabbing it since a few hours.
20:57 🔗 Hani111 has joined #archiveteam-ot
21:08 🔗 Hani has quit IRC (Ping timeout: 745 seconds)
21:08 🔗 Hani111 is now known as Hani
21:23 🔗 Raccoon markedL, yeah, but i wonder if they have a quota for 500/500 on a business fiber
21:24 🔗 Raccoon if we do this, i'm dripping every drop of that sht
21:24 🔗 Raccoon 5.15 TB/day
21:29 🔗 Raccoon I want to start an archive group called Going Postal, where we circulate 4 or 10 TB harddrives between high-bandwidth transmission lines and low-bandwidth archivists.
21:31 🔗 JAA Which provider?
21:32 🔗 Raccoon Century Link
21:39 🔗 JAA FWIW, "The data usage limit applies to residential HSI. It does not apply to business-class HSI." from https://www.centurylink.com/aboutus/legal/internet-service-disclosure/full-version.html
21:40 🔗 JAA (HSI = High Speed Internet)
22:36 🔗 Kaz anyone know of an easy way to get current tab count in firefox?
22:36 🔗 Kaz ideally without an addon
22:42 🔗 BlueMax has joined #archiveteam-ot
22:48 🔗 ivan_ are you setting up a prometheus metric on your tab addiction
22:53 🔗 JAA You could parse the sessionstore file in the Firefox profile directory.
22:53 🔗 ivan_ https://superuser.com/questions/1363747/how-to-decode-decipher-mozilla-firefox-proprietary-jsonlz4-format-sessionstor
22:54 🔗 JAA "Proprietary" *twitch*
22:54 🔗 ivan_ I see only sessionstore-backups
22:54 🔗 JAA Yup
22:54 🔗 ivan_ and it's got some jsonlz4 junk that lz4cat can't read
22:54 🔗 JAA They changed that a while ago.
22:55 🔗 JAA It's not really "junk". One of the naswers on SU explains it: "Unfortunately, due to a non-standard header, standard tools won't work. There's an open proposal to change that. Apparently the Mozilla header was devised before a standard lz4 frame format existed; it does wrap a standard lz4 block."
22:56 🔗 ivan_ https://github.com/badboy/jsonlz4cat works
22:59 🔗 markedL https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/windows/getAll#Examples
23:00 🔗 ShellyRol has quit IRC (Read error: Operation timed out)
23:01 🔗 JAA Hmm, all those tools just skip the first 8 bytes and then decompress the rest. But then why doesn't tail -c+9 | lz4cat work?
23:03 🔗 markedL dd can do a byte skip also
23:04 🔗 JAA So?
23:04 🔗 markedL rephrased, I'd trust dd more than tail , but maybe thats not the issue
23:05 🔗 JAA They're both fine.
23:05 🔗 JAA And yes, definitely not the issue.
23:05 🔗 ivan_ the size in the header might be different
23:05 🔗 ShellyRol has joined #archiveteam-ot
23:05 🔗 JAA Hmm
23:08 🔗 Raccoon JAA: > "Residential Fiber Gigabit plans are also not subject to data usage limits."
23:08 🔗 Kaz raaaah why can't this be simple
23:08 🔗 JAA Mozilla likes to overengineer things.
23:09 🔗 JAA Many things used to be simple. Adding a custom search engine was a simple modification of a .json file. Now it's essentially impossible.
23:11 🔗 ivan_ I suspect some of these profile annoyances are intentional things designed to discourage other software from touching the profile
23:11 🔗 ivan_ Chrome has a pretty crazy session format
23:13 🔗 JAA If I'm reading https://github.com/badboy/jsonlz4cat/blob/master/src/main.rs correctly, it does a read(8) for the magic bytes, then a read(4) for the outsize, and then throws the rest into LZ4 decompression. Meanwhile https://gist.github.com/Tblue/62ff47bef7f894e92ed5 , which reportedly also works (haven't tested it), only skips the magic bytes. ¯\_(ツ)_/¯
23:15 🔗 Raccoon I got tired of browser session managment. use Session Buddy on chrome, prolly ff too
23:15 🔗 Raccoon saved my ass manya time
23:28 🔗 qw3rty117 has joined #archiveteam-ot
23:34 🔗 qw3rty116 has quit IRC (Ping timeout: 612 seconds)
23:59 🔗 JAA Apparently the problem is that lz4cat uses a different decompression routine than those tools. Specifically, it uses LZ4_decompress_safe, not LZ4_decompress_safe_partial. I think.

irclogger-viewer