#archiveteam-ot 2019-08-31,Sat

↑back Search

Time	Nickname	Message
00:00 ^🔗		BlueMax has quit IRC (Quit: Leaving)
01:04 ^🔗		ZizzyDizz has joined #archiveteam-ot
01:04 ^🔗	ZizzyDizz	Hello, I was wondering if anyone here has a way to archive a disqus channel?
01:05 ^🔗	ZizzyDizz	I just found out today they were getting deleted and there's two I really need to save in some capacity.
01:05 ^🔗	ZizzyDizz	And I don't have the bandwidth to do it myself on my home PC, as I have less than 600kbps.
01:10 ^🔗		BlueMax has joined #archiveteam-ot
01:22 ^🔗		killsushi has quit IRC (Read error: Connection reset by peer)
01:26 ^🔗		DogsRNice has quit IRC (Read error: Connection reset by peer)
01:38 ^🔗		nepeat has quit IRC (Read error: Connection reset by peer)
01:39 ^🔗		nepeat has joined #archiveteam-ot
01:46 ^🔗		nepeat has quit IRC (Quit: ZNC 1.7.4 - https://znc.in)
01:47 ^🔗	ZizzyDizz	I can't seem to get grab-site to respect --wpull-args=
01:47 ^🔗		nepeat has joined #archiveteam-ot
01:55 ^🔗		m007a83_ is now known as m007a83
02:42 ^🔗	markedL	is disqus public or password protected?
02:44 ^🔗		qw3rty115 has joined #archiveteam-ot
02:49 ^🔗		qw3rty114 has quit IRC (Read error: Operation timed out)
03:42 ^🔗		qw3rty116 has joined #archiveteam-ot
03:47 ^🔗		qw3rty115 has quit IRC (Ping timeout: 612 seconds)
04:00 ^🔗		lunik1 has quit IRC (:x)
04:05 ^🔗		ZizzyDizz has quit IRC (Ping timeout: 260 seconds)
04:10 ^🔗		dhyan_nat has joined #archiveteam-ot
07:18 ^🔗		Mateon1 has joined #archiveteam-ot
09:13 ^🔗	JAA	ZizzyDizz: Disqus is heavily JS-based, so you won't be able to get much with wpull/grab-site, wget, etc.
09:28 ^🔗		BlueMax has quit IRC (Quit: Leaving)
10:33 ^🔗		tuluu has quit IRC (Read error: Connection refused)
10:33 ^🔗		tuluu has joined #archiveteam-ot
11:52 ^🔗		lunik1 has joined #archiveteam-ot
12:29 ^🔗		dhyan_nat has quit IRC (Read error: Operation timed out)
12:59 ^🔗		killsushi has joined #archiveteam-ot
15:40 ^🔗		h3ndr1k_ has quit IRC (Ping timeout: 252 seconds)
15:53 ^🔗		dhyan_nat has joined #archiveteam-ot
17:18 ^🔗		h3ndr1k has joined #archiveteam-ot
17:50 ^🔗	Raccoon	How can I become a good steward once we get this 500/500 fiber installed
17:50 ^🔗	Raccoon	need a Quickstart Guide
18:32 ^🔗	kiska	Where do you live?
18:32 ^🔗	kiska	Can I put a rsync target on that connection?
18:37 ^🔗	Raccoon	I haven't even set up a box yet
18:37 ^🔗	Raccoon	will probably need to do some bouncy bouncy before renting to anyone who'll get us terminated for DMCA
18:59 ^🔗		h3ndr1k has quit IRC (Quit: )
19:03 ^🔗		h3ndr1k has joined #archiveteam-ot
19:41 ^🔗		ShellyRol has quit IRC (Ping timeout: 745 seconds)
19:52 ^🔗		ShellyRol has joined #archiveteam-ot
20:03 ^🔗		ZizzyDizz has joined #archiveteam-ot
20:03 ^🔗	ZizzyDizz	No markedL
20:13 ^🔗		dhyan_nat has quit IRC (Read error: Operation timed out)
20:38 ^🔗	markedL	ZizzyDizz: sounds like you might need to use chromebot. What's the URL so we can experiment?
20:38 ^🔗	JAA	Have fun with that. You'll need a lot of resources.
20:38 ^🔗	JAA	Also, archival talk should happen in -bs.
20:38 ^🔗	markedL	Raccoon: over quota is the bigger risk than DMCA with some of these folk
20:38 ^🔗	JAA	I'm grabbing it since a few hours.
20:57 ^🔗		Hani111 has joined #archiveteam-ot
21:08 ^🔗		Hani has quit IRC (Ping timeout: 745 seconds)
21:08 ^🔗		Hani111 is now known as Hani
21:23 ^🔗	Raccoon	markedL, yeah, but i wonder if they have a quota for 500/500 on a business fiber
21:24 ^🔗	Raccoon	if we do this, i'm dripping every drop of that sht
21:24 ^🔗	Raccoon	5.15 TB/day
21:29 ^🔗	Raccoon	I want to start an archive group called Going Postal, where we circulate 4 or 10 TB harddrives between high-bandwidth transmission lines and low-bandwidth archivists.
21:31 ^🔗	JAA	Which provider?
21:32 ^🔗	Raccoon	Century Link
21:39 ^🔗	JAA	FWIW, "The data usage limit applies to residential HSI. It does not apply to business-class HSI." from https://www.centurylink.com/aboutus/legal/internet-service-disclosure/full-version.html
21:40 ^🔗	JAA	(HSI = High Speed Internet)
22:36 ^🔗	Kaz	anyone know of an easy way to get current tab count in firefox?
22:36 ^🔗	Kaz	ideally without an addon
22:42 ^🔗		BlueMax has joined #archiveteam-ot
22:48 ^🔗	ivan_	are you setting up a prometheus metric on your tab addiction
22:53 ^🔗	JAA	You could parse the sessionstore file in the Firefox profile directory.
22:53 ^🔗	ivan_	https://superuser.com/questions/1363747/how-to-decode-decipher-mozilla-firefox-proprietary-jsonlz4-format-sessionstor
22:54 ^🔗	JAA	"Proprietary" twitch
22:54 ^🔗	ivan_	I see only sessionstore-backups
22:54 ^🔗	JAA	Yup
22:54 ^🔗	ivan_	and it's got some jsonlz4 junk that lz4cat can't read
22:54 ^🔗	JAA	They changed that a while ago.
22:55 ^🔗	JAA	It's not really "junk". One of the naswers on SU explains it: "Unfortunately, due to a non-standard header, standard tools won't work. There's an open proposal to change that. Apparently the Mozilla header was devised before a standard lz4 frame format existed; it does wrap a standard lz4 block."
22:56 ^🔗	ivan_	https://github.com/badboy/jsonlz4cat works
22:59 ^🔗	markedL	https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/windows/getAll#Examples
23:00 ^🔗		ShellyRol has quit IRC (Read error: Operation timed out)
23:01 ^🔗	JAA	Hmm, all those tools just skip the first 8 bytes and then decompress the rest. But then why doesn't tail -c+9 \| lz4cat work?
23:03 ^🔗	markedL	dd can do a byte skip also
23:04 ^🔗	JAA	So?
23:04 ^🔗	markedL	rephrased, I'd trust dd more than tail , but maybe thats not the issue
23:05 ^🔗	JAA	They're both fine.
23:05 ^🔗	JAA	And yes, definitely not the issue.
23:05 ^🔗	ivan_	the size in the header might be different
23:05 ^🔗		ShellyRol has joined #archiveteam-ot
23:05 ^🔗	JAA	Hmm
23:08 ^🔗	Raccoon	JAA: > "Residential Fiber Gigabit plans are also not subject to data usage limits."
23:08 ^🔗	Kaz	raaaah why can't this be simple
23:08 ^🔗	JAA	Mozilla likes to overengineer things.
23:09 ^🔗	JAA	Many things used to be simple. Adding a custom search engine was a simple modification of a .json file. Now it's essentially impossible.
23:11 ^🔗	ivan_	I suspect some of these profile annoyances are intentional things designed to discourage other software from touching the profile
23:11 ^🔗	ivan_	Chrome has a pretty crazy session format
23:13 ^🔗	JAA	If I'm reading https://github.com/badboy/jsonlz4cat/blob/master/src/main.rs correctly, it does a read(8) for the magic bytes, then a read(4) for the outsize, and then throws the rest into LZ4 decompression. Meanwhile https://gist.github.com/Tblue/62ff47bef7f894e92ed5 , which reportedly also works (haven't tested it), only skips the magic bytes. ¯\_(ツ)_/¯
23:15 ^🔗	Raccoon	I got tired of browser session managment. use Session Buddy on chrome, prolly ff too
23:15 ^🔗	Raccoon	saved my ass manya time
23:28 ^🔗		qw3rty117 has joined #archiveteam-ot
23:34 ^🔗		qw3rty116 has quit IRC (Ping timeout: 612 seconds)
23:59 ^🔗	JAA	Apparently the problem is that lz4cat uses a different decompression routine than those tools. Specifically, it uses LZ4_decompress_safe, not LZ4_decompress_safe_partial. I think.

irclogger-viewer