Time |
Nickname |
Message |
00:00
🔗
|
|
BlueMax has quit IRC (Quit: Leaving) |
01:04
🔗
|
|
ZizzyDizz has joined #archiveteam-ot |
01:04
🔗
|
ZizzyDizz |
Hello, I was wondering if anyone here has a way to archive a disqus channel? |
01:05
🔗
|
ZizzyDizz |
I just found out today they were getting deleted and there's two I really need to save in some capacity. |
01:05
🔗
|
ZizzyDizz |
And I don't have the bandwidth to do it myself on my home PC, as I have less than 600kbps. |
01:10
🔗
|
|
BlueMax has joined #archiveteam-ot |
01:22
🔗
|
|
killsushi has quit IRC (Read error: Connection reset by peer) |
01:26
🔗
|
|
DogsRNice has quit IRC (Read error: Connection reset by peer) |
01:38
🔗
|
|
nepeat has quit IRC (Read error: Connection reset by peer) |
01:39
🔗
|
|
nepeat has joined #archiveteam-ot |
01:46
🔗
|
|
nepeat has quit IRC (Quit: ZNC 1.7.4 - https://znc.in) |
01:47
🔗
|
ZizzyDizz |
I can't seem to get grab-site to respect --wpull-args= |
01:47
🔗
|
|
nepeat has joined #archiveteam-ot |
01:55
🔗
|
|
m007a83_ is now known as m007a83 |
02:42
🔗
|
markedL |
is disqus public or password protected? |
02:44
🔗
|
|
qw3rty115 has joined #archiveteam-ot |
02:49
🔗
|
|
qw3rty114 has quit IRC (Read error: Operation timed out) |
03:42
🔗
|
|
qw3rty116 has joined #archiveteam-ot |
03:47
🔗
|
|
qw3rty115 has quit IRC (Ping timeout: 612 seconds) |
04:00
🔗
|
|
lunik1 has quit IRC (:x) |
04:05
🔗
|
|
ZizzyDizz has quit IRC (Ping timeout: 260 seconds) |
04:10
🔗
|
|
dhyan_nat has joined #archiveteam-ot |
07:18
🔗
|
|
Mateon1 has joined #archiveteam-ot |
09:13
🔗
|
JAA |
ZizzyDizz: Disqus is heavily JS-based, so you won't be able to get much with wpull/grab-site, wget, etc. |
09:28
🔗
|
|
BlueMax has quit IRC (Quit: Leaving) |
10:33
🔗
|
|
tuluu has quit IRC (Read error: Connection refused) |
10:33
🔗
|
|
tuluu has joined #archiveteam-ot |
11:52
🔗
|
|
lunik1 has joined #archiveteam-ot |
12:29
🔗
|
|
dhyan_nat has quit IRC (Read error: Operation timed out) |
12:59
🔗
|
|
killsushi has joined #archiveteam-ot |
15:40
🔗
|
|
h3ndr1k_ has quit IRC (Ping timeout: 252 seconds) |
15:53
🔗
|
|
dhyan_nat has joined #archiveteam-ot |
17:18
🔗
|
|
h3ndr1k has joined #archiveteam-ot |
17:50
🔗
|
Raccoon |
How can I become a good steward once we get this 500/500 fiber installed |
17:50
🔗
|
Raccoon |
need a Quickstart Guide |
18:32
🔗
|
kiska |
Where do you live? |
18:32
🔗
|
kiska |
Can I put a rsync target on that connection? |
18:37
🔗
|
Raccoon |
I haven't even set up a box yet |
18:37
🔗
|
Raccoon |
will probably need to do some bouncy bouncy before renting to anyone who'll get us terminated for DMCA |
18:59
🔗
|
|
h3ndr1k has quit IRC (Quit: ) |
19:03
🔗
|
|
h3ndr1k has joined #archiveteam-ot |
19:41
🔗
|
|
ShellyRol has quit IRC (Ping timeout: 745 seconds) |
19:52
🔗
|
|
ShellyRol has joined #archiveteam-ot |
20:03
🔗
|
|
ZizzyDizz has joined #archiveteam-ot |
20:03
🔗
|
ZizzyDizz |
No markedL |
20:13
🔗
|
|
dhyan_nat has quit IRC (Read error: Operation timed out) |
20:38
🔗
|
markedL |
ZizzyDizz: sounds like you might need to use chromebot. What's the URL so we can experiment? |
20:38
🔗
|
JAA |
Have fun with that. You'll need a *lot* of resources. |
20:38
🔗
|
JAA |
Also, archival talk should happen in -bs. |
20:38
🔗
|
markedL |
Raccoon: over quota is the bigger risk than DMCA with some of these folk |
20:38
🔗
|
JAA |
I'm grabbing it since a few hours. |
20:57
🔗
|
|
Hani111 has joined #archiveteam-ot |
21:08
🔗
|
|
Hani has quit IRC (Ping timeout: 745 seconds) |
21:08
🔗
|
|
Hani111 is now known as Hani |
21:23
🔗
|
Raccoon |
markedL, yeah, but i wonder if they have a quota for 500/500 on a business fiber |
21:24
🔗
|
Raccoon |
if we do this, i'm dripping every drop of that sht |
21:24
🔗
|
Raccoon |
5.15 TB/day |
21:29
🔗
|
Raccoon |
I want to start an archive group called Going Postal, where we circulate 4 or 10 TB harddrives between high-bandwidth transmission lines and low-bandwidth archivists. |
21:31
🔗
|
JAA |
Which provider? |
21:32
🔗
|
Raccoon |
Century Link |
21:39
🔗
|
JAA |
FWIW, "The data usage limit applies to residential HSI. It does not apply to business-class HSI." from https://www.centurylink.com/aboutus/legal/internet-service-disclosure/full-version.html |
21:40
🔗
|
JAA |
(HSI = High Speed Internet) |
22:36
🔗
|
Kaz |
anyone know of an easy way to get current tab count in firefox? |
22:36
🔗
|
Kaz |
ideally without an addon |
22:42
🔗
|
|
BlueMax has joined #archiveteam-ot |
22:48
🔗
|
ivan_ |
are you setting up a prometheus metric on your tab addiction |
22:53
🔗
|
JAA |
You could parse the sessionstore file in the Firefox profile directory. |
22:53
🔗
|
ivan_ |
https://superuser.com/questions/1363747/how-to-decode-decipher-mozilla-firefox-proprietary-jsonlz4-format-sessionstor |
22:54
🔗
|
JAA |
"Proprietary" *twitch* |
22:54
🔗
|
ivan_ |
I see only sessionstore-backups |
22:54
🔗
|
JAA |
Yup |
22:54
🔗
|
ivan_ |
and it's got some jsonlz4 junk that lz4cat can't read |
22:54
🔗
|
JAA |
They changed that a while ago. |
22:55
🔗
|
JAA |
It's not really "junk". One of the naswers on SU explains it: "Unfortunately, due to a non-standard header, standard tools won't work. There's an open proposal to change that. Apparently the Mozilla header was devised before a standard lz4 frame format existed; it does wrap a standard lz4 block." |
22:56
🔗
|
ivan_ |
https://github.com/badboy/jsonlz4cat works |
22:59
🔗
|
markedL |
https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/windows/getAll#Examples |
23:00
🔗
|
|
ShellyRol has quit IRC (Read error: Operation timed out) |
23:01
🔗
|
JAA |
Hmm, all those tools just skip the first 8 bytes and then decompress the rest. But then why doesn't tail -c+9 | lz4cat work? |
23:03
🔗
|
markedL |
dd can do a byte skip also |
23:04
🔗
|
JAA |
So? |
23:04
🔗
|
markedL |
rephrased, I'd trust dd more than tail , but maybe thats not the issue |
23:05
🔗
|
JAA |
They're both fine. |
23:05
🔗
|
JAA |
And yes, definitely not the issue. |
23:05
🔗
|
ivan_ |
the size in the header might be different |
23:05
🔗
|
|
ShellyRol has joined #archiveteam-ot |
23:05
🔗
|
JAA |
Hmm |
23:08
🔗
|
Raccoon |
JAA: > "Residential Fiber Gigabit plans are also not subject to data usage limits." |
23:08
🔗
|
Kaz |
raaaah why can't this be simple |
23:08
🔗
|
JAA |
Mozilla likes to overengineer things. |
23:09
🔗
|
JAA |
Many things used to be simple. Adding a custom search engine was a simple modification of a .json file. Now it's essentially impossible. |
23:11
🔗
|
ivan_ |
I suspect some of these profile annoyances are intentional things designed to discourage other software from touching the profile |
23:11
🔗
|
ivan_ |
Chrome has a pretty crazy session format |
23:13
🔗
|
JAA |
If I'm reading https://github.com/badboy/jsonlz4cat/blob/master/src/main.rs correctly, it does a read(8) for the magic bytes, then a read(4) for the outsize, and then throws the rest into LZ4 decompression. Meanwhile https://gist.github.com/Tblue/62ff47bef7f894e92ed5 , which reportedly also works (haven't tested it), only skips the magic bytes. ¯\_(ツ)_/¯ |
23:15
🔗
|
Raccoon |
I got tired of browser session managment. use Session Buddy on chrome, prolly ff too |
23:15
🔗
|
Raccoon |
saved my ass manya time |
23:28
🔗
|
|
qw3rty117 has joined #archiveteam-ot |
23:34
🔗
|
|
qw3rty116 has quit IRC (Ping timeout: 612 seconds) |
23:59
🔗
|
JAA |
Apparently the problem is that lz4cat uses a different decompression routine than those tools. Specifically, it uses LZ4_decompress_safe, not LZ4_decompress_safe_partial. I think. |