#archiveteam-ot 2019-12-22,Sun

↑back Search

Time Nickname Message
00:05 🔗 Frogging When I had a monthly transfer limit my ISP would inject a warning into HTML pages when I was close to the limit
00:06 🔗 Frogging Which is "evil" because they're modifying stuff in transit, but I guess it's not malicious. Just something I'd really rather they did not do
00:06 🔗 Frogging It's moot now since I don't have a transfer cap anymore
00:07 🔗 JAA Yeah, that's one that was mentioned in the earlier discussion in #warrior.
00:07 🔗 Frogging I have a considerably lower opinion of the practice of redirecting NXDOMAINs to ad serving pages
00:08 🔗 Frogging That can fuck right off
00:13 🔗 Frogging although for the HTML injection, it does bother me in that it implies that they have created infrastructure to parse and modify data in transit
00:14 🔗 Myself Yeah. Last time they did that, I saved a copy of the page, it's sitting around here somewhere.
00:15 🔗 Myself I was thinking it'd be interesting to pull some signatures out of that and have some sort of proactive detection of such hijinks, instead of relying on users' caution.
00:16 🔗 JAA Yes, in fact, I asked for exactly that in #warrior before so we can add this.
00:16 🔗 Frogging I think I've seen some warrior scripts that have checks of some kind for tampering
00:16 🔗 Frogging I don't remember anything that matters about it though
00:16 🔗 JAA Yes, but it's only very basic checks.
00:16 🔗 JAA Covers WiFi portals for example.
00:17 🔗 Frogging If better checks could be implemented in the Warrior itself that all projects can benefit from, that'd be neat
00:17 🔗 JAA It simply queries the IPs for a few well-known hosts (Twitter, Facebook, Google, and a few others) and verifies that they're all different.
00:17 🔗 Frogging ah
00:17 🔗 Frogging yeah, I didn't remember what it actually did
00:17 🔗 JAA Well, it shouldn't be in the warrior, but yes.
00:19 🔗 jamiew has joined #archiveteam-ot
00:21 🔗 Frogging The Warrior couldn't fetch some URLs and check for tampering?
00:21 🔗 JAA The warrior is only the VM.
00:21 🔗 Myself Well, the once-in-a-blue-moon interception wouldn't happen very often, you'd need to grep for the tamper signatures in each warc, I think
00:22 🔗 JAA Those checks you mentioned are currently in the pipeline scripts, but it might be a good idea to move them to seesaw.
00:22 🔗 Frogging ah, right
00:22 🔗 Frogging I meant the thing that runs in the warrior that orchestrates running jbos
00:22 🔗 Frogging jobs*
00:22 🔗 JAA Yep, that's seesaw.
00:22 🔗 Myself like if you get the bandwidth warning or something, you can't trigger that by just requesting a page
00:22 🔗 JAA Hmm, it's not injected into every page?
00:23 🔗 Myself no, when I used to get the maintenance reminder, it'd come up in like one or two pages and the rest of my session was unmolested
00:23 🔗 Frogging Depends on how the ISP implements it
00:23 🔗 JAA Right
00:24 🔗 JAA Ok, better solution: always retrieve everything over HTTPS, and if something requires HTTP, only run that on trusted machines/connections.
00:24 🔗 Frogging Trusted how?
00:24 🔗 Myself then you need a way to track trustedness, but yeah
00:24 🔗 JAA Well, connected over providers that aren't doing this shit.
00:25 🔗 JAA If a datacentre started doing this, they'd go bankrupt in days.
00:25 🔗 JAA So that's a start.
00:25 🔗 JAA Basically, no consumer ISPs.
00:26 🔗 JAA Although there are of course ones that actually do their job properly.
00:27 🔗 Frogging My ISP would be one of those that does this (unless they don't do it anymore), but they don't do it to *me* since I don't have a transfer cap
00:27 🔗 Myself yeah, I avoided running a warrior for years because I wasn't confident that I'd completely opted out of that stuff -- there's no confirmation -- but it's been years since I've seen it so I think I'm okay.
00:27 🔗 Frogging And yeah, I also haven't seen it in years
00:28 🔗 JAA Depending on what you do on the internet, you might browse 99+ % over HTTPS anyway.
00:30 🔗 Frogging Looking at my history, I definitely regularly browse some non-HTTPS pages
00:34 🔗 JAA There are other solutions to this, namely we could have a system that retrieves everything multiple times from different locations, then verifies that they're the same and punishes those that provide mismatching responses.
00:34 🔗 Frogging That's how BOINC works
00:34 🔗 JAA But we normally don't have time to do things multiple times here.
00:35 🔗 JAA Yeah, that YouTube annotation archival project did it as well I think.
00:35 🔗 JAA (That wasn't AT.)
00:36 🔗 OrIdow6 That would require some nuance for checking if things are the "same" in some cases - random tracking IDs or element IDs (to prevent adblocking), apparently random order of some page elements (Youtube in some cases), etc.
00:37 🔗 Frogging this is getting real complicated :p
00:37 🔗 JAA Yep, that as well.
00:43 🔗 OrIdow6 If it could be made to work, the insecure-to-datacenter approach would probably cover most cases; the only thing you'd have to worry about (I think) would be outright disconnection as opposed to injection
00:44 🔗 JAA Is that a thing?
00:44 🔗 Frogging Depends if you could get enough people running the scripts in a datacentre
00:45 🔗 JAA That hasn't been a problem lately.
00:45 🔗 JAA Just ping yano or Fusl, and they'll throw a couple thousand workers at it. lol
00:45 🔗 Frogging No, not really. A server provider won't ban you for scraping a website unless they get abuse reports
00:45 🔗 Frogging never happened to me
00:45 🔗 pnJay I don't think we've had capacity problems on OUR side of things for a while o_o
00:45 🔗 Frogging ah, well that's good then
00:45 🔗 JAA Yeah, the target server might ban the IP or similar, but that's not unique to DC connections.
00:46 🔗 JAA Well... We haven't had capacity issues on the worker side of things.
00:46 🔗 JAA Targets are a different matter.
00:46 🔗 endrift I have a setup now that lets me spin up hundreds of workers too :D
00:46 🔗 pnJay wibblywobbly targetywargety things
00:46 🔗 Frogging that's what I thought pnJay meant in the first place
00:47 🔗 endrift Is there some way I can keep abreast of which projects need bunches of warriors without checking the wiki or something every day
00:47 🔗 JAA Project launches are normally announced in #archiveteam, but other than that, you'd have to keep an eye on the project channel.
00:48 🔗 pnJay jasons twitter, i guess
00:48 🔗 JAA Not really, no.
00:48 🔗 endrift pnJay: that's how I usually find out about these things
00:48 🔗 endrift JAA: well given how low traffic that channel is that kinda works for me
00:49 🔗 JAA Jason's tweets are fine to find out that something needs to be archived, but not for "does this project need workers?" since he's typically not at all involved in the actual project.
00:49 🔗 endrift last I checked I didn't get billed for all that overage bandwidth I used on Vultr either <_<
00:49 🔗 endrift but I'll find out in a week and a half
00:50 🔗 endrift I can also just leave more workers running on my home server if that help
00:50 🔗 endrift right now they're mostly just idle or URLteam
00:50 🔗 endrift *helps
00:50 🔗 endrift I have one warrior (6 threads) up atm
00:51 🔗 endrift JAA: better than nothing. I can jump in the channel and gauge from there
01:32 🔗 DogsRNice has quit IRC (Ping timeout: 276 seconds)
01:32 🔗 DogsRNice has joined #archiveteam-ot
02:09 🔗 dhyan_nat has quit IRC (Read error: Operation timed out)
02:40 🔗 SoraUta has joined #archiveteam-ot
02:45 🔗 ivan IPv6 on digitalocean: "We support a maximum of 16 addresses (a subnet mask of /124 ) per Droplet. Additional addresses are not available. "
02:48 🔗 JAA Hey, still better than OVH, where you get a single address.
02:48 🔗 JAA (On VPS)
02:48 🔗 icedice has quit IRC (Quit: Leaving)
02:49 🔗 ivan wow
03:05 🔗 cerca has quit IRC (Remote host closed the connection)
03:43 🔗 DogsRNice has quit IRC (Read error: Connection reset by peer)
03:50 🔗 ShellyRol has quit IRC (Read error: Connection reset by peer)
03:51 🔗 ShellyRol has joined #archiveteam-ot
03:51 🔗 LowLevelM has quit IRC (Quit: Ping timeout (120 seconds))
03:51 🔗 LowLevelM has joined #archiveteam-ot
04:36 🔗 kiska Or any provider at colocrossing where you get 0 ipv6
04:47 🔗 qw3rty2 has joined #archiveteam-ot
04:56 🔗 qw3rty has quit IRC (Ping timeout: 745 seconds)
05:45 🔗 fuzzy802 has joined #archiveteam-ot
05:46 🔗 superkuh has quit IRC (Excess Flood)
05:46 🔗 nyany has quit IRC (Read error: Operation timed out)
05:47 🔗 voltagex has quit IRC (Ping timeout: 262 seconds)
05:48 🔗 fuzzy8021 has quit IRC (Read error: Operation timed out)
05:48 🔗 superkuh has joined #archiveteam-ot
05:49 🔗 VADemon_ has joined #archiveteam-ot
05:49 🔗 arkiver has quit IRC (Ping timeout: 360 seconds)
05:49 🔗 swebb has quit IRC (Read error: Operation timed out)
05:49 🔗 swebb has joined #archiveteam-ot
05:50 🔗 fuzzy802 has quit IRC ()
05:50 🔗 fuzzy8021 has joined #archiveteam-ot
05:51 🔗 arkiver has joined #archiveteam-ot
05:51 🔗 svchfoo3 sets mode: +o arkiver
05:51 🔗 svchfoo1 sets mode: +o arkiver
05:51 🔗 ShellyRol has quit IRC (Read error: Operation timed out)
05:52 🔗 chfoo has quit IRC (Ping timeout: 360 seconds)
05:53 🔗 ShellyRol has joined #archiveteam-ot
05:54 🔗 Igloo has quit IRC (Read error: Connection reset by peer)
05:55 🔗 nyany has joined #archiveteam-ot
05:55 🔗 voltagex has joined #archiveteam-ot
05:56 🔗 Igloo has joined #archiveteam-ot
05:56 🔗 chfoo has joined #archiveteam-ot
05:57 🔗 svchfoo3 sets mode: +o chfoo
05:57 🔗 svchfoo1 sets mode: +o chfoo
05:59 🔗 VADemon has quit IRC (Read error: Operation timed out)
06:01 🔗 jamiew has quit IRC (Textual IRC Client: www.textualapp.com)
06:12 🔗 jamiew has joined #archiveteam-ot
06:16 🔗 jamiew has quit IRC (Read error: Operation timed out)
06:17 🔗 bluefoo has joined #archiveteam-ot
06:17 🔗 jamiew has joined #archiveteam-ot
07:43 🔗 ShellyRol has quit IRC (Ping timeout: 610 seconds)
07:54 🔗 ShellyRol has joined #archiveteam-ot
08:27 🔗 schbirid has joined #archiveteam-ot
08:33 🔗 LowLevelM has quit IRC (Read error: Operation timed out)
08:36 🔗 wp494 has quit IRC (LOUD UNNECESSARY QUIT MESSAGES)
08:38 🔗 LowLevelM has joined #archiveteam-ot
08:44 🔗 wp494 has joined #archiveteam-ot
08:50 🔗 killsushi has quit IRC (Leaving)
08:56 🔗 BlueMaxim has joined #archiveteam-ot
09:09 🔗 BlueMax has quit IRC (Ping timeout: 745 seconds)
09:44 🔗 Atom-- has joined #archiveteam-ot
09:50 🔗 Atom__ has quit IRC (Read error: Operation timed out)
10:06 🔗 dhyan_nat has joined #archiveteam-ot
11:15 🔗 BlueMax has joined #archiveteam-ot
11:24 🔗 BlueMax has quit IRC (Ping timeout: 276 seconds)
11:25 🔗 BlueMax has joined #archiveteam-ot
11:26 🔗 BlueMaxim has quit IRC (Ping timeout: 745 seconds)
11:27 🔗 DigiDigi has quit IRC (Remote host closed the connection)
11:32 🔗 cerca has joined #archiveteam-ot
11:55 🔗 BlueMax has quit IRC (Read error: Connection reset by peer)
11:55 🔗 BlueMax has joined #archiveteam-ot
12:40 🔗 ola_norsk has joined #archiveteam-ot
12:41 🔗 VoynichCr Archiving is futile https://www.youtube.com/watch?v=uD4izuDMUQA
12:42 🔗 dhyan_nat has quit IRC (Read error: Operation timed out)
12:42 🔗 ola_norsk When archiving a driver for a USB oscilloscope, i happened to notice one of the vendor's software/driver installer is said by WinXP to be corrupted. The file in question https://www.linkinstruments.com/mso19_3_setup_web_32bit.exe
12:43 🔗 ola_norsk is there any way to verify that it's not simply something awry with my virtualbox?
12:43 🔗 ola_norsk i've tried downloading the file several times, and unfortunately i don't have a "real" windows box to try it on
12:44 🔗 ola_norsk item in question: https://archive.org/details/Link_Instruments_MSO-19_usb_oscilloscope_software_2019
12:44 🔗 ola_norsk it's such a dated file, i would think the manufacturer should have noticed and fixed it by now if it was indeed a corrupted file
12:46 🔗 schbirid tried wine?
12:50 🔗 ola_norsk schbirid: hmmm..wierd. Trough wine the same installer actually launches
12:50 🔗 ola_norsk i'm guessing it's just an XP problem then
12:53 🔗 OrIdow6 Works for me in XP SP3 in QEMU
12:54 🔗 ola_norsk OrIdow6: here's the message i get when trying to run it https://i.imgur.com/WhqgdAo.png
12:55 🔗 ola_norsk but i take it the file is fine then, and the shit is with my vm
12:55 🔗 Raccoon ola_norsk: download HashCheck Shell Extension so we can compare the file hash
12:56 🔗 ola_norsk Raccoon: couldn't md5sum or shasum do that as well?
12:56 🔗 Raccoon sure
12:57 🔗 Raccoon File: mso19_3_setup_web_32bit.exe CRC-32: c4f17baf MD5: b4332bf1a3f361ba1ad7b176d39b6be4 SHA-1: c7434ba7468d058d978051717853c67509e496c7
12:57 🔗 ola_norsk md5sum say b4332bf1a3f361ba1ad7b176d39b6be4
12:58 🔗 OrIdow6 Likewise
12:58 🔗 Raccoon as long as that's the file on your xp machine, then it's not corruption
12:58 🔗 ola_norsk something wonky with the winxp then, whatever it could be
12:59 🔗 Raccoon did you perform the hash check within xp
12:59 🔗 ola_norsk no i just md5sum'ed it from the same virtualbox shared folder
12:59 🔗 Raccoon just wondering if xp sees the file correctly, too
13:00 🔗 ola_norsk i'll try the HashCheck Shell Extension thingy now within the vm
13:00 🔗 OrIdow6 Perhaps also move it from the shared folder to the native VM drive
13:10 🔗 ola_norsk odd, the same file trough the WinXP, it states the md5 to be 55b4d3ee442371e2bc25e19eeeb5b762 ..
13:11 🔗 OrIdow6 The shell plugin keeps b4332bf1a3f361ba1ad7b176d39b6be4 for me
13:11 🔗 ola_norsk anywho, as long Raccoon got the same as in https://ia601408.us.archive.org/18/items/Link_Instruments_MSO-19_usb_oscilloscope_software_2019/Link_Instruments_MSO-19_usb_oscilloscope_software_2019_files.xml , i'll go with the file being actually fine and that it's my vm that is shit
13:14 🔗 ola_norsk has quit IRC (Quit: leaving)
13:27 🔗 MrRadar has joined #archiveteam-ot
13:28 🔗 yano JAA: that's my speciality :3
13:28 🔗 yano i'm a little competitive
13:30 🔗 MrRadar has quit IRC (Read error: Operation timed out)
13:51 🔗 dhyan_nat has joined #archiveteam-ot
14:00 🔗 oxguy3 has joined #archiveteam-ot
14:01 🔗 oxguy3 hey anyone ever run into archive.org's web uploader rejecting a PDF file? getting 400 Bad Data with the error "Syntax error detected in pdf data. You may be able to repair the pdf file with a repair tool, pdftk is one such tool."
14:02 🔗 oxguy3 the pdf opens fine on my PC -- i tried opening it in Acrobat Pro and resaving it, which made a file with a different md5sum but apparently still one that archive.org didn't like. i'm on mac so can't get pdftk
14:03 🔗 schbirid is it better to use the first 256 bits of a sha512 or is that no better than just a sha256?
14:08 🔗 LowLevelM In my experience, archive.org's web uploader sucks. It almost never works.
14:09 🔗 oxguy3 i've been uploading a ton of pdfs without issue. this error i presume is on the backend (that error message actually comes from some raw XML that i had to hit "show details" to view)
14:12 🔗 oxguy3 i worked around it by zipping the PDF. not ideal since it means no derivative files get generated, but better than no file at all. if anyone cares to take a crack at figuring out the issue... https://archive.org/details/seattlesoundersfc2017mediaguide
14:32 🔗 BlueMax has quit IRC (Read error: Connection reset by peer)
14:41 🔗 eientei95 oxguy3: You should be able to install pdftk via Homebrew https://brew.sh however, even after fixing the PDF with pdftk, I still get that same error on the fixed file
14:42 🔗 oxguy3 oh good to know, thanks
14:45 🔗 oxguy3 i also tried ilovepdf.com's repair tool, which curiously knocked about 1MB off the file size, but also failed to resolve the error. bizarre issue...
14:47 🔗 eientei95 yup, errors even via using the command-line tool
15:17 🔗 SoraUta has quit IRC (Read error: Operation timed out)
15:28 🔗 oxguy3 damn, it happened again -- another supposedly corrupt PDF that opens fine on my computer https://archive.org/details/dcunited2017mediaguide
16:03 🔗 oxguy3 has quit IRC (My MacBook has gone to sleep. ZZZzzz…)
17:19 🔗 MrRadar has joined #archiveteam-ot
17:25 🔗 Frogging schbirid: "SHA-256 and SHA-512 are novel hash functions computed with 32-bit and 64-bit words, respectively. They use different shift amounts and additive constants, but their structures are otherwise virtually identical, differing only in the number of rounds."
17:25 🔗 Frogging https://en.wikipedia.org/wiki/SHA-2
17:26 🔗 Frogging not really sure what "rounds" means
17:28 🔗 Frogging I think for the purposes of file verification, sha256 and sha512 truncated to 256 would be the same
17:29 🔗 Frogging well, not the same outputs, but the same usefulness
17:39 🔗 JAA 256 bits of the SHA-512 hash is slightly safer than the SHA-256 hash because it prevents length extension attacks and some other things. Whether that matters depends on what you're trying to do, obviously.
17:43 🔗 JAA Also, on decently modern machines, SHA-512 is a bit faster than SHA-256 due to the 64-bit arithmetics.
17:47 🔗 schbirid nice
17:48 🔗 VerifiedJ has joined #archiveteam-ot
17:48 🔗 wp494 has quit IRC (Ping timeout: 745 seconds)
17:49 🔗 wp494 has joined #archiveteam-ot
17:52 🔗 yano or you could use sha3-256 or sha3-256 via the command `rhash`
17:52 🔗 yano or sha3-512
17:52 🔗 yano for example: `rhash --sha3-512 myfile.txt`
18:02 🔗 JAA At least the OpenSSL implementation is a fair bit slower than SHA-2 though.
18:02 🔗 markedL oxguy3 , that pdf has issues. what's the original link?
18:04 🔗 JAA type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
18:04 🔗 JAA sha512 34398.78k 137441.77k 276030.81k 438686.72k 544983.72k 569671.68k
18:04 🔗 JAA sha3-512 25492.26k 100461.65k 125587.97k 146192.04k 168471.21k 167242.83k
18:04 🔗 JAA Ew
18:07 🔗 Dallas has joined #archiveteam-ot
18:21 🔗 DigiDigi has joined #archiveteam-ot
18:36 🔗 schbirid has quit IRC (Quit: Leaving)
18:53 🔗 apache2_ Frogging: rounds has to do with the number of times the algorithm runs internally tp produce a final result
18:54 🔗 apache2_ the fips standard for SHA-512 standardized two algorithms, SHA-512 and "SHA-512/t", the latter uses a different initialization vector (a lot less sketchy than the one for SHA-512 and SHA-256) and truncates the final output
18:55 🔗 apache2_ what JAA was describing before is SHA-512/256 (SHA-512/t with t := 256), which has become a popular algorithm with followers of daniel bernstein
18:56 🔗 JAA Actually, not exactly. SHA-512/256 is specifically the *first* 256 bits of the SHA-512 hash.
18:56 🔗 apache2_ if SHA-512/256 is not available to you (it frequently is not), you could consider using SHA-384 instead, which also truncates its output and doesn't lend itself to length extension attacks trivially either
18:58 🔗 JAA But you could really keep any 256 bit for an equivalent security level. (Not that there's any good reason to do so though.)
19:00 🔗 apache2_ ah, calling the 256 leftmost bits of sha-512 a "sha-512/256" hash is wildly misleading
19:00 🔗 JAA Well yeah, it's not the same due to the different IV.
19:02 🔗 apache2_ :-(
19:02 🔗 apache2_ fucking cryptographers
20:07 🔗 Dj-Wawa has quit IRC (Dj-Wawa)
20:08 🔗 Dj-Wawa has joined #archiveteam-ot
20:09 🔗 oxguy3 has joined #archiveteam-ot
20:21 🔗 oxguy3 does anyone know of any tools that can download the original PDFs from issuu/scribd without a subscription? there's a couple of free sites out there that will rip PDFs, but they're actually just generating new PDFs from JPGs which means you have to OCR them
20:29 🔗 X-Scale` has joined #archiveteam-ot
20:34 🔗 X-Scale has quit IRC (Ping timeout: 610 seconds)
20:34 🔗 X-Scale` is now known as X-Scale
20:44 🔗 SoraUta has joined #archiveteam-ot
20:44 🔗 tuluu has quit IRC (Ping timeout: 276 seconds)
20:49 🔗 oxguy3 has quit IRC (Ping timeout: 246 seconds)
21:20 🔗 VerifiedJ has quit IRC (Quit: Leaving)
21:25 🔗 oxguy3 has joined #archiveteam-ot
21:28 🔗 benjins has quit IRC (Read error: Connection reset by peer)
21:30 🔗 benjins has joined #archiveteam-ot
21:31 🔗 benjins has quit IRC (Remote host closed the connection)
21:33 🔗 benjins has joined #archiveteam-ot
22:48 🔗 oxguy3 has quit IRC (My MacBook has gone to sleep. ZZZzzz…)
22:48 🔗 oxguy3 has joined #archiveteam-ot
22:49 🔗 oxguy3 has quit IRC (Client Quit)
22:49 🔗 oxguy3 has joined #archiveteam-ot
22:49 🔗 oxguy3 has quit IRC (Client Quit)
22:50 🔗 oxguy3 has joined #archiveteam-ot
22:50 🔗 oxguy3 has quit IRC (Client Quit)
22:51 🔗 oxguy3 has joined #archiveteam-ot
22:51 🔗 oxguy3 has quit IRC (Client Quit)
22:52 🔗 oxguy3 has joined #archiveteam-ot
22:52 🔗 oxguy3 has quit IRC (Client Quit)
22:53 🔗 oxguy3 has joined #archiveteam-ot
22:53 🔗 oxguy3 has quit IRC (Client Quit)
22:54 🔗 oxguy3 has joined #archiveteam-ot
22:54 🔗 oxguy3 has quit IRC (Client Quit)
22:55 🔗 oxguy3 has joined #archiveteam-ot
22:55 🔗 oxguy3 has quit IRC (Client Quit)
23:12 🔗 BlueMax has joined #archiveteam-ot
23:33 🔗 dhyan_nat has quit IRC (Read error: Operation timed out)

irclogger-viewer