[00:00] *** ats has quit IRC (Read error: Operation timed out) [00:01] 4chan does this a lot. also with sticking rar files at the end of shockwave flash .swf files [00:06] and i've stuck zip files inside GIF98 animated gifs [00:06] *** ats has joined #archiveteam-ot [00:17] *** phirephly has joined #archiveteam-ot [00:47] *** killsushi has joined #archiveteam-ot [01:23] @Raccoon There's a guy on Twitter, angealbertini ,who specializes in making this kind of chimera (polyglot) There's also a zine (PoC || GTFO) that heavily embraces this- each issue is not only a PDF, but a number of other things as well [01:26] Yeah, PoC||GTFO is lovely. [01:28] *** yawkat has quit IRC (Ping timeout: 246 seconds) [01:34] dashcloud, will check it out :) indeed, lots of crafty tricks out there, heavily dependent on the software of the day though. any of the big web services typically validate / re-encode everything to prevent this. [01:35] don't want someone's webforum avatar to contain the text phrase "startkeylogger" for instance [01:52] *** yawkat has joined #archiveteam-ot [02:25] *** SynMonger has quit IRC (Quit: Wait, what?) [02:26] *** SynMonger has joined #archiveteam-ot [02:28] *** SynMonger has quit IRC (Client Quit) [02:29] *** SynMonger has joined #archiveteam-ot [02:37] *** SynMonger has quit IRC (Wait, what?) [02:37] *** SynMonger has joined #archiveteam-ot [02:43] *** SynMonger has quit IRC (Quit: Wait, what?) [02:44] *** SynMonger has joined #archiveteam-ot [02:46] *** SynMonger has quit IRC (Client Quit) [02:48] *** SynMonger has joined #archiveteam-ot [03:18] *** manjaro-u has quit IRC (Ping timeout: 610 seconds) [04:00] *** DogsRNice has quit IRC (Read error: Connection reset by peer) [04:45] *** qw3rty has joined #archiveteam-ot [04:48] *** odemg has quit IRC (Ping timeout: 745 seconds) [04:53] *** X-Scale` has joined #archiveteam-ot [04:55] *** qw3rty2 has quit IRC (Ping timeout: 745 seconds) [04:55] *** X-Scale has quit IRC (Ping timeout: 252 seconds) [04:55] *** X-Scale` is now known as X-Scale [06:56] *** X-Scale` has joined #archiveteam-ot [06:56] *** X-Scale has quit IRC (Ping timeout: 252 seconds) [06:56] *** X-Scale` is now known as X-Scale [07:02] *** nataraj has joined #archiveteam-ot [07:12] *** deevious has joined #archiveteam-ot [07:19] *** Atom-- has joined #archiveteam-ot [07:23] *** Atom__ has quit IRC (Read error: Operation timed out) [07:25] *** katocala has quit IRC (Read error: Operation timed out) [07:25] *** katocala has joined #archiveteam-ot [07:51] *** eythian_ has joined #archiveteam-ot [07:51] *** MrRadar has quit IRC (Read error: Operation timed out) [08:01] *** superkuh has quit IRC (Excess Flood) [08:02] *** superkuh has joined #archiveteam-ot [08:02] *** MrRadar has joined #archiveteam-ot [08:04] *** eythian has quit IRC (Read error: Operation timed out) [09:07] Where can I learn about writing page scraper scripts to collect metadata from a site [09:33] Not sure if you mean that, but last time I had to scrape webpages I used python requests and beautifulSoup. [09:44] I think some people here use chrome extensions [09:44] which would be ideal for my uses if it could scrape while naturally visiting the specific pages [09:44] Yeah, BeautifulSoup will not execute JavaScript. [09:45] but probably more realistic if I just did wget -i linklist.txt [10:00] *** BlueMax has quit IRC (Quit: Leaving) [12:09] *** robogoat has quit IRC (Read error: Operation timed out) [12:09] *** robogoat has joined #archiveteam-ot [12:42] *** killsushi has quit IRC (Quit: Leaving) [13:21] *** mls has quit IRC (Ping timeout: 258 seconds) [13:32] *** SketchCow has joined #archiveteam-ot [13:32] *** Fusl sets mode: +o SketchCow [13:32] *** Fusl__ sets mode: +o SketchCow [13:32] *** Fusl_ sets mode: +o SketchCow [13:53] *** mls has joined #archiveteam-ot [13:54] if you are writing from scratch and want to run as an extension while browsing as a human, you would google: html selectors in javascript [13:58] for the extension part, flip through the extension store for an extension with similar functionality already, then repurpose the code [14:05] if you want to use python+beautiful soup it might be possible with a HTTP proxy like squid or warcprox [14:17] *** mls has quit IRC (Read error: Connection reset by peer) [14:17] *** mls has joined #archiveteam-ot [14:22] *** mls has quit IRC (Ping timeout: 258 seconds) [14:41] *** deevious has quit IRC (Quit: deevious) [15:12] *** deevious has joined #archiveteam-ot [15:40] *** nataraj has quit IRC (Read error: Operation timed out) [17:40] *** slyphic has quit IRC (Read error: Operation timed out) [17:44] *** slyphic has joined #archiveteam-ot [17:52] *** nataraj has joined #archiveteam-ot [18:33] Huh, so Sumo Logic being acquired by JASK; the acquired company has relations with Pokemon apparently: https://www.sumologic.com/ [18:37] *** manjaro-u has joined #archiveteam-ot [19:14] *** bluefoo has quit IRC (Quit: bluefoo) [19:42] https://variety.com/2019/film/news/project-silica-superman-warner-bros-microsoft-1203390459/ [19:57] latest digitize tapes : https://www.patreon.com/posts/digitize-tapes-31312387 [21:24] https://spectrum.ieee.org/computing/it/the-lost-picture-show-hollywood-archivists-cant-outpace-obsolescence [21:30] *** BlueMax has joined #archiveteam-ot [21:42] *** nataraj has quit IRC (Read error: Operation timed out) [23:27] *** asdf0101 has quit IRC (The Lounge - https://thelounge.chat) [23:27] *** markedL has quit IRC (Quit: The Lounge - https://thelounge.chat) [23:31] *** bluefoo has joined #archiveteam-ot [23:35] *** asdf0101 has joined #archiveteam-ot [23:35] *** markedL has joined #archiveteam-ot [23:42] *** katocala has quit IRC ()