Time |
Nickname |
Message |
00:00
🔗
|
|
ats has quit IRC (Read error: Operation timed out) |
00:01
🔗
|
Raccoon |
4chan does this a lot. also with sticking rar files at the end of shockwave flash .swf files |
00:06
🔗
|
Raccoon |
and i've stuck zip files inside GIF98 animated gifs |
00:06
🔗
|
|
ats has joined #archiveteam-ot |
00:17
🔗
|
|
phirephly has joined #archiveteam-ot |
00:47
🔗
|
|
killsushi has joined #archiveteam-ot |
01:23
🔗
|
dashcloud |
@Raccoon There's a guy on Twitter, angealbertini ,who specializes in making this kind of chimera (polyglot) There's also a zine (PoC || GTFO) that heavily embraces this- each issue is not only a PDF, but a number of other things as well |
01:26
🔗
|
JAA |
Yeah, PoC||GTFO is lovely. |
01:28
🔗
|
|
yawkat has quit IRC (Ping timeout: 246 seconds) |
01:34
🔗
|
Raccoon |
dashcloud, will check it out :) indeed, lots of crafty tricks out there, heavily dependent on the software of the day though. any of the big web services typically validate / re-encode everything to prevent this. |
01:35
🔗
|
Raccoon |
don't want someone's webforum avatar to contain the text phrase "startkeylogger" for instance |
01:52
🔗
|
|
yawkat has joined #archiveteam-ot |
02:25
🔗
|
|
SynMonger has quit IRC (Quit: Wait, what?) |
02:26
🔗
|
|
SynMonger has joined #archiveteam-ot |
02:28
🔗
|
|
SynMonger has quit IRC (Client Quit) |
02:29
🔗
|
|
SynMonger has joined #archiveteam-ot |
02:37
🔗
|
|
SynMonger has quit IRC (Wait, what?) |
02:37
🔗
|
|
SynMonger has joined #archiveteam-ot |
02:43
🔗
|
|
SynMonger has quit IRC (Quit: Wait, what?) |
02:44
🔗
|
|
SynMonger has joined #archiveteam-ot |
02:46
🔗
|
|
SynMonger has quit IRC (Client Quit) |
02:48
🔗
|
|
SynMonger has joined #archiveteam-ot |
03:18
🔗
|
|
manjaro-u has quit IRC (Ping timeout: 610 seconds) |
04:00
🔗
|
|
DogsRNice has quit IRC (Read error: Connection reset by peer) |
04:45
🔗
|
|
qw3rty has joined #archiveteam-ot |
04:48
🔗
|
|
odemg has quit IRC (Ping timeout: 745 seconds) |
04:53
🔗
|
|
X-Scale` has joined #archiveteam-ot |
04:55
🔗
|
|
qw3rty2 has quit IRC (Ping timeout: 745 seconds) |
04:55
🔗
|
|
X-Scale has quit IRC (Ping timeout: 252 seconds) |
04:55
🔗
|
|
X-Scale` is now known as X-Scale |
06:56
🔗
|
|
X-Scale` has joined #archiveteam-ot |
06:56
🔗
|
|
X-Scale has quit IRC (Ping timeout: 252 seconds) |
06:56
🔗
|
|
X-Scale` is now known as X-Scale |
07:02
🔗
|
|
nataraj has joined #archiveteam-ot |
07:12
🔗
|
|
deevious has joined #archiveteam-ot |
07:19
🔗
|
|
Atom-- has joined #archiveteam-ot |
07:23
🔗
|
|
Atom__ has quit IRC (Read error: Operation timed out) |
07:25
🔗
|
|
katocala has quit IRC (Read error: Operation timed out) |
07:25
🔗
|
|
katocala has joined #archiveteam-ot |
07:51
🔗
|
|
eythian_ has joined #archiveteam-ot |
07:51
🔗
|
|
MrRadar has quit IRC (Read error: Operation timed out) |
08:01
🔗
|
|
superkuh has quit IRC (Excess Flood) |
08:02
🔗
|
|
superkuh has joined #archiveteam-ot |
08:02
🔗
|
|
MrRadar has joined #archiveteam-ot |
08:04
🔗
|
|
eythian has quit IRC (Read error: Operation timed out) |
09:07
🔗
|
Raccoon |
Where can I learn about writing page scraper scripts to collect metadata from a site |
09:33
🔗
|
h3ndr1k |
Not sure if you mean that, but last time I had to scrape webpages I used python requests and beautifulSoup. |
09:44
🔗
|
Raccoon |
I think some people here use chrome extensions |
09:44
🔗
|
Raccoon |
which would be ideal for my uses if it could scrape while naturally visiting the specific pages |
09:44
🔗
|
h3ndr1k |
Yeah, BeautifulSoup will not execute JavaScript. |
09:45
🔗
|
Raccoon |
but probably more realistic if I just did wget -i linklist.txt |
10:00
🔗
|
|
BlueMax has quit IRC (Quit: Leaving) |
12:09
🔗
|
|
robogoat has quit IRC (Read error: Operation timed out) |
12:09
🔗
|
|
robogoat has joined #archiveteam-ot |
12:42
🔗
|
|
killsushi has quit IRC (Quit: Leaving) |
13:21
🔗
|
|
mls has quit IRC (Ping timeout: 258 seconds) |
13:32
🔗
|
|
SketchCow has joined #archiveteam-ot |
13:32
🔗
|
|
Fusl sets mode: +o SketchCow |
13:32
🔗
|
|
Fusl__ sets mode: +o SketchCow |
13:32
🔗
|
|
Fusl_ sets mode: +o SketchCow |
13:53
🔗
|
|
mls has joined #archiveteam-ot |
13:54
🔗
|
markedL |
if you are writing from scratch and want to run as an extension while browsing as a human, you would google: html selectors in javascript |
13:58
🔗
|
markedL |
for the extension part, flip through the extension store for an extension with similar functionality already, then repurpose the code |
14:05
🔗
|
markedL |
if you want to use python+beautiful soup it might be possible with a HTTP proxy like squid or warcprox |
14:17
🔗
|
|
mls has quit IRC (Read error: Connection reset by peer) |
14:17
🔗
|
|
mls has joined #archiveteam-ot |
14:22
🔗
|
|
mls has quit IRC (Ping timeout: 258 seconds) |
14:41
🔗
|
|
deevious has quit IRC (Quit: deevious) |
15:12
🔗
|
|
deevious has joined #archiveteam-ot |
15:40
🔗
|
|
nataraj has quit IRC (Read error: Operation timed out) |
17:40
🔗
|
|
slyphic has quit IRC (Read error: Operation timed out) |
17:44
🔗
|
|
slyphic has joined #archiveteam-ot |
17:52
🔗
|
|
nataraj has joined #archiveteam-ot |
18:33
🔗
|
Ryz |
Huh, so Sumo Logic being acquired by JASK; the acquired company has relations with Pokemon apparently: https://www.sumologic.com/ |
18:37
🔗
|
|
manjaro-u has joined #archiveteam-ot |
19:14
🔗
|
|
bluefoo has quit IRC (Quit: bluefoo) |
19:42
🔗
|
godane |
https://variety.com/2019/film/news/project-silica-superman-warner-bros-microsoft-1203390459/ |
19:57
🔗
|
godane |
latest digitize tapes : https://www.patreon.com/posts/digitize-tapes-31312387 |
21:24
🔗
|
eythian_ |
https://spectrum.ieee.org/computing/it/the-lost-picture-show-hollywood-archivists-cant-outpace-obsolescence |
21:30
🔗
|
|
BlueMax has joined #archiveteam-ot |
21:42
🔗
|
|
nataraj has quit IRC (Read error: Operation timed out) |
23:27
🔗
|
|
asdf0101 has quit IRC (The Lounge - https://thelounge.chat) |
23:27
🔗
|
|
markedL has quit IRC (Quit: The Lounge - https://thelounge.chat) |
23:31
🔗
|
|
bluefoo has joined #archiveteam-ot |
23:35
🔗
|
|
asdf0101 has joined #archiveteam-ot |
23:35
🔗
|
|
markedL has joined #archiveteam-ot |
23:42
🔗
|
|
katocala has quit IRC () |