Time |
Nickname |
Message |
00:12
🔗
|
|
BlueMax has joined #archiveteam-bs |
00:23
🔗
|
|
tech234a has quit IRC (Quit: Connection closed for inactivity) |
00:26
🔗
|
phillipsj |
I was not aware of a #youtubearchive channel. |
00:26
🔗
|
Raccoon |
it's an unproject |
00:27
🔗
|
phillipsj |
One of the issues I ran into with my crawls is that some of the video were removed for personal safety reasons (Ie: death threats), and I can't really share them. |
00:29
🔗
|
Raccoon |
people seem to share just about anything on liveleak |
00:30
🔗
|
phillipsj |
Death threats against the original; author(s) of the video(s), not me. |
00:34
🔗
|
|
ShellyRol has quit IRC (Read error: Connection reset by peer) |
00:35
🔗
|
|
ShellyRol has joined #archiveteam-bs |
00:39
🔗
|
|
killsushi has quit IRC (Quit: Leaving) |
01:56
🔗
|
|
freemint_ has quit IRC (Read error: Operation timed out) |
01:58
🔗
|
|
tech234a has joined #archiveteam-bs |
02:11
🔗
|
|
larryv has joined #archiveteam-bs |
02:29
🔗
|
|
MrRadar2 has quit IRC (Read error: Operation timed out) |
02:29
🔗
|
|
Frogging has quit IRC (Read error: Operation timed out) |
02:29
🔗
|
|
yano has quit IRC (Write error: Broken pipe) |
02:29
🔗
|
|
foureyes has quit IRC (Read error: Operation timed out) |
02:29
🔗
|
|
ats has quit IRC (Read error: Operation timed out) |
02:29
🔗
|
|
Dallas has quit IRC (Read error: Operation timed out) |
02:29
🔗
|
|
ats has joined #archiveteam-bs |
02:29
🔗
|
|
yano has joined #archiveteam-bs |
02:29
🔗
|
|
Frogging has joined #archiveteam-bs |
02:30
🔗
|
|
Tenebrae has quit IRC (Read error: Operation timed out) |
02:30
🔗
|
|
foureyes has joined #archiveteam-bs |
02:32
🔗
|
|
Fusl____ has quit IRC (Read error: Operation timed out) |
02:35
🔗
|
|
RichardG_ has joined #archiveteam-bs |
02:35
🔗
|
|
Fusl____ has joined #archiveteam-bs |
02:35
🔗
|
|
Fusl_ sets mode: +o Fusl____ |
02:35
🔗
|
|
Fusl sets mode: +o Fusl____ |
02:35
🔗
|
|
RichardG has quit IRC (Read error: Operation timed out) |
02:36
🔗
|
|
Xibalba has quit IRC (Read error: Operation timed out) |
02:36
🔗
|
|
BnAboyZ has quit IRC (Read error: Connection reset by peer) |
02:37
🔗
|
|
asie has quit IRC (Read error: Operation timed out) |
02:37
🔗
|
|
BnAboyZ has joined #archiveteam-bs |
02:40
🔗
|
|
larryv has quit IRC (Max SendQ exceeded) |
02:42
🔗
|
|
larryv has joined #archiveteam-bs |
02:42
🔗
|
|
joshua_ has quit IRC (Read error: Operation timed out) |
02:45
🔗
|
|
MrRadar2 has joined #archiveteam-bs |
02:46
🔗
|
|
brayden has quit IRC (Ping timeout: 864 seconds) |
02:46
🔗
|
|
Tenebrae has joined #archiveteam-bs |
02:47
🔗
|
|
asie has joined #archiveteam-bs |
02:50
🔗
|
|
MrRadar2 has quit IRC (Remote host closed the connection) |
02:52
🔗
|
|
joshua_ has joined #archiveteam-bs |
02:52
🔗
|
|
DogsRNice has quit IRC (Read error: Connection reset by peer) |
02:53
🔗
|
|
BnAboyZ has quit IRC (Read error: Connection reset by peer) |
02:55
🔗
|
|
Dallas has joined #archiveteam-bs |
02:56
🔗
|
|
Xibalba has joined #archiveteam-bs |
02:57
🔗
|
|
MrRadar2 has joined #archiveteam-bs |
03:02
🔗
|
|
BnAboyZ has joined #archiveteam-bs |
03:04
🔗
|
|
brayden has joined #archiveteam-bs |
03:05
🔗
|
|
kiska has quit IRC (Remote host closed the connection) |
03:06
🔗
|
|
kiska has joined #archiveteam-bs |
03:06
🔗
|
|
Fusl____ sets mode: +o kiska |
03:06
🔗
|
|
Fusl sets mode: +o kiska |
03:06
🔗
|
|
Fusl_ sets mode: +o kiska |
03:06
🔗
|
|
Flashfire has joined #archiveteam-bs |
03:37
🔗
|
|
odemgi_ has joined #archiveteam-bs |
03:42
🔗
|
|
odemgi has quit IRC (Read error: Operation timed out) |
03:45
🔗
|
|
qw3rty has joined #archiveteam-bs |
03:53
🔗
|
|
qw3rty2 has quit IRC (Ping timeout: 745 seconds) |
04:04
🔗
|
|
Dallas6 has joined #archiveteam-bs |
04:04
🔗
|
|
Dallas has quit IRC (Read error: Connection reset by peer) |
04:08
🔗
|
|
Tenebrae has quit IRC (Read error: Operation timed out) |
04:17
🔗
|
|
MrRadar2 has quit IRC (Read error: Connection reset by peer) |
04:19
🔗
|
|
brayden has quit IRC (Ping timeout: 864 seconds) |
04:21
🔗
|
|
Dallas6 has quit IRC (Read error: Operation timed out) |
04:22
🔗
|
|
BnAboyZ has quit IRC (Read error: Operation timed out) |
04:26
🔗
|
|
asie has quit IRC (Ping timeout: 864 seconds) |
04:28
🔗
|
|
asie has joined #archiveteam-bs |
04:30
🔗
|
|
Dallas6 has joined #archiveteam-bs |
04:30
🔗
|
|
Tenebrae has joined #archiveteam-bs |
04:31
🔗
|
|
BnAboyZ has joined #archiveteam-bs |
04:31
🔗
|
|
MrRadar2 has joined #archiveteam-bs |
04:31
🔗
|
|
brayden has joined #archiveteam-bs |
04:38
🔗
|
|
Xibalba has quit IRC (Read error: Operation timed out) |
04:40
🔗
|
|
MrRadar2 has quit IRC (Read error: Operation timed out) |
04:40
🔗
|
|
BnAboyZ has quit IRC (Read error: Operation timed out) |
04:45
🔗
|
|
Xibalba has joined #archiveteam-bs |
04:46
🔗
|
|
Tenebrae has quit IRC (Read error: Operation timed out) |
04:46
🔗
|
|
brayden has quit IRC (Read error: Operation timed out) |
04:47
🔗
|
|
Tenebrae has joined #archiveteam-bs |
04:47
🔗
|
|
asie has quit IRC (Read error: Operation timed out) |
04:47
🔗
|
|
MrRadar2 has joined #archiveteam-bs |
04:47
🔗
|
|
asie has joined #archiveteam-bs |
04:48
🔗
|
|
brayden has joined #archiveteam-bs |
04:49
🔗
|
|
Dallas6 has quit IRC (Read error: Operation timed out) |
04:49
🔗
|
|
BnAboyZ has joined #archiveteam-bs |
04:50
🔗
|
|
Dallas6 has joined #archiveteam-bs |
05:07
🔗
|
|
m007a83 has quit IRC (Read error: Connection reset by peer) |
05:14
🔗
|
|
m007a83 has joined #archiveteam-bs |
05:23
🔗
|
|
tech234a has quit IRC (Quit: Connection closed for inactivity) |
06:09
🔗
|
|
purplebot has joined #archiveteam-bs |
06:09
🔗
|
PurpleSym |
JAA: Restarted. |
06:22
🔗
|
|
larryv has quit IRC (Quit: larryv) |
07:01
🔗
|
|
DigiDigi` has quit IRC (Read error: Operation timed out) |
07:03
🔗
|
|
DigiDigi` has joined #archiveteam-bs |
07:37
🔗
|
|
anarchat has quit IRC (Read error: Connection reset by peer) |
07:40
🔗
|
|
anarcat has joined #archiveteam-bs |
07:45
🔗
|
|
william has joined #archiveteam-bs |
07:48
🔗
|
asie |
Continuing from #archiveteam; I think backing up stallman.org would be reasonable. The rest, I wouldn't say there's a big rush |
07:49
🔗
|
asie |
But I'm not AT, it's up to AT to decide if they want to do something about it. |
07:53
🔗
|
PurpleSym |
You are here, so you are ArchiveTeam. |
07:56
🔗
|
markedL |
there's no "I" in Team. But oddly no "U" either. |
07:56
🔗
|
markedL |
presumably rms.sexy and stallman.org he has personal control over? |
07:57
🔗
|
PurpleSym |
Grab both, I’d say. |
07:58
🔗
|
asie |
I thought rms.sexy is fanmade? |
07:58
🔗
|
asie |
But if so, its contents are likely to change in short order as well. |
08:12
🔗
|
|
william has quit IRC (Remote host closed the connection) |
08:12
🔗
|
hook54321 |
it's fanmade |
08:22
🔗
|
markedL |
rms.sexy has a surprising amount of .js for something so simple, but when I turn off JS, it's still being reloaded by: <meta http-equiv="refresh" content="3;/"> |
08:49
🔗
|
|
godane has joined #archiveteam-bs |
08:54
🔗
|
markedL |
Joi Ito is leaving MIT and Harvard, so https://www.media.mit.edu/people/joi/overview/ |
08:55
🔗
|
markedL |
https://www.media.mit.edu/posts/my-apology-regarding-jeffrey-epstein/ |
09:07
🔗
|
Sanqui |
did y'all put them in archivebot? |
09:11
🔗
|
Igloo_ |
Doesn't look like it. |
09:11
🔗
|
|
Igloo_ is now known as Igloo |
09:12
🔗
|
|
svchfoo3 sets mode: +o Igloo |
09:12
🔗
|
|
svchfoo1 sets mode: +o Igloo |
10:30
🔗
|
|
Leslie has joined #archiveteam-bs |
11:03
🔗
|
|
BlueMax has quit IRC (Read error: Connection reset by peer) |
12:58
🔗
|
|
SmileyG has joined #archiveteam-bs |
12:59
🔗
|
|
Smiley has quit IRC (Read error: Operation timed out) |
14:09
🔗
|
|
freemint_ has joined #archiveteam-bs |
14:17
🔗
|
|
bluefoo has quit IRC (Read error: Connection reset by peer) |
14:40
🔗
|
|
tech234a has joined #archiveteam-bs |
14:51
🔗
|
|
DogsRNice has joined #archiveteam-bs |
15:50
🔗
|
|
DigiDigi` has quit IRC (Remote host closed the connection) |
15:50
🔗
|
|
DigiDigi has joined #archiveteam-bs |
15:54
🔗
|
|
RichardG_ has quit IRC (Ping timeout: 496 seconds) |
16:13
🔗
|
|
william has joined #archiveteam-bs |
16:15
🔗
|
markedL |
do we want a channel for Financial Times? |
16:15
🔗
|
|
closure_ has quit IRC (Read error: Operation timed out) |
16:17
🔗
|
JAA |
Don't think that's necessary. |
16:20
🔗
|
|
closure has joined #archiveteam-bs |
16:38
🔗
|
|
Ryz has quit IRC (Remote host closed the connection) |
16:38
🔗
|
|
Ryz has joined #archiveteam-bs |
16:39
🔗
|
|
Fusl____ sets mode: +o Ryz |
16:39
🔗
|
|
Fusl sets mode: +o Ryz |
16:39
🔗
|
|
Fusl_ sets mode: +o Ryz |
16:39
🔗
|
|
kiska1 has quit IRC (Read error: Connection reset by peer) |
16:39
🔗
|
|
kiska18 has joined #archiveteam-bs |
16:50
🔗
|
|
tech234a has quit IRC (Quit: Connection closed for inactivity) |
17:15
🔗
|
|
icedice has joined #archiveteam-bs |
17:17
🔗
|
icedice |
Does anyone here have Take180 archived? Especially their Electric Spoofaloo series? |
17:19
🔗
|
icedice |
Take180 used to be a Disney-owned sketch comedy YouTube channel started about nine years ago or so. They used to collaborate a lot with old-school YouTubers back in the day and from what I remembered their skits were pretty funny. |
17:24
🔗
|
|
Stiletto has quit IRC (Ping timeout: 246 seconds) |
17:26
🔗
|
|
Stiletto has joined #archiveteam-bs |
17:41
🔗
|
|
schbirid has quit IRC (Remote host closed the connection) |
17:50
🔗
|
|
MaximeLeG has joined #archiveteam-bs |
17:50
🔗
|
|
MaximeLeG has quit IRC (Client Quit) |
18:30
🔗
|
|
william has quit IRC (Remote host closed the connection) |
18:53
🔗
|
JAA |
I'm grabbing some parts of ft.com now. Specifically, I'm traversing the sitemaps and grabbing all content pages (i.e. anything starting with https://www.ft.com/content/) and the images referenced (with URLs beginning with https://www.ft.com/__origami/service/image/v2/images/raw/http). I'm not grabbing anything else, so no videos, stylesheets, etc. |
18:53
🔗
|
JAA |
Force the cookie FTCookieConsentGDPR=true to get rid of the "Cookies on FT Sites" thingy in the bottom left. You need to force this value for every request as the server will try to set it to false. |
18:54
🔗
|
JAA |
(There's probably a way around that, but I didn't bother investigating further since it works fine for me.) |
19:04
🔗
|
jodizzle |
JAA: Are we going to put the full FT site through archivebot, or will that not work? |
19:07
🔗
|
|
freemint_ has quit IRC (Ping timeout: 246 seconds) |
19:10
🔗
|
JAA |
jodizzle: We can try. I guess it won't be able to get rid of that cookie thing though, unfortunately. |
19:17
🔗
|
JAA |
I'm a little surprised that FT lets me hammer them with 50 requests per second, but I won't complain. :-) |
19:22
🔗
|
jodizzle |
JAA: Could we use mips? There's a way to set cookies with grab-site, right? |
19:23
🔗
|
JAA |
jodizzle: Yes, but the problem is that the server resets that cookie to false immediately. I believe there's no way to override that in grab-site or wpull (without extra code). |
19:24
🔗
|
|
freemint_ has joined #archiveteam-bs |
19:26
🔗
|
jodizzle |
JAA: Ah okay. We'll, I'll put it in archivebot for good measure I guess. |
19:27
🔗
|
jodizzle |
I imagine a bunch of people are trying to grab it today, but hopefully it will hold up. |
19:28
🔗
|
markedL |
anyone know if this archive, I think they called it epub, is normally paywalled: http://digital.olivesoftware.com/Olive/APA/FinancialTimesUK/default.aspx |
19:30
🔗
|
|
TC01_ has joined #archiveteam-bs |
19:31
🔗
|
|
TC01 has quit IRC (Read error: Operation timed out) |
19:36
🔗
|
|
odemg has joined #archiveteam-bs |
19:36
🔗
|
jodizzle |
markedL: Don't know, but that seems like a totally different site. |
19:36
🔗
|
jodizzle |
Unless FT is using that site as a service? |
19:40
🔗
|
markedL |
when I playback pages grabbed from wget, I'm not getting the cookie pop-up |
19:41
🔗
|
markedL |
there's a lot of variables, like I'm not requesting page reqs yet many are absolute pathed to ft.com |
19:56
🔗
|
|
icedice has quit IRC (Ping timeout: 252 seconds) |
20:04
🔗
|
|
britmob has joined #archiveteam-bs |
20:12
🔗
|
|
tech234a has joined #archiveteam-bs |
20:17
🔗
|
Fusl |
britmob: JAA is the dev of qwarc so he's able to answer all the questions you have |
20:18
🔗
|
britmob |
JAA: Hello, are there any guides/binary downloads for qwarc? I've had no luck building it myself. |
20:18
🔗
|
britmob |
Thank you for directing me in the right direction. |
20:21
🔗
|
ivan_ |
britmob: did you try pip3 install --upgrade --user git+https://github.com/JustAnotherArchivist/qwarc |
20:22
🔗
|
britmob |
I don't remember exactly what I used, I will try that now. |
20:24
🔗
|
britmob |
https://i.imgur.com/gEc1eUQ.png |
20:24
🔗
|
britmob |
ivan_: This is the error I have been getting from your command and the others I have tried. |
20:24
🔗
|
britmob |
Is this meant to be used with mono on linux? |
20:25
🔗
|
britmob |
It's very possible I am being an idiot and missing a step here. |
20:26
🔗
|
markedL |
os is different on Windows than Linux |
20:26
🔗
|
|
trc has quit IRC (Quit: Leaving) |
20:26
🔗
|
ivan_ |
britmob: that Python function is Unix-only |
20:26
🔗
|
britmob |
Well, there's my issue |
20:27
🔗
|
markedL |
do you want to install WSL ? |
20:27
🔗
|
britmob |
Tried building on Linux, got some error and assumed it was not designed for Linux. I am building it on my Linux server now. |
20:27
🔗
|
ivan_ |
WSL 1 has no guarantee of providing enough Linux compatibility :-) |
20:27
🔗
|
britmob |
IIRC WSL did not work either. |
20:29
🔗
|
markedL |
Linux server is the most likely so we'll just wait for that error message |
20:30
🔗
|
britmob |
Gonna take a second- adding more RAM to my proxmox host :) |
20:56
🔗
|
markedL |
the FT archive's PDF api seems to instead return a 700KB png per page |
20:56
🔗
|
markedL |
this is what drives the print page button |
20:59
🔗
|
|
freemint_ has quit IRC (Read error: Connection reset by peer) |
21:00
🔗
|
|
freemint has joined #archiveteam-bs |
21:02
🔗
|
JAA |
My ft.com grab is done already. :-) |
21:03
🔗
|
JAA |
britmob: I'm pretty sure that qwarc will only work properly on Linux or Linux-like systems at the moment. |
21:04
🔗
|
JAA |
And since that's all I'm running, I won't invest any time in making it Windows-compatible either. If someone wants to send a non-invasive PR for it though, I'll happily get that merged. |
21:09
🔗
|
|
dxrt_ has quit IRC (Read error: Operation timed out) |
21:10
🔗
|
|
dxrt_ has joined #archiveteam-bs |
21:10
🔗
|
|
dxrt sets mode: +o dxrt_ |
21:10
🔗
|
|
Fusl____ sets mode: +o dxrt_ |
21:10
🔗
|
|
Fusl sets mode: +o dxrt_ |
21:10
🔗
|
|
Fusl_ sets mode: +o dxrt_ |
21:10
🔗
|
|
Pixi has quit IRC (Quit: Pixi) |
21:16
🔗
|
|
Pixi has joined #archiveteam-bs |
21:16
🔗
|
|
sep332 has quit IRC (Ping timeout: 745 seconds) |
21:35
🔗
|
|
RichardG has joined #archiveteam-bs |
21:37
🔗
|
|
jrwr has quit IRC (Read error: Connection reset by peer) |
21:40
🔗
|
|
jrwr has joined #archiveteam-bs |
21:44
🔗
|
JAA |
britmob: Oh, also, regarding guides on how to actually use qwarc, nothing exists yet. |
22:15
🔗
|
|
freemint has quit IRC (Remote host closed the connection) |
22:15
🔗
|
|
freemint has joined #archiveteam-bs |
22:22
🔗
|
|
tech234a has quit IRC (Quit: Connection closed for inactivity) |
22:31
🔗
|
|
britmob has quit IRC (Read error: Connection reset by peer) |
22:33
🔗
|
|
kiskabak has quit IRC (Remote host closed the connection) |
22:33
🔗
|
|
kiskabak has joined #archiveteam-bs |
22:33
🔗
|
|
Fusl_ sets mode: +o kiskabak |
22:33
🔗
|
|
Fusl____ sets mode: +o kiskabak |
22:33
🔗
|
|
Fusl sets mode: +o kiskabak |
22:56
🔗
|
|
BlueMax has joined #archiveteam-bs |
23:05
🔗
|
|
dxrt has quit IRC (ZNC - http://znc.sourceforge.net) |
23:05
🔗
|
|
dxrt has joined #archiveteam-bs |
23:05
🔗
|
|
Fusl____ sets mode: +o dxrt |
23:05
🔗
|
|
Fusl sets mode: +o dxrt |
23:05
🔗
|
|
Fusl_ sets mode: +o dxrt |
23:10
🔗
|
|
RichardG has quit IRC (Quit: Keyboard not found, press F1 to continue) |
23:13
🔗
|
|
RichardG has joined #archiveteam-bs |
23:36
🔗
|
|
bluefoo has joined #archiveteam-bs |
23:45
🔗
|
hook54321 |
!ig 81vuwl012gnmdplpk7yjkwrg3 ^https?://www\.gnu\.org/server/select-language\.html\?.+language=..$ |
23:45
🔗
|
hook54321 |
oops |
23:47
🔗
|
hook54321 |
JAA: are there still memory leaks in qwarc? |
23:53
🔗
|
|
tech234a has joined #archiveteam-bs |
23:55
🔗
|
JAA |
hook54321: No but yes. There is no memory leak in qwarc and never has been. However, memory consumption of qwarc processes still increases with time due to heap fragmentation. I haven't been able to find a proper solution for that yet. My workaround for the time being is to set MALLOC_MMAP_THRESHOLD_=4096 (default being 128 KiB or more), which means that most memory allocations happen through mmap. |
23:55
🔗
|
JAA |
This is a performance hit but reduces memory consumption since glibc won't constantly resize the heap. For more details, check the -dev logs at the end of August. |
23:59
🔗
|
JAA |
Chances are this can't really be fixed in qwarc since CPython handles all the memory allocation things. So the only way I could potentially solve it is to change how qwarc works (e.g. how the retrieved data is kept in memory structures) and thereby how CPython/glibc allocates memory. But that will require a lot of additional analysis to figure out exactly what is the problem. |
23:59
🔗
|
hook54321 |
ah ok |