#archiveteam-bs 2019-09-17,Tue

↑back Search

Time Nickname Message
00:12 🔗 BlueMax has joined #archiveteam-bs
00:23 🔗 tech234a has quit IRC (Quit: Connection closed for inactivity)
00:26 🔗 phillipsj I was not aware of a #youtubearchive channel.
00:26 🔗 Raccoon it's an unproject
00:27 🔗 phillipsj One of the issues I ran into with my crawls is that some of the video were removed for personal safety reasons (Ie: death threats), and I can't really share them.
00:29 🔗 Raccoon people seem to share just about anything on liveleak
00:30 🔗 phillipsj Death threats against the original; author(s) of the video(s), not me.
00:34 🔗 ShellyRol has quit IRC (Read error: Connection reset by peer)
00:35 🔗 ShellyRol has joined #archiveteam-bs
00:39 🔗 killsushi has quit IRC (Quit: Leaving)
01:56 🔗 freemint_ has quit IRC (Read error: Operation timed out)
01:58 🔗 tech234a has joined #archiveteam-bs
02:11 🔗 larryv has joined #archiveteam-bs
02:29 🔗 MrRadar2 has quit IRC (Read error: Operation timed out)
02:29 🔗 Frogging has quit IRC (Read error: Operation timed out)
02:29 🔗 yano has quit IRC (Write error: Broken pipe)
02:29 🔗 foureyes has quit IRC (Read error: Operation timed out)
02:29 🔗 ats has quit IRC (Read error: Operation timed out)
02:29 🔗 Dallas has quit IRC (Read error: Operation timed out)
02:29 🔗 ats has joined #archiveteam-bs
02:29 🔗 yano has joined #archiveteam-bs
02:29 🔗 Frogging has joined #archiveteam-bs
02:30 🔗 Tenebrae has quit IRC (Read error: Operation timed out)
02:30 🔗 foureyes has joined #archiveteam-bs
02:32 🔗 Fusl____ has quit IRC (Read error: Operation timed out)
02:35 🔗 RichardG_ has joined #archiveteam-bs
02:35 🔗 Fusl____ has joined #archiveteam-bs
02:35 🔗 Fusl_ sets mode: +o Fusl____
02:35 🔗 Fusl sets mode: +o Fusl____
02:35 🔗 RichardG has quit IRC (Read error: Operation timed out)
02:36 🔗 Xibalba has quit IRC (Read error: Operation timed out)
02:36 🔗 BnAboyZ has quit IRC (Read error: Connection reset by peer)
02:37 🔗 asie has quit IRC (Read error: Operation timed out)
02:37 🔗 BnAboyZ has joined #archiveteam-bs
02:40 🔗 larryv has quit IRC (Max SendQ exceeded)
02:42 🔗 larryv has joined #archiveteam-bs
02:42 🔗 joshua_ has quit IRC (Read error: Operation timed out)
02:45 🔗 MrRadar2 has joined #archiveteam-bs
02:46 🔗 brayden has quit IRC (Ping timeout: 864 seconds)
02:46 🔗 Tenebrae has joined #archiveteam-bs
02:47 🔗 asie has joined #archiveteam-bs
02:50 🔗 MrRadar2 has quit IRC (Remote host closed the connection)
02:52 🔗 joshua_ has joined #archiveteam-bs
02:52 🔗 DogsRNice has quit IRC (Read error: Connection reset by peer)
02:53 🔗 BnAboyZ has quit IRC (Read error: Connection reset by peer)
02:55 🔗 Dallas has joined #archiveteam-bs
02:56 🔗 Xibalba has joined #archiveteam-bs
02:57 🔗 MrRadar2 has joined #archiveteam-bs
03:02 🔗 BnAboyZ has joined #archiveteam-bs
03:04 🔗 brayden has joined #archiveteam-bs
03:05 🔗 kiska has quit IRC (Remote host closed the connection)
03:06 🔗 kiska has joined #archiveteam-bs
03:06 🔗 Fusl____ sets mode: +o kiska
03:06 🔗 Fusl sets mode: +o kiska
03:06 🔗 Fusl_ sets mode: +o kiska
03:06 🔗 Flashfire has joined #archiveteam-bs
03:37 🔗 odemgi_ has joined #archiveteam-bs
03:42 🔗 odemgi has quit IRC (Read error: Operation timed out)
03:45 🔗 qw3rty has joined #archiveteam-bs
03:53 🔗 qw3rty2 has quit IRC (Ping timeout: 745 seconds)
04:04 🔗 Dallas6 has joined #archiveteam-bs
04:04 🔗 Dallas has quit IRC (Read error: Connection reset by peer)
04:08 🔗 Tenebrae has quit IRC (Read error: Operation timed out)
04:17 🔗 MrRadar2 has quit IRC (Read error: Connection reset by peer)
04:19 🔗 brayden has quit IRC (Ping timeout: 864 seconds)
04:21 🔗 Dallas6 has quit IRC (Read error: Operation timed out)
04:22 🔗 BnAboyZ has quit IRC (Read error: Operation timed out)
04:26 🔗 asie has quit IRC (Ping timeout: 864 seconds)
04:28 🔗 asie has joined #archiveteam-bs
04:30 🔗 Dallas6 has joined #archiveteam-bs
04:30 🔗 Tenebrae has joined #archiveteam-bs
04:31 🔗 BnAboyZ has joined #archiveteam-bs
04:31 🔗 MrRadar2 has joined #archiveteam-bs
04:31 🔗 brayden has joined #archiveteam-bs
04:38 🔗 Xibalba has quit IRC (Read error: Operation timed out)
04:40 🔗 MrRadar2 has quit IRC (Read error: Operation timed out)
04:40 🔗 BnAboyZ has quit IRC (Read error: Operation timed out)
04:45 🔗 Xibalba has joined #archiveteam-bs
04:46 🔗 Tenebrae has quit IRC (Read error: Operation timed out)
04:46 🔗 brayden has quit IRC (Read error: Operation timed out)
04:47 🔗 Tenebrae has joined #archiveteam-bs
04:47 🔗 asie has quit IRC (Read error: Operation timed out)
04:47 🔗 MrRadar2 has joined #archiveteam-bs
04:47 🔗 asie has joined #archiveteam-bs
04:48 🔗 brayden has joined #archiveteam-bs
04:49 🔗 Dallas6 has quit IRC (Read error: Operation timed out)
04:49 🔗 BnAboyZ has joined #archiveteam-bs
04:50 🔗 Dallas6 has joined #archiveteam-bs
05:07 🔗 m007a83 has quit IRC (Read error: Connection reset by peer)
05:14 🔗 m007a83 has joined #archiveteam-bs
05:23 🔗 tech234a has quit IRC (Quit: Connection closed for inactivity)
06:09 🔗 purplebot has joined #archiveteam-bs
06:09 🔗 PurpleSym JAA: Restarted.
06:22 🔗 larryv has quit IRC (Quit: larryv)
07:01 🔗 DigiDigi` has quit IRC (Read error: Operation timed out)
07:03 🔗 DigiDigi` has joined #archiveteam-bs
07:37 🔗 anarchat has quit IRC (Read error: Connection reset by peer)
07:40 🔗 anarcat has joined #archiveteam-bs
07:45 🔗 william has joined #archiveteam-bs
07:48 🔗 asie Continuing from #archiveteam; I think backing up stallman.org would be reasonable. The rest, I wouldn't say there's a big rush
07:49 🔗 asie But I'm not AT, it's up to AT to decide if they want to do something about it.
07:53 🔗 PurpleSym You are here, so you are ArchiveTeam.
07:56 🔗 markedL there's no "I" in Team. But oddly no "U" either.
07:56 🔗 markedL presumably rms.sexy and stallman.org he has personal control over?
07:57 🔗 PurpleSym Grab both, I’d say.
07:58 🔗 asie I thought rms.sexy is fanmade?
07:58 🔗 asie But if so, its contents are likely to change in short order as well.
08:12 🔗 william has quit IRC (Remote host closed the connection)
08:12 🔗 hook54321 it's fanmade
08:22 🔗 markedL rms.sexy has a surprising amount of .js for something so simple, but when I turn off JS, it's still being reloaded by: <meta http-equiv="refresh" content="3;/">
08:49 🔗 godane has joined #archiveteam-bs
08:54 🔗 markedL Joi Ito is leaving MIT and Harvard, so https://www.media.mit.edu/people/joi/overview/
08:55 🔗 markedL https://www.media.mit.edu/posts/my-apology-regarding-jeffrey-epstein/
09:07 🔗 Sanqui did y'all put them in archivebot?
09:11 🔗 Igloo_ Doesn't look like it.
09:11 🔗 Igloo_ is now known as Igloo
09:12 🔗 svchfoo3 sets mode: +o Igloo
09:12 🔗 svchfoo1 sets mode: +o Igloo
10:30 🔗 Leslie has joined #archiveteam-bs
11:03 🔗 BlueMax has quit IRC (Read error: Connection reset by peer)
12:58 🔗 SmileyG has joined #archiveteam-bs
12:59 🔗 Smiley has quit IRC (Read error: Operation timed out)
14:09 🔗 freemint_ has joined #archiveteam-bs
14:17 🔗 bluefoo has quit IRC (Read error: Connection reset by peer)
14:40 🔗 tech234a has joined #archiveteam-bs
14:51 🔗 DogsRNice has joined #archiveteam-bs
15:50 🔗 DigiDigi` has quit IRC (Remote host closed the connection)
15:50 🔗 DigiDigi has joined #archiveteam-bs
15:54 🔗 RichardG_ has quit IRC (Ping timeout: 496 seconds)
16:13 🔗 william has joined #archiveteam-bs
16:15 🔗 markedL do we want a channel for Financial Times?
16:15 🔗 closure_ has quit IRC (Read error: Operation timed out)
16:17 🔗 JAA Don't think that's necessary.
16:20 🔗 closure has joined #archiveteam-bs
16:38 🔗 Ryz has quit IRC (Remote host closed the connection)
16:38 🔗 Ryz has joined #archiveteam-bs
16:39 🔗 Fusl____ sets mode: +o Ryz
16:39 🔗 Fusl sets mode: +o Ryz
16:39 🔗 Fusl_ sets mode: +o Ryz
16:39 🔗 kiska1 has quit IRC (Read error: Connection reset by peer)
16:39 🔗 kiska18 has joined #archiveteam-bs
16:50 🔗 tech234a has quit IRC (Quit: Connection closed for inactivity)
17:15 🔗 icedice has joined #archiveteam-bs
17:17 🔗 icedice Does anyone here have Take180 archived? Especially their Electric Spoofaloo series?
17:19 🔗 icedice Take180 used to be a Disney-owned sketch comedy YouTube channel started about nine years ago or so. They used to collaborate a lot with old-school YouTubers back in the day and from what I remembered their skits were pretty funny.
17:24 🔗 Stiletto has quit IRC (Ping timeout: 246 seconds)
17:26 🔗 Stiletto has joined #archiveteam-bs
17:41 🔗 schbirid has quit IRC (Remote host closed the connection)
17:50 🔗 MaximeLeG has joined #archiveteam-bs
17:50 🔗 MaximeLeG has quit IRC (Client Quit)
18:30 🔗 william has quit IRC (Remote host closed the connection)
18:53 🔗 JAA I'm grabbing some parts of ft.com now. Specifically, I'm traversing the sitemaps and grabbing all content pages (i.e. anything starting with https://www.ft.com/content/) and the images referenced (with URLs beginning with https://www.ft.com/__origami/service/image/v2/images/raw/http). I'm not grabbing anything else, so no videos, stylesheets, etc.
18:53 🔗 JAA Force the cookie FTCookieConsentGDPR=true to get rid of the "Cookies on FT Sites" thingy in the bottom left. You need to force this value for every request as the server will try to set it to false.
18:54 🔗 JAA (There's probably a way around that, but I didn't bother investigating further since it works fine for me.)
19:04 🔗 jodizzle JAA: Are we going to put the full FT site through archivebot, or will that not work?
19:07 🔗 freemint_ has quit IRC (Ping timeout: 246 seconds)
19:10 🔗 JAA jodizzle: We can try. I guess it won't be able to get rid of that cookie thing though, unfortunately.
19:17 🔗 JAA I'm a little surprised that FT lets me hammer them with 50 requests per second, but I won't complain. :-)
19:22 🔗 jodizzle JAA: Could we use mips? There's a way to set cookies with grab-site, right?
19:23 🔗 JAA jodizzle: Yes, but the problem is that the server resets that cookie to false immediately. I believe there's no way to override that in grab-site or wpull (without extra code).
19:24 🔗 freemint_ has joined #archiveteam-bs
19:26 🔗 jodizzle JAA: Ah okay. We'll, I'll put it in archivebot for good measure I guess.
19:27 🔗 jodizzle I imagine a bunch of people are trying to grab it today, but hopefully it will hold up.
19:28 🔗 markedL anyone know if this archive, I think they called it epub, is normally paywalled: http://digital.olivesoftware.com/Olive/APA/FinancialTimesUK/default.aspx
19:30 🔗 TC01_ has joined #archiveteam-bs
19:31 🔗 TC01 has quit IRC (Read error: Operation timed out)
19:36 🔗 odemg has joined #archiveteam-bs
19:36 🔗 jodizzle markedL: Don't know, but that seems like a totally different site.
19:36 🔗 jodizzle Unless FT is using that site as a service?
19:40 🔗 markedL when I playback pages grabbed from wget, I'm not getting the cookie pop-up
19:41 🔗 markedL there's a lot of variables, like I'm not requesting page reqs yet many are absolute pathed to ft.com
19:56 🔗 icedice has quit IRC (Ping timeout: 252 seconds)
20:04 🔗 britmob has joined #archiveteam-bs
20:12 🔗 tech234a has joined #archiveteam-bs
20:17 🔗 Fusl britmob: JAA is the dev of qwarc so he's able to answer all the questions you have
20:18 🔗 britmob JAA: Hello, are there any guides/binary downloads for qwarc? I've had no luck building it myself.
20:18 🔗 britmob Thank you for directing me in the right direction.
20:21 🔗 ivan_ britmob: did you try pip3 install --upgrade --user git+https://github.com/JustAnotherArchivist/qwarc
20:22 🔗 britmob I don't remember exactly what I used, I will try that now.
20:24 🔗 britmob https://i.imgur.com/gEc1eUQ.png
20:24 🔗 britmob ivan_: This is the error I have been getting from your command and the others I have tried.
20:24 🔗 britmob Is this meant to be used with mono on linux?
20:25 🔗 britmob It's very possible I am being an idiot and missing a step here.
20:26 🔗 markedL os is different on Windows than Linux
20:26 🔗 trc has quit IRC (Quit: Leaving)
20:26 🔗 ivan_ britmob: that Python function is Unix-only
20:26 🔗 britmob Well, there's my issue
20:27 🔗 markedL do you want to install WSL ?
20:27 🔗 britmob Tried building on Linux, got some error and assumed it was not designed for Linux. I am building it on my Linux server now.
20:27 🔗 ivan_ WSL 1 has no guarantee of providing enough Linux compatibility :-)
20:27 🔗 britmob IIRC WSL did not work either.
20:29 🔗 markedL Linux server is the most likely so we'll just wait for that error message
20:30 🔗 britmob Gonna take a second- adding more RAM to my proxmox host :)
20:56 🔗 markedL the FT archive's PDF api seems to instead return a 700KB png per page
20:56 🔗 markedL this is what drives the print page button
20:59 🔗 freemint_ has quit IRC (Read error: Connection reset by peer)
21:00 🔗 freemint has joined #archiveteam-bs
21:02 🔗 JAA My ft.com grab is done already. :-)
21:03 🔗 JAA britmob: I'm pretty sure that qwarc will only work properly on Linux or Linux-like systems at the moment.
21:04 🔗 JAA And since that's all I'm running, I won't invest any time in making it Windows-compatible either. If someone wants to send a non-invasive PR for it though, I'll happily get that merged.
21:09 🔗 dxrt_ has quit IRC (Read error: Operation timed out)
21:10 🔗 dxrt_ has joined #archiveteam-bs
21:10 🔗 dxrt sets mode: +o dxrt_
21:10 🔗 Fusl____ sets mode: +o dxrt_
21:10 🔗 Fusl sets mode: +o dxrt_
21:10 🔗 Fusl_ sets mode: +o dxrt_
21:10 🔗 Pixi has quit IRC (Quit: Pixi)
21:16 🔗 Pixi has joined #archiveteam-bs
21:16 🔗 sep332 has quit IRC (Ping timeout: 745 seconds)
21:35 🔗 RichardG has joined #archiveteam-bs
21:37 🔗 jrwr has quit IRC (Read error: Connection reset by peer)
21:40 🔗 jrwr has joined #archiveteam-bs
21:44 🔗 JAA britmob: Oh, also, regarding guides on how to actually use qwarc, nothing exists yet.
22:15 🔗 freemint has quit IRC (Remote host closed the connection)
22:15 🔗 freemint has joined #archiveteam-bs
22:22 🔗 tech234a has quit IRC (Quit: Connection closed for inactivity)
22:31 🔗 britmob has quit IRC (Read error: Connection reset by peer)
22:33 🔗 kiskabak has quit IRC (Remote host closed the connection)
22:33 🔗 kiskabak has joined #archiveteam-bs
22:33 🔗 Fusl_ sets mode: +o kiskabak
22:33 🔗 Fusl____ sets mode: +o kiskabak
22:33 🔗 Fusl sets mode: +o kiskabak
22:56 🔗 BlueMax has joined #archiveteam-bs
23:05 🔗 dxrt has quit IRC (ZNC - http://znc.sourceforge.net)
23:05 🔗 dxrt has joined #archiveteam-bs
23:05 🔗 Fusl____ sets mode: +o dxrt
23:05 🔗 Fusl sets mode: +o dxrt
23:05 🔗 Fusl_ sets mode: +o dxrt
23:10 🔗 RichardG has quit IRC (Quit: Keyboard not found, press F1 to continue)
23:13 🔗 RichardG has joined #archiveteam-bs
23:36 🔗 bluefoo has joined #archiveteam-bs
23:45 🔗 hook54321 !ig 81vuwl012gnmdplpk7yjkwrg3 ^https?://www\.gnu\.org/server/select-language\.html\?.+language=..$
23:45 🔗 hook54321 oops
23:47 🔗 hook54321 JAA: are there still memory leaks in qwarc?
23:53 🔗 tech234a has joined #archiveteam-bs
23:55 🔗 JAA hook54321: No but yes. There is no memory leak in qwarc and never has been. However, memory consumption of qwarc processes still increases with time due to heap fragmentation. I haven't been able to find a proper solution for that yet. My workaround for the time being is to set MALLOC_MMAP_THRESHOLD_=4096 (default being 128 KiB or more), which means that most memory allocations happen through mmap.
23:55 🔗 JAA This is a performance hit but reduces memory consumption since glibc won't constantly resize the heap. For more details, check the -dev logs at the end of August.
23:59 🔗 JAA Chances are this can't really be fixed in qwarc since CPython handles all the memory allocation things. So the only way I could potentially solve it is to change how qwarc works (e.g. how the retrieved data is kept in memory structures) and thereby how CPython/glibc allocates memory. But that will require a lot of additional analysis to figure out exactly what is the problem.
23:59 🔗 hook54321 ah ok

irclogger-viewer