[00:12] *** BlueMax has quit IRC (Read error: Connection reset by peer) [00:25] *** VerfiedJ has quit IRC (Quit: Leaving) [00:35] *** m007a83 has joined #archiveteam-bs [00:40] *** Dj-Wawa has quit IRC (Quit: Connection closed for inactivity) [02:25] *** tomaspark has quit IRC (Read error: Operation timed out) [02:31] *** wp494 has quit IRC (Read error: Operation timed out) [02:32] *** wp494 has joined #archiveteam-bs [02:33] I stood up a warrior VM, but it looks like there's no work to be done for any of the projects. Is my warrior broken, or is that actually the case? [02:34] long and short of it is warrior has certain incompatibilities with our 'active' projects atm [02:34] #flickrfckr is available if you want to run the scripts manually though [02:36] Oh cool. I'll make a new jail on my server and start running those then. [02:37] turned out i had old NBC Nightly News Netcast (the Podcast edition) for 2013 i didn't upload [02:37] i'm uploading those files now [02:44] *** Sian1468 has joined #archiveteam-bs [02:55] *** Sian1468 has quit IRC (Quit: Quit) [02:56] *** marked has joined #archiveteam-bs [04:02] *** kiska1 has quit IRC (Read error: Operation timed out) [04:02] *** alembic has joined #archiveteam-bs [04:03] *** kiska1 has joined #archiveteam-bs [04:17] *** qw3rty113 has joined #archiveteam-bs [04:22] *** qw3rty112 has quit IRC (Ping timeout: 600 seconds) [04:37] Catbox.moe has just done a large amount of purging [04:39] Also beware hosted on WPDtalk subreddit but this has a list of a bunch of Hosting sites we could look into https://www.reddit.com/r/WPDtalk/comments/a9b7m4/the_reason_why_catbox_videos_no_longer_work_also/?st=JQ39FKEJ&sh=0567e439 [04:39] https://docs.google.com/spreadsheets/d/1vrKixs_ItQlLnGK6_D22qP3NhRPiD8n7SATX81CGzzs/htmlview#gid=0 [04:40] Here is the spreadsheet [04:51] *** BlueMax has joined #archiveteam-bs [04:57] *** odemg has quit IRC (Ping timeout: 265 seconds) [05:09] *** odemg has joined #archiveteam-bs [05:13] *** hdch has joined #archiveteam-bs [05:16] *** Odd0002 has joined #archiveteam-bs [05:26] *** hdch has quit IRC (Quit: oops) [06:12] *** alembic has quit IRC (Quit: Connection closed for inactivity) [06:14] *** wp494 has quit IRC (Ping timeout: 506 seconds) [06:15] *** wp494 has joined #archiveteam-bs [06:36] *** tapos has quit IRC (Quit: Leaving) [06:54] *** alembic has joined #archiveteam-bs [07:33] *** Pixi has quit IRC (Read error: Operation timed out) [08:51] *** BlueMax has quit IRC (Read error: Connection reset by peer) [09:03] *** alembic has quit IRC (Quit: Connection closed for inactivity) [09:50] *** godane has quit IRC (Ping timeout: 265 seconds) [09:57] *** godane has joined #archiveteam-bs [10:55] *** Sian1468 has joined #archiveteam-bs [11:22] *** BartoCH has quit IRC (Ping timeout: 615 seconds) [11:37] what might cause wpull to barf "WARNING ERROR: Unable to download webpage: (caused by URLError(OSError('Tunnel connection failed: 501 CONNECT is intentionally not supported')))" all over me? [11:37] *** odemgi_ has quit IRC (Ping timeout: 252 seconds) [11:37] *** odemgi has joined #archiveteam-bs [11:37] using ludios' fork but same happened with others before iirc [12:13] *** BartoCH has joined #archiveteam-bs [12:29] schbirid: Are you using youtube-dl? [12:30] If so, that's broken on wpull 2.0.x: https://github.com/ArchiveTeam/wpull/issues/392 [13:10] *** Dj-Wawa has joined #archiveteam-bs [13:30] I need a good name for my new archival tool. Any suggestions? I thought I was so clever when I came up with "aardwarc", only to find that a software with that name already exists... :-( [13:32] It's based on the code I used for Storify, Outpost, AMO, and a few other sites. Python 3, asyncio/aiohttp/warcio, very high concurrency on a single machine, very low-level tool which doesn't do any link extraction or similar, just queue management in the form of items (think like tracker/warrior), retrieval, and writing to WARCs. [13:37] The main advantage over wpull, wget, & Co. is the higher throughput due to the lack of HTML parsing and other processing; I managed to retrieve roughly 100 million URLs from Outpost in less than 4 days runtime from a single machine, for example. The main downside is that you need to write retrieval code for everything you want to grab. [13:49] *** Pixi has joined #archiveteam-bs [13:50] *** odemgi_ has joined #archiveteam-bs [13:52] *** odemgi has quit IRC (Read error: Operation timed out) [13:55] *** Sian1468 has quit IRC (Quit: Leaving) [14:52] hta high throughput archiver [14:57] you can knock letters off the aard part ? [14:58] warc-damacy too obscure? [14:59] warcfast [15:00] where does src live now? [15:00] https://www.onelook.com/reverse-dictionary.shtml?s=fast [15:18] *** wp494 has quit IRC (Ping timeout: 492 seconds) [15:19] *** wp494 has joined #archiveteam-bs [17:00] *** Mateon1 has quit IRC (Read error: Operation timed out) [17:02] *** Mateon1 has joined #archiveteam-bs [17:40] *** Dj-Wawa has quit IRC (Quit: Connection closed for inactivity) [19:02] can you archive my website https://aam.autistic.space/ [19:06] it's at risk of going down by suicide [19:15] jaartywarcfast [19:15] JAA: yeah i had --youtube-dl [19:15] let me try without [19:16] *** ubahn has joined #archiveteam-bs [19:19] yeah that fixed those [19:24] *** ubahn has quit IRC (Quit: ubahn) [19:30] *** Odd0002 has quit IRC (ZNC - http://znc.in) [19:33] *** VerfiedJ has joined #archiveteam-bs [19:36] *** Odd0002 has joined #archiveteam-bs [19:37] *** Odd0002 has quit IRC (Client Quit) [19:41] *** Odd0002 has joined #archiveteam-bs [19:53] *** ubahn has joined #archiveteam-bs [19:58] *** ubahn has quit IRC (Client Quit) [22:33] *** BlueMax has joined #archiveteam-bs [22:46] *** Dj-Wawa has joined #archiveteam-bs [22:50] I'm thinking something like Very Fast Archiver now (in reference to the Very Large Telescope). [22:58] I like to know what makes techiniques make it different, makes it fast. does it use async io, multiprocess, twisted, http2, predicts URLs... etc [22:59] flac files [23:00] he pretty much explained it [23:00] It's based on the code I used for Storify, Outpost, AMO, and a few other sites. Python 3, asyncio/aiohttp/warcio, very high concurrency on a single machine, very low-level tool which doesn't do any link extraction or similar, just queue management in the form of items (think like tracker/warrior), retrieval, and writing to WARCs. [23:00] The main advantage over wpull, wget, & Co. is the higher throughput due to the lack of HTML parsing and other processing; I managed to retrieve roughly 100 million URLs from Outpost in less than 4 days runtime from a single machine, for example. The main downside is that you need to write retrieval code for everything you want to grab. [23:00] to repeat [23:00] also JAA: nice! [23:01] so I meant async used to be in the name, now it's not. [23:01] asyncio and aiohttp for retrieval, multi-process support by coordination through an SQLite database. [23:03] It still needs some polishing to make it useful as a more general archival tool, but it's usable already. [23:05] https://www.reddit.com/r/Archiveteam/comments/a9j6en/textream_is_to_be_shut_down_within_the_month/ [23:05] Oh surprise, it's a Yahoo product. [23:07] wooh [23:07] thanks Jens! [23:08] let´s make a channel :) [23:11] Drawing a blank on channel names. "Textream" doesn't lend itself well to puns. [23:14] what's the opposite of ream ? [23:15] I only know the word from machining. [23:15] A reamer is for making very accurately sized holes in metal. [23:16] .jp is known for butchering the engrish langauge [23:17] There should almost be a dedicated channel for Yahoo garbage fires. [23:18] *** Odd0002 has quit IRC (ZNC - http://znc.in) [23:18] Wasn't there #woohoo or something? [23:19] Ah no, only the wiki page https://archiveteam.org/index.php?title=Woohoo [23:21] *** Odd0002 has joined #archiveteam-bs [23:29] *** twoTBHetz has joined #archiveteam-bs