#archiveteam-bs 2018-12-25,Tue

↑back Search

Time Nickname Message
00:12 πŸ”— BlueMax has quit IRC (Read error: Connection reset by peer)
00:25 πŸ”— VerfiedJ has quit IRC (Quit: Leaving)
00:35 πŸ”— m007a83 has joined #archiveteam-bs
00:40 πŸ”— Dj-Wawa has quit IRC (Quit: Connection closed for inactivity)
02:25 πŸ”— tomaspark has quit IRC (Read error: Operation timed out)
02:31 πŸ”— wp494 has quit IRC (Read error: Operation timed out)
02:32 πŸ”— wp494 has joined #archiveteam-bs
02:33 πŸ”— pawbs I stood up a warrior VM, but it looks like there's no work to be done for any of the projects. Is my warrior broken, or is that actually the case?
02:34 πŸ”— Kaz long and short of it is warrior has certain incompatibilities with our 'active' projects atm
02:34 πŸ”— Kaz #flickrfckr is available if you want to run the scripts manually though
02:36 πŸ”— pawbs Oh cool. I'll make a new jail on my server and start running those then.
02:37 πŸ”— godane turned out i had old NBC Nightly News Netcast (the Podcast edition) for 2013 i didn't upload
02:37 πŸ”— godane i'm uploading those files now
02:44 πŸ”— Sian1468 has joined #archiveteam-bs
02:55 πŸ”— Sian1468 has quit IRC (Quit: Quit)
02:56 πŸ”— marked has joined #archiveteam-bs
04:02 πŸ”— kiska1 has quit IRC (Read error: Operation timed out)
04:02 πŸ”— alembic has joined #archiveteam-bs
04:03 πŸ”— kiska1 has joined #archiveteam-bs
04:17 πŸ”— qw3rty113 has joined #archiveteam-bs
04:22 πŸ”— qw3rty112 has quit IRC (Ping timeout: 600 seconds)
04:37 πŸ”— Flashfire Catbox.moe has just done a large amount of purging
04:39 πŸ”— Flashfire Also beware hosted on WPDtalk subreddit but this has a list of a bunch of Hosting sites we could look into https://www.reddit.com/r/WPDtalk/comments/a9b7m4/the_reason_why_catbox_videos_no_longer_work_also/?st=JQ39FKEJ&sh=0567e439
04:39 πŸ”— Flashfire https://docs.google.com/spreadsheets/d/1vrKixs_ItQlLnGK6_D22qP3NhRPiD8n7SATX81CGzzs/htmlview#gid=0
04:40 πŸ”— Flashfire Here is the spreadsheet
04:51 πŸ”— BlueMax has joined #archiveteam-bs
04:57 πŸ”— odemg has quit IRC (Ping timeout: 265 seconds)
05:09 πŸ”— odemg has joined #archiveteam-bs
05:13 πŸ”— hdch has joined #archiveteam-bs
05:16 πŸ”— Odd0002 has joined #archiveteam-bs
05:26 πŸ”— hdch has quit IRC (Quit: oops)
06:12 πŸ”— alembic has quit IRC (Quit: Connection closed for inactivity)
06:14 πŸ”— wp494 has quit IRC (Ping timeout: 506 seconds)
06:15 πŸ”— wp494 has joined #archiveteam-bs
06:36 πŸ”— tapos has quit IRC (Quit: Leaving)
06:54 πŸ”— alembic has joined #archiveteam-bs
07:33 πŸ”— Pixi has quit IRC (Read error: Operation timed out)
08:51 πŸ”— BlueMax has quit IRC (Read error: Connection reset by peer)
09:03 πŸ”— alembic has quit IRC (Quit: Connection closed for inactivity)
09:50 πŸ”— godane has quit IRC (Ping timeout: 265 seconds)
09:57 πŸ”— godane has joined #archiveteam-bs
10:55 πŸ”— Sian1468 has joined #archiveteam-bs
11:22 πŸ”— BartoCH has quit IRC (Ping timeout: 615 seconds)
11:37 πŸ”— schbirid what might cause wpull to barf "WARNING ERROR: Unable to download webpage: <urlopen error Tunnel connection failed: 501 CONNECT is intentionally not supported> (caused by URLError(OSError('Tunnel connection failed: 501 CONNECT is intentionally not supported')))" all over me?
11:37 πŸ”— odemgi_ has quit IRC (Ping timeout: 252 seconds)
11:37 πŸ”— odemgi has joined #archiveteam-bs
11:37 πŸ”— schbirid using ludios' fork but same happened with others before iirc
12:13 πŸ”— BartoCH has joined #archiveteam-bs
12:29 πŸ”— JAA schbirid: Are you using youtube-dl?
12:30 πŸ”— JAA If so, that's broken on wpull 2.0.x: https://github.com/ArchiveTeam/wpull/issues/392
13:10 πŸ”— Dj-Wawa has joined #archiveteam-bs
13:30 πŸ”— JAA I need a good name for my new archival tool. Any suggestions? I thought I was so clever when I came up with "aardwarc", only to find that a software with that name already exists... :-(
13:32 πŸ”— JAA It's based on the code I used for Storify, Outpost, AMO, and a few other sites. Python 3, asyncio/aiohttp/warcio, very high concurrency on a single machine, very low-level tool which doesn't do any link extraction or similar, just queue management in the form of items (think like tracker/warrior), retrieval, and writing to WARCs.
13:37 πŸ”— JAA The main advantage over wpull, wget, & Co. is the higher throughput due to the lack of HTML parsing and other processing; I managed to retrieve roughly 100 million URLs from Outpost in less than 4 days runtime from a single machine, for example. The main downside is that you need to write retrieval code for everything you want to grab.
13:49 πŸ”— Pixi has joined #archiveteam-bs
13:50 πŸ”— odemgi_ has joined #archiveteam-bs
13:52 πŸ”— odemgi has quit IRC (Read error: Operation timed out)
13:55 πŸ”— Sian1468 has quit IRC (Quit: Leaving)
14:52 πŸ”— ivan_ hta high throughput archiver
14:57 πŸ”— marked you can knock letters off the aard part ?
14:58 πŸ”— ivan_ warc-damacy too obscure?
14:59 πŸ”— ivan_ warcfast
15:00 πŸ”— marked where does src live now?
15:00 πŸ”— ivan_ https://www.onelook.com/reverse-dictionary.shtml?s=fast
15:18 πŸ”— wp494 has quit IRC (Ping timeout: 492 seconds)
15:19 πŸ”— wp494 has joined #archiveteam-bs
17:00 πŸ”— Mateon1 has quit IRC (Read error: Operation timed out)
17:02 πŸ”— Mateon1 has joined #archiveteam-bs
17:40 πŸ”— Dj-Wawa has quit IRC (Quit: Connection closed for inactivity)
19:02 πŸ”— Soni can you archive my website https://aam.autistic.space/
19:06 πŸ”— Soni it's at risk of going down by suicide
19:15 πŸ”— schbirid jaartywarcfast
19:15 πŸ”— schbirid JAA: yeah i had --youtube-dl
19:15 πŸ”— schbirid let me try without
19:16 πŸ”— ubahn has joined #archiveteam-bs
19:19 πŸ”— schbirid yeah that fixed those
19:24 πŸ”— ubahn has quit IRC (Quit: ubahn)
19:30 πŸ”— Odd0002 has quit IRC (ZNC - http://znc.in)
19:33 πŸ”— VerfiedJ has joined #archiveteam-bs
19:36 πŸ”— Odd0002 has joined #archiveteam-bs
19:37 πŸ”— Odd0002 has quit IRC (Client Quit)
19:41 πŸ”— Odd0002 has joined #archiveteam-bs
19:53 πŸ”— ubahn has joined #archiveteam-bs
19:58 πŸ”— ubahn has quit IRC (Client Quit)
22:33 πŸ”— BlueMax has joined #archiveteam-bs
22:46 πŸ”— Dj-Wawa has joined #archiveteam-bs
22:50 πŸ”— JAA I'm thinking something like Very Fast Archiver now (in reference to the Very Large Telescope).
22:58 πŸ”— marked I like to know what makes techiniques make it different, makes it fast. does it use async io, multiprocess, twisted, http2, predicts URLs... etc
22:59 πŸ”— psi flac files
23:00 πŸ”— arkiver he pretty much explained it
23:00 πŸ”— arkiver <JAA> It's based on the code I used for Storify, Outpost, AMO, and a few other sites. Python 3, asyncio/aiohttp/warcio, very high concurrency on a single machine, very low-level tool which doesn't do any link extraction or similar, just queue management in the form of items (think like tracker/warrior), retrieval, and writing to WARCs.
23:00 πŸ”— arkiver <JAA> The main advantage over wpull, wget, & Co. is the higher throughput due to the lack of HTML parsing and other processing; I managed to retrieve roughly 100 million URLs from Outpost in less than 4 days runtime from a single machine, for example. The main downside is that you need to write retrieval code for everything you want to grab.
23:00 πŸ”— arkiver to repeat
23:00 πŸ”— arkiver also JAA: nice!
23:01 πŸ”— marked so I meant async used to be in the name, now it's not.
23:01 πŸ”— JAA asyncio and aiohttp for retrieval, multi-process support by coordination through an SQLite database.
23:03 πŸ”— JAA It still needs some polishing to make it useful as a more general archival tool, but it's usable already.
23:05 πŸ”— Jens https://www.reddit.com/r/Archiveteam/comments/a9j6en/textream_is_to_be_shut_down_within_the_month/
23:05 πŸ”— Jens Oh surprise, it's a Yahoo product.
23:07 πŸ”— arkiver wooh
23:07 πŸ”— arkiver thanks Jens!
23:08 πŸ”— arkiver letΒ΄s make a channel :)
23:11 πŸ”— Jens Drawing a blank on channel names. "Textream" doesn't lend itself well to puns.
23:14 πŸ”— marked what's the opposite of ream ?
23:15 πŸ”— Jens I only know the word from machining.
23:15 πŸ”— Jens A reamer is for making very accurately sized holes in metal.
23:16 πŸ”— marked .jp is known for butchering the engrish langauge
23:17 πŸ”— Jens There should almost be a dedicated channel for Yahoo garbage fires.
23:18 πŸ”— Odd0002 has quit IRC (ZNC - http://znc.in)
23:18 πŸ”— JAA Wasn't there #woohoo or something?
23:19 πŸ”— JAA Ah no, only the wiki page https://archiveteam.org/index.php?title=Woohoo
23:21 πŸ”— Odd0002 has joined #archiveteam-bs
23:29 πŸ”— twoTBHetz has joined #archiveteam-bs

irclogger-viewer