Time |
Nickname |
Message |
00:12
π
|
|
BlueMax has quit IRC (Read error: Connection reset by peer) |
00:25
π
|
|
VerfiedJ has quit IRC (Quit: Leaving) |
00:35
π
|
|
m007a83 has joined #archiveteam-bs |
00:40
π
|
|
Dj-Wawa has quit IRC (Quit: Connection closed for inactivity) |
02:25
π
|
|
tomaspark has quit IRC (Read error: Operation timed out) |
02:31
π
|
|
wp494 has quit IRC (Read error: Operation timed out) |
02:32
π
|
|
wp494 has joined #archiveteam-bs |
02:33
π
|
pawbs |
I stood up a warrior VM, but it looks like there's no work to be done for any of the projects. Is my warrior broken, or is that actually the case? |
02:34
π
|
Kaz |
long and short of it is warrior has certain incompatibilities with our 'active' projects atm |
02:34
π
|
Kaz |
#flickrfckr is available if you want to run the scripts manually though |
02:36
π
|
pawbs |
Oh cool. I'll make a new jail on my server and start running those then. |
02:37
π
|
godane |
turned out i had old NBC Nightly News Netcast (the Podcast edition) for 2013 i didn't upload |
02:37
π
|
godane |
i'm uploading those files now |
02:44
π
|
|
Sian1468 has joined #archiveteam-bs |
02:55
π
|
|
Sian1468 has quit IRC (Quit: Quit) |
02:56
π
|
|
marked has joined #archiveteam-bs |
04:02
π
|
|
kiska1 has quit IRC (Read error: Operation timed out) |
04:02
π
|
|
alembic has joined #archiveteam-bs |
04:03
π
|
|
kiska1 has joined #archiveteam-bs |
04:17
π
|
|
qw3rty113 has joined #archiveteam-bs |
04:22
π
|
|
qw3rty112 has quit IRC (Ping timeout: 600 seconds) |
04:37
π
|
Flashfire |
Catbox.moe has just done a large amount of purging |
04:39
π
|
Flashfire |
Also beware hosted on WPDtalk subreddit but this has a list of a bunch of Hosting sites we could look into https://www.reddit.com/r/WPDtalk/comments/a9b7m4/the_reason_why_catbox_videos_no_longer_work_also/?st=JQ39FKEJ&sh=0567e439 |
04:39
π
|
Flashfire |
https://docs.google.com/spreadsheets/d/1vrKixs_ItQlLnGK6_D22qP3NhRPiD8n7SATX81CGzzs/htmlview#gid=0 |
04:40
π
|
Flashfire |
Here is the spreadsheet |
04:51
π
|
|
BlueMax has joined #archiveteam-bs |
04:57
π
|
|
odemg has quit IRC (Ping timeout: 265 seconds) |
05:09
π
|
|
odemg has joined #archiveteam-bs |
05:13
π
|
|
hdch has joined #archiveteam-bs |
05:16
π
|
|
Odd0002 has joined #archiveteam-bs |
05:26
π
|
|
hdch has quit IRC (Quit: oops) |
06:12
π
|
|
alembic has quit IRC (Quit: Connection closed for inactivity) |
06:14
π
|
|
wp494 has quit IRC (Ping timeout: 506 seconds) |
06:15
π
|
|
wp494 has joined #archiveteam-bs |
06:36
π
|
|
tapos has quit IRC (Quit: Leaving) |
06:54
π
|
|
alembic has joined #archiveteam-bs |
07:33
π
|
|
Pixi has quit IRC (Read error: Operation timed out) |
08:51
π
|
|
BlueMax has quit IRC (Read error: Connection reset by peer) |
09:03
π
|
|
alembic has quit IRC (Quit: Connection closed for inactivity) |
09:50
π
|
|
godane has quit IRC (Ping timeout: 265 seconds) |
09:57
π
|
|
godane has joined #archiveteam-bs |
10:55
π
|
|
Sian1468 has joined #archiveteam-bs |
11:22
π
|
|
BartoCH has quit IRC (Ping timeout: 615 seconds) |
11:37
π
|
schbirid |
what might cause wpull to barf "WARNING ERROR: Unable to download webpage: <urlopen error Tunnel connection failed: 501 CONNECT is intentionally not supported> (caused by URLError(OSError('Tunnel connection failed: 501 CONNECT is intentionally not supported')))" all over me? |
11:37
π
|
|
odemgi_ has quit IRC (Ping timeout: 252 seconds) |
11:37
π
|
|
odemgi has joined #archiveteam-bs |
11:37
π
|
schbirid |
using ludios' fork but same happened with others before iirc |
12:13
π
|
|
BartoCH has joined #archiveteam-bs |
12:29
π
|
JAA |
schbirid: Are you using youtube-dl? |
12:30
π
|
JAA |
If so, that's broken on wpull 2.0.x: https://github.com/ArchiveTeam/wpull/issues/392 |
13:10
π
|
|
Dj-Wawa has joined #archiveteam-bs |
13:30
π
|
JAA |
I need a good name for my new archival tool. Any suggestions? I thought I was so clever when I came up with "aardwarc", only to find that a software with that name already exists... :-( |
13:32
π
|
JAA |
It's based on the code I used for Storify, Outpost, AMO, and a few other sites. Python 3, asyncio/aiohttp/warcio, very high concurrency on a single machine, very low-level tool which doesn't do any link extraction or similar, just queue management in the form of items (think like tracker/warrior), retrieval, and writing to WARCs. |
13:37
π
|
JAA |
The main advantage over wpull, wget, & Co. is the higher throughput due to the lack of HTML parsing and other processing; I managed to retrieve roughly 100 million URLs from Outpost in less than 4 days runtime from a single machine, for example. The main downside is that you need to write retrieval code for everything you want to grab. |
13:49
π
|
|
Pixi has joined #archiveteam-bs |
13:50
π
|
|
odemgi_ has joined #archiveteam-bs |
13:52
π
|
|
odemgi has quit IRC (Read error: Operation timed out) |
13:55
π
|
|
Sian1468 has quit IRC (Quit: Leaving) |
14:52
π
|
ivan_ |
hta high throughput archiver |
14:57
π
|
marked |
you can knock letters off the aard part ? |
14:58
π
|
ivan_ |
warc-damacy too obscure? |
14:59
π
|
ivan_ |
warcfast |
15:00
π
|
marked |
where does src live now? |
15:00
π
|
ivan_ |
https://www.onelook.com/reverse-dictionary.shtml?s=fast |
15:18
π
|
|
wp494 has quit IRC (Ping timeout: 492 seconds) |
15:19
π
|
|
wp494 has joined #archiveteam-bs |
17:00
π
|
|
Mateon1 has quit IRC (Read error: Operation timed out) |
17:02
π
|
|
Mateon1 has joined #archiveteam-bs |
17:40
π
|
|
Dj-Wawa has quit IRC (Quit: Connection closed for inactivity) |
19:02
π
|
Soni |
can you archive my website https://aam.autistic.space/ |
19:06
π
|
Soni |
it's at risk of going down by suicide |
19:15
π
|
schbirid |
jaartywarcfast |
19:15
π
|
schbirid |
JAA: yeah i had --youtube-dl |
19:15
π
|
schbirid |
let me try without |
19:16
π
|
|
ubahn has joined #archiveteam-bs |
19:19
π
|
schbirid |
yeah that fixed those |
19:24
π
|
|
ubahn has quit IRC (Quit: ubahn) |
19:30
π
|
|
Odd0002 has quit IRC (ZNC - http://znc.in) |
19:33
π
|
|
VerfiedJ has joined #archiveteam-bs |
19:36
π
|
|
Odd0002 has joined #archiveteam-bs |
19:37
π
|
|
Odd0002 has quit IRC (Client Quit) |
19:41
π
|
|
Odd0002 has joined #archiveteam-bs |
19:53
π
|
|
ubahn has joined #archiveteam-bs |
19:58
π
|
|
ubahn has quit IRC (Client Quit) |
22:33
π
|
|
BlueMax has joined #archiveteam-bs |
22:46
π
|
|
Dj-Wawa has joined #archiveteam-bs |
22:50
π
|
JAA |
I'm thinking something like Very Fast Archiver now (in reference to the Very Large Telescope). |
22:58
π
|
marked |
I like to know what makes techiniques make it different, makes it fast. does it use async io, multiprocess, twisted, http2, predicts URLs... etc |
22:59
π
|
psi |
flac files |
23:00
π
|
arkiver |
he pretty much explained it |
23:00
π
|
arkiver |
<JAA> It's based on the code I used for Storify, Outpost, AMO, and a few other sites. Python 3, asyncio/aiohttp/warcio, very high concurrency on a single machine, very low-level tool which doesn't do any link extraction or similar, just queue management in the form of items (think like tracker/warrior), retrieval, and writing to WARCs. |
23:00
π
|
arkiver |
<JAA> The main advantage over wpull, wget, & Co. is the higher throughput due to the lack of HTML parsing and other processing; I managed to retrieve roughly 100 million URLs from Outpost in less than 4 days runtime from a single machine, for example. The main downside is that you need to write retrieval code for everything you want to grab. |
23:00
π
|
arkiver |
to repeat |
23:00
π
|
arkiver |
also JAA: nice! |
23:01
π
|
marked |
so I meant async used to be in the name, now it's not. |
23:01
π
|
JAA |
asyncio and aiohttp for retrieval, multi-process support by coordination through an SQLite database. |
23:03
π
|
JAA |
It still needs some polishing to make it useful as a more general archival tool, but it's usable already. |
23:05
π
|
Jens |
https://www.reddit.com/r/Archiveteam/comments/a9j6en/textream_is_to_be_shut_down_within_the_month/ |
23:05
π
|
Jens |
Oh surprise, it's a Yahoo product. |
23:07
π
|
arkiver |
wooh |
23:07
π
|
arkiver |
thanks Jens! |
23:08
π
|
arkiver |
letΒ΄s make a channel :) |
23:11
π
|
Jens |
Drawing a blank on channel names. "Textream" doesn't lend itself well to puns. |
23:14
π
|
marked |
what's the opposite of ream ? |
23:15
π
|
Jens |
I only know the word from machining. |
23:15
π
|
Jens |
A reamer is for making very accurately sized holes in metal. |
23:16
π
|
marked |
.jp is known for butchering the engrish langauge |
23:17
π
|
Jens |
There should almost be a dedicated channel for Yahoo garbage fires. |
23:18
π
|
|
Odd0002 has quit IRC (ZNC - http://znc.in) |
23:18
π
|
JAA |
Wasn't there #woohoo or something? |
23:19
π
|
JAA |
Ah no, only the wiki page https://archiveteam.org/index.php?title=Woohoo |
23:21
π
|
|
Odd0002 has joined #archiveteam-bs |
23:29
π
|
|
twoTBHetz has joined #archiveteam-bs |