#archiveteam-bs 2020-02-12,Wed

↑back Search

Time Nickname Message
00:00 🔗 er1sian At the moment, I think the best approach for peertube.video would be using their API to list all accounts, get all their videos and use my yt-dl pull request (yt-dl currently have an incomplete PeerTube extractor) with TubeUp, and save all the webpages into IA wayback machine to keep the public metadata.
00:00 🔗 er1sian Critique/comments wanted
00:01 🔗 Mateon1 has quit IRC (Ping timeout: 255 seconds)
00:01 🔗 Mateon1 has joined #archiveteam-bs
00:04 🔗 Ctrl has joined #archiveteam-bs
00:06 🔗 AlsoJAA Reddit as a whole is archivable. Individual subreddits or users require you to first build an index of entire Reddit. The same thing applies to a user's saved, upvoted, etc. It's due to how Reddit stores the data internally.
00:08 🔗 AlsoJAA Twitter's better in a sense because at least the search works. Finding old retweets is impossible though as far as I know.
00:08 🔗 AlsoJAA Anyway, for Reddit: #shreddit
00:08 🔗 er1sian has quit IRC (Read error: Operation timed out)
00:11 🔗 AlsoJAA er1sian: We won't archive anything from the Fediverse unless the operators of the affected instance ask us to.
00:24 🔗 er1sian has joined #archiveteam-bs
00:25 🔗 er1sian AFAIK owner is AWOL, so I doubt they will ask. Out of curiosity, why does archiveteam wait for on owner's request when it comes to Fediverse sites? If they never ask, then it all the user data gets lost?
00:26 🔗 er1sian I understand if its to help them do a clean handover, but that doesn't look like it will happen
00:32 🔗 nicolas17 has quit IRC (Read error: Connection reset by peer)
00:35 🔗 robogoat has quit IRC (Read error: Operation timed out)
00:35 🔗 robogoat has joined #archiveteam-bs
00:36 🔗 nicolas17 has joined #archiveteam-bs
00:37 🔗 wp494 er1sian: some jackass decided to get Really Pissed Off that the custodian of their data was going to change hands and wrote a big blog post about how we're all evil SOBs
00:38 🔗 cppchrisc has quit IRC (Read error: Operation timed out)
00:38 🔗 wp494 clearly they missed those assemblies in school where the local police/child protection advocates/etc come in and say "if you don't want it stored forever on somebody's disk somewhere don't post it to begin with" by the way they wrote the whole thing
00:38 🔗 benjinsmi has joined #archiveteam-bs
00:38 🔗 wp494 eventually it hit Jason and then Jason told us to not bother unless a Fedi host/operator explicitly reaches out and says "ARCHIVE THIS THING BECAUSE I'M SHUTTING IT DOWN"
00:39 🔗 britmob_ has joined #archiveteam-bs
00:39 🔗 Datechnom has quit IRC (Ping timeout: 496 seconds)
00:40 🔗 Ryz has quit IRC (Read error: Operation timed out)
00:40 🔗 mistym has quit IRC (Read error: Operation timed out)
00:41 🔗 cppchrisc has joined #archiveteam-bs
00:41 🔗 cppchrisc has quit IRC (Connection closed)
00:41 🔗 cf has quit IRC (Read error: Operation timed out)
00:42 🔗 er1sian Ahh, that's a pain. I can understand why a Fediverse user would dislike data permanence but its on them. I assume they didn't even try to ask you all to delete their content.
00:42 🔗 er1sian I might just try to contact the most popular users and ask if they'd like my help saving/migrating their channels.
00:42 🔗 er1sian Thanks for explaining :)
00:42 🔗 er1sian I'll go look for the blog and get mad for a while
00:42 🔗 er1sian has left Leaving
00:42 🔗 nyany_ has quit IRC (Read error: Operation timed out)
00:42 🔗 Larsenv has quit IRC (Read error: Operation timed out)
00:43 🔗 cppchrisc has joined #archiveteam-bs
00:43 🔗 benjinss has quit IRC (Ping timeout: 496 seconds)
00:43 🔗 Lord_Nigh has quit IRC (Ping timeout: 496 seconds)
00:43 🔗 svchfoo3 has quit IRC (Ping timeout: 496 seconds)
00:43 🔗 mistym has joined #archiveteam-bs
00:43 🔗 Lord_Nigh has joined #archiveteam-bs
00:44 🔗 britmob has quit IRC (Ping timeout: 496 seconds)
00:46 🔗 ctrl_ has quit IRC (Read error: Operation timed out)
00:47 🔗 PurpleSym has quit IRC (Write error: Broken pipe)
00:50 🔗 PurpleSym has joined #archiveteam-bs
00:51 🔗 svchfoo1 sets mode: +o PurpleSym
00:51 🔗 Lord_Nigh has quit IRC (Read error: Operation timed out)
00:51 🔗 balrog has quit IRC (Read error: Operation timed out)
00:52 🔗 balrog has joined #archiveteam-bs
00:54 🔗 paul2520 has quit IRC (Read error: Operation timed out)
00:54 🔗 robogoat has quit IRC (Write error: Broken pipe)
00:54 🔗 dxrt_ has quit IRC (Read error: Operation timed out)
00:54 🔗 NIC007a83 has quit IRC (Remote host closed the connection)
00:54 🔗 Lord_Nigh has joined #archiveteam-bs
00:54 🔗 robogoat has joined #archiveteam-bs
00:54 🔗 equant has quit IRC (Read error: Operation timed out)
00:54 🔗 keith20 has quit IRC (Read error: Operation timed out)
00:54 🔗 wabu has quit IRC (Read error: Operation timed out)
00:54 🔗 Dj-Wawa has quit IRC (Read error: Operation timed out)
00:54 🔗 chaz has quit IRC (Read error: Operation timed out)
00:54 🔗 fredgido has joined #archiveteam-bs
00:54 🔗 Dj-Wawa has joined #archiveteam-bs
00:54 🔗 ctrl_ has joined #archiveteam-bs
00:54 🔗 dxrt_ has joined #archiveteam-bs
00:54 🔗 dxrt sets mode: +o dxrt_
00:54 🔗 wabu has joined #archiveteam-bs
00:54 🔗 keith20 has joined #archiveteam-bs
00:54 🔗 asdf0101 has quit IRC (Read error: Connection reset by peer)
00:54 🔗 systwi_ has joined #archiveteam-bs
00:54 🔗 jake_test has quit IRC (Read error: Connection reset by peer)
00:54 🔗 asdf0101 has joined #archiveteam-bs
00:55 🔗 paul2520 has joined #archiveteam-bs
00:55 🔗 jake_test has joined #archiveteam-bs
00:55 🔗 gandalf has quit IRC (Ping timeout: 622 seconds)
00:55 🔗 klg_ has joined #archiveteam-bs
00:55 🔗 gandalf has joined #archiveteam-bs
00:55 🔗 NIC007a83 has joined #archiveteam-bs
00:55 🔗 odemgi has quit IRC (Read error: Operation timed out)
00:55 🔗 Larsenv has joined #archiveteam-bs
00:55 🔗 klg has quit IRC (Read error: Operation timed out)
00:56 🔗 Flashfire has quit IRC (Remote host closed the connection)
00:56 🔗 kiska has quit IRC (Read error: Connection reset by peer)
00:57 🔗 Flashfire has joined #archiveteam-bs
00:58 🔗 jake_test has quit IRC (Read error: Operation timed out)
00:58 🔗 gtwy has quit IRC (Read error: Operation timed out)
01:00 🔗 fredgido_ has quit IRC (Read error: Operation timed out)
01:00 🔗 systwi has quit IRC (Ping timeout: 622 seconds)
01:00 🔗 cf has joined #archiveteam-bs
01:01 🔗 paul2520 has quit IRC (Read error: Operation timed out)
01:03 🔗 logchfoo2 starts logging #archiveteam-bs at Wed Feb 12 01:03:22 2020
01:03 🔗 logchfoo2 has joined #archiveteam-bs
01:03 🔗 Kenshin has joined #archiveteam-bs
01:04 🔗 Auctus has joined #archiveteam-bs
01:04 🔗 Ravenloft has quit IRC (Read error: Operation timed out)
01:04 🔗 Raccoon has quit IRC (Ping timeout: 622 seconds)
01:04 🔗 Raccoon` is now known as Raccoon
01:04 🔗 Ravenloft has joined #archiveteam-bs
01:05 🔗 fredgido has quit IRC (Remote host closed the connection)
01:05 🔗 systwi_ has quit IRC (Read error: Operation timed out)
01:05 🔗 Auctus_ has quit IRC (Read error: Operation timed out)
01:06 🔗 odemgi has joined #archiveteam-bs
01:06 🔗 chaz has joined #archiveteam-bs
01:06 🔗 odemgi_ has joined #archiveteam-bs
01:07 🔗 wp494 has quit IRC (Read error: Operation timed out)
01:07 🔗 fredgido has joined #archiveteam-bs
01:07 🔗 britmob_ has quit IRC (Read error: Connection reset by peer)
01:07 🔗 wp494 has joined #archiveteam-bs
01:07 🔗 Yurume has quit IRC (Read error: Connection reset by peer)
01:09 🔗 Yurume has joined #archiveteam-bs
01:09 🔗 equant has joined #archiveteam-bs
01:09 🔗 paul2520 has joined #archiveteam-bs
01:09 🔗 britmob_ has joined #archiveteam-bs
01:20 🔗 odemgi has quit IRC (Read error: Operation timed out)
01:37 🔗 Ryz has joined #archiveteam-bs
01:37 🔗 svchfoo3 has joined #archiveteam-bs
01:37 🔗 svchfoo1 sets mode: +o svchfoo3
01:38 🔗 nyany_ has joined #archiveteam-bs
01:47 🔗 kiska has joined #archiveteam-bs
01:47 🔗 svchfoo3 sets mode: +o kiska
01:47 🔗 svchfoo1 sets mode: +o kiska
01:50 🔗 Datechnom has joined #archiveteam-bs
02:10 🔗 Ravenloft has quit IRC (Ping timeout: 360 seconds)
02:10 🔗 Ravenloft has joined #archiveteam-bs
02:11 🔗 bsmith093 has quit IRC (Ping timeout: 615 seconds)
02:13 🔗 HP_Archiv has joined #archiveteam-bs
02:38 🔗 asdf0101 has quit IRC (The Lounge - https://thelounge.chat)
02:40 🔗 asdf0101 has joined #archiveteam-bs
03:00 🔗 BlueMax has quit IRC (Read error: Connection reset by peer)
03:35 🔗 thuban2 has joined #archiveteam-bs
03:38 🔗 thuban1 has quit IRC (Ping timeout: 255 seconds)
03:38 🔗 bsmith093 has joined #archiveteam-bs
03:51 🔗 DogsRNice has quit IRC (Read error: Connection reset by peer)
03:56 🔗 bsmith093 has quit IRC (Quit: Leaving.)
04:02 🔗 Smiley has quit IRC (Ping timeout: 255 seconds)
04:14 🔗 BlueMax has joined #archiveteam-bs
04:18 🔗 RichardG has quit IRC (Quit: Keyboard not found, press F1 to continue)
04:18 🔗 odemgi_ has quit IRC (Ping timeout: 246 seconds)
04:21 🔗 RichardG has joined #archiveteam-bs
04:28 🔗 bsmith093 has joined #archiveteam-bs
04:36 🔗 qw3rty_ has joined #archiveteam-bs
04:40 🔗 qw3rty has quit IRC (Ping timeout: 276 seconds)
04:45 🔗 Smiley has joined #archiveteam-bs
05:16 🔗 HP_Archiv has quit IRC (Ping timeout: 276 seconds)
05:16 🔗 HP_Archiv has joined #archiveteam-bs
05:49 🔗 d5f4a3622 has quit IRC (Read error: Connection reset by peer)
05:50 🔗 d5f4a3622 has joined #archiveteam-bs
06:02 🔗 Flashfire has quit IRC (Remote host closed the connection)
06:02 🔗 kiska has quit IRC (Remote host closed the connection)
06:02 🔗 kiska has joined #archiveteam-bs
06:02 🔗 svchfoo3 sets mode: +o kiska
06:03 🔗 svchfoo1 sets mode: +o kiska
06:03 🔗 Flashfire has joined #archiveteam-bs
06:08 🔗 ranma_ has joined #archiveteam-bs
06:20 🔗 ranma has quit IRC (Ping timeout: 745 seconds)
06:29 🔗 odemgi has joined #archiveteam-bs
06:44 🔗 nicolas17 has quit IRC (Ping timeout: 745 seconds)
06:58 🔗 thuban2 has quit IRC (Read error: Operation timed out)
06:59 🔗 thuban2 has joined #archiveteam-bs
07:10 🔗 HP_Archiv has quit IRC (Ping timeout: 276 seconds)
07:13 🔗 HP_Archiv has joined #archiveteam-bs
07:34 🔗 superkuh has quit IRC (Read error: Operation timed out)
07:35 🔗 superkuh has joined #archiveteam-bs
08:26 🔗 luckcolor has quit IRC (Read error: Operation timed out)
08:29 🔗 luckcolor has joined #archiveteam-bs
08:34 🔗 RichardG_ has joined #archiveteam-bs
08:34 🔗 RichardG has quit IRC (Read error: Connection reset by peer)
09:22 🔗 Mayonaise has quit IRC (Read error: Operation timed out)
09:44 🔗 BlueMax has quit IRC (Quit: Leaving)
09:54 🔗 Smiley has quit IRC (Ping timeout: 496 seconds)
10:09 🔗 VerifiedJ has joined #archiveteam-bs
10:22 🔗 Smiley has joined #archiveteam-bs
10:29 🔗 mtntmnky has quit IRC (Remote host closed the connection)
10:30 🔗 mtntmnky has joined #archiveteam-bs
10:30 🔗 SmileyG has joined #archiveteam-bs
10:41 🔗 Smiley has quit IRC (Ping timeout: 745 seconds)
11:05 🔗 HP_Archiv has quit IRC (Quit: Leaving)
11:25 🔗 bitbit has joined #archiveteam-bs
11:25 🔗 bitbit dxrt: hi :)
11:27 🔗 dxrt Hello
11:27 🔗 dxrt So we grabbed the full site and it is viewable in the wayback machine and the WARCs are also available if interested.
11:28 🔗 d5f4a3622 has quit IRC (Quit: https://i.imgur.com/xacQ09F.mp4)
11:30 🔗 mtntmnky has quit IRC (Remote host closed the connection)
11:30 🔗 bitbit cool! I couldn't find it via archive.org search for "botbot" and neither web.archive.org/cdx/search also returns too few results to be the full record. can you link me to the WARCs?
11:30 🔗 mtntmnky has joined #archiveteam-bs
11:32 🔗 dxrt https://archive.fart.website/archivebot/viewer/job/6afwa and https://archive.fart.website/archivebot/viewer/job/6egkw - the latter being more recent and without off-site links.
11:36 🔗 d5f4a3622 has joined #archiveteam-bs
11:36 🔗 britmob_ has quit IRC (Read error: Connection reset by peer)
11:36 🔗 britmob has joined #archiveteam-bs
11:39 🔗 bitbit dxrt: that's amazing thank you! I haven't tried to open WARC files yet even though I read about them generally. I should get all the files in this folder yes? and then combine them + use some sort of a WARC cli tool?
11:43 🔗 dxrt Yeah get them all. I usually just extract them with a generic unarchiving tool but something like https://github.com/chfoo/warcat will probably work better. There's a whole list of relevant software here https://www.archiveteam.org/index.php?title=The_WARC_Ecosystem. I gotta run off, but someone else should be able to assist.
11:44 🔗 bitbit dxrt: thanks again
12:05 🔗 NIC007a83 has quit IRC (Ping timeout: 745 seconds)
12:06 🔗 NIC007a83 has joined #archiveteam-bs
12:26 🔗 Dragnog2 has joined #archiveteam-bs
13:13 🔗 eythian has quit IRC (Ping timeout: 246 seconds)
13:31 🔗 eythian has joined #archiveteam-bs
13:33 🔗 trumad has joined #archiveteam-bs
13:33 🔗 trumad AlsoJAA: hey ho
13:58 🔗 n00b151_ has joined #archiveteam-bs
14:13 🔗 n00b151_ has quit IRC (Ping timeout: 260 seconds)
14:26 🔗 thuban3 has joined #archiveteam-bs
14:29 🔗 thuban2 has quit IRC (Ping timeout: 255 seconds)
14:35 🔗 equant has quit IRC (Read error: Connection reset by peer)
14:38 🔗 equant has joined #archiveteam-bs
14:39 🔗 bitbit which command should I invoke with warcat to output the files inside the WARCs here? https://archive.fart.website/archivebot/viewer/job/6egkw
14:39 🔗 bitbit and on which file out of the parts should I invoke the command? maybe on botbot.me-inf-20181016-112202-6egkw-meta.warc.gz ?
14:44 🔗 Yurume_ has joined #archiveteam-bs
14:47 🔗 jake_test has quit IRC (Read error: Operation timed out)
14:47 🔗 NIC007a83 has quit IRC (Remote host closed the connection)
14:48 🔗 kiska has quit IRC (Read error: Operation timed out)
14:48 🔗 gtwy has quit IRC (Read error: Operation timed out)
14:49 🔗 gtwy has joined #archiveteam-bs
14:49 🔗 antomatic has joined #archiveteam-bs
14:49 🔗 systwi_ has joined #archiveteam-bs
14:49 🔗 Yurume has quit IRC (Read error: Operation timed out)
14:49 🔗 kiska has joined #archiveteam-bs
14:50 🔗 NIC007a83 has joined #archiveteam-bs
14:50 🔗 svchfoo1 sets mode: +o kiska
14:50 🔗 svchfoo3 sets mode: +o kiska
14:56 🔗 systwi has quit IRC (Read error: Operation timed out)
14:58 🔗 antomati_ has quit IRC (Read error: Operation timed out)
15:02 🔗 AlsoJAA bitbit: The -meta.warc.gz only contains the log of the retrieval. The actual data is in the numbered ones.
15:11 🔗 nicolas17 has joined #archiveteam-bs
15:15 🔗 bitbit AlsoJAA: thanks! I think I figured it out. first I do "cat <file0001> <file0002> ... > final.warc.gz" and then maybe "warcat extract final.warc.gz --output-dir ./final --progress"?
15:16 🔗 jake_test has joined #archiveteam-bs
15:19 🔗 nicolas17 can you just cat multiple .gz files together?
15:20 🔗 bitbit stackoverflow says yes
15:22 🔗 AlsoJAA Yes, you can.
15:23 🔗 AlsoJAA bitbit: I think you can also do `warcat extract file0.warc.gz file1.warc.gz ...`, but not entirely sure.
15:23 🔗 AlsoJAA To avoid copying around stuff needlessly.
15:24 🔗 bitbit interesting I will give it a try next
15:34 🔗 bitbit yes! it works. so great
16:10 🔗 JAA has joined #archiveteam-bs
16:10 🔗 AlsoJAA sets mode: +o JAA
16:13 🔗 mtntmnky has quit IRC (Remote host closed the connection)
16:14 🔗 mtntmnky has joined #archiveteam-bs
16:16 🔗 nicolas17 has quit IRC (Quit: Konversation terminated!)
16:45 🔗 systwi has joined #archiveteam-bs
16:52 🔗 systwi_ has quit IRC (Ping timeout: 622 seconds)
17:11 🔗 atphoenix has quit IRC (Read error: Connection reset by peer)
17:15 🔗 atphoenix has joined #archiveteam-bs
17:44 🔗 VerifiedJ has quit IRC (Read error: Connection reset by peer)
18:46 🔗 Dragnog2 has quit IRC (Quit: Connection closed for inactivity)
19:55 🔗 bitbit JJA, AlsoJAA: is it possible to get the size of the resulting files before I begin the extraction?
19:57 🔗 vesi has joined #archiveteam-bs
19:59 🔗 JAA bitbit: A decent estimate would be the size of the decompressed data. Something like `zcat *.warc.gz | wc -c`.
20:00 🔗 JAA (gzip files have the decompressed size at the end of the data, but that's modulo 2 or 4 GiB.)
20:08 🔗 JAA vesi: Do you have a reference for those forums closing? All I see is "If you like, you can still explore archived discussion in these forums." etc.
20:09 🔗 JAA Looks like they were made read-only in April 2017.
20:12 🔗 OrIdow6 JAA: https://i.imgur.com/ZLCdyse.png is what I see when I go there\
20:13 🔗 OrIdow6 After I turn JS on
20:13 🔗 JAA Ah, yeah.
20:13 🔗 bitbit JAA: from what I can tell zcat just prints the uncompressed data? so its like I'm doing uncompress no?
20:13 🔗 kpcyrd I want to throw in #forum76 as a channel name
20:13 🔗 JAA Because JS is totally needed for displaying an error like that. Ugh.
20:13 🔗 Ryz A week's worth of notice D:
20:14 🔗 JAA bitbit: Correct.
20:31 🔗 thuban3 oh man the glossy graphics
20:35 🔗 TC01 has quit IRC (Read error: Operation timed out)
20:36 🔗 TC01 has joined #archiveteam-bs
20:47 🔗 JAA Response time from those forums is horrid.
20:48 🔗 vesi Hey all, I'm here now. I got pinged on Discord by another member. Just thought to pass on the message to anyone I knew.
20:49 🔗 thuban3 it's in archivebot now
20:50 🔗 vesi Thank you. As a noob, some noob questions:
20:51 🔗 vesi Does that mean that archivebot has queued a job to archive the forums? If so, will everything be archived, or only n-levels from root? Is one week enough time for the forums to be archived?
20:51 🔗 JAA Average response time across ~200 requests: 1190 ms. Eww.
20:51 🔗 hook54321 it's not enough time
20:51 🔗 vesi Gross. Can I ask that we prioritize a certain subforum? Just a hunch.
20:51 🔗 JAA A job is running for the forums in ArchiveBot now. It won't finish in time, which is why I'm looking into alternative ways. But with this performance of their servers, well...
20:53 🔗 vesi Given the nature of the community that sent out the alert that these forums are getting taken offline, I think this subforum is the one of most interest to the most people: http://forums.bethsoft.com/forum/16-elder-scrolls-lore/
20:54 🔗 JAA No way to prioritise this with my method of grabbing (bruteforcing topic IDs). Possible in principle though.
20:55 🔗 hook54321 JAA: there's a sitemap
20:56 🔗 JAA Let's see what happens if I throw more connections at it.
20:57 🔗 JAA hook54321: Ah, right, I always ignore those. :-P
20:57 🔗 hook54321 is the response time by chance better over http? I've come across a couple of sites that are like that for some reason
20:58 🔗 JAA I am using HTTP.
20:58 🔗 hook54321 oh
20:58 🔗 JAA Seems pretty similar on HTTPS.
20:59 🔗 JAA 1884 requests, 3153 ms avg response time :-|
20:59 🔗 vesi Another noob question for myself and for anyone who asks: will the public be able to access the archived version? Will it be accessible via the IA's WaybackMachine, or though other means?
20:59 🔗 JAA vesi: Everything we archive goes into the WBM.
20:59 🔗 JAA (The exception confirms the rule.)
21:00 🔗 thuban3 worth trying to contact admins and ask for enough time for archivebot to finish?
21:00 🔗 astrid unlike most other WBM collections, our archives can also be downloaded and converted into a zip file by anyone :)
21:01 🔗 vesi Ah that's great to hear!
21:01 🔗 thuban3 a couple of the admins are listed as last active in late january (there's a contact email address but i don't know whether it's still monitored)
21:03 🔗 vesi So I know you are brute-forcing topic ids, but question — the threads here are paginated. Will the bot be able to catch all the pages, or only the first one?
21:06 🔗 hook54321 thuban3: I'm conflicted about whether or not to try to contact them, that could just make things worse. It largely depends if they're the kind of company that sends out legal threats willy nilly.
21:07 🔗 thuban3 vesi: archive bot is a crawler; it finds pages to save by following links from the root node. i believe JAA was referring to "alternative ways" (we often use that type of brute-forcing in archive jobs where crawls don't apply well)
21:08 🔗 thuban3 so yes, it is paginated
21:08 🔗 thuban3 you can see the urls being archived on the dashboard at http://dashboard.at.ninjawedding.org/3 (click the button in the lower left of a job to show just that one)
21:10 🔗 JAA Starting to see some timeouts, and the response time is increasing as well.
21:11 🔗 JAA 4.3 s by now. :-|
21:12 🔗 vesi Speaking as a webdev, throwing more connections at it might be making it choke,.
21:12 🔗 JAA Can happen, yes.
21:12 🔗 vesi Thank you for the info!
21:13 🔗 JAA It really depends on what causes it to be this slow. If it's somehow related to network lag, for example, but not throughput-limited, more connections can help despite horrible response times.
21:13 🔗 JAA Doesn't seem to be the case here though.
21:14 🔗 hook54321 i mean, it's only set to 6 concurrent connections.
21:14 🔗 JAA I'm talking about my qwarc run. ArchiveBot is almost never fast enough to kill web servers.
21:14 🔗 hook54321 ah
21:14 🔗 hook54321 would pausing or aborting the archivebot job help?
21:15 🔗 JAA Nope, probably won't make a difference.
21:15 🔗 JAA I'm doing ~22 requests per second.
21:15 🔗 JAA So 10+ times faster than AB.
21:17 🔗 vesi I'm guessing that over the past 2-3 years, since the forums went into read-only mode, they probably moved them to smaller servers to match reduced activity. So it's probably a pretty small infrastructure that's supporting the current crawl.
21:23 🔗 JAA Seems to have stabilised at just over 4 second response time.
21:30 🔗 hook54321 vesi: some of them redirect to the new site already. http://forums.bethsoft.com/forum/259-wolfenstein/
21:30 🔗 Flashfire has quit IRC (Remote host closed the connection)
21:30 🔗 kiska has quit IRC (Remote host closed the connection)
21:30 🔗 kiska has joined #archiveteam-bs
21:31 🔗 vesi Hmm that's alright. No big loss there.
21:31 🔗 svchfoo1 sets mode: +o kiska
21:31 🔗 svchfoo3 sets mode: +o kiska
21:31 🔗 vesi Honestly, I think the most important subforum to archive is probably http://forums.bethsoft.com/forum/16-elder-scrolls-lore/
21:31 🔗 vesi Some of the original writers for the games posted there + participated in forum role-play
21:32 🔗 vesi It became a kind of cornerstone of lore for the fandom, etc.
21:56 🔗 HP_Archiv has joined #archiveteam-bs
22:58 🔗 OrIdow6 has quit IRC (Quit: Leaving.)
23:06 🔗 JAA Response time has decreased to ~3 seconds in the last 2 hours. I should be able to comfortably grab all topics in time if it stays like that.
23:13 🔗 OrIdow6 has joined #archiveteam-bs
23:13 🔗 BlueMax has joined #archiveteam-bs
23:35 🔗 RichardG_ is now known as RichardG
23:48 🔗 vesi That's great to hear. Thank you for jumping on the ball for this.

irclogger-viewer