[00:03] Hello, is there a stronger and faster 'Website down?' detector (or website monitor websites?) than me using https://www.isitdownrightnow.com/ ? There's countless other 'Website down?' website detectors, but I'm not sure if they're any better or unsure on what info to compare it with~ [00:04] I ask in trying to at least improve my efforts of placing manual ignores on huge jobs in #archivebot that still need to be finished in terms of speed and reliability [00:04] downdetector etc tend to just rely on user reports [00:05] the best way to know is a) work out what you expect to see, b) work out how quickly you expect to see it, and compare to c) what you actually see/when you see it [00:10] You certain on it? I do both using a website monitor and checking the links myself; there's tens or hundreds of these kinds of services that do the same thing o.o; [00:11] Some are based on user reports, others ping or connect to the server in question. No idea which are reliable. But the real question here is: what are you trying to do? [00:11] Not sure what this has to do with ignores. [00:13] Well JAA, you mention to try and focus more on doing ignores on jobs that last longer than a day; I'm trying to finish 4e8nhc1c60bt0760g31o0pk9h faster and me throwing ignores on dead websites that seem to stall the pipeline with 'Connect timed out' and 'DNS resolution error' [00:14] Tis to make sure that it's not an AB problem that it got the error, but also when I check them myself [00:14] Right. Well, such ignores only really make much sense if there are a lot of URLs on the same host. Otherwise, it doesn't save much time. [00:14] (A lot = a few dozen or more) [00:15] And when I said to focus on older jobs, I meant mainly those with millions and millions of URLs in the queue. [00:40] *** manjaro-u has quit IRC (Quit: Konversation terminated!) [00:46] *** tech234a has quit IRC (Quit: Connection closed for inactivity) [01:05] *** Damme has joined #archiveteam-bs [01:32] *** VerifiedJ has quit IRC (Quit: Leaving) [01:33] *** jc86035 has quit IRC (Quit: Connection closed for inactivity) [02:34] https://reddit.com/r/DataHoarder/comments/e0sb8b/theeyeeu_we_now_host_the_largest_open_repo_of/ [02:43] *** Flashfire has quit IRC (Quit: The Lounge - https://thelounge.chat) [02:43] *** kiska has quit IRC (Quit: The Lounge - https://thelounge.chat) [02:46] *** kiska has joined #archiveteam-bs [02:46] *** Fusl__ sets mode: +o kiska [02:46] *** Fusl sets mode: +o kiska [02:46] *** Fusl_ sets mode: +o kiska [02:46] *** Flashfire has joined #archiveteam-bs [02:49] *** kiska18 has quit IRC (The Lounge - https://thelounge.chat) [02:49] *** Ryz has quit IRC (Quit: The Lounge - https://thelounge.chat) [02:49] *** Ryz has joined #archiveteam-bs [02:50] *** kiska18 has joined #archiveteam-bs [02:50] *** Fusl__ sets mode: +o kiska18 [02:50] *** Fusl sets mode: +o kiska18 [02:50] *** Fusl_ sets mode: +o kiska18 [03:04] *** tech234a has joined #archiveteam-bs [03:36] *** kiska18 has quit IRC (The Lounge - https://thelounge.chat) [03:36] *** Ryz has quit IRC (Quit: The Lounge - https://thelounge.chat) [03:36] *** kiska18 has joined #archiveteam-bs [03:36] *** Fusl__ sets mode: +o kiska18 [03:36] *** Fusl sets mode: +o kiska18 [03:36] *** Fusl_ sets mode: +o kiska18 [03:36] *** Ryz has joined #archiveteam-bs [03:50] I'll look into archiving the legacy GMC forums independently with qwarc as well since the AB job won't get everything in time. [03:51] any advice on how I could do that as well [03:55] Not really, I haven't written any documentation on qwarc yet. There are lots of things that need to be kept in mind when writing the spec file, plus bugs and quirks. I'd love to see others use it, but I'm not sure it's ready for that yet. [03:56] But if that doesn't scare you away, check out the code and my most recent uploads on IA (the -meta WARC contains the spec file and qwarc command). [03:58] I'm afraid I'm completely lacking on the time or brainpower to go that indepth [04:05] Hey JAA, did you managed to grab http://gb64.com/ ? [04:05] manage* [04:06] HP_Archiv: Yes, that ArchiveBot job finished. [04:06] Well, both of them. (One for the site, one for the forums.) [04:07] It's in the WBM starting from here: http://web.archive.org/web/20191116011644/http://gb64.com/ [04:08] Awesome. Thanks again for grabbing this ^^ [04:09] Looks like not everything is in the WBM yet, but it should be soon. [04:10] No worries [04:10] *** wp494 has quit IRC (Quit: LOUD UNNECESSARY QUIT MESSAGES) [04:17] *** wp494 has joined #archiveteam-bs [04:18] *** odemgi_ has joined #archiveteam-bs [04:21] *** qw3rty has joined #archiveteam-bs [04:22] do we have to do the ftp later or does AB do that as well? [04:22] *** odemgi has quit IRC (Read error: Operation timed out) [04:22] Which FTP? [04:24] HP_Archiv: Oh yeah, once the stuff is in the WBM, you should check whether everything in http://gb64.com/articles.php was archived. I have a feeling that it wasn't. Let me know if so and I'll look into it. [04:24] the gb64 links to roms on FTP [04:24] Ah, ftp://8bitfiles.net/gamebase_64/Games/ [04:24] No, that wasn't fetched. [04:25] 1) AB isn't very good with FTP. 2) There are certainly a *lot* of C64 collections on IA already. [04:26] *** qw3rty2 has quit IRC (Ping timeout: 745 seconds) [04:27] Noted ^^ [04:28] Does this mean since it can't work well with FTP that not everything will be archived from gb64? [04:28] But feel free to grab the data from that FTP, dedupe it against the IA collections, and upload the rest. [04:28] Is there more besides those ROMs? [04:29] If there are links directly to files, those should've been fetched. I don't know whether they'll be available in the WBM though. [04:31] Hm, honestly, I don't know how to do deduplication against the IA material [04:31] New to all of this so bear with me, heh [04:33] not knowing is ok, as long as you want to know [04:33] Essentially, you'd take every file from that FTP server and check whether it's somewhere on the Internet Archive. With filename changes etc., that'll be a lot of work of course. [04:33] Of course ^^ If given the tools, etc. Always willing to learn. [04:34] I figured it was going to be a manual process. There's no other way? [04:35] Well, it can partially be automated for sure. Locate the relevant items on IA, calculate some sort of hash for every ROM file in them (but some will be collections in huge zips etc., so have fun with that). Then calculate hashes for the files on the FTP server, and filter out all files whose hash appears in the IA list, then sort out the rest manually. [04:37] ^^ I think the first option of manually doing it might be less involved, heh [04:38] I'll have to set some time aside to do this. That's a lot of work. [04:38] *** eientei95 has quit IRC (Remote host closed the connection) [04:39] Yeah, it certainly would be a lot of work. [04:41] Of course, another way is to designate X number of people in here to certain directories, cut down the amount of time, etc [04:41] *** eientei95 has joined #archiveteam-bs [04:41] * JAA ducks out through the back door. [04:42] ^^ Lol [04:44] Oh [04:45] Looks like that entire directory is less than 700 MB. [04:45] Actually, I have a question. Since ftp was available for gb64, it made seeing what was there/available pretty easy. What do you do in the even that ftp isn't publicly accessible for a given site? [04:45] In that case, duplication isn't really a big issue. [04:46] *** mistym has quit IRC (Quit: ZNC - http://znc.in) [04:47] Looks like the entire server was grabbed a few years ago: https://archive.org/details/ftp-8bitfiles.net [04:48] I'm not sure I understand the question. How you discover content obviously depends entirely on how the server is set up. [04:52] Right, I'm asking does ftp make it easier to pull content? [04:55] Oh and if the entire server was grabbed previously, what does that mean for the new archive job you just finished up - a second version will eventually be on as a separate object? [05:03] If "content" means "files", then yes, FTP is definitely the best option. That's what it was designed for after all. The reason why it's not so great with ArchiveBot is simply bugs in the underlying tool that let it crash on any kind of error, e.g. a disconnect. [05:03] I did not grab anything from the FTP server. [05:30] *** Yurume has quit IRC (Quit: No Ping reply in 180 seconds.) [05:32] *** Yurume has joined #archiveteam-bs [05:40] Yes, I meant files ^^ [05:40] And okay, thank you for the explanation [05:40] *** HP_Archiv has quit IRC (Quit: Leaving) [05:43] *** mistym has joined #archiveteam-bs [05:49] *** mistym has quit IRC (Quit: ZNC - http://znc.in) [06:12] *** raeyulca has quit IRC (Ping timeout: 252 seconds) [06:15] *** raeyulca has joined #archiveteam-bs [06:22] *** mistym has joined #archiveteam-bs [06:24] *** tech234a has quit IRC (Quit: Connection closed for inactivity) [06:53] *** programme has joined #archiveteam-bs [06:55] *** SmileyG has joined #archiveteam-bs [06:57] *** bluefoo has quit IRC (se.hub efnet.portlane.se) [06:57] *** TC01 has quit IRC (se.hub efnet.portlane.se) [06:57] *** mls has quit IRC (se.hub efnet.portlane.se) [06:57] *** klg has quit IRC (se.hub efnet.portlane.se) [06:57] *** prq has quit IRC (se.hub efnet.portlane.se) [06:57] *** SketchCow has quit IRC (se.hub efnet.portlane.se) [06:57] *** Smiley has quit IRC (se.hub efnet.portlane.se) [06:57] *** Gfy has quit IRC (se.hub efnet.portlane.se) [06:57] *** VoynichCr has quit IRC (se.hub efnet.portlane.se) [06:57] *** luckcolor has quit IRC (se.hub efnet.portlane.se) [06:57] *** Deewiant has quit IRC (se.hub efnet.portlane.se) [06:57] *** yuitimoth has quit IRC (se.hub efnet.portlane.se) [06:57] *** mc2 has quit IRC (se.hub efnet.portlane.se) [06:57] *** sHATNER has quit IRC (se.hub efnet.portlane.se) [06:57] *** Laverne_ has quit IRC (se.hub efnet.portlane.se) [06:57] *** Shen has quit IRC (se.hub efnet.portlane.se) [06:58] *** TC01_ has joined #archiveteam-bs [06:58] *** SketchCo1 has joined #archiveteam-bs [06:58] *** Fusl sets mode: +o SketchCo1 [06:58] *** Fusl__ sets mode: +o SketchCo1 [06:58] *** Fusl_ sets mode: +o SketchCo1 [06:59] *** MC2Square has joined #archiveteam-bs [06:59] *** bluefoo_ has joined #archiveteam-bs [07:03] *** sHATNER_ has joined #archiveteam-bs [07:11] *** klg_ has joined #archiveteam-bs [07:12] *** luckcolor has joined #archiveteam-bs [07:12] *** MC2Square is now known as mc2 [07:38] *** mistym has quit IRC (Quit: ZNC - http://znc.in) [07:40] *** Gfy has joined #archiveteam-bs [07:41] *** mls has joined #archiveteam-bs [07:42] *** VoynichCr has joined #archiveteam-bs [07:46] *** tech234a has joined #archiveteam-bs [07:47] *** mistym has joined #archiveteam-bs [07:58] *** Laverne_ has joined #archiveteam-bs [08:12] *** mistym has quit IRC (Quit: ZNC - http://znc.in) [08:14] *** mistym has joined #archiveteam-bs [08:27] *** mistym has quit IRC (Quit: ZNC - http://znc.in) [08:35] *** jc86035 has joined #archiveteam-bs [08:55] *** mistym has joined #archiveteam-bs [09:17] *** BlueMaxim has joined #archiveteam-bs [09:17] *** BlueMax has quit IRC (Read error: Connection reset by peer) [09:54] *** tech234a has quit IRC (Quit: Connection closed for inactivity) [10:44] *** jc86035 has quit IRC (Quit: Connection closed for inactivity) [11:15] *** ShellyRol has quit IRC (Read error: Connection reset by peer) [11:21] *** ShellyRol has joined #archiveteam-bs [12:08] *** benjinss has joined #archiveteam-bs [12:09] *** h3ndr1k_ is now known as h3ndr1k [12:13] *** benjinsmi has quit IRC (Read error: Operation timed out) [13:23] *** wp494 has quit IRC (Read error: Operation timed out) [13:24] *** wp494 has joined #archiveteam-bs [13:49] *** BlueMaxim has quit IRC (Quit: Leaving) [13:52] *** killsushi has quit IRC (Quit: Leaving) [14:09] *** odemgi has joined #archiveteam-bs [14:15] *** odemgi_ has quit IRC (Read error: Operation timed out) [14:27] *** Damme has quit IRC (Read error: Connection reset by peer) [14:44] *** SketchCo1 is now known as SketchCow [14:45] *** benjinss has quit IRC (Quit: Leaving) [14:46] *** benjins has joined #archiveteam-bs [14:52] *** kiskaWee has joined #archiveteam-bs [14:53] *** kiska sets mode: +o kiskaWee [15:16] *** godane has joined #archiveteam-bs [15:47] *** britmob has quit IRC (Read error: Connection reset by peer) [15:48] *** britmob has joined #archiveteam-bs [16:24] *** DigiDigi has quit IRC (Quit: Leaving) [16:30] *** DigiDigi has joined #archiveteam-bs [16:55] *** Damme has joined #archiveteam-bs [17:12] *** HP_Archiv has joined #archiveteam-bs [17:16] *** sHATNER_ is now known as sHATNER [17:17] *** Fusl has quit IRC (Read error: Operation timed out) [17:18] *** Fusl has joined #archiveteam-bs [17:18] *** Fusl__ sets mode: +o Fusl [17:18] *** Fusl_ sets mode: +o Fusl [17:19] *** kyledrake has quit IRC (Read error: Operation timed out) [17:19] *** kyledrake has joined #archiveteam-bs [17:20] *** icedice has joined #archiveteam-bs [17:46] Twitter grab question. I'm trying to archive this entire Twitter post w/thread comments. I've captured it into WBM, but when viewing the captured page in WMB it's only showing several comments out of the comment thread [17:46] https://web.archive.org/web/20191105002320/https:/twitter.com/GagReathle/status/1191509241259077632 [17:46] Actual live page: https://twitter.com/GagReathle/status/1191474201640755201 [17:46] Thoughts? [17:50] *** SmileyG has quit IRC (Quit: http://www.milkme.co.uk - You'll never understand.) [17:51] *** Smiley has joined #archiveteam-bs [18:13] *** X-Scale` has joined #archiveteam-bs [18:20] *** X-Scale has quit IRC (Read error: Operation timed out) [18:20] *** X-Scale` is now known as X-Scale [18:55] I also need to some expertise on archiving an old Flash based site. The url is: pdl.warnerbros.com/harrypotter/us/yuleball/swf/yb-main.swf [18:55] But modern browsers will not open it, and instead will force a download/save as [18:56] Opening in Edge browser, however, actually displays the old Flash based site correctly w/audio. Is there anyway to capture this site 'as is' with Flash enabled or is this not possible? [19:14] Just a note- Opera opens it fine. I can play it. [19:17] try webrecorder then webrecorder player https://webrecorder.io/ [19:46] *** X-Scale` has joined #archiveteam-bs [19:47] Used Bandicam and did a screen recording with internal audio. So I did capture it that way [19:47] *** X-Scale has quit IRC (Ping timeout: 252 seconds) [19:47] I forgot about Webrecorder.io, thank you [19:48] *** X-Scale` is now known as X-Scale [19:50] Oh and thanks for testing in Operat, britmob. I only tested in Chrome/Firefox before trying Edge [19:50] Opera* [20:09] *** raeyulca has quit IRC (Ping timeout: 496 seconds) [20:13] i'm at 1784k items now [20:21] *** raeyulca has joined #archiveteam-bs [21:07] *** raeyulca has quit IRC (Ping timeout: 360 seconds) [21:24] *** raeyulca has joined #archiveteam-bs [22:07] *** zino has joined #archiveteam-bs [22:18] *** godane has quit IRC (Quit: Leaving.) [22:47] *** godane has joined #archiveteam-bs [22:48] *** BlueMax has joined #archiveteam-bs [22:50] Legacy GMC forum grab running now. [23:27] !a https://youtu.be/swc7TqOfcyY [23:27] oops wrong room [23:41] *** Jon has quit IRC (Quit: ZNC - http://znc.in) [23:41] *** Jon has joined #archiveteam-bs [23:42] *** thejsa_ has quit IRC (Quit: No Ping reply in 180 seconds.) [23:42] *** jsa has joined #archiveteam-bs