#archiveteam-bs 2019-11-24,Sun

↑back Search

Time Nickname Message
00:03 🔗 Ryz Hello, is there a stronger and faster 'Website down?' detector (or website monitor websites?) than me using https://www.isitdownrightnow.com/ ? There's countless other 'Website down?' website detectors, but I'm not sure if they're any better or unsure on what info to compare it with~
00:04 🔗 Ryz I ask in trying to at least improve my efforts of placing manual ignores on huge jobs in #archivebot that still need to be finished in terms of speed and reliability
00:04 🔗 Kaz downdetector etc tend to just rely on user reports
00:05 🔗 Kaz the best way to know is a) work out what you expect to see, b) work out how quickly you expect to see it, and compare to c) what you actually see/when you see it
00:10 🔗 Ryz You certain on it? I do both using a website monitor and checking the links myself; there's tens or hundreds of these kinds of services that do the same thing o.o;
00:11 🔗 JAA Some are based on user reports, others ping or connect to the server in question. No idea which are reliable. But the real question here is: what are you trying to do?
00:11 🔗 JAA Not sure what this has to do with ignores.
00:13 🔗 Ryz Well JAA, you mention to try and focus more on doing ignores on jobs that last longer than a day; I'm trying to finish 4e8nhc1c60bt0760g31o0pk9h faster and me throwing ignores on dead websites that seem to stall the pipeline with 'Connect timed out' and 'DNS resolution error'
00:14 🔗 Ryz Tis to make sure that it's not an AB problem that it got the error, but also when I check them myself
00:14 🔗 JAA Right. Well, such ignores only really make much sense if there are a lot of URLs on the same host. Otherwise, it doesn't save much time.
00:14 🔗 JAA (A lot = a few dozen or more)
00:15 🔗 JAA And when I said to focus on older jobs, I meant mainly those with millions and millions of URLs in the queue.
00:40 🔗 manjaro-u has quit IRC (Quit: Konversation terminated!)
00:46 🔗 tech234a has quit IRC (Quit: Connection closed for inactivity)
01:05 🔗 Damme has joined #archiveteam-bs
01:32 🔗 VerifiedJ has quit IRC (Quit: Leaving)
01:33 🔗 jc86035 has quit IRC (Quit: Connection closed for inactivity)
02:34 🔗 odemgi https://reddit.com/r/DataHoarder/comments/e0sb8b/theeyeeu_we_now_host_the_largest_open_repo_of/
02:43 🔗 Flashfire has quit IRC (Quit: The Lounge - https://thelounge.chat)
02:43 🔗 kiska has quit IRC (Quit: The Lounge - https://thelounge.chat)
02:46 🔗 kiska has joined #archiveteam-bs
02:46 🔗 Fusl__ sets mode: +o kiska
02:46 🔗 Fusl sets mode: +o kiska
02:46 🔗 Fusl_ sets mode: +o kiska
02:46 🔗 Flashfire has joined #archiveteam-bs
02:49 🔗 kiska18 has quit IRC (The Lounge - https://thelounge.chat)
02:49 🔗 Ryz has quit IRC (Quit: The Lounge - https://thelounge.chat)
02:49 🔗 Ryz has joined #archiveteam-bs
02:50 🔗 kiska18 has joined #archiveteam-bs
02:50 🔗 Fusl__ sets mode: +o kiska18
02:50 🔗 Fusl sets mode: +o kiska18
02:50 🔗 Fusl_ sets mode: +o kiska18
03:04 🔗 tech234a has joined #archiveteam-bs
03:36 🔗 kiska18 has quit IRC (The Lounge - https://thelounge.chat)
03:36 🔗 Ryz has quit IRC (Quit: The Lounge - https://thelounge.chat)
03:36 🔗 kiska18 has joined #archiveteam-bs
03:36 🔗 Fusl__ sets mode: +o kiska18
03:36 🔗 Fusl sets mode: +o kiska18
03:36 🔗 Fusl_ sets mode: +o kiska18
03:36 🔗 Ryz has joined #archiveteam-bs
03:50 🔗 JAA I'll look into archiving the legacy GMC forums independently with qwarc as well since the AB job won't get everything in time.
03:51 🔗 BlueMax any advice on how I could do that as well
03:55 🔗 JAA Not really, I haven't written any documentation on qwarc yet. There are lots of things that need to be kept in mind when writing the spec file, plus bugs and quirks. I'd love to see others use it, but I'm not sure it's ready for that yet.
03:56 🔗 JAA But if that doesn't scare you away, check out the code and my most recent uploads on IA (the -meta WARC contains the spec file and qwarc command).
03:58 🔗 BlueMax I'm afraid I'm completely lacking on the time or brainpower to go that indepth
04:05 🔗 HP_Archiv Hey JAA, did you managed to grab http://gb64.com/ ?
04:05 🔗 HP_Archiv manage*
04:06 🔗 JAA HP_Archiv: Yes, that ArchiveBot job finished.
04:06 🔗 JAA Well, both of them. (One for the site, one for the forums.)
04:07 🔗 JAA It's in the WBM starting from here: http://web.archive.org/web/20191116011644/http://gb64.com/
04:08 🔗 HP_Archiv Awesome. Thanks again for grabbing this ^^
04:09 🔗 JAA Looks like not everything is in the WBM yet, but it should be soon.
04:10 🔗 HP_Archiv No worries
04:10 🔗 wp494 has quit IRC (Quit: LOUD UNNECESSARY QUIT MESSAGES)
04:17 🔗 wp494 has joined #archiveteam-bs
04:18 🔗 odemgi_ has joined #archiveteam-bs
04:21 🔗 qw3rty has joined #archiveteam-bs
04:22 🔗 markedL do we have to do the ftp later or does AB do that as well?
04:22 🔗 odemgi has quit IRC (Read error: Operation timed out)
04:22 🔗 JAA Which FTP?
04:24 🔗 JAA HP_Archiv: Oh yeah, once the stuff is in the WBM, you should check whether everything in http://gb64.com/articles.php was archived. I have a feeling that it wasn't. Let me know if so and I'll look into it.
04:24 🔗 markedL the gb64 links to roms on FTP
04:24 🔗 JAA Ah, ftp://8bitfiles.net/gamebase_64/Games/
04:24 🔗 JAA No, that wasn't fetched.
04:25 🔗 JAA 1) AB isn't very good with FTP. 2) There are certainly a *lot* of C64 collections on IA already.
04:26 🔗 qw3rty2 has quit IRC (Ping timeout: 745 seconds)
04:27 🔗 HP_Archiv Noted ^^
04:28 🔗 HP_Archiv Does this mean since it can't work well with FTP that not everything will be archived from gb64?
04:28 🔗 JAA But feel free to grab the data from that FTP, dedupe it against the IA collections, and upload the rest.
04:28 🔗 JAA Is there more besides those ROMs?
04:29 🔗 JAA If there are links directly to files, those should've been fetched. I don't know whether they'll be available in the WBM though.
04:31 🔗 HP_Archiv Hm, honestly, I don't know how to do deduplication against the IA material
04:31 🔗 HP_Archiv New to all of this so bear with me, heh
04:33 🔗 markedL not knowing is ok, as long as you want to know
04:33 🔗 JAA Essentially, you'd take every file from that FTP server and check whether it's somewhere on the Internet Archive. With filename changes etc., that'll be a lot of work of course.
04:33 🔗 HP_Archiv Of course ^^ If given the tools, etc. Always willing to learn.
04:34 🔗 HP_Archiv I figured it was going to be a manual process. There's no other way?
04:35 🔗 JAA Well, it can partially be automated for sure. Locate the relevant items on IA, calculate some sort of hash for every ROM file in them (but some will be collections in huge zips etc., so have fun with that). Then calculate hashes for the files on the FTP server, and filter out all files whose hash appears in the IA list, then sort out the rest manually.
04:37 🔗 HP_Archiv ^^ I think the first option of manually doing it might be less involved, heh
04:38 🔗 HP_Archiv I'll have to set some time aside to do this. That's a lot of work.
04:38 🔗 eientei95 has quit IRC (Remote host closed the connection)
04:39 🔗 JAA Yeah, it certainly would be a lot of work.
04:41 🔗 HP_Archiv Of course, another way is to designate X number of people in here to certain directories, cut down the amount of time, etc
04:41 🔗 eientei95 has joined #archiveteam-bs
04:41 🔗 * JAA ducks out through the back door.
04:42 🔗 HP_Archiv ^^ Lol
04:44 🔗 JAA Oh
04:45 🔗 JAA Looks like that entire directory is less than 700 MB.
04:45 🔗 HP_Archiv Actually, I have a question. Since ftp was available for gb64, it made seeing what was there/available pretty easy. What do you do in the even that ftp isn't publicly accessible for a given site?
04:45 🔗 JAA In that case, duplication isn't really a big issue.
04:46 🔗 mistym has quit IRC (Quit: ZNC - http://znc.in)
04:47 🔗 JAA Looks like the entire server was grabbed a few years ago: https://archive.org/details/ftp-8bitfiles.net
04:48 🔗 JAA I'm not sure I understand the question. How you discover content obviously depends entirely on how the server is set up.
04:52 🔗 HP_Archiv Right, I'm asking does ftp make it easier to pull content?
04:55 🔗 HP_Archiv Oh and if the entire server was grabbed previously, what does that mean for the new archive job you just finished up - a second version will eventually be on as a separate object?
05:03 🔗 JAA If "content" means "files", then yes, FTP is definitely the best option. That's what it was designed for after all. The reason why it's not so great with ArchiveBot is simply bugs in the underlying tool that let it crash on any kind of error, e.g. a disconnect.
05:03 🔗 JAA I did not grab anything from the FTP server.
05:30 🔗 Yurume has quit IRC (Quit: No Ping reply in 180 seconds.)
05:32 🔗 Yurume has joined #archiveteam-bs
05:40 🔗 HP_Archiv Yes, I meant files ^^
05:40 🔗 HP_Archiv And okay, thank you for the explanation
05:40 🔗 HP_Archiv has quit IRC (Quit: Leaving)
05:43 🔗 mistym has joined #archiveteam-bs
05:49 🔗 mistym has quit IRC (Quit: ZNC - http://znc.in)
06:12 🔗 raeyulca has quit IRC (Ping timeout: 252 seconds)
06:15 🔗 raeyulca has joined #archiveteam-bs
06:22 🔗 mistym has joined #archiveteam-bs
06:24 🔗 tech234a has quit IRC (Quit: Connection closed for inactivity)
06:53 🔗 programme has joined #archiveteam-bs
06:55 🔗 SmileyG has joined #archiveteam-bs
06:57 🔗 bluefoo has quit IRC (se.hub efnet.portlane.se)
06:57 🔗 TC01 has quit IRC (se.hub efnet.portlane.se)
06:57 🔗 mls has quit IRC (se.hub efnet.portlane.se)
06:57 🔗 klg has quit IRC (se.hub efnet.portlane.se)
06:57 🔗 prq has quit IRC (se.hub efnet.portlane.se)
06:57 🔗 SketchCow has quit IRC (se.hub efnet.portlane.se)
06:57 🔗 Smiley has quit IRC (se.hub efnet.portlane.se)
06:57 🔗 Gfy has quit IRC (se.hub efnet.portlane.se)
06:57 🔗 VoynichCr has quit IRC (se.hub efnet.portlane.se)
06:57 🔗 luckcolor has quit IRC (se.hub efnet.portlane.se)
06:57 🔗 Deewiant has quit IRC (se.hub efnet.portlane.se)
06:57 🔗 yuitimoth has quit IRC (se.hub efnet.portlane.se)
06:57 🔗 mc2 has quit IRC (se.hub efnet.portlane.se)
06:57 🔗 sHATNER has quit IRC (se.hub efnet.portlane.se)
06:57 🔗 Laverne_ has quit IRC (se.hub efnet.portlane.se)
06:57 🔗 Shen has quit IRC (se.hub efnet.portlane.se)
06:58 🔗 TC01_ has joined #archiveteam-bs
06:58 🔗 SketchCo1 has joined #archiveteam-bs
06:58 🔗 Fusl sets mode: +o SketchCo1
06:58 🔗 Fusl__ sets mode: +o SketchCo1
06:58 🔗 Fusl_ sets mode: +o SketchCo1
06:59 🔗 MC2Square has joined #archiveteam-bs
06:59 🔗 bluefoo_ has joined #archiveteam-bs
07:03 🔗 sHATNER_ has joined #archiveteam-bs
07:11 🔗 klg_ has joined #archiveteam-bs
07:12 🔗 luckcolor has joined #archiveteam-bs
07:12 🔗 MC2Square is now known as mc2
07:38 🔗 mistym has quit IRC (Quit: ZNC - http://znc.in)
07:40 🔗 Gfy has joined #archiveteam-bs
07:41 🔗 mls has joined #archiveteam-bs
07:42 🔗 VoynichCr has joined #archiveteam-bs
07:46 🔗 tech234a has joined #archiveteam-bs
07:47 🔗 mistym has joined #archiveteam-bs
07:58 🔗 Laverne_ has joined #archiveteam-bs
08:12 🔗 mistym has quit IRC (Quit: ZNC - http://znc.in)
08:14 🔗 mistym has joined #archiveteam-bs
08:27 🔗 mistym has quit IRC (Quit: ZNC - http://znc.in)
08:35 🔗 jc86035 has joined #archiveteam-bs
08:55 🔗 mistym has joined #archiveteam-bs
09:17 🔗 BlueMaxim has joined #archiveteam-bs
09:17 🔗 BlueMax has quit IRC (Read error: Connection reset by peer)
09:54 🔗 tech234a has quit IRC (Quit: Connection closed for inactivity)
10:44 🔗 jc86035 has quit IRC (Quit: Connection closed for inactivity)
11:15 🔗 ShellyRol has quit IRC (Read error: Connection reset by peer)
11:21 🔗 ShellyRol has joined #archiveteam-bs
12:08 🔗 benjinss has joined #archiveteam-bs
12:09 🔗 h3ndr1k_ is now known as h3ndr1k
12:13 🔗 benjinsmi has quit IRC (Read error: Operation timed out)
13:23 🔗 wp494 has quit IRC (Read error: Operation timed out)
13:24 🔗 wp494 has joined #archiveteam-bs
13:49 🔗 BlueMaxim has quit IRC (Quit: Leaving)
13:52 🔗 killsushi has quit IRC (Quit: Leaving)
14:09 🔗 odemgi has joined #archiveteam-bs
14:15 🔗 odemgi_ has quit IRC (Read error: Operation timed out)
14:27 🔗 Damme has quit IRC (Read error: Connection reset by peer)
14:44 🔗 SketchCo1 is now known as SketchCow
14:45 🔗 benjinss has quit IRC (Quit: Leaving)
14:46 🔗 benjins has joined #archiveteam-bs
14:52 🔗 kiskaWee has joined #archiveteam-bs
14:53 🔗 kiska sets mode: +o kiskaWee
15:16 🔗 godane has joined #archiveteam-bs
15:47 🔗 britmob has quit IRC (Read error: Connection reset by peer)
15:48 🔗 britmob has joined #archiveteam-bs
16:24 🔗 DigiDigi has quit IRC (Quit: Leaving)
16:30 🔗 DigiDigi has joined #archiveteam-bs
16:55 🔗 Damme has joined #archiveteam-bs
17:12 🔗 HP_Archiv has joined #archiveteam-bs
17:16 🔗 sHATNER_ is now known as sHATNER
17:17 🔗 Fusl has quit IRC (Read error: Operation timed out)
17:18 🔗 Fusl has joined #archiveteam-bs
17:18 🔗 Fusl__ sets mode: +o Fusl
17:18 🔗 Fusl_ sets mode: +o Fusl
17:19 🔗 kyledrake has quit IRC (Read error: Operation timed out)
17:19 🔗 kyledrake has joined #archiveteam-bs
17:20 🔗 icedice has joined #archiveteam-bs
17:46 🔗 HP_Archiv Twitter grab question. I'm trying to archive this entire Twitter post w/thread comments. I've captured it into WBM, but when viewing the captured page in WMB it's only showing several comments out of the comment thread
17:46 🔗 HP_Archiv https://web.archive.org/web/20191105002320/https:/twitter.com/GagReathle/status/1191509241259077632
17:46 🔗 HP_Archiv Actual live page: https://twitter.com/GagReathle/status/1191474201640755201
17:46 🔗 HP_Archiv Thoughts?
17:50 🔗 SmileyG has quit IRC (Quit: http://www.milkme.co.uk - You'll never understand.)
17:51 🔗 Smiley has joined #archiveteam-bs
18:13 🔗 X-Scale` has joined #archiveteam-bs
18:20 🔗 X-Scale has quit IRC (Read error: Operation timed out)
18:20 🔗 X-Scale` is now known as X-Scale
18:55 🔗 HP_Archiv I also need to some expertise on archiving an old Flash based site. The url is: pdl.warnerbros.com/harrypotter/us/yuleball/swf/yb-main.swf
18:55 🔗 HP_Archiv But modern browsers will not open it, and instead will force a download/save as
18:56 🔗 HP_Archiv Opening in Edge browser, however, actually displays the old Flash based site correctly w/audio. Is there anyway to capture this site 'as is' with Flash enabled or is this not possible?
19:14 🔗 britmob Just a note- Opera opens it fine. I can play it.
19:17 🔗 markedL try webrecorder then webrecorder player https://webrecorder.io/
19:46 🔗 X-Scale` has joined #archiveteam-bs
19:47 🔗 HP_Archiv Used Bandicam and did a screen recording with internal audio. So I did capture it that way
19:47 🔗 X-Scale has quit IRC (Ping timeout: 252 seconds)
19:47 🔗 HP_Archiv I forgot about Webrecorder.io, thank you
19:48 🔗 X-Scale` is now known as X-Scale
19:50 🔗 HP_Archiv Oh and thanks for testing in Operat, britmob. I only tested in Chrome/Firefox before trying Edge
19:50 🔗 HP_Archiv Opera*
20:09 🔗 raeyulca has quit IRC (Ping timeout: 496 seconds)
20:13 🔗 godane i'm at 1784k items now
20:21 🔗 raeyulca has joined #archiveteam-bs
21:07 🔗 raeyulca has quit IRC (Ping timeout: 360 seconds)
21:24 🔗 raeyulca has joined #archiveteam-bs
22:07 🔗 zino has joined #archiveteam-bs
22:18 🔗 godane has quit IRC (Quit: Leaving.)
22:47 🔗 godane has joined #archiveteam-bs
22:48 🔗 BlueMax has joined #archiveteam-bs
22:50 🔗 JAA Legacy GMC forum grab running now.
23:27 🔗 HP_Archiv !a https://youtu.be/swc7TqOfcyY
23:27 🔗 HP_Archiv oops wrong room
23:41 🔗 Jon has quit IRC (Quit: ZNC - http://znc.in)
23:41 🔗 Jon has joined #archiveteam-bs
23:42 🔗 thejsa_ has quit IRC (Quit: No Ping reply in 180 seconds.)
23:42 🔗 jsa has joined #archiveteam-bs

irclogger-viewer