#archiveteam-bs 2019-11-08,Fri

↑back Search

Time Nickname Message
00:02 🔗 HP_Archiv Hey guys, hope everyone's having a nice afternoon. Can someone archive the entirety of ModDB.com?
00:03 🔗 schbirid i would love that
00:03 🔗 schbirid if someone does please keep me in the loop
00:03 🔗 schbirid has quit IRC (Quit: Leaving)
00:05 🔗 HP_Archiv @schbirid, sure thing.
00:06 🔗 HP_Archiv I was told by -Archivist that #archiveteam might have archived the individual/hosted user files (mod creations) in the past, but that nobody's actually archived the entire site before. Considering the nature of the site, can someone in the #archiveteam submit this for entire-site archiving?
00:10 🔗 JAA Just in case: no, probably not a good idea to throw this into ArchiveBot. At least the downloads need to be handled differently.
00:10 🔗 apache2 has joined #archiveteam-bs
00:10 🔗 HP_Archiv @JAA, can you explain why?
00:10 🔗 JAA HP_Archiv: ArchiveBot doesn't handle large files very well.
00:12 🔗 HP_Archiv Hmm. Well the mod creations/user files are not that large. Using a map creation from the HP1 PC Game someone created, https://www.moddb.com/games/harry-potter-and-the-sorcerers-stone/addons/night-map
00:12 🔗 HP_Archiv As an example ^^
00:12 🔗 JAA I only looked at the last few uploads and saw several files over 1 GB there.
00:13 🔗 JAA E.g. https://www.moddb.com/mods/doom-for-your-lazy-friends/downloads/doom-for-your-lazy-friends-part-1
00:13 🔗 JAA And I mean, the entire thing is 13 TB. That would easily be the largest AB job ever.
00:14 🔗 HP_Archiv Ah, I wasn't aware that people upload files that big on site/I've only ever seen files less than 1GB (until now)
00:14 🔗 HP_Archiv Hmm
00:14 🔗 HP_Archiv Can I curate a selection of individual pages for archiving then, instead of the whole site?
00:15 🔗 HP_Archiv I'd at least like to see all entries Potter-game related archived. But if not, that's fine/understandable.
00:16 🔗 britmob Yes, you can use individual links
00:17 🔗 JAA (In case anyone's wondering, the largest AB job to date to my knowledge was for NDTV at just over 8 TiB.)
00:19 🔗 HP_Archiv I was under the impression it was Google+ Pages? News to me ^^
00:19 🔗 JAA I'm talking only about ArchiveBot.
00:19 🔗 JAA Google+ was the largest distributed project of AT to date at 1.4 PiB.
00:21 🔗 HP_Archiv Oh right, I keep forgetting that there are different, well, aspects to #archiveteam
00:21 🔗 HP_Archiv Okay, well as it happens, I already have a spreadsheet of links ready for archiving in a spreadsheet ;)
00:22 🔗 HP_Archiv Should I just paste them all here, or what?
00:23 🔗 JAA We'd need a text file of URLs.
00:23 🔗 JAA I'm not sure how this would work for downloads though. Are the download URLs constant?
00:24 🔗 JAA (The link on that "Click to <filename> if it doesn't start automatically" page I mean.)
00:25 🔗 HP_Archiv I believe so. I tested a few entries, and each ones prompts with the same thing - Click X File if it doesn't start automatically - and the offers the download as a .zip
00:26 🔗 HP_Archiv Correction: Not all are .zip
00:27 🔗 HP_Archiv How do I get you the text file?
00:28 🔗 JAA https://transfer.notkiska.pw/
00:29 🔗 HP_Archiv Okay, thanks @JAA. Give me a few minutes and I'll send it over. Just want to make sure I'm not missing any links.
00:34 🔗 JAA HP_Archiv: Take your time. I'm going to bed now anyway. Will look into it tomorrow or on the weekend.
00:35 🔗 HP_Archiv Okay, sure thing. Yeah the only way to get links is to open entries one at a time. And for everything Potter-related on ModDB, there are about 10 pages of results that come up when you search 'Potter'. I have 3/4 of them, but need to make some adjustments to the list I have. Have a nice evening.
00:37 🔗 robogoat_ has quit IRC (Ping timeout: 258 seconds)
00:37 🔗 robogoat has joined #archiveteam-bs
01:10 🔗 mike__ has joined #archiveteam-bs
01:11 🔗 mike__ hi, I've got a project to gather data and I need folks help with it. Is there a recommended way to propose things?
01:11 🔗 mike__ (Sorry if this is the wrong IRC room.)
01:11 🔗 arkiver what project is it
01:11 🔗 mike__ Getting data from case.law, a database of scanned legal opinions hosted by Harvard.
01:12 🔗 mike__ It's behind a lock and key until 2024, but anybody can get 500 items/day, so we have tools to automate that.
01:21 🔗 markedL I know that site. What have you done with it already?
01:22 🔗 mike__ We've got a macOS app with some users and a docker image folks can install. Those will check our servers for assignments, then go get the items as requested and send them back to us to push to the Internet Archive.
01:26 🔗 markedL is there data already pushed to IA ?
01:27 🔗 mike__ Yeah, it's uploading daily.
01:27 🔗 mike__ We're keeping a bit of a lid on this though until we're done.
01:27 🔗 markedL can you point us to where to find it on IA?
01:28 🔗 wm_ has joined #archiveteam-bs
01:28 🔗 mike__ No, I'd rather not for the moment.
01:29 🔗 mike__ Harvard has a sort of "we don't want to know about it" attitude, so we're only going to broadcast this once it's done.
01:29 🔗 markedL What did you want us to help with, we have our own software for server and client side grabs, so it sounds like you already chose your own platform
01:30 🔗 mike__ Yeah, I just saw that, shoot, but we need people to install the macOS client or run the docker image until the project is done.
01:31 🔗 markedL Harvard is bound what I believe was a contract with their technology provider, so no they don't want to be involved. Their agreement for 2024 was already concession enough.
01:32 🔗 mike__ The 500/day is in their contrat too.
01:32 🔗 mike__ They made sure of that.
01:32 🔗 arkiver hi sorry, what website?
01:32 🔗 markedL case.law
01:32 🔗 arkiver nice
01:32 🔗 arkiver with login only?
01:33 🔗 arkiver or actually publicly visitable URLs after some login
01:33 🔗 arkiver findable through some login*
01:33 🔗 Somebody2 britmob: If *you* want to reach out, feel free! Maybe they'll mail you a hard drive...
01:33 🔗 mike__ yeah, you need to pop your API key into the macOS client or the docker image
01:34 🔗 arkiver I see they have bulk data links
01:34 🔗 mike__ Only for a couple jurisdictions.
01:34 🔗 mike__ We gathered that already.
01:34 🔗 markedL the bulk data links is by the rights on the jurisidiction
01:34 🔗 markedL Harvard is very liberal, but is bound by their upstream process
01:35 🔗 arkiver this channel is logged
01:35 🔗 mike__ yeah, if the court checks a few boxes, Harvard can give it away, but only a few have (or probably will)
01:35 🔗 arkiver make another channel maybe
01:36 🔗 arkiver #allthecases
01:36 🔗 arkiver mike__: ^
01:44 🔗 SketchCow TODAY I LEARNED: https://en.wikipedia.org/wiki/More_Product,_Less_Process
01:49 🔗 astrid it's you
02:04 🔗 tech234a has joined #archiveteam-bs
02:20 🔗 mike__ has quit IRC (Ping timeout: 260 seconds)
02:45 🔗 katocala has quit IRC ()
02:49 🔗 n00b161 has joined #archiveteam-bs
02:49 🔗 n00b161 has quit IRC (Client Quit)
02:52 🔗 katocala has joined #archiveteam-bs
02:53 🔗 ShellyRol has quit IRC (Read error: Connection reset by peer)
02:55 🔗 ShellyRol has joined #archiveteam-bs
03:14 🔗 kiskabak has quit IRC (Ping timeout (120 seconds))
03:15 🔗 kiskabak has joined #archiveteam-bs
03:15 🔗 Fusl__ sets mode: +o kiskabak
03:15 🔗 Fusl sets mode: +o kiskabak
03:15 🔗 Fusl_ sets mode: +o kiskabak
03:39 🔗 m007a83 has joined #archiveteam-bs
03:46 🔗 manjaro-u has quit IRC (Read error: Operation timed out)
03:53 🔗 BlueMax has joined #archiveteam-bs
04:14 🔗 tech234a has quit IRC (Quit: Connection closed for inactivity)
04:33 🔗 odemgi_ has joined #archiveteam-bs
04:39 🔗 odemgi has quit IRC (Read error: Operation timed out)
04:39 🔗 manjaro-u has joined #archiveteam-bs
04:40 🔗 qw3rty has joined #archiveteam-bs
04:49 🔗 qw3rty2 has quit IRC (Ping timeout: 745 seconds)
04:51 🔗 manjaro-u has quit IRC (Quit: Konversation terminated!)
05:38 🔗 Zeryl has joined #archiveteam-bs
05:38 🔗 manjaro-u has joined #archiveteam-bs
05:41 🔗 Zeryl Hrm, is #urlteam private for a reason? Figured it'd be ok to join since it's a warrior project
05:49 🔗 markedL it's not suppose to be private
05:57 🔗 manjaro-u has quit IRC (Quit: Konversation terminated!)
05:58 🔗 Zeryl I'm a dolt, tried to join on the wrong network >.>
06:19 🔗 manjaro-u has joined #archiveteam-bs
06:49 🔗 omglolbah has quit IRC (Quit: ZNC - https://znc.in)
06:50 🔗 kiska has quit IRC (Remote host closed the connection)
06:50 🔗 Flashfire has quit IRC (Remote host closed the connection)
06:51 🔗 kiska has joined #archiveteam-bs
06:51 🔗 Fusl__ sets mode: +o kiska
06:51 🔗 Fusl sets mode: +o kiska
06:51 🔗 Fusl_ sets mode: +o kiska
06:51 🔗 Flashfire has joined #archiveteam-bs
06:53 🔗 omglolbah has joined #archiveteam-bs
06:55 🔗 godane SketchCow: so i got a old CNN tape called Best of Play of the Day from 1991
06:55 🔗 godane sponsored by The Athlete's Foot
09:10 🔗 Raccoon has quit IRC (Ping timeout: 612 seconds)
09:25 🔗 Raccoon has joined #archiveteam-bs
09:31 🔗 odemgi_ well this is bullshit: https://twitter.com/textfiles/status/1192518085997137920 I didn't know that you'd been working on gfycat and had just scraped half a mil urls myself, I guess someone has a bigger list than I do?
09:33 🔗 Kaz odemgi_: yeah we've got a fair few. pop into #deadcat over on hackint if you're interested
09:43 🔗 odemgi_ has quit IRC (Quit: Leaving)
09:48 🔗 manjaro-u has quit IRC (Quit: Konversation terminated!)
10:00 🔗 manjaro-u has joined #archiveteam-bs
10:06 🔗 HP_Archiv has quit IRC (Ping timeout: 263 seconds)
10:14 🔗 manjaro-u has quit IRC (Quit: Konversation terminated!)
10:55 🔗 BlueMax has quit IRC (Read error: Connection reset by peer)
11:14 🔗 IAmbience has quit IRC (Quit: Connection closed for inactivity)
11:26 🔗 Tenebrae has quit IRC (Read error: Operation timed out)
11:41 🔗 Tenebrae has joined #archiveteam-bs
11:49 🔗 HP_Archiv has joined #archiveteam-bs
11:50 🔗 HP_Archiv #archiveteam-bs I have a text file ready of links from ModDB.com that are all entries related to Harry Potter game-related content and development. Uploaded the text file to https://transfer.notkiska.pw/ and the link is:
11:50 🔗 HP_Archiv https://transfer.notkiska.pw/yeOxD/ModDB_Potter_11.2019.txt
11:50 🔗 HP_Archiv Can someone please ingest all of the URLs into Archivebot for archiving?
11:56 🔗 betamax HP_Archiv: I can, but you're welcome to do it yourself (you don't need ops or voice to do a "!ao" command, which is what you'd use for ingesting from a list of URLs)
11:56 🔗 betamax the command you'd want to do (in #archivebot) would be "!ao < https://transfer.notkiska.pw/yeOxD/ModDB_Potter_11.2019.txt"
11:57 🔗 HP_Archiv Oh cool, okay. I didn't know anyone can self-submit content into archivebot
11:57 🔗 HP_Archiv Let me try it out now
11:58 🔗 betamax yup, you need voice / ops for recursive archiving ("!a") but for "!ao" you don't
11:59 🔗 HP_Archiv I think I did it correctly ^^ Thank you
11:59 🔗 betamax looks good to me (you can watch the progress at http://dashboard.at.ninjawedding.org/3 )
12:00 🔗 HP_Archiv Awesome. Thanks again :)
12:04 🔗 HP_Archiv Hm, just watching it now @betamax, it appears to have hit an error?
12:06 🔗 betamax if you mean it gets stuck at the end, I think that happens with all jobs
12:06 🔗 tuluu_ has joined #archiveteam-bs
12:06 🔗 tuluu has quit IRC (Read error: Connection reset by peer)
12:07 🔗 betamax I can't see the log, unfortunately, since it's finished it doesn't show up on the dashboard page if I load it now
12:07 🔗 betamax what was the error?
12:07 🔗 HP_Archiv Oh, I think that's what it was because #archivebot is saying job finished
12:08 🔗 HP_Archiv How do I view the results?
12:08 🔗 betamax I think the download will now be sitting on a staging server, before it's uploaded into archive.org and then added into the wayback machine
12:08 🔗 HP_Archiv What I'd like to do is make sure that it actually captured the direct download options on each mod pages/each hosted file
12:10 🔗 betamax I'm not 100% sure there is a way to do that (although I'm not involved with the running of archivebot, just a happy user, so perhaps someone will know)
12:11 🔗 HP_Archiv I actually just checked. It appears all of the links I gathered were just to the entries, and I think you're right. Archivebot only capture those page URLs, not the direct download URL's on each page, example: https://www.moddb.com/games/harry-potter-and-the-sorcerers-stone/addons/sorcerers-stone-custom-map-levunr
12:12 🔗 HP_Archiv I'd have to go through the 100+ links I just submitted and pull the sub-download links from each page...
12:13 🔗 betamax yeah, just looked at a URL myself. Since the "Download now" button isn't a direct link itself, but opens a JS popup, the direct links won't have been captured
12:14 🔗 betamax however, there are probably easier ways than going through every page by hand
12:14 🔗 HP_Archiv Do tell, 'cause that would save me a lot of time ^^
12:15 🔗 betamax gimme a few minutes with one of my scripts :)
12:15 🔗 HP_Archiv Sure thing, take your time. Appreciate your help :)
12:34 🔗 JAA HP_Archiv: I can assure you it didn't grab the downloads.
12:35 🔗 JAA It will only have grabbed the pages in your text file plus images, stylesheets, etc.
12:35 🔗 HP_Archiv @JAA, yeah I realized that after the fact. @betamax was kind enough to assist, hopefully I can get the exact URL download paths into #archivebot in an easy fashion
12:36 🔗 JAA And yeah, the uploads go to an intermediate server and will show up on the Internet Archive sometime soonish probably.
12:36 🔗 betamax huh, I've been distracted by the fact that moddb downloads can be discovered using numerically incrementing IDs...
12:37 🔗 betamax so doing a grab of *ALL* content probably would be quite easy :)
12:37 🔗 JAA Yes, it would. Just not with AB.
12:38 🔗 betamax yeah, would probably be a warrior project. I'll try and make a quite wiki page and note for ModDB later
12:38 🔗 betamax in case it's ever needed
12:38 🔗 JAA Yeah, sounds good.
12:38 🔗 JAA The site seems stable at the moment.
12:38 🔗 HP_Archiv @betamax, that would great. Thank you ^^
12:42 🔗 HP_Archiv Has anyone given attention to the site, TCRF.net ? Example: https://tcrf.net/Prerelease:Harry_Potter_and_the_Sorcerer%27s_Stone_(Windows,_Mac_OS_Classic,_Mac_OS_X)
12:43 🔗 HP_Archiv Bad link ^^ , Correct link: tcrf.net/Prerelease:Harry_Potter_and_the_Sorcerer%27s_Stone_(Windows,_Mac_OS_Classic,_Mac_OS_X)
12:43 🔗 HP_Archiv If not, I'd like to submit the site for archiving
12:44 🔗 JAA Yeah, looks like it was archived with ArchiveBot in March.
12:45 🔗 IAmbience has joined #archiveteam-bs
12:46 🔗 HP_Archiv Okay good. And it capture all elements on a pages with hosted media? example: https://tcrf.net/Harry_Potter_and_the_Sorcerer%27s_Stone_(Windows,_Mac_OS_Classic,_Mac_OS_X)#Unused_Sounds
12:46 🔗 HP_Archiv I'd check myself but not sure how to do that
12:48 🔗 JAA All ArchiveBot crawls end up in the Wayback Machine (eventually).
12:48 🔗 JAA This is the AB snapshot of that page: https://web.archive.org/web/20190302045109/https://tcrf.net/Harry_Potter_and_the_Sorcerer's_Stone_(Windows,_Mac_OS_Classic,_Mac_OS_X)
12:50 🔗 JAA To check for the individual audio files in this case, you need to copy their URL and edit it to e.g. https://web.archive.org/web/*/https://tcrf.net/images/5/5c/HPSSWin-bats_squeaking1.ogg . Then you see that AB did indeed capture that as well on 2019-03-02.
12:53 🔗 hata has joined #archiveteam-bs
12:54 🔗 HP_Archiv Awesome. Thank you for the explanation @JAA, appreciate it
12:54 🔗 HP_Archiv I tested that myself and was able to pull up a different link from another Potter entry. Good to go
12:56 🔗 JAA :-)
12:57 🔗 JAA If you want a local copy, the data is somewhere in the ArchiveBot collection on the Internet Archive, but be warned that it'll be a pain to find those files since the viewer is broken currently.
12:58 🔗 HP_Archiv Not sure I follow - I tried to right click save as on a random audio file hosted from another game entry. Wasn't hard to do find?
12:58 🔗 HP_Archiv Hard to find*
12:59 🔗 HP_Archiv Or do you mean a local copy of the entire capture?
13:01 🔗 JAA Yeah, the entire thing.
13:01 🔗 JAA And in the actual archival format (WARC) rather than plain files.
13:01 🔗 betamax that list contained 58 URLs to pages that had a "download" button, the rest must be search results, category pages or images
13:01 🔗 betamax https://transfer.notkiska.pw/iOZLe/hp.list
13:03 🔗 betamax I've saved both the actual download link and the page that opens in the popup, as that *should* mean the popup with download link works in the wayback (but no guarentees as the wayback doesn't always get these things right)
13:03 🔗 betamax I'll let you check it over and add it into archivebot
13:03 🔗 HP_Archiv Hm, I've actually never tried to download in WARC before. So for example, how would I download this entire page as an archival file? https://web.archive.org/web/20190302045109/https://tcrf.net/Harry_Potter_and_the_Sorcerer's_Stone_(Windows,_Mac_OS_Classic,_Mac_OS_X)
13:04 🔗 HP_Archiv Okay @betamax. Thank you very much. I'll have a look now
13:06 🔗 JAA HP_Archiv: That's not possible as far as I know. You could save the Wayback Machine page as WARC, but that's not the same as the original data because links get rewritten to the WBM etc. There are partial ways around that, but you can't reproduce the original retrieval from the WBM perfectly. In this case, you'd have to download the WARC files ArchiveBot produced for the entire tcrf.net crawl. (Yes,
13:06 🔗 JAA that will be large.)
13:08 🔗 HP_Archiv Huh, so I see we still have a ways to go for perfect website preservation (or maybe there's no such thing due to the nature of hyperlinks?)
13:09 🔗 HP_Archiv @betamax. Looks good, man. Thank you very much getting all of these links. Saved me a lot of time ^^
13:12 🔗 betamax no problem! it's a very simple python script which I'll add to the wiki when I have time
13:12 🔗 betamax since I have no clue, how reliable is ModDB? Is it stable? Do things ever / regularly get deleted? (wondering if this would be a good archival candidate at some point)
13:13 🔗 JAA Well, the data in those WARCs is pretty much the best you can get. There's still things that need to be improved, e.g. JavaScript handling (solvable by using automated browsers for crawling, but that's *very* slow in comparison due to all the rendering etc.), DNS preservation, and SSL/TLS certificates, but any individual URL capture is essentially perfect in these WARCs.
13:15 🔗 HP_Archiv @Betamax. I'm not a gamer by any means. A quick search brings up the Wikipedia entry for ModDB, https://en.wikipedia.org/wiki/Mod_DB and it seems like it's frequently accessed by a lot of gamers/modders.
13:16 🔗 HP_Archiv So perhaps it's stable - for now. Ingesting the Potter-game entries into archivebot is one small aspect of a much larger project I and others are working on, seeing to it that these early 00's Potter PC games are preserved.
13:17 🔗 HP_Archiv I might have mentioned this in here yesterday, but I'm working with several former and current Warner Bros executives and one person out of the LoC's video game workflow to track down a prototype/dev source archive for HP 1, the first ever game
13:19 🔗 HP_Archiv When I was much younger, I played these games as a kid. And if you go on YT for the gameplay, you'll find a lot of people - not even gamer, per se - are nostalgica for these particular games. So they had a fairly strong hold in the culture (obviously, it's HP) but apparently still do as there is an active HP Modding server on Discord.
13:21 🔗 deevious has quit IRC (Read error: Connection reset by peer)
13:22 🔗 HP_Archiv The prototype source code is like the holy grail for these games, because Sorcerer's Stone is the oldest, almost 20 years old, and as mentioned the first Potter game released. With that, the game can be rebuilt, ground up. And the person from the LoC I've been in talks with has said, 'they're very much interested in participating in conversations around acquiring digital assets/proto dev archives'. Surprisingly.
13:22 🔗 deevious has joined #archiveteam-bs
13:22 🔗 HP_Archiv Anyway, hope that answers your question in a round about sort of way Lol
13:23 🔗 HP_Archiv @JAA, noted. How do I download a WARC file from WBM?
13:26 🔗 JAA HP_Archiv: You don't. You download them from the Internet Archive instead. The WBM is essentially just an index of and interface to all the WARC data residing in IA. The AB data is in https://archive.org/details/archivebot , but as you will quickly realise, all the various ArchiveBot jobs are mixed together, so it's a mess to find the data of a particular job. That's why the AB viewer was written
13:26 🔗 JAA many moons ago, to make it easier to find the files, but as mentioned it's broken at the moment.
13:28 🔗 deevious has quit IRC (Ping timeout: 252 seconds)
13:30 🔗 HP_Archiv @JAA: I'm seeing that, yeah. Huh, why weren't these automatically uploaded to IA with their corresponding website names?
13:31 🔗 Sanqui HP_Archiv: Are you also in contact with Griptonite/KnowWonder folks?
13:36 🔗 HP_Archiv @Sanqui, it's a mix, and it has not been easy. Former head of licensing in the same WB department, who oversaw this exact game titles, put us in touch with the vp of tech at WBIE, Warner B. Interactive Entertainment. We've been emailing for a few months, the latter pointed us in the direction of several people formerly of Foundation 9, some of who were working at KnowWonder/Amaze during the dev time for these Potter games.
13:37 🔗 Sanqui HP_Archiv: I'm more than interested in any leads for the GB games, in particular HP1/2 GB and HP3 GBA.
13:38 🔗 HP_Archiv The latest contact, a former dev who worked directly on Sorcerer's Stone, gave me a list of possible lead, people who might've held onto a copy of the proto files. He at one point had the E3 2001 proto - basically a test map - but had these on CD-RWs and were unreadable after a certain point a few years back.
13:39 🔗 Sanqui Image them anyway, some data could be recovered
13:39 🔗 HP_Archiv He has since destroyed the discs...
13:39 🔗 Sanqui good jorb.
13:40 🔗 HP_Archiv But the data he has, as far as we can tell, was not the actual HP 1 proto dev archive of files, which would look like this, tcrf.net/Proto:Harry_Potter_and_the_Chamber_of_Secrets_(Windows,_Mac_OS_Classic,_Mac_OS_X)
13:41 🔗 HP_Archiv We actually have Chamber of Secrets, HP 2's full prototype. A former developer who luckily held onto the entire directory gave it to the HP modding community a few years ago.
13:41 🔗 HP_Archiv Data he had*
13:42 🔗 HP_Archiv Also, we're not focused on other platforms. There's been work with the Gamecube versions of each game, but it's difficult to mod non-pc games as other platforms had game released in a more story mode, rather than open world
13:42 🔗 Sanqui Cool stuff, cool stuff.
13:42 🔗 Sanqui The GBC games were really cool western JRPGs, not platformers
13:43 🔗 HP_Archiv Ah okay, I never played the GBC versions
13:44 🔗 Sanqui I do recommend giving them a shot some day. But that's OT :P
13:44 🔗 HP_Archiv But anyway, yeah. It's a real headache trying to find HP 1 proto. We don't even know if it exists still. Early 2000s was still a time when people used CD-Rs and external drives were not common yet.
13:45 🔗 betamax HP_Archiv: just fyi, I didn't put those links into archivebot, as I thought you'd want to check over them first (in case you thought I had while I thought you would... etc)
13:45 🔗 HP_Archiv We have confirmed with EA Archives that they do have the final source code for the commercial/retail release of the game. But they declined having any proto/development files.
13:45 🔗 Sanqui I'm surprised they're so communicative
13:46 🔗 HP_Archiv I've had a small group of people helping me with this - we've been, I should say, unforgiving in our efforts to push forward and press for information :)
13:46 🔗 HP_Archiv @betamax. Thanks - I actually already submitted into AB. I believe the job is already done ^^
13:50 🔗 HP_Archiv But yeah, the former head of licensing who oversaw licensing for these games was actually quite interested in what we were trying to do. And both people from WB were surprised to hear that the Library of Congress was even interested in participating in these conversations. But I reached out to this guy, https://blogs.loc.gov/thesignal/2012/09/yes-the-library-of-congress-has-video-games-an-interview-with-david-gibson/, about a year a
13:50 🔗 HP_Archiv He helped with the acquisition of physical copies of each of the Potter games into their collections and preservation workflow, which I believe includes ISO imaging
13:52 🔗 HP_Archiv It's a small operation, which is located in their motion-picture film division (video games fall under 'moving images') but it's a start. Anyway, I've written a novel in here.
13:52 🔗 HP_Archiv Thank you all for your help :)
13:56 🔗 Sanqui Your IRC client cut off one of your mesages, beginning with "about a year a[...]"
13:56 🔗 Sanqui still, cool stuff. lemme know if you hear anything about/from the gameboy team :D
13:58 🔗 HP_Archiv 'He helped with the acquisition of physical copies of each of the Potter games into their collections and preservation workflow, which I believe includes ISO imaging. It's a small operation, which is located in their motion-picture film division (video games fall under 'moving images') but it's a start.'
13:58 🔗 HP_Archiv Heh will do ^^
13:58 🔗 Sanqui Oh, that message came through, just not the "ago" part in "a year ago" I guess XD
13:59 🔗 HP_Archiv Odd, well no worries
14:00 🔗 HP_Archiv Again thanks everyone for the help/explanations ^^
14:00 🔗 JAA HP_Archiv: Uploading one item per job is actually not possible because items are size-limited. This has in fact caused problems before because some pipelines did (attempt to) upload per-job items.
14:01 🔗 JAA And yeah, the web chat thingy sucks. Messages in IRC have a length limit, and that web chat just cuts them off instead of splitting up into multiple messages as any sane client would do.
14:03 🔗 HP_Archiv @JAA I think the last ingest of URL's in the text file, 'https://transfer.notkiska.pw/PvcO6/ModDB_Potter_Downloads_URLs_11.2019.txt' was successful though?
14:05 🔗 JAA HP_Archiv: Seems like it, yes. I suggest you double-check though once it's in the Wayback Machine that it didn't get any "Download Link Expired" pages or similar.
14:06 🔗 JAA Apparently the download URLs are not dependent on the UA or IP, but they do expire periodically.
14:11 🔗 HP_Archiv Okay @JAA will d ^^
14:11 🔗 HP_Archiv will do*
14:12 🔗 odemgi has joined #archiveteam-bs
14:31 🔗 systwi_ is now known as systwi
14:44 🔗 deevious has joined #archiveteam-bs
15:14 🔗 manjaro-u has joined #archiveteam-bs
16:25 🔗 Sokar has quit IRC (Remote host closed the connection)
16:30 🔗 X-Scale has quit IRC (Ping timeout: 252 seconds)
16:31 🔗 [X-Scale] has joined #archiveteam-bs
16:31 🔗 [X-Scale] is now known as X-Scale
16:32 🔗 Video has quit IRC (Quit: Page closed)
16:32 🔗 deevious has quit IRC (Ping timeout: 252 seconds)
16:33 🔗 Video has joined #archiveteam-bs
16:36 🔗 manjaro-u has quit IRC (Konversation terminated!)
16:47 🔗 manjaro-u has joined #archiveteam-bs
17:10 🔗 schbirid has joined #archiveteam-bs
17:15 🔗 manjaro-u has quit IRC (Konversation terminated!)
17:17 🔗 Sokar has joined #archiveteam-bs
17:37 🔗 akierig has joined #archiveteam-bs
17:50 🔗 mike__ has joined #archiveteam-bs
17:51 🔗 mike__ We were chatting here last night (PST) about gathering content from case.law. If anybody is interested in discussing that project, I'm over in #allthecases.
17:59 🔗 omglolba- has joined #archiveteam-bs
18:06 🔗 omglolbah has quit IRC (Ping timeout: 745 seconds)
18:11 🔗 tuluu_ has quit IRC (Read error: Connection refused)
18:12 🔗 tuluu has joined #archiveteam-bs
18:15 🔗 bluefoo has quit IRC (Ping timeout: 255 seconds)
18:23 🔗 Video has quit IRC (Quit: Page closed)
18:25 🔗 manjaro-u has joined #archiveteam-bs
18:39 🔗 omglolbah has joined #archiveteam-bs
18:39 🔗 DogsRNice has joined #archiveteam-bs
18:40 🔗 omglolba- has quit IRC (Read error: Operation timed out)
19:23 🔗 akierig has quit IRC (Quit: later_gator)
19:31 🔗 bluefoo has joined #archiveteam-bs
19:33 🔗 HP_Archiv Good morning guys. @JAA, if you're around, how would I go about searching for those ModDB links to see if they're already in WBM?
19:34 🔗 HP_Archiv Apologies if you explained this earlier
19:41 🔗 HP_Archiv Also, how does AB handle links to files hosted in a public Google Drive? eg: A site hosts a link to a Google Drive folder
19:45 🔗 HP_Archiv Or file*
19:53 🔗 betamax HP_Archiv: I think it should be as simple as trying to load the URL in the wayback machine
19:53 🔗 betamax if the file is in the WBM, then you'll see the file
19:54 🔗 betamax otherwise you'll get a message like "this page is available on the web, save it now"
19:56 🔗 HP_Archiv Oh okay, then none of the links you helped pull are on WBM yet and probably still queued.
19:57 🔗 HP_Archiv For Google Drive files - will AB create a copy, pull down, a copy of a file that's hosted with GDrive or will it only archive the link?
19:57 🔗 HP_Archiv For example: https://hp-games.net/343
19:57 🔗 HP_Archiv On this page ^^ Game Mod files are hosted in two locations, one with Yandex, and the other in a Google Drive.
20:00 🔗 HP_Archiv And what I'd like to do with HP-Games.net is similar to with ModB - archive entire pages w/elements and also archive mod files that, while not hosted on the site directly, are linked from the site to online storage eg: Google Drive
20:10 🔗 betamax AB will probably only archive the link
20:11 🔗 betamax I think it archives all outgoing links from the page, but since the actual download link exists two levels deep (hp-games.net > gdrive info page > gdrive download) it won't get captured
20:11 🔗 markedL there's an API for wbm membership, if there's a lot to check
20:30 🔗 mike__ has quit IRC (Ping timeout: 260 seconds)
21:14 🔗 Pixi has quit IRC (Quit: Pixi)
21:36 🔗 BlueMax has joined #archiveteam-bs
22:04 🔗 Pixi has joined #archiveteam-bs
22:18 🔗 schbirid has quit IRC (Quit: Leaving)
22:42 🔗 Jon has quit IRC (Quit: ZNC - http://znc.in)
22:46 🔗 jmtd has joined #archiveteam-bs
23:37 🔗 dd33cc has joined #archiveteam-bs

irclogger-viewer