#archiveteam-bs 2019-11-08,Fri

↑back Search

Time	Nickname	Message
00:02 ^🔗	HP_Archiv	Hey guys, hope everyone's having a nice afternoon. Can someone archive the entirety of ModDB.com?
00:03 ^🔗	schbirid	i would love that
00:03 ^🔗	schbirid	if someone does please keep me in the loop
00:03 ^🔗		schbirid has quit IRC (Quit: Leaving)
00:05 ^🔗	HP_Archiv	@schbirid, sure thing.
00:06 ^🔗	HP_Archiv	I was told by -Archivist that #archiveteam might have archived the individual/hosted user files (mod creations) in the past, but that nobody's actually archived the entire site before. Considering the nature of the site, can someone in the #archiveteam submit this for entire-site archiving?
00:10 ^🔗	JAA	Just in case: no, probably not a good idea to throw this into ArchiveBot. At least the downloads need to be handled differently.
00:10 ^🔗		apache2 has joined #archiveteam-bs
00:10 ^🔗	HP_Archiv	@JAA, can you explain why?
00:10 ^🔗	JAA	HP_Archiv: ArchiveBot doesn't handle large files very well.
00:12 ^🔗	HP_Archiv	Hmm. Well the mod creations/user files are not that large. Using a map creation from the HP1 PC Game someone created, https://www.moddb.com/games/harry-potter-and-the-sorcerers-stone/addons/night-map
00:12 ^🔗	HP_Archiv	As an example ^^
00:12 ^🔗	JAA	I only looked at the last few uploads and saw several files over 1 GB there.
00:13 ^🔗	JAA	E.g. https://www.moddb.com/mods/doom-for-your-lazy-friends/downloads/doom-for-your-lazy-friends-part-1
00:13 ^🔗	JAA	And I mean, the entire thing is 13 TB. That would easily be the largest AB job ever.
00:14 ^🔗	HP_Archiv	Ah, I wasn't aware that people upload files that big on site/I've only ever seen files less than 1GB (until now)
00:14 ^🔗	HP_Archiv	Hmm
00:14 ^🔗	HP_Archiv	Can I curate a selection of individual pages for archiving then, instead of the whole site?
00:15 ^🔗	HP_Archiv	I'd at least like to see all entries Potter-game related archived. But if not, that's fine/understandable.
00:16 ^🔗	britmob	Yes, you can use individual links
00:17 ^🔗	JAA	(In case anyone's wondering, the largest AB job to date to my knowledge was for NDTV at just over 8 TiB.)
00:19 ^🔗	HP_Archiv	I was under the impression it was Google+ Pages? News to me ^^
00:19 ^🔗	JAA	I'm talking only about ArchiveBot.
00:19 ^🔗	JAA	Google+ was the largest distributed project of AT to date at 1.4 PiB.
00:21 ^🔗	HP_Archiv	Oh right, I keep forgetting that there are different, well, aspects to #archiveteam
00:21 ^🔗	HP_Archiv	Okay, well as it happens, I already have a spreadsheet of links ready for archiving in a spreadsheet ;)
00:22 ^🔗	HP_Archiv	Should I just paste them all here, or what?
00:23 ^🔗	JAA	We'd need a text file of URLs.
00:23 ^🔗	JAA	I'm not sure how this would work for downloads though. Are the download URLs constant?
00:24 ^🔗	JAA	(The link on that "Click to <filename> if it doesn't start automatically" page I mean.)
00:25 ^🔗	HP_Archiv	I believe so. I tested a few entries, and each ones prompts with the same thing - Click X File if it doesn't start automatically - and the offers the download as a .zip
00:26 ^🔗	HP_Archiv	Correction: Not all are .zip
00:27 ^🔗	HP_Archiv	How do I get you the text file?
00:28 ^🔗	JAA	https://transfer.notkiska.pw/
00:29 ^🔗	HP_Archiv	Okay, thanks @JAA. Give me a few minutes and I'll send it over. Just want to make sure I'm not missing any links.
00:34 ^🔗	JAA	HP_Archiv: Take your time. I'm going to bed now anyway. Will look into it tomorrow or on the weekend.
00:35 ^🔗	HP_Archiv	Okay, sure thing. Yeah the only way to get links is to open entries one at a time. And for everything Potter-related on ModDB, there are about 10 pages of results that come up when you search 'Potter'. I have 3/4 of them, but need to make some adjustments to the list I have. Have a nice evening.
00:37 ^🔗		robogoat_ has quit IRC (Ping timeout: 258 seconds)
00:37 ^🔗		robogoat has joined #archiveteam-bs
01:10 ^🔗		mike__ has joined #archiveteam-bs
01:11 ^🔗	mike__	hi, I've got a project to gather data and I need folks help with it. Is there a recommended way to propose things?
01:11 ^🔗	mike__	(Sorry if this is the wrong IRC room.)
01:11 ^🔗	arkiver	what project is it
01:11 ^🔗	mike__	Getting data from case.law, a database of scanned legal opinions hosted by Harvard.
01:12 ^🔗	mike__	It's behind a lock and key until 2024, but anybody can get 500 items/day, so we have tools to automate that.
01:21 ^🔗	markedL	I know that site. What have you done with it already?
01:22 ^🔗	mike__	We've got a macOS app with some users and a docker image folks can install. Those will check our servers for assignments, then go get the items as requested and send them back to us to push to the Internet Archive.
01:26 ^🔗	markedL	is there data already pushed to IA ?
01:27 ^🔗	mike__	Yeah, it's uploading daily.
01:27 ^🔗	mike__	We're keeping a bit of a lid on this though until we're done.
01:27 ^🔗	markedL	can you point us to where to find it on IA?
01:28 ^🔗		wm_ has joined #archiveteam-bs
01:28 ^🔗	mike__	No, I'd rather not for the moment.
01:29 ^🔗	mike__	Harvard has a sort of "we don't want to know about it" attitude, so we're only going to broadcast this once it's done.
01:29 ^🔗	markedL	What did you want us to help with, we have our own software for server and client side grabs, so it sounds like you already chose your own platform
01:30 ^🔗	mike__	Yeah, I just saw that, shoot, but we need people to install the macOS client or run the docker image until the project is done.
01:31 ^🔗	markedL	Harvard is bound what I believe was a contract with their technology provider, so no they don't want to be involved. Their agreement for 2024 was already concession enough.
01:32 ^🔗	mike__	The 500/day is in their contrat too.
01:32 ^🔗	mike__	They made sure of that.
01:32 ^🔗	arkiver	hi sorry, what website?
01:32 ^🔗	markedL	case.law
01:32 ^🔗	arkiver	nice
01:32 ^🔗	arkiver	with login only?
01:33 ^🔗	arkiver	or actually publicly visitable URLs after some login
01:33 ^🔗	arkiver	findable through some login*
01:33 ^🔗	Somebody2	britmob: If you want to reach out, feel free! Maybe they'll mail you a hard drive...
01:33 ^🔗	mike__	yeah, you need to pop your API key into the macOS client or the docker image
01:34 ^🔗	arkiver	I see they have bulk data links
01:34 ^🔗	mike__	Only for a couple jurisdictions.
01:34 ^🔗	mike__	We gathered that already.
01:34 ^🔗	markedL	the bulk data links is by the rights on the jurisidiction
01:34 ^🔗	markedL	Harvard is very liberal, but is bound by their upstream process
01:35 ^🔗	arkiver	this channel is logged
01:35 ^🔗	mike__	yeah, if the court checks a few boxes, Harvard can give it away, but only a few have (or probably will)
01:35 ^🔗	arkiver	make another channel maybe
01:36 ^🔗	arkiver	#allthecases
01:36 ^🔗	arkiver	mike__: ^
01:44 ^🔗	SketchCow	TODAY I LEARNED: https://en.wikipedia.org/wiki/More_Product,_Less_Process
01:49 ^🔗	astrid	it's you
02:04 ^🔗		tech234a has joined #archiveteam-bs
02:20 ^🔗		mike__ has quit IRC (Ping timeout: 260 seconds)
02:45 ^🔗		katocala has quit IRC ()
02:49 ^🔗		n00b161 has joined #archiveteam-bs
02:49 ^🔗		n00b161 has quit IRC (Client Quit)
02:52 ^🔗		katocala has joined #archiveteam-bs
02:53 ^🔗		ShellyRol has quit IRC (Read error: Connection reset by peer)
02:55 ^🔗		ShellyRol has joined #archiveteam-bs
03:14 ^🔗		kiskabak has quit IRC (Ping timeout (120 seconds))
03:15 ^🔗		kiskabak has joined #archiveteam-bs
03:15 ^🔗		Fusl__ sets mode: +o kiskabak
03:15 ^🔗		Fusl sets mode: +o kiskabak
03:15 ^🔗		Fusl_ sets mode: +o kiskabak
03:39 ^🔗		m007a83 has joined #archiveteam-bs
03:46 ^🔗		manjaro-u has quit IRC (Read error: Operation timed out)
03:53 ^🔗		BlueMax has joined #archiveteam-bs
04:14 ^🔗		tech234a has quit IRC (Quit: Connection closed for inactivity)
04:33 ^🔗		odemgi_ has joined #archiveteam-bs
04:39 ^🔗		odemgi has quit IRC (Read error: Operation timed out)
04:39 ^🔗		manjaro-u has joined #archiveteam-bs
04:40 ^🔗		qw3rty has joined #archiveteam-bs
04:49 ^🔗		qw3rty2 has quit IRC (Ping timeout: 745 seconds)
04:51 ^🔗		manjaro-u has quit IRC (Quit: Konversation terminated!)
05:38 ^🔗		Zeryl has joined #archiveteam-bs
05:38 ^🔗		manjaro-u has joined #archiveteam-bs
05:41 ^🔗	Zeryl	Hrm, is #urlteam private for a reason? Figured it'd be ok to join since it's a warrior project
05:49 ^🔗	markedL	it's not suppose to be private
05:57 ^🔗		manjaro-u has quit IRC (Quit: Konversation terminated!)
05:58 ^🔗	Zeryl	I'm a dolt, tried to join on the wrong network >.>
06:19 ^🔗		manjaro-u has joined #archiveteam-bs
06:49 ^🔗		omglolbah has quit IRC (Quit: ZNC - https://znc.in)
06:50 ^🔗		kiska has quit IRC (Remote host closed the connection)
06:50 ^🔗		Flashfire has quit IRC (Remote host closed the connection)
06:51 ^🔗		kiska has joined #archiveteam-bs
06:51 ^🔗		Fusl__ sets mode: +o kiska
06:51 ^🔗		Fusl sets mode: +o kiska
06:51 ^🔗		Fusl_ sets mode: +o kiska
06:51 ^🔗		Flashfire has joined #archiveteam-bs
06:53 ^🔗		omglolbah has joined #archiveteam-bs
06:55 ^🔗	godane	SketchCow: so i got a old CNN tape called Best of Play of the Day from 1991
06:55 ^🔗	godane	sponsored by The Athlete's Foot
09:10 ^🔗		Raccoon has quit IRC (Ping timeout: 612 seconds)
09:25 ^🔗		Raccoon has joined #archiveteam-bs
09:31 ^🔗	odemgi_	well this is bullshit: https://twitter.com/textfiles/status/1192518085997137920 I didn't know that you'd been working on gfycat and had just scraped half a mil urls myself, I guess someone has a bigger list than I do?
09:33 ^🔗	Kaz	odemgi_: yeah we've got a fair few. pop into #deadcat over on hackint if you're interested
09:43 ^🔗		odemgi_ has quit IRC (Quit: Leaving)
09:48 ^🔗		manjaro-u has quit IRC (Quit: Konversation terminated!)
10:00 ^🔗		manjaro-u has joined #archiveteam-bs
10:06 ^🔗		HP_Archiv has quit IRC (Ping timeout: 263 seconds)
10:14 ^🔗		manjaro-u has quit IRC (Quit: Konversation terminated!)
10:55 ^🔗		BlueMax has quit IRC (Read error: Connection reset by peer)
11:14 ^🔗		IAmbience has quit IRC (Quit: Connection closed for inactivity)
11:26 ^🔗		Tenebrae has quit IRC (Read error: Operation timed out)
11:41 ^🔗		Tenebrae has joined #archiveteam-bs
11:49 ^🔗		HP_Archiv has joined #archiveteam-bs
11:50 ^🔗	HP_Archiv	#archiveteam-bs I have a text file ready of links from ModDB.com that are all entries related to Harry Potter game-related content and development. Uploaded the text file to https://transfer.notkiska.pw/ and the link is:
11:50 ^🔗	HP_Archiv	https://transfer.notkiska.pw/yeOxD/ModDB_Potter_11.2019.txt
11:50 ^🔗	HP_Archiv	Can someone please ingest all of the URLs into Archivebot for archiving?
11:56 ^🔗	betamax	HP_Archiv: I can, but you're welcome to do it yourself (you don't need ops or voice to do a "!ao" command, which is what you'd use for ingesting from a list of URLs)
11:56 ^🔗	betamax	the command you'd want to do (in #archivebot) would be "!ao < https://transfer.notkiska.pw/yeOxD/ModDB_Potter_11.2019.txt"
11:57 ^🔗	HP_Archiv	Oh cool, okay. I didn't know anyone can self-submit content into archivebot
11:57 ^🔗	HP_Archiv	Let me try it out now
11:58 ^🔗	betamax	yup, you need voice / ops for recursive archiving ("!a") but for "!ao" you don't
11:59 ^🔗	HP_Archiv	I think I did it correctly ^^ Thank you
11:59 ^🔗	betamax	looks good to me (you can watch the progress at http://dashboard.at.ninjawedding.org/3 )
12:00 ^🔗	HP_Archiv	Awesome. Thanks again :)
12:04 ^🔗	HP_Archiv	Hm, just watching it now @betamax, it appears to have hit an error?
12:06 ^🔗	betamax	if you mean it gets stuck at the end, I think that happens with all jobs
12:06 ^🔗		tuluu_ has joined #archiveteam-bs
12:06 ^🔗		tuluu has quit IRC (Read error: Connection reset by peer)
12:07 ^🔗	betamax	I can't see the log, unfortunately, since it's finished it doesn't show up on the dashboard page if I load it now
12:07 ^🔗	betamax	what was the error?
12:07 ^🔗	HP_Archiv	Oh, I think that's what it was because #archivebot is saying job finished
12:08 ^🔗	HP_Archiv	How do I view the results?
12:08 ^🔗	betamax	I think the download will now be sitting on a staging server, before it's uploaded into archive.org and then added into the wayback machine
12:08 ^🔗	HP_Archiv	What I'd like to do is make sure that it actually captured the direct download options on each mod pages/each hosted file
12:10 ^🔗	betamax	I'm not 100% sure there is a way to do that (although I'm not involved with the running of archivebot, just a happy user, so perhaps someone will know)
12:11 ^🔗	HP_Archiv	I actually just checked. It appears all of the links I gathered were just to the entries, and I think you're right. Archivebot only capture those page URLs, not the direct download URL's on each page, example: https://www.moddb.com/games/harry-potter-and-the-sorcerers-stone/addons/sorcerers-stone-custom-map-levunr
12:12 ^🔗	HP_Archiv	I'd have to go through the 100+ links I just submitted and pull the sub-download links from each page...
12:13 ^🔗	betamax	yeah, just looked at a URL myself. Since the "Download now" button isn't a direct link itself, but opens a JS popup, the direct links won't have been captured
12:14 ^🔗	betamax	however, there are probably easier ways than going through every page by hand
12:14 ^🔗	HP_Archiv	Do tell, 'cause that would save me a lot of time ^^
12:15 ^🔗	betamax	gimme a few minutes with one of my scripts :)
12:15 ^🔗	HP_Archiv	Sure thing, take your time. Appreciate your help :)
12:34 ^🔗	JAA	HP_Archiv: I can assure you it didn't grab the downloads.
12:35 ^🔗	JAA	It will only have grabbed the pages in your text file plus images, stylesheets, etc.
12:35 ^🔗	HP_Archiv	@JAA, yeah I realized that after the fact. @betamax was kind enough to assist, hopefully I can get the exact URL download paths into #archivebot in an easy fashion
12:36 ^🔗	JAA	And yeah, the uploads go to an intermediate server and will show up on the Internet Archive sometime soonish probably.
12:36 ^🔗	betamax	huh, I've been distracted by the fact that moddb downloads can be discovered using numerically incrementing IDs...
12:37 ^🔗	betamax	so doing a grab of ALL content probably would be quite easy :)
12:37 ^🔗	JAA	Yes, it would. Just not with AB.
12:38 ^🔗	betamax	yeah, would probably be a warrior project. I'll try and make a quite wiki page and note for ModDB later
12:38 ^🔗	betamax	in case it's ever needed
12:38 ^🔗	JAA	Yeah, sounds good.
12:38 ^🔗	JAA	The site seems stable at the moment.
12:38 ^🔗	HP_Archiv	@betamax, that would great. Thank you ^^
12:42 ^🔗	HP_Archiv	Has anyone given attention to the site, TCRF.net ? Example: https://tcrf.net/Prerelease:Harry_Potter_and_the_Sorcerer%27s_Stone_(Windows,_Mac_OS_Classic,_Mac_OS_X)
12:43 ^🔗	HP_Archiv	Bad link ^^ , Correct link: tcrf.net/Prerelease:Harry_Potter_and_the_Sorcerer%27s_Stone_(Windows,_Mac_OS_Classic,_Mac_OS_X)
12:43 ^🔗	HP_Archiv	If not, I'd like to submit the site for archiving
12:44 ^🔗	JAA	Yeah, looks like it was archived with ArchiveBot in March.
12:45 ^🔗		IAmbience has joined #archiveteam-bs
12:46 ^🔗	HP_Archiv	Okay good. And it capture all elements on a pages with hosted media? example: https://tcrf.net/Harry_Potter_and_the_Sorcerer%27s_Stone_(Windows,_Mac_OS_Classic,_Mac_OS_X)#Unused_Sounds
12:46 ^🔗	HP_Archiv	I'd check myself but not sure how to do that
12:48 ^🔗	JAA	All ArchiveBot crawls end up in the Wayback Machine (eventually).
12:48 ^🔗	JAA	This is the AB snapshot of that page: https://web.archive.org/web/20190302045109/https://tcrf.net/Harry_Potter_and_the_Sorcerer's_Stone_(Windows,_Mac_OS_Classic,_Mac_OS_X)
12:50 ^🔗	JAA	To check for the individual audio files in this case, you need to copy their URL and edit it to e.g. https://web.archive.org/web/*/https://tcrf.net/images/5/5c/HPSSWin-bats_squeaking1.ogg . Then you see that AB did indeed capture that as well on 2019-03-02.
12:53 ^🔗		hata has joined #archiveteam-bs
12:54 ^🔗	HP_Archiv	Awesome. Thank you for the explanation @JAA, appreciate it
12:54 ^🔗	HP_Archiv	I tested that myself and was able to pull up a different link from another Potter entry. Good to go
12:56 ^🔗	JAA	:-)
12:57 ^🔗	JAA	If you want a local copy, the data is somewhere in the ArchiveBot collection on the Internet Archive, but be warned that it'll be a pain to find those files since the viewer is broken currently.
12:58 ^🔗	HP_Archiv	Not sure I follow - I tried to right click save as on a random audio file hosted from another game entry. Wasn't hard to do find?
12:58 ^🔗	HP_Archiv	Hard to find*
12:59 ^🔗	HP_Archiv	Or do you mean a local copy of the entire capture?
13:01 ^🔗	JAA	Yeah, the entire thing.
13:01 ^🔗	JAA	And in the actual archival format (WARC) rather than plain files.
13:01 ^🔗	betamax	that list contained 58 URLs to pages that had a "download" button, the rest must be search results, category pages or images
13:01 ^🔗	betamax	https://transfer.notkiska.pw/iOZLe/hp.list
13:03 ^🔗	betamax	I've saved both the actual download link and the page that opens in the popup, as that should mean the popup with download link works in the wayback (but no guarentees as the wayback doesn't always get these things right)
13:03 ^🔗	betamax	I'll let you check it over and add it into archivebot
13:03 ^🔗	HP_Archiv	Hm, I've actually never tried to download in WARC before. So for example, how would I download this entire page as an archival file? https://web.archive.org/web/20190302045109/https://tcrf.net/Harry_Potter_and_the_Sorcerer's_Stone_(Windows,_Mac_OS_Classic,_Mac_OS_X)
13:04 ^🔗	HP_Archiv	Okay @betamax. Thank you very much. I'll have a look now
13:06 ^🔗	JAA	HP_Archiv: That's not possible as far as I know. You could save the Wayback Machine page as WARC, but that's not the same as the original data because links get rewritten to the WBM etc. There are partial ways around that, but you can't reproduce the original retrieval from the WBM perfectly. In this case, you'd have to download the WARC files ArchiveBot produced for the entire tcrf.net crawl. (Yes,
13:06 ^🔗	JAA	that will be large.)
13:08 ^🔗	HP_Archiv	Huh, so I see we still have a ways to go for perfect website preservation (or maybe there's no such thing due to the nature of hyperlinks?)
13:09 ^🔗	HP_Archiv	@betamax. Looks good, man. Thank you very much getting all of these links. Saved me a lot of time ^^
13:12 ^🔗	betamax	no problem! it's a very simple python script which I'll add to the wiki when I have time
13:12 ^🔗	betamax	since I have no clue, how reliable is ModDB? Is it stable? Do things ever / regularly get deleted? (wondering if this would be a good archival candidate at some point)
13:13 ^🔗	JAA	Well, the data in those WARCs is pretty much the best you can get. There's still things that need to be improved, e.g. JavaScript handling (solvable by using automated browsers for crawling, but that's very slow in comparison due to all the rendering etc.), DNS preservation, and SSL/TLS certificates, but any individual URL capture is essentially perfect in these WARCs.
13:15 ^🔗	HP_Archiv	@Betamax. I'm not a gamer by any means. A quick search brings up the Wikipedia entry for ModDB, https://en.wikipedia.org/wiki/Mod_DB and it seems like it's frequently accessed by a lot of gamers/modders.
13:16 ^🔗	HP_Archiv	So perhaps it's stable - for now. Ingesting the Potter-game entries into archivebot is one small aspect of a much larger project I and others are working on, seeing to it that these early 00's Potter PC games are preserved.
13:17 ^🔗	HP_Archiv	I might have mentioned this in here yesterday, but I'm working with several former and current Warner Bros executives and one person out of the LoC's video game workflow to track down a prototype/dev source archive for HP 1, the first ever game
13:19 ^🔗	HP_Archiv	When I was much younger, I played these games as a kid. And if you go on YT for the gameplay, you'll find a lot of people - not even gamer, per se - are nostalgica for these particular games. So they had a fairly strong hold in the culture (obviously, it's HP) but apparently still do as there is an active HP Modding server on Discord.
13:21 ^🔗		deevious has quit IRC (Read error: Connection reset by peer)
13:22 ^🔗	HP_Archiv	The prototype source code is like the holy grail for these games, because Sorcerer's Stone is the oldest, almost 20 years old, and as mentioned the first Potter game released. With that, the game can be rebuilt, ground up. And the person from the LoC I've been in talks with has said, 'they're very much interested in participating in conversations around acquiring digital assets/proto dev archives'. Surprisingly.
13:22 ^🔗		deevious has joined #archiveteam-bs
13:22 ^🔗	HP_Archiv	Anyway, hope that answers your question in a round about sort of way Lol
13:23 ^🔗	HP_Archiv	@JAA, noted. How do I download a WARC file from WBM?
13:26 ^🔗	JAA	HP_Archiv: You don't. You download them from the Internet Archive instead. The WBM is essentially just an index of and interface to all the WARC data residing in IA. The AB data is in https://archive.org/details/archivebot , but as you will quickly realise, all the various ArchiveBot jobs are mixed together, so it's a mess to find the data of a particular job. That's why the AB viewer was written
13:26 ^🔗	JAA	many moons ago, to make it easier to find the files, but as mentioned it's broken at the moment.
13:28 ^🔗		deevious has quit IRC (Ping timeout: 252 seconds)
13:30 ^🔗	HP_Archiv	@JAA: I'm seeing that, yeah. Huh, why weren't these automatically uploaded to IA with their corresponding website names?
13:31 ^🔗	Sanqui	HP_Archiv: Are you also in contact with Griptonite/KnowWonder folks?
13:36 ^🔗	HP_Archiv	@Sanqui, it's a mix, and it has not been easy. Former head of licensing in the same WB department, who oversaw this exact game titles, put us in touch with the vp of tech at WBIE, Warner B. Interactive Entertainment. We've been emailing for a few months, the latter pointed us in the direction of several people formerly of Foundation 9, some of who were working at KnowWonder/Amaze during the dev time for these Potter games.
13:37 ^🔗	Sanqui	HP_Archiv: I'm more than interested in any leads for the GB games, in particular HP1/2 GB and HP3 GBA.
13:38 ^🔗	HP_Archiv	The latest contact, a former dev who worked directly on Sorcerer's Stone, gave me a list of possible lead, people who might've held onto a copy of the proto files. He at one point had the E3 2001 proto - basically a test map - but had these on CD-RWs and were unreadable after a certain point a few years back.
13:39 ^🔗	Sanqui	Image them anyway, some data could be recovered
13:39 ^🔗	HP_Archiv	He has since destroyed the discs...
13:39 ^🔗	Sanqui	good jorb.
13:40 ^🔗	HP_Archiv	But the data he has, as far as we can tell, was not the actual HP 1 proto dev archive of files, which would look like this, tcrf.net/Proto:Harry_Potter_and_the_Chamber_of_Secrets_(Windows,_Mac_OS_Classic,_Mac_OS_X)
13:41 ^🔗	HP_Archiv	We actually have Chamber of Secrets, HP 2's full prototype. A former developer who luckily held onto the entire directory gave it to the HP modding community a few years ago.
13:41 ^🔗	HP_Archiv	Data he had*
13:42 ^🔗	HP_Archiv	Also, we're not focused on other platforms. There's been work with the Gamecube versions of each game, but it's difficult to mod non-pc games as other platforms had game released in a more story mode, rather than open world
13:42 ^🔗	Sanqui	Cool stuff, cool stuff.
13:42 ^🔗	Sanqui	The GBC games were really cool western JRPGs, not platformers
13:43 ^🔗	HP_Archiv	Ah okay, I never played the GBC versions
13:44 ^🔗	Sanqui	I do recommend giving them a shot some day. But that's OT :P
13:44 ^🔗	HP_Archiv	But anyway, yeah. It's a real headache trying to find HP 1 proto. We don't even know if it exists still. Early 2000s was still a time when people used CD-Rs and external drives were not common yet.
13:45 ^🔗	betamax	HP_Archiv: just fyi, I didn't put those links into archivebot, as I thought you'd want to check over them first (in case you thought I had while I thought you would... etc)
13:45 ^🔗	HP_Archiv	We have confirmed with EA Archives that they do have the final source code for the commercial/retail release of the game. But they declined having any proto/development files.
13:45 ^🔗	Sanqui	I'm surprised they're so communicative
13:46 ^🔗	HP_Archiv	I've had a small group of people helping me with this - we've been, I should say, unforgiving in our efforts to push forward and press for information :)
13:46 ^🔗	HP_Archiv	@betamax. Thanks - I actually already submitted into AB. I believe the job is already done ^^
13:50 ^🔗	HP_Archiv	But yeah, the former head of licensing who oversaw licensing for these games was actually quite interested in what we were trying to do. And both people from WB were surprised to hear that the Library of Congress was even interested in participating in these conversations. But I reached out to this guy, https://blogs.loc.gov/thesignal/2012/09/yes-the-library-of-congress-has-video-games-an-interview-with-david-gibson/, about a year a
13:50 ^🔗	HP_Archiv	He helped with the acquisition of physical copies of each of the Potter games into their collections and preservation workflow, which I believe includes ISO imaging
13:52 ^🔗	HP_Archiv	It's a small operation, which is located in their motion-picture film division (video games fall under 'moving images') but it's a start. Anyway, I've written a novel in here.
13:52 ^🔗	HP_Archiv	Thank you all for your help :)
13:56 ^🔗	Sanqui	Your IRC client cut off one of your mesages, beginning with "about a year a[...]"
13:56 ^🔗	Sanqui	still, cool stuff. lemme know if you hear anything about/from the gameboy team :D
13:58 ^🔗	HP_Archiv	'He helped with the acquisition of physical copies of each of the Potter games into their collections and preservation workflow, which I believe includes ISO imaging. It's a small operation, which is located in their motion-picture film division (video games fall under 'moving images') but it's a start.'
13:58 ^🔗	HP_Archiv	Heh will do ^^
13:58 ^🔗	Sanqui	Oh, that message came through, just not the "ago" part in "a year ago" I guess XD
13:59 ^🔗	HP_Archiv	Odd, well no worries
14:00 ^🔗	HP_Archiv	Again thanks everyone for the help/explanations ^^
14:00 ^🔗	JAA	HP_Archiv: Uploading one item per job is actually not possible because items are size-limited. This has in fact caused problems before because some pipelines did (attempt to) upload per-job items.
14:01 ^🔗	JAA	And yeah, the web chat thingy sucks. Messages in IRC have a length limit, and that web chat just cuts them off instead of splitting up into multiple messages as any sane client would do.
14:03 ^🔗	HP_Archiv	@JAA I think the last ingest of URL's in the text file, 'https://transfer.notkiska.pw/PvcO6/ModDB_Potter_Downloads_URLs_11.2019.txt' was successful though?
14:05 ^🔗	JAA	HP_Archiv: Seems like it, yes. I suggest you double-check though once it's in the Wayback Machine that it didn't get any "Download Link Expired" pages or similar.
14:06 ^🔗	JAA	Apparently the download URLs are not dependent on the UA or IP, but they do expire periodically.
14:11 ^🔗	HP_Archiv	Okay @JAA will d ^^
14:11 ^🔗	HP_Archiv	will do*
14:12 ^🔗		odemgi has joined #archiveteam-bs
14:31 ^🔗		systwi_ is now known as systwi
14:44 ^🔗		deevious has joined #archiveteam-bs
15:14 ^🔗		manjaro-u has joined #archiveteam-bs
16:25 ^🔗		Sokar has quit IRC (Remote host closed the connection)
16:30 ^🔗		X-Scale has quit IRC (Ping timeout: 252 seconds)
16:31 ^🔗		[X-Scale] has joined #archiveteam-bs
16:31 ^🔗		[X-Scale] is now known as X-Scale
16:32 ^🔗		Video has quit IRC (Quit: Page closed)
16:32 ^🔗		deevious has quit IRC (Ping timeout: 252 seconds)
16:33 ^🔗		Video has joined #archiveteam-bs
16:36 ^🔗		manjaro-u has quit IRC (Konversation terminated!)
16:47 ^🔗		manjaro-u has joined #archiveteam-bs
17:10 ^🔗		schbirid has joined #archiveteam-bs
17:15 ^🔗		manjaro-u has quit IRC (Konversation terminated!)
17:17 ^🔗		Sokar has joined #archiveteam-bs
17:37 ^🔗		akierig has joined #archiveteam-bs
17:50 ^🔗		mike__ has joined #archiveteam-bs
17:51 ^🔗	mike__	We were chatting here last night (PST) about gathering content from case.law. If anybody is interested in discussing that project, I'm over in #allthecases.
17:59 ^🔗		omglolba- has joined #archiveteam-bs
18:06 ^🔗		omglolbah has quit IRC (Ping timeout: 745 seconds)
18:11 ^🔗		tuluu_ has quit IRC (Read error: Connection refused)
18:12 ^🔗		tuluu has joined #archiveteam-bs
18:15 ^🔗		bluefoo has quit IRC (Ping timeout: 255 seconds)
18:23 ^🔗		Video has quit IRC (Quit: Page closed)
18:25 ^🔗		manjaro-u has joined #archiveteam-bs
18:39 ^🔗		omglolbah has joined #archiveteam-bs
18:39 ^🔗		DogsRNice has joined #archiveteam-bs
18:40 ^🔗		omglolba- has quit IRC (Read error: Operation timed out)
19:23 ^🔗		akierig has quit IRC (Quit: later_gator)
19:31 ^🔗		bluefoo has joined #archiveteam-bs
19:33 ^🔗	HP_Archiv	Good morning guys. @JAA, if you're around, how would I go about searching for those ModDB links to see if they're already in WBM?
19:34 ^🔗	HP_Archiv	Apologies if you explained this earlier
19:41 ^🔗	HP_Archiv	Also, how does AB handle links to files hosted in a public Google Drive? eg: A site hosts a link to a Google Drive folder
19:45 ^🔗	HP_Archiv	Or file*
19:53 ^🔗	betamax	HP_Archiv: I think it should be as simple as trying to load the URL in the wayback machine
19:53 ^🔗	betamax	if the file is in the WBM, then you'll see the file
19:54 ^🔗	betamax	otherwise you'll get a message like "this page is available on the web, save it now"
19:56 ^🔗	HP_Archiv	Oh okay, then none of the links you helped pull are on WBM yet and probably still queued.
19:57 ^🔗	HP_Archiv	For Google Drive files - will AB create a copy, pull down, a copy of a file that's hosted with GDrive or will it only archive the link?
19:57 ^🔗	HP_Archiv	For example: https://hp-games.net/343
19:57 ^🔗	HP_Archiv	On this page ^^ Game Mod files are hosted in two locations, one with Yandex, and the other in a Google Drive.
20:00 ^🔗	HP_Archiv	And what I'd like to do with HP-Games.net is similar to with ModB - archive entire pages w/elements and also archive mod files that, while not hosted on the site directly, are linked from the site to online storage eg: Google Drive
20:10 ^🔗	betamax	AB will probably only archive the link
20:11 ^🔗	betamax	I think it archives all outgoing links from the page, but since the actual download link exists two levels deep (hp-games.net > gdrive info page > gdrive download) it won't get captured
20:11 ^🔗	markedL	there's an API for wbm membership, if there's a lot to check
20:30 ^🔗		mike__ has quit IRC (Ping timeout: 260 seconds)
21:14 ^🔗		Pixi has quit IRC (Quit: Pixi)
21:36 ^🔗		BlueMax has joined #archiveteam-bs
22:04 ^🔗		Pixi has joined #archiveteam-bs
22:18 ^🔗		schbirid has quit IRC (Quit: Leaving)
22:42 ^🔗		Jon has quit IRC (Quit: ZNC - http://znc.in)
22:46 ^🔗		jmtd has joined #archiveteam-bs
23:37 ^🔗		dd33cc has joined #archiveteam-bs

irclogger-viewer