Time |
Nickname |
Message |
00:02
🔗
|
HP_Archiv |
Hey guys, hope everyone's having a nice afternoon. Can someone archive the entirety of ModDB.com? |
00:03
🔗
|
schbirid |
i would love that |
00:03
🔗
|
schbirid |
if someone does please keep me in the loop |
00:03
🔗
|
|
schbirid has quit IRC (Quit: Leaving) |
00:05
🔗
|
HP_Archiv |
@schbirid, sure thing. |
00:06
🔗
|
HP_Archiv |
I was told by -Archivist that #archiveteam might have archived the individual/hosted user files (mod creations) in the past, but that nobody's actually archived the entire site before. Considering the nature of the site, can someone in the #archiveteam submit this for entire-site archiving? |
00:10
🔗
|
JAA |
Just in case: no, probably not a good idea to throw this into ArchiveBot. At least the downloads need to be handled differently. |
00:10
🔗
|
|
apache2 has joined #archiveteam-bs |
00:10
🔗
|
HP_Archiv |
@JAA, can you explain why? |
00:10
🔗
|
JAA |
HP_Archiv: ArchiveBot doesn't handle large files very well. |
00:12
🔗
|
HP_Archiv |
Hmm. Well the mod creations/user files are not that large. Using a map creation from the HP1 PC Game someone created, https://www.moddb.com/games/harry-potter-and-the-sorcerers-stone/addons/night-map |
00:12
🔗
|
HP_Archiv |
As an example ^^ |
00:12
🔗
|
JAA |
I only looked at the last few uploads and saw several files over 1 GB there. |
00:13
🔗
|
JAA |
E.g. https://www.moddb.com/mods/doom-for-your-lazy-friends/downloads/doom-for-your-lazy-friends-part-1 |
00:13
🔗
|
JAA |
And I mean, the entire thing is 13 TB. That would easily be the largest AB job ever. |
00:14
🔗
|
HP_Archiv |
Ah, I wasn't aware that people upload files that big on site/I've only ever seen files less than 1GB (until now) |
00:14
🔗
|
HP_Archiv |
Hmm |
00:14
🔗
|
HP_Archiv |
Can I curate a selection of individual pages for archiving then, instead of the whole site? |
00:15
🔗
|
HP_Archiv |
I'd at least like to see all entries Potter-game related archived. But if not, that's fine/understandable. |
00:16
🔗
|
britmob |
Yes, you can use individual links |
00:17
🔗
|
JAA |
(In case anyone's wondering, the largest AB job to date to my knowledge was for NDTV at just over 8 TiB.) |
00:19
🔗
|
HP_Archiv |
I was under the impression it was Google+ Pages? News to me ^^ |
00:19
🔗
|
JAA |
I'm talking only about ArchiveBot. |
00:19
🔗
|
JAA |
Google+ was the largest distributed project of AT to date at 1.4 PiB. |
00:21
🔗
|
HP_Archiv |
Oh right, I keep forgetting that there are different, well, aspects to #archiveteam |
00:21
🔗
|
HP_Archiv |
Okay, well as it happens, I already have a spreadsheet of links ready for archiving in a spreadsheet ;) |
00:22
🔗
|
HP_Archiv |
Should I just paste them all here, or what? |
00:23
🔗
|
JAA |
We'd need a text file of URLs. |
00:23
🔗
|
JAA |
I'm not sure how this would work for downloads though. Are the download URLs constant? |
00:24
🔗
|
JAA |
(The link on that "Click to <filename> if it doesn't start automatically" page I mean.) |
00:25
🔗
|
HP_Archiv |
I believe so. I tested a few entries, and each ones prompts with the same thing - Click X File if it doesn't start automatically - and the offers the download as a .zip |
00:26
🔗
|
HP_Archiv |
Correction: Not all are .zip |
00:27
🔗
|
HP_Archiv |
How do I get you the text file? |
00:28
🔗
|
JAA |
https://transfer.notkiska.pw/ |
00:29
🔗
|
HP_Archiv |
Okay, thanks @JAA. Give me a few minutes and I'll send it over. Just want to make sure I'm not missing any links. |
00:34
🔗
|
JAA |
HP_Archiv: Take your time. I'm going to bed now anyway. Will look into it tomorrow or on the weekend. |
00:35
🔗
|
HP_Archiv |
Okay, sure thing. Yeah the only way to get links is to open entries one at a time. And for everything Potter-related on ModDB, there are about 10 pages of results that come up when you search 'Potter'. I have 3/4 of them, but need to make some adjustments to the list I have. Have a nice evening. |
00:37
🔗
|
|
robogoat_ has quit IRC (Ping timeout: 258 seconds) |
00:37
🔗
|
|
robogoat has joined #archiveteam-bs |
01:10
🔗
|
|
mike__ has joined #archiveteam-bs |
01:11
🔗
|
mike__ |
hi, I've got a project to gather data and I need folks help with it. Is there a recommended way to propose things? |
01:11
🔗
|
mike__ |
(Sorry if this is the wrong IRC room.) |
01:11
🔗
|
arkiver |
what project is it |
01:11
🔗
|
mike__ |
Getting data from case.law, a database of scanned legal opinions hosted by Harvard. |
01:12
🔗
|
mike__ |
It's behind a lock and key until 2024, but anybody can get 500 items/day, so we have tools to automate that. |
01:21
🔗
|
markedL |
I know that site. What have you done with it already? |
01:22
🔗
|
mike__ |
We've got a macOS app with some users and a docker image folks can install. Those will check our servers for assignments, then go get the items as requested and send them back to us to push to the Internet Archive. |
01:26
🔗
|
markedL |
is there data already pushed to IA ? |
01:27
🔗
|
mike__ |
Yeah, it's uploading daily. |
01:27
🔗
|
mike__ |
We're keeping a bit of a lid on this though until we're done. |
01:27
🔗
|
markedL |
can you point us to where to find it on IA? |
01:28
🔗
|
|
wm_ has joined #archiveteam-bs |
01:28
🔗
|
mike__ |
No, I'd rather not for the moment. |
01:29
🔗
|
mike__ |
Harvard has a sort of "we don't want to know about it" attitude, so we're only going to broadcast this once it's done. |
01:29
🔗
|
markedL |
What did you want us to help with, we have our own software for server and client side grabs, so it sounds like you already chose your own platform |
01:30
🔗
|
mike__ |
Yeah, I just saw that, shoot, but we need people to install the macOS client or run the docker image until the project is done. |
01:31
🔗
|
markedL |
Harvard is bound what I believe was a contract with their technology provider, so no they don't want to be involved. Their agreement for 2024 was already concession enough. |
01:32
🔗
|
mike__ |
The 500/day is in their contrat too. |
01:32
🔗
|
mike__ |
They made sure of that. |
01:32
🔗
|
arkiver |
hi sorry, what website? |
01:32
🔗
|
markedL |
case.law |
01:32
🔗
|
arkiver |
nice |
01:32
🔗
|
arkiver |
with login only? |
01:33
🔗
|
arkiver |
or actually publicly visitable URLs after some login |
01:33
🔗
|
arkiver |
findable through some login* |
01:33
🔗
|
Somebody2 |
britmob: If *you* want to reach out, feel free! Maybe they'll mail you a hard drive... |
01:33
🔗
|
mike__ |
yeah, you need to pop your API key into the macOS client or the docker image |
01:34
🔗
|
arkiver |
I see they have bulk data links |
01:34
🔗
|
mike__ |
Only for a couple jurisdictions. |
01:34
🔗
|
mike__ |
We gathered that already. |
01:34
🔗
|
markedL |
the bulk data links is by the rights on the jurisidiction |
01:34
🔗
|
markedL |
Harvard is very liberal, but is bound by their upstream process |
01:35
🔗
|
arkiver |
this channel is logged |
01:35
🔗
|
mike__ |
yeah, if the court checks a few boxes, Harvard can give it away, but only a few have (or probably will) |
01:35
🔗
|
arkiver |
make another channel maybe |
01:36
🔗
|
arkiver |
#allthecases |
01:36
🔗
|
arkiver |
mike__: ^ |
01:44
🔗
|
SketchCow |
TODAY I LEARNED: https://en.wikipedia.org/wiki/More_Product,_Less_Process |
01:49
🔗
|
astrid |
it's you |
02:04
🔗
|
|
tech234a has joined #archiveteam-bs |
02:20
🔗
|
|
mike__ has quit IRC (Ping timeout: 260 seconds) |
02:45
🔗
|
|
katocala has quit IRC () |
02:49
🔗
|
|
n00b161 has joined #archiveteam-bs |
02:49
🔗
|
|
n00b161 has quit IRC (Client Quit) |
02:52
🔗
|
|
katocala has joined #archiveteam-bs |
02:53
🔗
|
|
ShellyRol has quit IRC (Read error: Connection reset by peer) |
02:55
🔗
|
|
ShellyRol has joined #archiveteam-bs |
03:14
🔗
|
|
kiskabak has quit IRC (Ping timeout (120 seconds)) |
03:15
🔗
|
|
kiskabak has joined #archiveteam-bs |
03:15
🔗
|
|
Fusl__ sets mode: +o kiskabak |
03:15
🔗
|
|
Fusl sets mode: +o kiskabak |
03:15
🔗
|
|
Fusl_ sets mode: +o kiskabak |
03:39
🔗
|
|
m007a83 has joined #archiveteam-bs |
03:46
🔗
|
|
manjaro-u has quit IRC (Read error: Operation timed out) |
03:53
🔗
|
|
BlueMax has joined #archiveteam-bs |
04:14
🔗
|
|
tech234a has quit IRC (Quit: Connection closed for inactivity) |
04:33
🔗
|
|
odemgi_ has joined #archiveteam-bs |
04:39
🔗
|
|
odemgi has quit IRC (Read error: Operation timed out) |
04:39
🔗
|
|
manjaro-u has joined #archiveteam-bs |
04:40
🔗
|
|
qw3rty has joined #archiveteam-bs |
04:49
🔗
|
|
qw3rty2 has quit IRC (Ping timeout: 745 seconds) |
04:51
🔗
|
|
manjaro-u has quit IRC (Quit: Konversation terminated!) |
05:38
🔗
|
|
Zeryl has joined #archiveteam-bs |
05:38
🔗
|
|
manjaro-u has joined #archiveteam-bs |
05:41
🔗
|
Zeryl |
Hrm, is #urlteam private for a reason? Figured it'd be ok to join since it's a warrior project |
05:49
🔗
|
markedL |
it's not suppose to be private |
05:57
🔗
|
|
manjaro-u has quit IRC (Quit: Konversation terminated!) |
05:58
🔗
|
Zeryl |
I'm a dolt, tried to join on the wrong network >.> |
06:19
🔗
|
|
manjaro-u has joined #archiveteam-bs |
06:49
🔗
|
|
omglolbah has quit IRC (Quit: ZNC - https://znc.in) |
06:50
🔗
|
|
kiska has quit IRC (Remote host closed the connection) |
06:50
🔗
|
|
Flashfire has quit IRC (Remote host closed the connection) |
06:51
🔗
|
|
kiska has joined #archiveteam-bs |
06:51
🔗
|
|
Fusl__ sets mode: +o kiska |
06:51
🔗
|
|
Fusl sets mode: +o kiska |
06:51
🔗
|
|
Fusl_ sets mode: +o kiska |
06:51
🔗
|
|
Flashfire has joined #archiveteam-bs |
06:53
🔗
|
|
omglolbah has joined #archiveteam-bs |
06:55
🔗
|
godane |
SketchCow: so i got a old CNN tape called Best of Play of the Day from 1991 |
06:55
🔗
|
godane |
sponsored by The Athlete's Foot |
09:10
🔗
|
|
Raccoon has quit IRC (Ping timeout: 612 seconds) |
09:25
🔗
|
|
Raccoon has joined #archiveteam-bs |
09:31
🔗
|
odemgi_ |
well this is bullshit: https://twitter.com/textfiles/status/1192518085997137920 I didn't know that you'd been working on gfycat and had just scraped half a mil urls myself, I guess someone has a bigger list than I do? |
09:33
🔗
|
Kaz |
odemgi_: yeah we've got a fair few. pop into #deadcat over on hackint if you're interested |
09:43
🔗
|
|
odemgi_ has quit IRC (Quit: Leaving) |
09:48
🔗
|
|
manjaro-u has quit IRC (Quit: Konversation terminated!) |
10:00
🔗
|
|
manjaro-u has joined #archiveteam-bs |
10:06
🔗
|
|
HP_Archiv has quit IRC (Ping timeout: 263 seconds) |
10:14
🔗
|
|
manjaro-u has quit IRC (Quit: Konversation terminated!) |
10:55
🔗
|
|
BlueMax has quit IRC (Read error: Connection reset by peer) |
11:14
🔗
|
|
IAmbience has quit IRC (Quit: Connection closed for inactivity) |
11:26
🔗
|
|
Tenebrae has quit IRC (Read error: Operation timed out) |
11:41
🔗
|
|
Tenebrae has joined #archiveteam-bs |
11:49
🔗
|
|
HP_Archiv has joined #archiveteam-bs |
11:50
🔗
|
HP_Archiv |
#archiveteam-bs I have a text file ready of links from ModDB.com that are all entries related to Harry Potter game-related content and development. Uploaded the text file to https://transfer.notkiska.pw/ and the link is: |
11:50
🔗
|
HP_Archiv |
https://transfer.notkiska.pw/yeOxD/ModDB_Potter_11.2019.txt |
11:50
🔗
|
HP_Archiv |
Can someone please ingest all of the URLs into Archivebot for archiving? |
11:56
🔗
|
betamax |
HP_Archiv: I can, but you're welcome to do it yourself (you don't need ops or voice to do a "!ao" command, which is what you'd use for ingesting from a list of URLs) |
11:56
🔗
|
betamax |
the command you'd want to do (in #archivebot) would be "!ao < https://transfer.notkiska.pw/yeOxD/ModDB_Potter_11.2019.txt" |
11:57
🔗
|
HP_Archiv |
Oh cool, okay. I didn't know anyone can self-submit content into archivebot |
11:57
🔗
|
HP_Archiv |
Let me try it out now |
11:58
🔗
|
betamax |
yup, you need voice / ops for recursive archiving ("!a") but for "!ao" you don't |
11:59
🔗
|
HP_Archiv |
I think I did it correctly ^^ Thank you |
11:59
🔗
|
betamax |
looks good to me (you can watch the progress at http://dashboard.at.ninjawedding.org/3 ) |
12:00
🔗
|
HP_Archiv |
Awesome. Thanks again :) |
12:04
🔗
|
HP_Archiv |
Hm, just watching it now @betamax, it appears to have hit an error? |
12:06
🔗
|
betamax |
if you mean it gets stuck at the end, I think that happens with all jobs |
12:06
🔗
|
|
tuluu_ has joined #archiveteam-bs |
12:06
🔗
|
|
tuluu has quit IRC (Read error: Connection reset by peer) |
12:07
🔗
|
betamax |
I can't see the log, unfortunately, since it's finished it doesn't show up on the dashboard page if I load it now |
12:07
🔗
|
betamax |
what was the error? |
12:07
🔗
|
HP_Archiv |
Oh, I think that's what it was because #archivebot is saying job finished |
12:08
🔗
|
HP_Archiv |
How do I view the results? |
12:08
🔗
|
betamax |
I think the download will now be sitting on a staging server, before it's uploaded into archive.org and then added into the wayback machine |
12:08
🔗
|
HP_Archiv |
What I'd like to do is make sure that it actually captured the direct download options on each mod pages/each hosted file |
12:10
🔗
|
betamax |
I'm not 100% sure there is a way to do that (although I'm not involved with the running of archivebot, just a happy user, so perhaps someone will know) |
12:11
🔗
|
HP_Archiv |
I actually just checked. It appears all of the links I gathered were just to the entries, and I think you're right. Archivebot only capture those page URLs, not the direct download URL's on each page, example: https://www.moddb.com/games/harry-potter-and-the-sorcerers-stone/addons/sorcerers-stone-custom-map-levunr |
12:12
🔗
|
HP_Archiv |
I'd have to go through the 100+ links I just submitted and pull the sub-download links from each page... |
12:13
🔗
|
betamax |
yeah, just looked at a URL myself. Since the "Download now" button isn't a direct link itself, but opens a JS popup, the direct links won't have been captured |
12:14
🔗
|
betamax |
however, there are probably easier ways than going through every page by hand |
12:14
🔗
|
HP_Archiv |
Do tell, 'cause that would save me a lot of time ^^ |
12:15
🔗
|
betamax |
gimme a few minutes with one of my scripts :) |
12:15
🔗
|
HP_Archiv |
Sure thing, take your time. Appreciate your help :) |
12:34
🔗
|
JAA |
HP_Archiv: I can assure you it didn't grab the downloads. |
12:35
🔗
|
JAA |
It will only have grabbed the pages in your text file plus images, stylesheets, etc. |
12:35
🔗
|
HP_Archiv |
@JAA, yeah I realized that after the fact. @betamax was kind enough to assist, hopefully I can get the exact URL download paths into #archivebot in an easy fashion |
12:36
🔗
|
JAA |
And yeah, the uploads go to an intermediate server and will show up on the Internet Archive sometime soonish probably. |
12:36
🔗
|
betamax |
huh, I've been distracted by the fact that moddb downloads can be discovered using numerically incrementing IDs... |
12:37
🔗
|
betamax |
so doing a grab of *ALL* content probably would be quite easy :) |
12:37
🔗
|
JAA |
Yes, it would. Just not with AB. |
12:38
🔗
|
betamax |
yeah, would probably be a warrior project. I'll try and make a quite wiki page and note for ModDB later |
12:38
🔗
|
betamax |
in case it's ever needed |
12:38
🔗
|
JAA |
Yeah, sounds good. |
12:38
🔗
|
JAA |
The site seems stable at the moment. |
12:38
🔗
|
HP_Archiv |
@betamax, that would great. Thank you ^^ |
12:42
🔗
|
HP_Archiv |
Has anyone given attention to the site, TCRF.net ? Example: https://tcrf.net/Prerelease:Harry_Potter_and_the_Sorcerer%27s_Stone_(Windows,_Mac_OS_Classic,_Mac_OS_X) |
12:43
🔗
|
HP_Archiv |
Bad link ^^ , Correct link: tcrf.net/Prerelease:Harry_Potter_and_the_Sorcerer%27s_Stone_(Windows,_Mac_OS_Classic,_Mac_OS_X) |
12:43
🔗
|
HP_Archiv |
If not, I'd like to submit the site for archiving |
12:44
🔗
|
JAA |
Yeah, looks like it was archived with ArchiveBot in March. |
12:45
🔗
|
|
IAmbience has joined #archiveteam-bs |
12:46
🔗
|
HP_Archiv |
Okay good. And it capture all elements on a pages with hosted media? example: https://tcrf.net/Harry_Potter_and_the_Sorcerer%27s_Stone_(Windows,_Mac_OS_Classic,_Mac_OS_X)#Unused_Sounds |
12:46
🔗
|
HP_Archiv |
I'd check myself but not sure how to do that |
12:48
🔗
|
JAA |
All ArchiveBot crawls end up in the Wayback Machine (eventually). |
12:48
🔗
|
JAA |
This is the AB snapshot of that page: https://web.archive.org/web/20190302045109/https://tcrf.net/Harry_Potter_and_the_Sorcerer's_Stone_(Windows,_Mac_OS_Classic,_Mac_OS_X) |
12:50
🔗
|
JAA |
To check for the individual audio files in this case, you need to copy their URL and edit it to e.g. https://web.archive.org/web/*/https://tcrf.net/images/5/5c/HPSSWin-bats_squeaking1.ogg . Then you see that AB did indeed capture that as well on 2019-03-02. |
12:53
🔗
|
|
hata has joined #archiveteam-bs |
12:54
🔗
|
HP_Archiv |
Awesome. Thank you for the explanation @JAA, appreciate it |
12:54
🔗
|
HP_Archiv |
I tested that myself and was able to pull up a different link from another Potter entry. Good to go |
12:56
🔗
|
JAA |
:-) |
12:57
🔗
|
JAA |
If you want a local copy, the data is somewhere in the ArchiveBot collection on the Internet Archive, but be warned that it'll be a pain to find those files since the viewer is broken currently. |
12:58
🔗
|
HP_Archiv |
Not sure I follow - I tried to right click save as on a random audio file hosted from another game entry. Wasn't hard to do find? |
12:58
🔗
|
HP_Archiv |
Hard to find* |
12:59
🔗
|
HP_Archiv |
Or do you mean a local copy of the entire capture? |
13:01
🔗
|
JAA |
Yeah, the entire thing. |
13:01
🔗
|
JAA |
And in the actual archival format (WARC) rather than plain files. |
13:01
🔗
|
betamax |
that list contained 58 URLs to pages that had a "download" button, the rest must be search results, category pages or images |
13:01
🔗
|
betamax |
https://transfer.notkiska.pw/iOZLe/hp.list |
13:03
🔗
|
betamax |
I've saved both the actual download link and the page that opens in the popup, as that *should* mean the popup with download link works in the wayback (but no guarentees as the wayback doesn't always get these things right) |
13:03
🔗
|
betamax |
I'll let you check it over and add it into archivebot |
13:03
🔗
|
HP_Archiv |
Hm, I've actually never tried to download in WARC before. So for example, how would I download this entire page as an archival file? https://web.archive.org/web/20190302045109/https://tcrf.net/Harry_Potter_and_the_Sorcerer's_Stone_(Windows,_Mac_OS_Classic,_Mac_OS_X) |
13:04
🔗
|
HP_Archiv |
Okay @betamax. Thank you very much. I'll have a look now |
13:06
🔗
|
JAA |
HP_Archiv: That's not possible as far as I know. You could save the Wayback Machine page as WARC, but that's not the same as the original data because links get rewritten to the WBM etc. There are partial ways around that, but you can't reproduce the original retrieval from the WBM perfectly. In this case, you'd have to download the WARC files ArchiveBot produced for the entire tcrf.net crawl. (Yes, |
13:06
🔗
|
JAA |
that will be large.) |
13:08
🔗
|
HP_Archiv |
Huh, so I see we still have a ways to go for perfect website preservation (or maybe there's no such thing due to the nature of hyperlinks?) |
13:09
🔗
|
HP_Archiv |
@betamax. Looks good, man. Thank you very much getting all of these links. Saved me a lot of time ^^ |
13:12
🔗
|
betamax |
no problem! it's a very simple python script which I'll add to the wiki when I have time |
13:12
🔗
|
betamax |
since I have no clue, how reliable is ModDB? Is it stable? Do things ever / regularly get deleted? (wondering if this would be a good archival candidate at some point) |
13:13
🔗
|
JAA |
Well, the data in those WARCs is pretty much the best you can get. There's still things that need to be improved, e.g. JavaScript handling (solvable by using automated browsers for crawling, but that's *very* slow in comparison due to all the rendering etc.), DNS preservation, and SSL/TLS certificates, but any individual URL capture is essentially perfect in these WARCs. |
13:15
🔗
|
HP_Archiv |
@Betamax. I'm not a gamer by any means. A quick search brings up the Wikipedia entry for ModDB, https://en.wikipedia.org/wiki/Mod_DB and it seems like it's frequently accessed by a lot of gamers/modders. |
13:16
🔗
|
HP_Archiv |
So perhaps it's stable - for now. Ingesting the Potter-game entries into archivebot is one small aspect of a much larger project I and others are working on, seeing to it that these early 00's Potter PC games are preserved. |
13:17
🔗
|
HP_Archiv |
I might have mentioned this in here yesterday, but I'm working with several former and current Warner Bros executives and one person out of the LoC's video game workflow to track down a prototype/dev source archive for HP 1, the first ever game |
13:19
🔗
|
HP_Archiv |
When I was much younger, I played these games as a kid. And if you go on YT for the gameplay, you'll find a lot of people - not even gamer, per se - are nostalgica for these particular games. So they had a fairly strong hold in the culture (obviously, it's HP) but apparently still do as there is an active HP Modding server on Discord. |
13:21
🔗
|
|
deevious has quit IRC (Read error: Connection reset by peer) |
13:22
🔗
|
HP_Archiv |
The prototype source code is like the holy grail for these games, because Sorcerer's Stone is the oldest, almost 20 years old, and as mentioned the first Potter game released. With that, the game can be rebuilt, ground up. And the person from the LoC I've been in talks with has said, 'they're very much interested in participating in conversations around acquiring digital assets/proto dev archives'. Surprisingly. |
13:22
🔗
|
|
deevious has joined #archiveteam-bs |
13:22
🔗
|
HP_Archiv |
Anyway, hope that answers your question in a round about sort of way Lol |
13:23
🔗
|
HP_Archiv |
@JAA, noted. How do I download a WARC file from WBM? |
13:26
🔗
|
JAA |
HP_Archiv: You don't. You download them from the Internet Archive instead. The WBM is essentially just an index of and interface to all the WARC data residing in IA. The AB data is in https://archive.org/details/archivebot , but as you will quickly realise, all the various ArchiveBot jobs are mixed together, so it's a mess to find the data of a particular job. That's why the AB viewer was written |
13:26
🔗
|
JAA |
many moons ago, to make it easier to find the files, but as mentioned it's broken at the moment. |
13:28
🔗
|
|
deevious has quit IRC (Ping timeout: 252 seconds) |
13:30
🔗
|
HP_Archiv |
@JAA: I'm seeing that, yeah. Huh, why weren't these automatically uploaded to IA with their corresponding website names? |
13:31
🔗
|
Sanqui |
HP_Archiv: Are you also in contact with Griptonite/KnowWonder folks? |
13:36
🔗
|
HP_Archiv |
@Sanqui, it's a mix, and it has not been easy. Former head of licensing in the same WB department, who oversaw this exact game titles, put us in touch with the vp of tech at WBIE, Warner B. Interactive Entertainment. We've been emailing for a few months, the latter pointed us in the direction of several people formerly of Foundation 9, some of who were working at KnowWonder/Amaze during the dev time for these Potter games. |
13:37
🔗
|
Sanqui |
HP_Archiv: I'm more than interested in any leads for the GB games, in particular HP1/2 GB and HP3 GBA. |
13:38
🔗
|
HP_Archiv |
The latest contact, a former dev who worked directly on Sorcerer's Stone, gave me a list of possible lead, people who might've held onto a copy of the proto files. He at one point had the E3 2001 proto - basically a test map - but had these on CD-RWs and were unreadable after a certain point a few years back. |
13:39
🔗
|
Sanqui |
Image them anyway, some data could be recovered |
13:39
🔗
|
HP_Archiv |
He has since destroyed the discs... |
13:39
🔗
|
Sanqui |
good jorb. |
13:40
🔗
|
HP_Archiv |
But the data he has, as far as we can tell, was not the actual HP 1 proto dev archive of files, which would look like this, tcrf.net/Proto:Harry_Potter_and_the_Chamber_of_Secrets_(Windows,_Mac_OS_Classic,_Mac_OS_X) |
13:41
🔗
|
HP_Archiv |
We actually have Chamber of Secrets, HP 2's full prototype. A former developer who luckily held onto the entire directory gave it to the HP modding community a few years ago. |
13:41
🔗
|
HP_Archiv |
Data he had* |
13:42
🔗
|
HP_Archiv |
Also, we're not focused on other platforms. There's been work with the Gamecube versions of each game, but it's difficult to mod non-pc games as other platforms had game released in a more story mode, rather than open world |
13:42
🔗
|
Sanqui |
Cool stuff, cool stuff. |
13:42
🔗
|
Sanqui |
The GBC games were really cool western JRPGs, not platformers |
13:43
🔗
|
HP_Archiv |
Ah okay, I never played the GBC versions |
13:44
🔗
|
Sanqui |
I do recommend giving them a shot some day. But that's OT :P |
13:44
🔗
|
HP_Archiv |
But anyway, yeah. It's a real headache trying to find HP 1 proto. We don't even know if it exists still. Early 2000s was still a time when people used CD-Rs and external drives were not common yet. |
13:45
🔗
|
betamax |
HP_Archiv: just fyi, I didn't put those links into archivebot, as I thought you'd want to check over them first (in case you thought I had while I thought you would... etc) |
13:45
🔗
|
HP_Archiv |
We have confirmed with EA Archives that they do have the final source code for the commercial/retail release of the game. But they declined having any proto/development files. |
13:45
🔗
|
Sanqui |
I'm surprised they're so communicative |
13:46
🔗
|
HP_Archiv |
I've had a small group of people helping me with this - we've been, I should say, unforgiving in our efforts to push forward and press for information :) |
13:46
🔗
|
HP_Archiv |
@betamax. Thanks - I actually already submitted into AB. I believe the job is already done ^^ |
13:50
🔗
|
HP_Archiv |
But yeah, the former head of licensing who oversaw licensing for these games was actually quite interested in what we were trying to do. And both people from WB were surprised to hear that the Library of Congress was even interested in participating in these conversations. But I reached out to this guy, https://blogs.loc.gov/thesignal/2012/09/yes-the-library-of-congress-has-video-games-an-interview-with-david-gibson/, about a year a |
13:50
🔗
|
HP_Archiv |
He helped with the acquisition of physical copies of each of the Potter games into their collections and preservation workflow, which I believe includes ISO imaging |
13:52
🔗
|
HP_Archiv |
It's a small operation, which is located in their motion-picture film division (video games fall under 'moving images') but it's a start. Anyway, I've written a novel in here. |
13:52
🔗
|
HP_Archiv |
Thank you all for your help :) |
13:56
🔗
|
Sanqui |
Your IRC client cut off one of your mesages, beginning with "about a year a[...]" |
13:56
🔗
|
Sanqui |
still, cool stuff. lemme know if you hear anything about/from the gameboy team :D |
13:58
🔗
|
HP_Archiv |
'He helped with the acquisition of physical copies of each of the Potter games into their collections and preservation workflow, which I believe includes ISO imaging. It's a small operation, which is located in their motion-picture film division (video games fall under 'moving images') but it's a start.' |
13:58
🔗
|
HP_Archiv |
Heh will do ^^ |
13:58
🔗
|
Sanqui |
Oh, that message came through, just not the "ago" part in "a year ago" I guess XD |
13:59
🔗
|
HP_Archiv |
Odd, well no worries |
14:00
🔗
|
HP_Archiv |
Again thanks everyone for the help/explanations ^^ |
14:00
🔗
|
JAA |
HP_Archiv: Uploading one item per job is actually not possible because items are size-limited. This has in fact caused problems before because some pipelines did (attempt to) upload per-job items. |
14:01
🔗
|
JAA |
And yeah, the web chat thingy sucks. Messages in IRC have a length limit, and that web chat just cuts them off instead of splitting up into multiple messages as any sane client would do. |
14:03
🔗
|
HP_Archiv |
@JAA I think the last ingest of URL's in the text file, 'https://transfer.notkiska.pw/PvcO6/ModDB_Potter_Downloads_URLs_11.2019.txt' was successful though? |
14:05
🔗
|
JAA |
HP_Archiv: Seems like it, yes. I suggest you double-check though once it's in the Wayback Machine that it didn't get any "Download Link Expired" pages or similar. |
14:06
🔗
|
JAA |
Apparently the download URLs are not dependent on the UA or IP, but they do expire periodically. |
14:11
🔗
|
HP_Archiv |
Okay @JAA will d ^^ |
14:11
🔗
|
HP_Archiv |
will do* |
14:12
🔗
|
|
odemgi has joined #archiveteam-bs |
14:31
🔗
|
|
systwi_ is now known as systwi |
14:44
🔗
|
|
deevious has joined #archiveteam-bs |
15:14
🔗
|
|
manjaro-u has joined #archiveteam-bs |
16:25
🔗
|
|
Sokar has quit IRC (Remote host closed the connection) |
16:30
🔗
|
|
X-Scale has quit IRC (Ping timeout: 252 seconds) |
16:31
🔗
|
|
[X-Scale] has joined #archiveteam-bs |
16:31
🔗
|
|
[X-Scale] is now known as X-Scale |
16:32
🔗
|
|
Video has quit IRC (Quit: Page closed) |
16:32
🔗
|
|
deevious has quit IRC (Ping timeout: 252 seconds) |
16:33
🔗
|
|
Video has joined #archiveteam-bs |
16:36
🔗
|
|
manjaro-u has quit IRC (Konversation terminated!) |
16:47
🔗
|
|
manjaro-u has joined #archiveteam-bs |
17:10
🔗
|
|
schbirid has joined #archiveteam-bs |
17:15
🔗
|
|
manjaro-u has quit IRC (Konversation terminated!) |
17:17
🔗
|
|
Sokar has joined #archiveteam-bs |
17:37
🔗
|
|
akierig has joined #archiveteam-bs |
17:50
🔗
|
|
mike__ has joined #archiveteam-bs |
17:51
🔗
|
mike__ |
We were chatting here last night (PST) about gathering content from case.law. If anybody is interested in discussing that project, I'm over in #allthecases. |
17:59
🔗
|
|
omglolba- has joined #archiveteam-bs |
18:06
🔗
|
|
omglolbah has quit IRC (Ping timeout: 745 seconds) |
18:11
🔗
|
|
tuluu_ has quit IRC (Read error: Connection refused) |
18:12
🔗
|
|
tuluu has joined #archiveteam-bs |
18:15
🔗
|
|
bluefoo has quit IRC (Ping timeout: 255 seconds) |
18:23
🔗
|
|
Video has quit IRC (Quit: Page closed) |
18:25
🔗
|
|
manjaro-u has joined #archiveteam-bs |
18:39
🔗
|
|
omglolbah has joined #archiveteam-bs |
18:39
🔗
|
|
DogsRNice has joined #archiveteam-bs |
18:40
🔗
|
|
omglolba- has quit IRC (Read error: Operation timed out) |
19:23
🔗
|
|
akierig has quit IRC (Quit: later_gator) |
19:31
🔗
|
|
bluefoo has joined #archiveteam-bs |
19:33
🔗
|
HP_Archiv |
Good morning guys. @JAA, if you're around, how would I go about searching for those ModDB links to see if they're already in WBM? |
19:34
🔗
|
HP_Archiv |
Apologies if you explained this earlier |
19:41
🔗
|
HP_Archiv |
Also, how does AB handle links to files hosted in a public Google Drive? eg: A site hosts a link to a Google Drive folder |
19:45
🔗
|
HP_Archiv |
Or file* |
19:53
🔗
|
betamax |
HP_Archiv: I think it should be as simple as trying to load the URL in the wayback machine |
19:53
🔗
|
betamax |
if the file is in the WBM, then you'll see the file |
19:54
🔗
|
betamax |
otherwise you'll get a message like "this page is available on the web, save it now" |
19:56
🔗
|
HP_Archiv |
Oh okay, then none of the links you helped pull are on WBM yet and probably still queued. |
19:57
🔗
|
HP_Archiv |
For Google Drive files - will AB create a copy, pull down, a copy of a file that's hosted with GDrive or will it only archive the link? |
19:57
🔗
|
HP_Archiv |
For example: https://hp-games.net/343 |
19:57
🔗
|
HP_Archiv |
On this page ^^ Game Mod files are hosted in two locations, one with Yandex, and the other in a Google Drive. |
20:00
🔗
|
HP_Archiv |
And what I'd like to do with HP-Games.net is similar to with ModB - archive entire pages w/elements and also archive mod files that, while not hosted on the site directly, are linked from the site to online storage eg: Google Drive |
20:10
🔗
|
betamax |
AB will probably only archive the link |
20:11
🔗
|
betamax |
I think it archives all outgoing links from the page, but since the actual download link exists two levels deep (hp-games.net > gdrive info page > gdrive download) it won't get captured |
20:11
🔗
|
markedL |
there's an API for wbm membership, if there's a lot to check |
20:30
🔗
|
|
mike__ has quit IRC (Ping timeout: 260 seconds) |
21:14
🔗
|
|
Pixi has quit IRC (Quit: Pixi) |
21:36
🔗
|
|
BlueMax has joined #archiveteam-bs |
22:04
🔗
|
|
Pixi has joined #archiveteam-bs |
22:18
🔗
|
|
schbirid has quit IRC (Quit: Leaving) |
22:42
🔗
|
|
Jon has quit IRC (Quit: ZNC - http://znc.in) |
22:46
🔗
|
|
jmtd has joined #archiveteam-bs |
23:37
🔗
|
|
dd33cc has joined #archiveteam-bs |