[00:07] *** fie has joined #archiveteam-bs [00:41] *** fie has quit IRC (Ping timeout: 245 seconds) [00:42] *** Ravenloft has joined #archiveteam-bs [00:58] *** fie has joined #archiveteam-bs [01:47] *** tklk has joined #archiveteam-bs [01:51] *** icedice has quit IRC (Quit: Leaving) [02:15] mls: turns out it wasn't even dcl, it just had the same header [02:29] *** REiN^ has quit IRC (Max SendQ exceeded) [02:29] *** REiN^ has joined #archiveteam-bs [02:32] *** REiN^ has quit IRC (Max SendQ exceeded) [02:38] *** REiN^ has joined #archiveteam-bs [02:52] ndiddy: what was it then? did you find out? [02:54] it actually got pretty interesting [02:54] by the way is there anything to archive on an xmpp server? dukgo's xmpp server is shutting down on May 18th [02:55] oooh tell more [02:55] so, the backstory is i wanted to extract the script for this dos game "princess maker 2" because it's far better than the new translation that's on steam [02:56] yes [02:56] my first hurdle was that the game uses proprietary ".lbx" files for containing data, but there's enough documentation on this old forum page to read them http://nurikosprincessmakernexusboard.yuku.com/reply/2166/File-Crack-for-PM2 [02:56] I was here that day [02:57] oh [02:57] however, i eventually resorted to memory inspection using dosbox's debugger to try and figure out what was in the files if i couldn't extract them [02:58] ah [02:58] turns out the game executable is a proprietary custom BASIC interpreter and the files are all written in a BASIC derivative [02:58] uh... what [02:58] why [03:00] idk man, but it was a dos game programmed by japanese people so maybe that's why [03:00] here's an example if you're curious https://pastebin.com/raw/BnWCY2tX [03:01] oh yay, "source code piracy" [03:01] hehe [03:02] better hope I don't get a DMCA warning for downloading that [03:02] gainax is gonna come after you bro [03:03] interesting note, according to https://archive.org/about/dmca.php, they only have an exception until 2006 [03:05] oh hey, apparently it was called "pm-basic" [03:05] TT: The whole thing is written in a custom “BASIC-like” language called “PM-BASIC”, which is implemented in 8086 assembly language. The text was interspersed with the game logic in that. That is generally a no-no to have text and code together, because it means each language has to have different versions of the entire code. But in this case, since we were just doing English, it was fine. Crawling around in assembly commented i [03:07] so like what I'd do when rushing to finish something [03:08] yeah, my guess is that it's sort of like today with java [03:08] where they'd have one guy doing the "virtual machine" work and then all the goons writing the actual game [03:09] oh [03:10] does anyone know if an archive of .ipa (iOS apps) files from the iOS 3.0 days is available? [03:11] there's a "custom ios for old devices" thing that has something like that [03:11] does it include apps that were common at that time? [03:11] http://www.whited00r.com/forum/index?topic=7695.0 [03:13] it says it's exclusive to their ios fork but if you install it on a device from that period it looks like it's just a web view [03:13] well I have 2 devices from that period [03:13] so if you set your user agent to ios 3 it might let you download on your desktop [03:13] nah, I'd be fine installing that [03:14] I can't believe it [03:14] the website is still up [03:14] http://apptimemachine.com/ [03:17] aand it's sort of broken [03:21] *** pizzaiolo has quit IRC (Ping timeout: 506 seconds) [03:22] no, simple user agent changing didn't work [03:48] *** Marcelo has joined #archiveteam-bs [04:04] *** Marcelo has left [04:22] *** Sk1d has quit IRC (Ping timeout: 250 seconds) [04:41] *** _desu___ has joined #archiveteam-bs [04:45] We need to start archiving yik yak posts they announced that they're shutting down [05:02] *** ndiddy has quit IRC () [05:55] *** Aranje has quit IRC (Quit: Three sheets to the wind) [06:06] *** BartoCH has joined #archiveteam-bs [06:44] *** BartoCH has quit IRC (Ping timeout: 260 seconds) [07:54] *** Honno has joined #archiveteam-bs [08:05] *** schbirid has joined #archiveteam-bs [08:11] *** fie has quit IRC (Read error: Connection reset by peer) [08:27] *** fie has joined #archiveteam-bs [08:35] *** GE has joined #archiveteam-bs [08:37] *** JAA has joined #archiveteam-bs [08:41] Odd0002: While the images don't display anymore normally (e.g. when you access the website through the browser), there is a workaround. As far as I can tell, all images are still there. Details in #totheyard . [08:42] But yes, we definitely need to do something about the ~2k stuck items. [08:48] tammy_: Looks like an incorrectly formatted link; I had a few of those during my scrape of the Weather Underground user blogs (Wunderblogs). I worked around it by manually editing the database, but that's definitely not the most elegant solution. Stopping wpull (^C), adding an option --reject-regex 'www\.eso\.org/www\.eso\.org' to the wpull command, and restarting should take care of it. [08:50] EpicSki phase 1 is at 1.05M of 2.08M after about 20 hours. Should be done by tomorrow morning. [09:03] JAA: sweet. the archivebot job for the site is still ongoing, the one for the forum has finished, I wonder if it got everything though [09:05] Actually, that number wasn't quite correct. 1.05M is the number of posts that were found; some additional 117k post IDs resulted in a 404 (probably deleted posts and member/staff-only forums). [09:06] Sanqui: Already? That seems way too quick considering all the different views of each thread, post, etc. [09:06] Did it only grab /forum perhaps? [09:06] I did a recursive grab for http://www.epicski.com/f/, guess that didn't work out [09:06] With --no-parent? [09:06] I didn't do that [09:06] Hm [09:07] well, my method is testing random obscure sites on iA [09:07] s/sites/pages/ [09:07] this thread was grabbed https://web-beta.archive.org/web/20170427231315/http://www.epicski.com/t/107630/can-we-go-a-little-easier-on-each-other [09:08] so was page 3 of this thread, for the first time on apr 27 https://web-beta.archive.org/web/20170427232000/http://www.epicski.com/t/77938/so-long-farewell/60 [09:08] looks good to me [09:10] Yeah [09:10] ok, found a thread that didn't make it https://web-beta.archive.org/web/20170427142527/http://www.epicski.com/t/14267/3-kick-rule [09:11] Here's another one: https://web-beta.archive.org/web/20170427141511/http://www.epicski.com/t/12746/posting-pictures [09:12] But https://web-beta.archive.org/web/20170427141511/http://www.epicski.com/f/15/community-discussions-forum-news/1950 , where that thread is linked, was grabbed. Strange... [09:14] Maybe IA is still indexing the WARCs, or their index is broken again. I had tons of issues a few months ago where it would randomly forget about pages I had archived manually using /save. [09:15] either way I changed the job that's left not to ignore the forum [09:15] to increase coverage [09:17] Mark from IA told me in early February that they had issues with their index which would be fixed within days, but some of the example pages I sent him still don't work. I wrote to him again a month ago but didn't get a reply. [09:17] Yeah, I'll do my thing as well. [09:27] *** odemg has quit IRC (Remote host closed the connection) [09:29] MFW I'm on a train running at almost 200 km/h in the middle of nowhere, and the internet connection is better than it was at the hotel. [09:30] A hotel in the middle of a huge city, I should add. [09:36] I'll be on a train shortly too, but I already know the connection will be spotty. [09:54] *** odemg has joined #archiveteam-bs [10:01] Sanqui: I just thought a bit more about what you wrote above. If you didn't use --no-parent, why did the ArchiveBot job for the forum finish quicker than the one for the entire site? Every forum page includes a link back to "Home" near the top, so it should discover exactly the same pages as the full-page grab (just in a different order). [10:02] It will discover the home page, but it won't follow any links from it that aren't in the /f/ directory. But the job probably didn't work properly because threads are /t/ anyway. [10:03] Ah, so it basically treats everything outside the directory of the original link like an external link? [10:04] I believe so [10:04] I see [10:04] Don't quote me though [10:04] Still doesn't explain why it didn't find those threads, of course [10:06] *** GE has quit IRC (Remote host closed the connection) [10:19] *** odemg has quit IRC (Remote host closed the connection) [10:42] *** C4K3 has quit IRC (Remote host closed the connection) [10:58] *** JAA has quit IRC (Quit: Page closed) [11:00] *** C4K3 has joined #archiveteam-bs [11:15] *** BartoCH has joined #archiveteam-bs [11:40] *** fie has quit IRC (Ping timeout: 246 seconds) [11:40] *** GE has joined #archiveteam-bs [11:56] *** fie has joined #archiveteam-bs [12:37] *** BlueMaxim has quit IRC (Quit: Leaving) [12:40] *** odemg has joined #archiveteam-bs [12:52] *** Honno has quit IRC (Ping timeout: 370 seconds) [13:03] *** odemg has quit IRC (Remote host closed the connection) [13:16] *** odemg has joined #archiveteam-bs [13:38] ndiddy: Oh right, hope you got it sorted though [14:11] *** odemg has quit IRC (Remote host closed the connection) [15:03] *** pizzaiolo has joined #archiveteam-bs [15:04] *** pizzaiolo has quit IRC (Remote host closed the connection) [15:33] *** pizzaiolo has joined #archiveteam-bs [15:40] *** ndiddy has joined #archiveteam-bs [15:53] *** pizzaiolo has quit IRC (Remote host closed the connection) [15:59] *** ZexaronS has joined #archiveteam-bs [16:17] *** odemg has joined #archiveteam-bs [17:43] *** odemg has quit IRC (Remote host closed the connection) [18:08] *** pizzaiolo has joined #archiveteam-bs [18:21] *** fie has quit IRC (Ping timeout: 370 seconds) [18:36] *** fie has joined #archiveteam-bs [18:45] *** odemg has joined #archiveteam-bs [18:50] *** GE_ has joined #archiveteam-bs [18:51] *** GE has quit IRC (Ping timeout: 255 seconds) [18:51] *** GE_ is now known as GE [19:08] *** GE has quit IRC (Remote host closed the connection) [19:15] *** icedice has joined #archiveteam-bs [19:27] *** odemg has quit IRC (Remote host closed the connection) [20:05] Over 1TB of catalog.data.gov will begin the slow march to the IA soon :) [20:14] *** mhazinsk has quit IRC (Remote host closed the connection) [20:14] *** mhazinsk has joined #archiveteam-bs [20:26] *** odemg has joined #archiveteam-bs [20:26] *** odemg has quit IRC (Connection closed) [20:29] *** GE has joined #archiveteam-bs [20:34] *** odemg has joined #archiveteam-bs [20:34] *** odemg has quit IRC (Connection closed) [20:43] *** JAA has joined #archiveteam-bs [20:49] *** pizzaiolo has quit IRC (Remote host closed the connection) [20:51] *** pizzaiolo has joined #archiveteam-bs [20:54] *** midas3 has quit IRC (Remote host closed the connection) [20:54] *** midas has quit IRC (Read error: Connection reset by peer) [21:05] *** odemg has joined #archiveteam-bs [21:27] *** schbirid has quit IRC (Quit: Leaving) [21:36] JAA: it seems to have worked past that loop on it's own already [21:36] but I'll add the regex switch if it does it again for the same spot. [21:43] tammy_: Yeah, it might reappear at a later time. At least that was the case when I saw it with WunderBlogs. It's related to wpull's depth-first recursion. [21:43] You'll likely see similar links being retrieved even after adding the switch as well, but the links on those pages will then be ignored, stopping the loop. [21:45] cool cool. I'll keep this running in one of my monitor screens and keep my eye open for shenanigans [22:00] *** pizzaiolo has quit IRC (Ping timeout: 260 seconds) [22:02] *** pizzaiolo has joined #archiveteam-bs [22:04] *** JAA has quit IRC (Quit: Page closed) [22:15] *** Mayonaise has quit IRC (Remote host closed the connection) [22:22] *** Mayonaise has joined #archiveteam-bs [22:22] *** ndiddy has quit IRC (Read error: Connection reset by peer) [22:37] i pissed someone off by mentioning archiveteam :( [22:37] *** _desu___ has quit IRC () [22:38] hook54321: how? [22:39] it's the person that joined #archiveteam and then left. [22:40] They run this site: http://volatile.ch/ [22:40] huh [22:40] Don't join their irc with any important ip address [22:40] or network [22:41] I mentioned that the wayback machine was going to start ignoring robots.txt [22:45] that guy has issues [22:46] he's a troll and a spammer [22:46] I wouldn't worry about it [22:49] how is he a troll and a spammer? [22:49] because I know who he is [22:49] and that is what he does [23:05] *** GE has quit IRC (Remote host closed the connection) [23:19] *** hook54321 has quit IRC (Connection closed) [23:21] *** hook54321 has joined #archiveteam-bs [23:22] *** wp494_ has joined #archiveteam-bs [23:23] *** hook54321 has quit IRC (Connection closed) [23:23] *** wp494 has quit IRC (Ping timeout: 244 seconds) [23:28] *** wp494_ is now known as wp494 [23:33] *** divingk has joined #archiveteam-bs [23:33] https://tcrf.net/Miner_2049er_(Apple_II) [23:36] Source code. [23:39] Or rather... [23:39] Source code fragments hidden on a disk. [23:53] *** wp494 has quit IRC (Ping timeout: 244 seconds) [23:55] *** hook54321 has joined #archiveteam-bs [23:56] *** BlueMaxim has joined #archiveteam-bs [23:58] *** wp494 has joined #archiveteam-bs