[00:01] *** DiscantX has joined #archiveteam [00:16] *** DoomTay has joined #archiveteam [00:17] *** namespace has joined #archiveteam [00:20] *** JesseW has quit IRC (Ping timeout: 370 seconds) [00:34] *** WinterFox has joined #archiveteam [00:38] *** rsanek has joined #archiveteam [00:39] WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD [00:42] " Google Groups: "Gone within a year" (SketchCow, 2016-06-07). " [00:42] Couldn't find anything with google. [00:42] Source? [00:42] rsanek: What is your quest with the wiki, friend? [00:43] just wanted to edit a date, though I found the secret in an irc log [00:43] ah okay :p [00:44] i guess we're fine as long as spambots don't figure that one out [00:44] ;p [00:44] yeah lets hope [00:44] *** rsanek has quit IRC (Quit: Page closed) [00:44] yeah, bye [00:44] *** philpem has quit IRC (Remote host closed the connection) [00:46] *** Sue_ has quit IRC (Read error: Operation timed out) [00:48] Do we have any crawlers that can do JS? [00:48] Google Groups is pure JS slurry, at least to get the machine readable DOM part of it. [00:49] *** philpem has joined #archiveteam [00:50] we do, ArchiveBot does phantomJS but I think putting a general purpose crawler onto something as big as that would be asking for trouble [00:50] but it is possible, since that's what you're asking [00:51] Noted. [00:51] How would you handle a behemoth of that size then? [00:52] Warrior job [00:52] (I wanted to do this in high school, but I was technically incapable at the time.) [00:52] (I can probably actually write up the warrior scripts now.) [00:58] *** JesseW has joined #archiveteam [01:01] *** BlueMaxim has joined #archiveteam [01:01] *** SDr has quit IRC () [01:17] *** JesseW has quit IRC (Quit: Leaving.) [01:17] *** JesseW has joined #archiveteam [01:36] *** DiscantX has quit IRC (Ping timeout: 244 seconds) [02:13] *** DiscantX has joined #archiveteam [02:20] *** DiscantX has quit IRC (Ping timeout: 244 seconds) [02:32] *** philpem has quit IRC (Ping timeout: 260 seconds) [02:55] *** DiscantX has joined #archiveteam [03:04] *** DiscantX has quit IRC (Ping timeout: 244 seconds) [03:12] *** ravetcofx has quit IRC (Ping timeout: 506 seconds) [03:20] *** ravetcofx has joined #archiveteam [03:22] *** Coderjoe has quit IRC (Read error: Connection reset by peer) [03:30] *** Coderjoe has joined #archiveteam [04:24] *** RichardG has quit IRC (Ping timeout: 258 seconds) [04:28] *** ravetcofx has quit IRC (Ping timeout: 506 seconds) [04:42] *** ravetcofx has joined #archiveteam [04:49] *** Kitaru has joined #archiveteam [04:54] *** ravetcofx has quit IRC (Read error: Operation timed out) [04:55] *** Kitaru has quit IRC (Quit: This computer has gone to sleep) [05:00] *** Sk1d has quit IRC (Ping timeout: 194 seconds) [05:03] ahahahhahaha [05:03] It's called someone leaked the info to me [05:04] *** metalcamp has joined #archiveteam [05:06] *** ravetcofx has joined #archiveteam [05:06] *** Sk1d has joined #archiveteam [05:08] there's still a robots.txt bug preventing google groups from being viewable in wayback https://web.archive.org/web/20110514012530/http://groups.google.com/group/google.public.support.general/msg/d88f36fb3e2c0aac [05:09] it does seem to be working better now on other sites though [05:10] foxbox.tv seems to be "working" [05:10] That is, it's not affected anymore, but it turns out that a good chunk of stuff is gone-gone [05:19] *** metal_cam has joined #archiveteam [05:20] *** metalcamp has quit IRC (Ping timeout: 244 seconds) [05:21] *** ndiddy has quit IRC (Quit: Leaving) [05:44] *** Jeroen52 has quit IRC (Ping timeout: 260 seconds) [05:48] *** Jeroen52 has joined #archiveteam [05:52] *** tomwsmf-a has joined #archiveteam [06:00] *** JesseW has quit IRC (Ping timeout: 370 seconds) [06:38] *** DoomTay has quit IRC (Quit: Page closed) [06:55] SketchCow: K. [06:56] Also wow I don't know if you tried scouting the directory structure of Groups, but it's really bad. All the top level categories have random numbers (at least in so far as I can tell, they're random). Then each post inside of a group has a unique (random?) ID. [06:56] Wondering if it's not random and actually just a hex string or something. [06:57] namespace: We can use the JWT(?) API. [06:58] I’ve seen scripts on GitHub, but I can’t find them anymore. [06:59] *GWT [07:00] *** anjacks0n has joined #archiveteam [07:30] *** anjacks0n has quit IRC (anjacks0n) [07:33] *** ravetcofx has quit IRC (Read error: Operation timed out) [07:43] *** ravetcofx has joined #archiveteam [07:53] *** tomwsmf-a has quit IRC (Read error: Operation timed out) [07:54] *** anjacks0n has joined #archiveteam [07:57] *** anjacks0n has quit IRC (anjacks0n) [08:04] *** ravetcofx has quit IRC (Read error: Operation timed out) [08:17] *** ravetcofx has joined #archiveteam [08:36] *** ravetcofx has quit IRC (Remote host closed the connection) [09:04] *** robink has quit IRC (Ping timeout: 633 seconds) [09:13] *** robink has joined #archiveteam [09:32] *** pfallenop has quit IRC (Ping timeout: 244 seconds) [09:34] *** pfallenop has joined #archiveteam [09:41] *** Emcy has quit IRC (Read error: Operation timed out) [09:45] *** Emcy has joined #archiveteam [10:20] *** Tomcat_ has joined #archiveteam [10:38] *** Tomcat_ has quit IRC (Ping timeout: 258 seconds) [10:40] *** philpem has joined #archiveteam [10:48] *** kristian_ has joined #archiveteam [11:00] PurpleSym: gggd actually only uses rss for updating exstisting crawls [11:00] wrong chat [11:17] *** Tomcat_ has joined #archiveteam [11:35] *** Tomcat_ has quit IRC (Remote host closed the connection) [11:57] *** signius has quit IRC (Ping timeout: 260 seconds) [12:11] *** signius has joined #archiveteam [12:56] *** anjacks0n has joined #archiveteam [13:07] *** anjacks0n has quit IRC (anjacks0n) [13:14] *** dashcloud has quit IRC (Read error: Connection reset by peer) [13:15] *** dashcloud has joined #archiveteam [13:20] *** BartoCH has quit IRC (Ping timeout: 260 seconds) [13:20] *** BartoCH has joined #archiveteam [13:25] *** BartoCH has quit IRC (Ping timeout: 260 seconds) [13:52] *** anjacks0n has joined #archiveteam [13:57] *** anjacks0n has quit IRC (anjacks0n) [14:07] *** VADemon has joined #archiveteam [14:28] *** ndiddy has joined #archiveteam [14:45] *** WinterFox has quit IRC (Read error: Operation timed out) [14:53] *** anjacks0n has joined #archiveteam [15:12] *** BartoCH has joined #archiveteam [15:14] *** BlueMaxim has quit IRC (Quit: Leaving) [15:17] *** JesseW has joined #archiveteam [15:18] *** RichardG has joined #archiveteam [15:52] *** ravetcofx has joined #archiveteam [15:52] *** RichardG has quit IRC (Read error: Operation timed out) [15:53] *** RichardG has joined #archiveteam [16:00] *** anjacks0n has quit IRC (anjacks0n) [16:01] *** JesseW has quit IRC (Ping timeout: 370 seconds) [16:04] *** BartoCH has quit IRC (Ping timeout: 260 seconds) [16:22] *** BartoCH has joined #archiveteam [16:25] *** anjacks0n has joined #archiveteam [16:36] *** anjacks0n has quit IRC (anjacks0n) [16:44] *** Kitaru has joined #archiveteam [16:48] *** DoomTay has joined #archiveteam [16:50] *** Medowar_ has joined #archiveteam [16:51] *** Medowar_ has quit IRC (Remote host closed the connection) [16:52] *** namespace has quit IRC (Read error: Operation timed out) [17:00] *** banderas6 has joined #archiveteam [17:04] *** VADemon has quit IRC (Quit: left4dead) [17:05] *** kristian_ has quit IRC (Leaving) [17:05] *** banderas6 has quit IRC (Ping timeout: 268 seconds) [17:29] *** tomwsmf-a has joined #archiveteam [17:29] *** schbirid has joined #archiveteam [17:38] *** anjacks0n has joined #archiveteam [17:52] *** anjacks0n has quit IRC (anjacks0n) [17:58] *** db48x has quit IRC (Read error: Connection reset by peer) [17:59] *** anjacks0n has joined #archiveteam [18:30] *** db48x has joined #archiveteam [18:36] *** VADemon has joined #archiveteam [18:53] *** DiscantX has joined #archiveteam [18:58] *** Kitaru has quit IRC (Quit: This computer has gone to sleep) [19:00] *** DiscantX has quit IRC (Ping timeout: 244 seconds) [19:01] *** JesseW has joined #archiveteam [19:11] *** Kitaru has joined #archiveteam [19:14] *** JesseW has quit IRC (Ping timeout: 370 seconds) [19:16] *** DiscantX has joined #archiveteam [19:28] *** dashcloud has quit IRC (Read error: Operation timed out) [19:33] *** dashcloud has joined #archiveteam [19:36] *** REiN^ has quit IRC () [19:51] *** tomwsmf-a has quit IRC (Read error: Operation timed out) [19:53] *** Kitaru has quit IRC (Quit: This computer has gone to sleep) [20:06] *** metal_cam has quit IRC (Ping timeout: 250 seconds) [20:07] *** metalcamp has joined #archiveteam [20:12] *** schbirid has quit IRC (Quit: Leaving) [20:29] *** DiscantX has quit IRC (Ping timeout: 244 seconds) [20:31] *** DoomTay has quit IRC (Quit: Page closed) [20:39] *** xXx_ndidd has joined #archiveteam [20:40] *** ndiddy has quit IRC (Ping timeout: 244 seconds) [21:01] *** REiN^ has joined #archiveteam [21:24] *** Kitaru has joined #archiveteam [21:27] *** JesseW has joined #archiveteam [21:27] *** metalcamp has quit IRC (Ping timeout: 244 seconds) [21:35] *** Kitaru has quit IRC (Quit: This computer has gone to sleep) [21:36] *** VADemon has quit IRC (Quit: left4dead) [21:38] *** Kitaru has joined #archiveteam [21:57] *** Start_ has joined #archiveteam [21:57] *** Start has quit IRC (Read error: Connection reset by peer) [22:30] *** dashcloud has quit IRC (Read error: Connection reset by peer) [22:32] *** dashcloud has joined #archiveteam [23:07] Anyone happen to have a copy of wikipedia-logs-2001-08-17.7z (used to be at http://noc.wikimedia.org/~tstarling/wikipedia-logs-2001-08-17.7z six years ago)? IA search doesn't turn up a copy... [23:15] JesseW: https://web.archive.org/web/20130501000000*/http://noc.wikimedia.org/~tstarling/wikipedia-logs-2001-08-17.7z [23:17] strange, when I looked I didn't find that [23:22] *** DoomTay has joined #archiveteam [23:27] *** divingk has joined #archiveteam [23:28] Good god. [23:28] Digging through a bunch of games and finding their source code. [23:28] And I've only dug through games on three platforms, tops. [23:30] divingk: say more? [23:30] Well...it's interesting to say the least. [23:30] What I've been doing is using Astrogrep over ROM collections. [23:31] I knew I would find bits of code, but I wasn't aware of the potential scale behind this. [23:31] what do you mean "finding their source code" -- where are you finding it? Included in the ROMs, or? [23:31] Yes, source code accidentally included in ROMs. [23:31] I can provide a lot of examples of this. [23:31] neat! [23:31] https://tcrf.net/Ometron [23:31] That's very good, no? [23:31] https://tcrf.net/Invasion_(ZX_Spectrum,_Bulldog_Software) [23:32] More interesting data to examine and learn from. [23:32] Good, but I can't help but think there are many games out there with this sort of thing/ [23:32] I haven't looked into the C64 library. [23:32] No doubt it's one of the most interesting things to find in a game, [23:32] that is depending on how long said fragments are. [23:33] For instance, here's one case where most of the code was discovered: https://tcrf.net/Exodus_(ZX_Spectrum,_Firebird_Software) [23:33] Whereas here, there's only a snippet: https://tcrf.net/Robotron:_2084_(ZX_Spectrum) [23:34] Most of the ones found so far are on the ZX Spectrum. [23:34] I've found some on the Amstrad CPC too, plus I wrote up one for the Supervision. [23:35] https://tcrf.net/Arcade_Flight_Simulator_(ZX_Spectrum) [23:35] A rare example of a Codemasters game with code sprawling about. [23:36] Early Ocean games, like Hunchback or Eskimo Eddie, also have bits of code. [23:36] https://tcrf.net/Hunchback_(ZX_Spectrum) [23:36] But yeah, curious if anyone here knows about this... [23:36] (you may want to move this to #archiveteam-bs, as this channel is generally reserved for quick announcements, rather than longer discussions) [23:36] Oh. [23:36] Mind if I copy and paste what I said here over to there? [23:37] Better to just link it from the public log (which I'll do) [23:38] *** RichardG has quit IRC (Read error: Connection reset by peer) [23:38] *** RichardG has joined #archiveteam [23:42] Hm, the wiki doesn't seem to have an entry for "The Cutting Room Floor" (video game history site: https://tcrf.net/ ) yet -- someone should add one. [23:45] *** WinterFox has joined #archiveteam [23:47] *** BlueMaxim has joined #archiveteam