[00:12] But yeah @JAA , some of those files that get updated are updated on a regular basis, one example I found via Twitter are the azure IP ranges https://www.microsoft.com/en-us/download/details.aspx?id=56519 [00:12] I assume the file that is associated with that is updated every so often [00:13] mgrandi: Yep, I found a bunch of files that appear to be updated weekly. [00:15] Sucks that they don't keep the old files around. [00:38] *** step has quit IRC (Remote host closed the connection) [01:14] *** BlueMax has joined #archiveteam-bs [01:33] *** synm0nger has quit IRC (Read error: Connection reset by peer) [01:33] *** SynMonger has joined #archiveteam-bs [02:33] *** Clefable has quit IRC (Quit: ZNC: the superior metal to CBLT) [03:29] *** Wingy has quit IRC (Read error: Operation timed out) [03:36] *** qw3rty__ has joined #archiveteam-bs [03:39] *** Wingy has joined #archiveteam-bs [03:44] *** qw3rty_ has quit IRC (Read error: Operation timed out) [04:02] *** Wingy has quit IRC (Read error: Operation timed out) [04:41] *** bsmith093 has quit IRC (Read error: Operation timed out) [04:44] *** bsmith093 has joined #archiveteam-bs [04:56] *** bsmith093 has quit IRC (Ping timeout: 745 seconds) [05:00] *** Wingy has joined #archiveteam-bs [05:03] *** bsmith093 has joined #archiveteam-bs [05:51] *** atphoenix has quit IRC (Ping timeout: 265 seconds) [08:26] *** fuzzy802 has joined #archiveteam-bs [08:32] *** fuzzy8021 has quit IRC (Read error: Operation timed out) [08:36] *** atphoenix has joined #archiveteam-bs [08:36] *** fuzzy802 is now known as fuzzy8021 [09:01] *** legoktm has quit IRC (Ping timeout: 610 seconds) [09:04] *** legoktm has joined #archiveteam-bs [10:37] *** OrIdow6^2 has joined #archiveteam-bs [10:39] *** OrIdow6 has quit IRC (Ping timeout: 265 seconds) [11:22] *** HP_Archiv has joined #archiveteam-bs [11:33] *** OrIdow6^2 is now known as OrIdow6 [11:56] *** HP_Archiv has quit IRC (Quit: Leaving) [12:01] *** HP_Archiv has joined #archiveteam-bs [12:08] *** HP_Archiv has quit IRC (Quit: Leaving) [12:42] *** coderobe has quit IRC (Quit: Ping timeout (120 seconds)) [12:42] *** coderobe has joined #archiveteam-bs [14:30] *** BlueMax has quit IRC (Quit: Leaving) [14:31] Gearogs, Discogs's audio equipment database, is shutting down on 2020-08-31. They will upload a data dump to IA themselves. [14:31] https://support.discogslabs.com/hc/en-us/articles/360011681538-Gearogs-Closing-On-August-31-2020 [14:34] I'll take this as an opportunity to archive all of their dumps. [14:34] They'll apparently only upload the last one anyway. [14:38] All of the Discogs data archives are a bit under 500 GB. All of the other *ogs dumps are just over 2 GB. Cute. [14:40] While I'm at it, I'll also attempt to make https://data.discogs.com/ and http://data.discogslabs.com/ work in the WBM. [14:42] JAA: yes :D [14:42] I agree [14:43] :-) [14:44] The Discogs dumps are already on IA in items, but there's virtually nothing in the WBM as far as I can see. [14:45] yeah [14:45] Couldn't find the other *ogs dumps on IA. [14:45] and while dumps are better for handling the data [14:45] wayback machine makes it easier findable for people without technical skills [14:45] Yep [14:45] which is like 99.9% [14:45] Well, the stuff I'm grabbing is still the dumps though. [14:46] those are probably best to get to a safe location first [14:46] But there are links to those two pages I mentioned across the web, so if someone tries to find a file from there, they might only try the WBM, not the IA search (where you can't even search for filenames etc.). [14:46] are you saving the dumps in wayback machine as well? [14:47] Yeah [14:47] also yes to what you said [14:47] Basically, the idea is to get https://data.discogs.com/ fully working in the WBM. [14:48] doesn't seem too extremely big [14:48] can we put that in AB? [14:48] Yep [14:48] That's my plan. [14:50] Assuming the WBM rewrites URLs that are JS strings like '//example.org', it should all work fine. [14:51] And that does appear to be the case. [14:51] Although the browsing doesn't work because it doesn't serve the S3 XHR data verbatim. [14:51] See e.g. https://web.archive.org/web/20200420234832/https://data.discogs.com/ https://web.archive.org/web/20190418084744/http://discogs-data.s3-us-west-2.amazonaws.com/?delimiter=/&prefix=data/ [14:52] Inserts the WBM scripts etc. into the XML, which breaks parsing. [15:07] Regarding archiving Gearogs the website itself: proper archival with previous revisions etc. requires logging in, as on all Discogs sites. [15:10] The current revisions should probably work fine through AB, so I'll try that. [15:14] arkiver: Could you forward the above to Mark or someone else from the WBM team? The WBM scripts shouldn't be inserted into XML data, only HTML/XHTML. [15:18] Looks like AB doesn't pick up the full-size images. :-/ [15:21] Oof [15:21] They also shut down Comicogs. It's already gone. [15:21] Shut down on 31 July. [15:22] "We will also be closing Gearogs, Filmogs, Bookogs, and Posterogs, but those will be closed about one month later while we make sure we haven’t overlooked anything. VinylHub will remain open." [15:30] *** step has joined #archiveteam-bs [16:16] *** britmob has quit IRC (Ping timeout: 265 seconds) [16:26] *** Arcorann has quit IRC (Read error: Connection reset by peer) [17:05] *** Wingy has quit IRC (The Lounge - https://thelounge.chat) [17:11] *** Wingy has joined #archiveteam-bs [17:35] *** asdf0101 has quit IRC (Remote host closed the connection) [17:45] *** asdf0101 has joined #archiveteam-bs [18:13] *** britmob has joined #archiveteam-bs [18:25] *** britmob has quit IRC (Read error: Connection reset by peer) [18:32] *** britmob has joined #archiveteam-bs [18:50] *** ave_ has joined #archiveteam-bs [18:58] *** VerifiedJ has joined #archiveteam-bs [19:09] *** lunik14 has joined #archiveteam-bs [19:10] *** lunik1 has quit IRC (Ping timeout: 265 seconds) [19:10] *** lunik14 is now known as lunik1 [19:22] *** DLoader_ has joined #archiveteam-bs [19:31] *** DLoader has quit IRC (Ping timeout: 745 seconds) [19:32] *** DLoader_ is now known as DLoader [19:39] *** VerifiedJ has quit IRC (Quit: Leaving) [19:54] *** Clefable has joined #archiveteam-bs [22:50] *** Arcorann has joined #archiveteam-bs [22:51] *** Arcorann has quit IRC (Remote host closed the connection) [22:52] *** Arcorann has joined #archiveteam-bs [23:39] *** BlueMax has joined #archiveteam-bs