Time |
Nickname |
Message |
00:12
π
|
mgrandi |
But yeah @JAA , some of those files that get updated are updated on a regular basis, one example I found via Twitter are the azure IP ranges https://www.microsoft.com/en-us/download/details.aspx?id=56519 |
00:12
π
|
mgrandi |
I assume the file that is associated with that is updated every so often |
00:13
π
|
JAA |
mgrandi: Yep, I found a bunch of files that appear to be updated weekly. |
00:15
π
|
JAA |
Sucks that they don't keep the old files around. |
00:38
π
|
|
step has quit IRC (Remote host closed the connection) |
01:14
π
|
|
BlueMax has joined #archiveteam-bs |
01:33
π
|
|
synm0nger has quit IRC (Read error: Connection reset by peer) |
01:33
π
|
|
SynMonger has joined #archiveteam-bs |
02:33
π
|
|
Clefable has quit IRC (Quit: ZNC: the superior metal to CBLT) |
03:29
π
|
|
Wingy has quit IRC (Read error: Operation timed out) |
03:36
π
|
|
qw3rty__ has joined #archiveteam-bs |
03:39
π
|
|
Wingy has joined #archiveteam-bs |
03:44
π
|
|
qw3rty_ has quit IRC (Read error: Operation timed out) |
04:02
π
|
|
Wingy has quit IRC (Read error: Operation timed out) |
04:41
π
|
|
bsmith093 has quit IRC (Read error: Operation timed out) |
04:44
π
|
|
bsmith093 has joined #archiveteam-bs |
04:56
π
|
|
bsmith093 has quit IRC (Ping timeout: 745 seconds) |
05:00
π
|
|
Wingy has joined #archiveteam-bs |
05:03
π
|
|
bsmith093 has joined #archiveteam-bs |
05:51
π
|
|
atphoenix has quit IRC (Ping timeout: 265 seconds) |
08:26
π
|
|
fuzzy802 has joined #archiveteam-bs |
08:32
π
|
|
fuzzy8021 has quit IRC (Read error: Operation timed out) |
08:36
π
|
|
atphoenix has joined #archiveteam-bs |
08:36
π
|
|
fuzzy802 is now known as fuzzy8021 |
09:01
π
|
|
legoktm has quit IRC (Ping timeout: 610 seconds) |
09:04
π
|
|
legoktm has joined #archiveteam-bs |
10:37
π
|
|
OrIdow6^2 has joined #archiveteam-bs |
10:39
π
|
|
OrIdow6 has quit IRC (Ping timeout: 265 seconds) |
11:22
π
|
|
HP_Archiv has joined #archiveteam-bs |
11:33
π
|
|
OrIdow6^2 is now known as OrIdow6 |
11:56
π
|
|
HP_Archiv has quit IRC (Quit: Leaving) |
12:01
π
|
|
HP_Archiv has joined #archiveteam-bs |
12:08
π
|
|
HP_Archiv has quit IRC (Quit: Leaving) |
12:42
π
|
|
coderobe has quit IRC (Quit: Ping timeout (120 seconds)) |
12:42
π
|
|
coderobe has joined #archiveteam-bs |
14:30
π
|
|
BlueMax has quit IRC (Quit: Leaving) |
14:31
π
|
JAA |
Gearogs, Discogs's audio equipment database, is shutting down on 2020-08-31. They will upload a data dump to IA themselves. |
14:31
π
|
JAA |
https://support.discogslabs.com/hc/en-us/articles/360011681538-Gearogs-Closing-On-August-31-2020 |
14:34
π
|
JAA |
I'll take this as an opportunity to archive all of their dumps. |
14:34
π
|
JAA |
They'll apparently only upload the last one anyway. |
14:38
π
|
JAA |
All of the Discogs data archives are a bit under 500 GB. All of the other *ogs dumps are just over 2 GB. Cute. |
14:40
π
|
JAA |
While I'm at it, I'll also attempt to make https://data.discogs.com/ and http://data.discogslabs.com/ work in the WBM. |
14:42
π
|
arkiver |
JAA: yes :D |
14:42
π
|
arkiver |
I agree |
14:43
π
|
JAA |
:-) |
14:44
π
|
JAA |
The Discogs dumps are already on IA in items, but there's virtually nothing in the WBM as far as I can see. |
14:45
π
|
arkiver |
yeah |
14:45
π
|
JAA |
Couldn't find the other *ogs dumps on IA. |
14:45
π
|
arkiver |
and while dumps are better for handling the data |
14:45
π
|
arkiver |
wayback machine makes it easier findable for people without technical skills |
14:45
π
|
JAA |
Yep |
14:45
π
|
arkiver |
which is like 99.9% |
14:45
π
|
JAA |
Well, the stuff I'm grabbing is still the dumps though. |
14:46
π
|
arkiver |
those are probably best to get to a safe location first |
14:46
π
|
JAA |
But there are links to those two pages I mentioned across the web, so if someone tries to find a file from there, they might only try the WBM, not the IA search (where you can't even search for filenames etc.). |
14:46
π
|
arkiver |
are you saving the dumps in wayback machine as well? |
14:47
π
|
JAA |
Yeah |
14:47
π
|
arkiver |
also yes to what you said |
14:47
π
|
JAA |
Basically, the idea is to get https://data.discogs.com/ fully working in the WBM. |
14:48
π
|
arkiver |
doesn't seem too extremely big |
14:48
π
|
arkiver |
can we put that in AB? |
14:48
π
|
JAA |
Yep |
14:48
π
|
JAA |
That's my plan. |
14:50
π
|
JAA |
Assuming the WBM rewrites URLs that are JS strings like '//example.org', it should all work fine. |
14:51
π
|
JAA |
And that does appear to be the case. |
14:51
π
|
JAA |
Although the browsing doesn't work because it doesn't serve the S3 XHR data verbatim. |
14:51
π
|
JAA |
See e.g. https://web.archive.org/web/20200420234832/https://data.discogs.com/ https://web.archive.org/web/20190418084744/http://discogs-data.s3-us-west-2.amazonaws.com/?delimiter=/&prefix=data/ |
14:52
π
|
JAA |
Inserts the WBM scripts etc. into the XML, which breaks parsing. |
15:07
π
|
JAA |
Regarding archiving Gearogs the website itself: proper archival with previous revisions etc. requires logging in, as on all Discogs sites. |
15:10
π
|
JAA |
The current revisions should probably work fine through AB, so I'll try that. |
15:14
π
|
JAA |
arkiver: Could you forward the above to Mark or someone else from the WBM team? The WBM scripts shouldn't be inserted into XML data, only HTML/XHTML. |
15:18
π
|
JAA |
Looks like AB doesn't pick up the full-size images. :-/ |
15:21
π
|
JAA |
Oof |
15:21
π
|
JAA |
They also shut down Comicogs. It's already gone. |
15:21
π
|
JAA |
Shut down on 31 July. |
15:22
π
|
JAA |
"We will also be closing Gearogs, Filmogs, Bookogs, and Posterogs, but those will be closed about one month later while we make sure we havenβt overlooked anything. VinylHub will remain open." |
15:30
π
|
|
step has joined #archiveteam-bs |
16:16
π
|
|
britmob has quit IRC (Ping timeout: 265 seconds) |
16:26
π
|
|
Arcorann has quit IRC (Read error: Connection reset by peer) |
17:05
π
|
|
Wingy has quit IRC (The Lounge - https://thelounge.chat) |
17:11
π
|
|
Wingy has joined #archiveteam-bs |
17:35
π
|
|
asdf0101 has quit IRC (Remote host closed the connection) |
17:45
π
|
|
asdf0101 has joined #archiveteam-bs |
18:13
π
|
|
britmob has joined #archiveteam-bs |
18:25
π
|
|
britmob has quit IRC (Read error: Connection reset by peer) |
18:32
π
|
|
britmob has joined #archiveteam-bs |
18:50
π
|
|
ave_ has joined #archiveteam-bs |
18:58
π
|
|
VerifiedJ has joined #archiveteam-bs |
19:09
π
|
|
lunik14 has joined #archiveteam-bs |
19:10
π
|
|
lunik1 has quit IRC (Ping timeout: 265 seconds) |
19:10
π
|
|
lunik14 is now known as lunik1 |
19:22
π
|
|
DLoader_ has joined #archiveteam-bs |
19:31
π
|
|
DLoader has quit IRC (Ping timeout: 745 seconds) |
19:32
π
|
|
DLoader_ is now known as DLoader |
19:39
π
|
|
VerifiedJ has quit IRC (Quit: Leaving) |
19:54
π
|
|
Clefable has joined #archiveteam-bs |
22:50
π
|
|
Arcorann has joined #archiveteam-bs |
22:51
π
|
|
Arcorann has quit IRC (Remote host closed the connection) |
22:52
π
|
|
Arcorann has joined #archiveteam-bs |
23:39
π
|
|
BlueMax has joined #archiveteam-bs |