[00:10] Haven't seen that anywhere no [00:21] looks like it used to be possible to query the /download2/ URL to get the /file2/ URL [00:21] probably not possible anymore, getting 500 with the AppWorld/4.3 useragent [00:30] Hmm, watching wireshark while installing one app, there's about sixty GETs for /file2/{id} with the id incrementing each time, different md5s but the 10digits stay the same [00:30] can you post the URLs? [00:30] or at least 2 [00:31] I can send you the pcap if you want [00:32] but I'll copy out a few urls in any case [00:40] https://0bin.net/paste/JYra5a4R7RqTi18d#LphoM-7juSPysy9te6jPpnje4eN+G2s6xf/qKks6/bW [00:42] *** BlueMax has quit IRC (Read error: Connection reset by peer) [00:45] So I think it's downloading all the files that make up the java app separately? Those links don't check useragent, btw. [01:06] Heh, first thing in their sitemap is a dead link to something from Alicia Keys https://appworld.blackberry.com/webstore/sitemap-manual-index.xml [01:06] *** josey has joined #archiveteam-bs [01:09] *** Wingy has quit IRC (The Lounge - https://thelounge.chat) [01:10] *** Wingy has joined #archiveteam-bs [01:17] The 10 digits are a Unix timestamp a few hours in the future - perhaps the md5 is of that date, the file ID, and a secret [01:23] It will still accept it if you prefix the timestamp with 0s, but not the file ID - seems to suggest that the former, but not the latter, is converted to an int first [01:25] arkiver: how big do you expect it to be? Rsync target will come up when I get back home in 10hrs.... [01:29] Ahh, timestamp! I found what appears to be json for the entire appstore listings, 75751 items sound right? https://appworld.blackberry.com/cas/producttype/all?page=1&pagesize=100 [01:40] I'll grab all of those and make a list of ids [01:51] *** BlueMax has joined #archiveteam-bs [01:52] There https://paste.ubuntu.com/p/tbSDRBGGTp/ [02:03] *** asdf0101 has quit IRC (The Lounge - https://thelounge.chat) [02:03] *** markedL has quit IRC (Quit: The Lounge - https://thelounge.chat) [02:15] *** Wingy has quit IRC (The Lounge - https://thelounge.chat) [02:15] *** Wingy has joined #archiveteam-bs [02:22] IA s3 seems to be dead [02:23] and as I say that.. it comes back [02:35] *** X-Scale has quit IRC (Ping timeout: 745 seconds) [02:45] *** wyatt8740 has joined #archiveteam-bs [02:50] *** LowLevelM has quit IRC (Remote host closed the connection) [02:52] *** X-Scale has joined #archiveteam-bs [02:57] *** LowLevelM has joined #archiveteam-bs [03:06] *** katocala has joined #archiveteam-bs [03:06] *** katocala has left [03:14] I have a general bone to pick with the WBM, so if anyone wants to chime in, feel free. Why is it that when I manually submit a link into the WBM, that after completing, going to view the capture in the viewer never works right? [03:15] Example: I submitted this story of astrophysicist, Ron Mallett, that was done by CNN. And for whatever reason, it took multiple attempt at submitting the links into WBM. And then once it finally did capture it, I still can't get the WBM viewer to display the contents correctly [03:15] This article, https://www.cnn.com/travel/article/time-travel-ron-mallett-scn [03:19] are you using any extensions / browser settings that could be interfering [03:19] does archiv.is cap it fine? [03:24] *** atphoenix has quit IRC (irc.efnet.nl efnet.deic.eu) [03:24] *** benjins has quit IRC (irc.efnet.nl efnet.deic.eu) [03:24] *** Nick-PC has quit IRC (irc.efnet.nl efnet.deic.eu) [03:24] *** SilSte has quit IRC (irc.efnet.nl efnet.deic.eu) [03:24] *** britmob has quit IRC (irc.efnet.nl efnet.deic.eu) [03:24] *** kiska3 has quit IRC (irc.efnet.nl efnet.deic.eu) [03:24] *** ctrl_ has quit IRC (irc.efnet.nl efnet.deic.eu) [03:25] *** britmob_ has joined #archiveteam-bs [03:25] *** HP_Archiv has joined #archiveteam-bs [03:26] *** MrRadar2 has quit IRC (Read error: Operation timed out) [03:27] I don't know about your capture problems, but turning off Javascript on the viewed capture works for me (it's one of those sites, evidently) [03:27] *** HP_Archiv has quit IRC (Read error: Connection reset by peer) [03:30] *** tsr has quit IRC (Read error: Operation timed out) [03:30] *** tsr has joined #archiveteam-bs [03:30] *** klg has quit IRC (Read error: Connection reset by peer) [03:30] *** klg has joined #archiveteam-bs [03:30] *** PurpleSym has quit IRC (Read error: Connection reset by peer) [03:30] *** PurpleSym has joined #archiveteam-bs [03:31] *** svchfoo1 sets mode: +o PurpleSym [03:31] *** svchfoo3 sets mode: +o PurpleSym [03:31] *** HP_Archiv has joined #archiveteam-bs [03:31] *** benjinsmi has joined #archiveteam-bs [03:31] *** MrRadar2 has joined #archiveteam-bs [03:32] Kyoon, no, I even tried submitting manually into the WBM's capture link field in a different browser without any extensions [03:32] Same results [03:32] *** Silvan has joined #archiveteam-bs [03:33] Trying to view the capture just now, I can see the article for a brief moment and after that this page displays, almost like an overlay or something [03:33] https://web.archive.org/web/20200103030641/https://www.cnn.com/travel/article/time-travel-ron-mallett-scn [03:33] *** benjinss has joined #archiveteam-bs [03:39] *** britmob_ has quit IRC (Remote host closed the connection) [03:40] Hp_Archiv / Nick-PC: the playback problem is just something that happens on some sites when you browse them in the WBM - it usually, as in this case, works just to temporarily turn off Javascript [03:40] *** atphoenix has joined #archiveteam-bs [03:41] Happens (or at least happened to me once) on captures of the new Reddit, too [03:41] *** krellstee has quit IRC (Read error: Connection reset by peer) [03:41] *** SoraUta has quit IRC (Read error: Connection reset by peer) [03:41] *** SoraUta has joined #archiveteam-bs [03:41] (Though that's just my experience) [03:42] *** benjinsmi has quit IRC (Read error: Operation timed out) [03:42] *** britmob has joined #archiveteam-bs [03:43] That's not exactly ideal though. At risk of overgeneralizing, what good is an archive if you have to fiddle with the settings to parse the information? [03:44] I recall someone in here saying that the viewer was broken or something. Not sure if that's the case (likely is) but if so, that's where the donated funds that IA receives should be going towards. You can archive everything for the hell of it. But if you can't read or access the archived information in an easy fashion, it's useless/impractical [03:44] blame cnn ;.> [03:44] *** krellstee has joined #archiveteam-bs [03:49] I understand that archiving websites is very site-specific. But there should be an easy way to view/parse what has been archived. Otherwise, what are we doing? eg: an awful lot of time is spent trying to capture as much as possible. And I know nothing will be perfect and seamless. But I've often wondered at what point will major companies and corporations start to realize that it's all of our best interest to work side by side [03:49] historians, archivists, preservationists, etc. [03:50] A lot of this work that's done by ArchiveTeam and other groups online are all rogue, non-formal activities. Which is a shame because it doesn't have to be that way [03:53] And I think the only reason it is that way is because it comes down to X group doesn't want XYZ available to other people, if not out of copyright, then out of ideological agenda. I'm thinking of Wikipedia. They've managed to setup an open-community where pretty much everyone uses and a sizable number of volunteers dedicate time to building further, adding sourced-information. There's the underground aspect of the informal web [03:53] archiving that I just view as unsustainable. End of rant, heh [03:54] Kind of took that off the deep end there, but it's something I've been thinking about for a while. I guess the WBM viewer issues teased that out a bit Lol [03:58] HP_Archiv, even Wikipedia needs archiving from time to time. Some of the bots over there have over-compressed/over-shrunk some of the images under the reason of 'non-free' image. Some are shrunk so much you can't even read the words anymore. The originals were not high quality to begin with. [03:58] ^ - :'( [03:58] Those bots seem to simply say 'resolution is greater than x pixels...must resize downwards' [03:59] making stamp collecting cool again. [03:59] it's more like image thumbnail collecting [03:59] but with thumbnails the size of pinkie nails [04:00] I was looking at some screenshots and old posters, and you can't read all the text in the image anymore. [04:00] atphonenix, good point. I was just thinking, here, that maybe I should remind myself and ask the question, "Has it always been like this?" e.g. long before the internet, long before computers, have archiving and 'saving stuff' endeavors always been risky insofar as people, archivists, pioneers in these activities of saving cultural heritage, have they always had to be somewhat low-key, shady, underground, not very vocal, etc? [04:00] thankfully someone had transcribed the image to text BEFORE the resize happened [04:00] also seen them deres svg images by simplifying/reducing points and rounding corners [04:01] so they can't be blown up in the future [04:01] racoon, yes it's pathetic [04:02] it's not even really saving meaningful space. What's a few GB these days? [04:02] In other words, digital archivists here and elsewhere, pretty much have to keep quiet about their activities because of copyright or legal reasons. And I guess my question, or beef, is why does it have to be this way? And has it always been this way? [04:02] they just don't want to be accused by Pepsi Co for being the source of trademark infringements [04:04] I feel like an open community, or an open way of doing all of this - having conversations with corporate types where something meaningful comes about - would be far more pro-active instead of trying to stealthily save stuff from the ether [04:05] HP_Archiv, I think archivists and culture-savors have long been fringe-y until their work is appreciated decades later when the originals have been burnt in media warehouse fires (https://en.wikipedia.org/wiki/2008_Universal_Studios_fire or intentionally destroyed (https://en.wikipedia.org/wiki/Doctor_Who#Missing_episodes ) [04:05] HP_Archiv: it wasn't always this way. back in the day, when the concept of copyright was first introduced and discussed, people scratched their heads and said "well, we're a people of story-tellers and song-singers. what do you mean we can't tell this story or sing that song? people can own these things now?" [04:05] and after much debate and reflection, people settled on "ok, in the interest of promoting the creative arts, we will allow a 7 year monetization period of any published work." "ok, and add another 7 years if they can show good reason why." [04:06] fast forward to today, after it was extended from 7 to 14 to 28 years, then 50 then 75 then 95 and 100 years. [04:06] they also may get lumped in with hoarders [04:06] mickey law [04:06] now it's 100 years after death in many instances. [04:06] mickey, but also sunny bono [04:06] *** ctrl_ has joined #archiveteam-bs [04:06] oddly enough [04:06] I wonder how to make my works public domain 1 second after death... is a will the route for that? [04:07] sad thing too, is that most of the world doesn't even recognize "public domain" [04:07] which is why creative commons had to come to exist [04:07] *** kiska3 has joined #archiveteam-bs [04:07] yeah CC0 or wtfpl [04:07] Exactly my point ^^ So it's only after time has it's affect that academic-type institutions take notice and then over time the saved cultural heritage is revered, looked after, studied, etc. What an ass-backwards way of trying to learn about ourselves... [04:07] a license that asserts no bond [04:08] WTFPL FAQ: [04:08] Isn’t this license basically public domain? [04:08] Unfortunately, the definition of public domain varies with the jurisdictions, and it is in some places debatable whether someone who has not been dead for the last seventy years is entitled to put their own work in the public domain. [04:08] fact of the matter is, all intellectual property is belongs to the people. Copyright law is very clear, that congress has a duty to assure that all intellectual works MUST ENTER the Public Domain within a reasonable period of time, the shortest period made possible. [04:09] It is the natural, resting disposition, of all published intellectual works that they belong to the cultural herritage and national conscience [04:09] it's crazy how little people know about copyright [04:10] default, everything is Public Domain, with a "brief monetization period." [04:10] I saw a post on... imgur I think it was? someone complaining that an online publication used their photo without permission or attribution [04:10] I take the high-level approach, the ideal, the romanticized notion of all of this, and that is the greater context of why we having anything saved at all. It wasn't intentional, maybe, but the outcome of our saved, curated, and preserved cultural knowledge is cultural evolution. And it's this and our amazing ability to adapt to changing environmental circumstances that has allowed us to flourish as a species. [04:10] and there were lots of comments saying "well you should have copyrighted it" [04:10] and like, that's not how this works [04:11] heh, indeed [04:11] since 1886 [04:11] well, since the late 90s [04:11] the copyright office hasn't been taking declarations of copyright [04:11] er late 1900s [04:11] somewhere in the 70's i think [04:12] the United States was particularly late to implementing the Berne Convention (1988) [04:13] the biggest problem I see in current copyright enforcement, is that publishers now make it practice to wield Copyright as a weapon to supress and make unavailable [04:13] Copyright law is (was?) very clear that copyright cannot be used this way. [04:13] I just look at all of this time, I see the notifications for this channel and #youtubearchive, how many times people submit things in any given day. How many countless hours spent dedicating time of everyone's personal life to try to save as much as we can. And I simply worry that 1. it's not going to be easily parsed in the future, and 2. it's an unsustainable activity without the support of other entities and the general [04:13] public. [04:14] Why publishers don't immediately and automatically forfeit their copyright claims upon minute-1 of a book going out of print, or a movie becoming unavailable (ie, Song of the South), is beyond me. [04:14] Bill Cosby used Copyright to prevent Fat Albert from ever being rebroadcast or sold on DVD [04:14] said it's too racist to ever be seen again [04:15] that should make it automatically public domain, stripping him of rights [04:15] or was it Michael Jackson [04:16] Example of something that might be reasonable (IMO): Default (free) is 5 years copyright from publishing date. Extendable indefinitely in 5 years increments BUT extension fee starts at say $1000 and quadruples each time it is renewed. [04:16] Year: Price [04:16] 0: free [04:16] 5: $1k [04:16] 10: $4k [04:16] 15: $16k [04:16] 20: $64k [04:16] 25 $256k [04:16] 30: $1M [04:16] 35: $4M [04:16] 40: $16M [04:17] 45: $64M [04:17] 50:$128M [04:17] 55:$256M [04:17] 60:$512M [04:17] 65: $1B [04:17] Raccoon: Song of the south is pubic domain i think in japan [04:17] something. [04:17] One last anecdote is that instead of subtly trying to archive as much of YT as we can, I think the IA should attempt to bring Sundar Pichai into the conversation. Google should be part of the conversation of web/internet preservation. Brewster should try to bring the leaders of these tech companies into the archives space and have a healthy conversation and/or debate. [04:17] eventually the renewal fee will be high enough that the value of paying renewal is not there. No more Mickey Mouse 100 years thing. [04:17] here's another terrible side effect. ever notice how hollywood cranks out really shitty remakes and sequels? [04:18] I think that's a more sustainable path IMO ^^ [04:18] that's a trademark hold [04:18] No incentive [04:18] and yes, out of print, no longer readily available content should automatically revert to PD [04:18] what about obsolete software? [04:18] they create a shit movie so they can prevent anyone else from using their marks [04:18] why the fuck are MS-DOS applications still under copyright protection? [04:18] obsolete=out of print/no longer available [04:19] computer software should only have about 5 years of copyright [04:20] *** qw3rty2 has joined #archiveteam-bs [04:20] anything released is now just updated all the time [04:20] well, maybe 5 years since that software build was released. [04:20] about MS-DOS applications, et al old software, I wish we would organize a group that proactively engages companies and coders to release their source code for the preservation of culture and technological achievement [04:20] Idk if the rest of you agree, but Google, for example, certainly would have the infrastructure and resources to dedicate to open-preservation. I can see it as an open-consortium of sorts,backed by non-profits like Gates Foundation, etc. [04:21] we literally need to write to Microsoft, and to Joe I Wrote A Game In 1994, asking for their source files [04:21] I guess this is what they attempted with the Google Books Project, but then were halted due to outdates copyright law. So maybe that was, indeed, their official attempt [04:21] effort needs buy-in from someone with clout in the tech billionaire class [04:21] That's just it ^^ [04:21] These guys should 'get this' [04:22] It's their bread and butter, they're in tech after all [04:22] Getting company to preserve doesn't make money, & there's nothing grandly special about preservation, vs. feeding the poor etc., that will bypass the profit incentive [04:22] did y'all see Software Heritage? [04:22] Yes - nicholas17, good step in the right direction. [04:22] Glad Piql partnered with Github [04:23] OrIdow6, that's shortsighted, I think. Both are equally important. [04:23] OrIdow6: and it's so much worse than that. Publishers actually find themselves in a situation where their back catalogue becomes a liablity -- whether a legal liability, or a social justice rabble liability. [04:23] And releasing content from their back catalogue steals away dollars from the new releases and top 50 [04:24] Raccoon, does it though? We've seen a surge in retro video game re-releases as of late. I would think that would just reinforce loyal fans to buy the new titles as well? [04:24] On tech billionaire buy in making a difference: it's what happened with electric cars. Someone with money/tech/engineering background (Musk) got involved in Tesla early on. Saw the realistic potential. And said this can be a thing. [04:24] You can't dump The Hair Bear Bunch on Netflix for binge watching, without stealing away eyes from Whatever New Is On [04:25] atphonexi: I sort of agree. I am not convinced that Musk' visions have been fully realized, if they ever will at all. Still very early on. [04:25] HP_Archiv: What I mean is that companies by their nature focus on profit, instead of putting it into "good" causes; in the same way that Google gives money back to its shareholders, instead of giving it to the hungry, it gives money to its shareholders, instead of spending it on presevation [04:25] HP_Archiv: you mean shitty Atari Classic and Nintendo Classic consoles with games built in? They're just over-priced novelties that aren't likely to impinge on the sale of the latest Nintendo or XBox console [04:25] You still have to pay like $60 for them [04:26] it isn't necessary to fully realize visions to make a meaningful difference in the direction society is taking [04:26] OrIdow6, doesn't have to be that way though. And yeah, yeah, a lot of things don't 'have to be that way, but are', heh. I get that. But at some point it might make more sense to upend the incentives structure... [04:27] Racooon, well I can't think of any one, specific retro title re-release that paid homage to its original counterpart, but I'm sure there has to be a few, no? [04:27] That was successful and not cheap/cheesy* [04:27] But books and movies take X number of eyeball-hours to read. [04:27] The 5 year renewal fee examples I suggested could go into a creative works funding pot. [04:28] the video media industry is all about eyeball-hours [04:28] even the book industry [04:28] Don't pay the fee? item is now PD. Not reversible [04:28] Item has no value to you? Don't pay the fee. [04:28] oh. i have a youtube video that will make you guys angry and also cry to sleep tonight. [04:28] it has to do with videogames and licenses [04:29] https://www.youtube.com/watch?v=RTkxzQDo0ng [04:29] *** qw3rty has quit IRC (Ping timeout: 745 seconds) [04:29] this isn't the only title. there are over 800 titles currently affected [04:29] atphoenix, remains to be seen with Tesla, as an example. The real game-changer is storing energy in the form or better battery technologies. That, or energy breakthroughs in a new form altogether. The 'I drive an electric car' is pretty short-sighed, but kind of necessary if you look at the long-perspective. eg. it'll take another 100 years to get off of cars entirely before we figure out another entirely different mode of [04:29] transportation [04:30] eg. it's the best we have until we get to where we're going [04:30] Almost like 2 steps forward, one step backward [04:31] We didn't get off vacuum tubes overnight. Nor off of memory cores. Or drum storage. [04:31] Exactly. Progress is incredibly, excruciatingly slow, heh [04:32] hint (video above): Game publishers are no able to publish their games indefinitely. Licneses for game content have a built-in time bomb. Music, trademark people names and logos, even 3d models of vehicle makes. the rights holders give the video game publisher, often, a max of 5 years and 5 years only, with no renewal option. [04:32] but if you don't push the tech...it also doesn't advance. Apollo program pushed tech *hard*. [04:32] This ^^ Good point [04:33] And also , Raccoon, great point too [04:33] since most games are download-only, or remote server host, the game publisher is required to delete the game after 1825 days. [04:33] day 1826 and they get sued for a million dollars [04:34] Games aren't the victims. Eyes on the Prize was stuck in limbo for 20 years. https://en.wikipedia.org/wiki/Eyes_on_the_Screen [04:34] watch that video [04:35] Well anyway, thanks all for reading my rants, heh. The thoughts and ideas I've expressed here over the past half hour have been plaguing my mind for a while. Again, I just see all of this as a time-suck and worry that it hopefully won't all be for naught. One nice little anecdote, is that like Tesla and the electric car movement generally, rogue/fringe archiving of the web is the best we can do at the moment [04:35] there will be a gap in video gaming history starting with the online-dependent games [04:36] and Google Stadia won't help that [04:37] Raccoon, video is queued for me [04:38] atphoenix: good article. of related note, the famous MLK "I had a dream" public address belongs to a company in the UK and cannot be screened in classrooms without a hefty royalty. [04:38] wtf [04:39] the MLK family estate owns the audio, and the UK company owns the video [04:39] meanwhile in Argentina photocopying textbooks is the norm [04:39] don't sing happy birthday either [04:39] that was fixed 2 years ago~ [04:39] was it? [04:39] Happy Birthday has been unshackled [04:39] by the courts [04:40] you know what's still weird to me [04:40] game mods [04:40] turns out the rights holders lied and never held rights [04:40] they extorted people out of money for some 40 or 50 yeras [04:41] wow, https://en.wikipedia.org/wiki/Happy_Birthday_to_You#2013_lawsuit [04:41] some games have no license (as in EULA) saying what you can or can't do with it, players reverse-engineer them and make patches and mods and release them, and the original developers applaud them... yet they would be fully in their legal right to sue them if they happened not to like them [04:41] nicolas17: many have and do [04:41] Blizzard is notorious [04:41] I think Rockstar Games goes after everyone [04:42] they killed many-a-mod of Warcraft II and Starcraft [04:42] man, Starcraft (original) mods were fantastic [04:42] This is why I brought MODDB.com to the attention of the Ops in here... [04:42] someone asked for archival of https://ragepluginhook.wixsite.com/ragepluginbackup a few days ago [04:42] There was a Starwars-themed mod for Warcraft or Starcraft. And there was the StarDraft project. [04:42] I think they haven't come back since [04:42] Well, this is intersting too [04:43] George Lucas was very pro-fan and allowed all sorts of fan fiction, even encouraged it [04:43] Now Disney owns this shit, and Disney don't play dat [04:44] ModDB.com would be a huge crawl, I was told. But it should be archived, by right. Actually, if I remember, I think someone submitted this into AB when I requested a few months back... I forget [04:44] Good luck finding 'Sexy Slave Disney-Princess Leia' costumes anymore [04:45] probably banned from Comic Con [04:45] a mention of stardraft: https://www.reddit.com/r/broodwar/comments/67kd77/stardraft_starcraft_unit_editor/ [04:46] HP_Archiv: still don't understand one thing. If your friend owns it, then why can't he drop the database in your lap. Then it doesn't have to be web crawled. [04:46] years ago I think I attempted to save a copy of their forums before it died. I have many old drives I need to go through. [04:46] Who said I have a friend who owns ModDB? [04:46] You just spit a list of external links to a download queue. [04:46] HP_Archiv: I thought that was the story. for the harry potter site [04:46] I have no idea who the owner is, certainly not friends with him. [04:47] Oh, that's different. There are HP modds from various modders/gamers on ModDB [04:47] HP-Games.net was the site I wanted into AB that has outlinks to hosted files on Yandex and GDrive [04:48] Still have to get around to doing that... But I meant the entire Moddb.com site, if it hasn't already been archived [04:49] I assumed that "HP" in your name stood for "Hewlett-Packard"; "Harry Potter" makes more sense [04:49] Lol it's all good [04:50] I've made so many site requests in here I honestly forget if Moddb has been taken care of or not. I remember that someone, maybe JAA did this I forget, mentioned the mod files are hosted right there on the main site from their ftp address. I *think* they got it, don't remember [04:54] I just checked it's been archived - https://archive.fart.website/archivebot/viewer/domain/www.moddb.com [04:55] I'm working on a few archiving projects at once, juggling work, personal life, etc. So apologies if I seem scatter brained, heh [04:55] *** nicolas17 has quit IRC (Quit: Konversation terminated!) [04:56] *** odemgi has joined #archiveteam-bs [04:56] I'm finding that doing AT-things can easily be a full time job [04:56] we curate what we can because we must [04:57] I am so thankful Firefox can handle crazy amounts of tabs these days. 1300+ from all the various yahoo groups stuff and 8tracks stuff and other non-AT stuff... [04:57] weee [04:57] This was my point earlier ^^ That a lot of time is spent doing this, I hope, is not all for naught. I mean, eventually time has the final say. Even the best preserved physical documents and film will eventually disintegrate. [04:58] But that keeping one step ahead of that unstoppable process, we can move, or migrate information over millennia enough to where things are not lost for good :) [04:59] *** odemgi_ has quit IRC (Ping timeout: 276 seconds) [04:59] Actually, we can think of this time period as 'the great migration', e.g. digitization [05:00] In which there will be many more to come in the future [05:01] Find a probate attorney to draft up a Will you can use to make sure all your harddisks are boxed up and shipped to JAA or Jason Scott upon your death. [05:03] Let them deal with properly cataloging and sorting :p [05:03] Lol [05:04] I've shipped contents, a physical hard drive, to IA HQ in San Francisco before. Snowballing data in that way, bypassing the internet entirely is a smart, more efficient method [05:06] if it gets there. occasionally have to worry about Stasi intercepting your data and treating it as contraband/munitions. [05:06] *** odemg has quit IRC (Ping timeout: 745 seconds) [05:07] Well, in my case, this was film data, DPX files. I think I talked about this with you before, Raccoon :) [05:07] Raccoon: well, hopefully you don't send your only copy of it in the mail.... [05:07] :p [05:07] 1.7TBs uploading to the IA through through the browser was impractical. So contacted them and they were happy to accept a physical drive to snowball the data in that way [05:08] Also I don't think Jason wants to deal with some guy's box of hard drives [05:08] could be wrong, maybe that's exactly what he wants to deal with :p [05:09] Just label them "Scanned Magazines" [05:09] actually contains 8 TB of IRC logs [05:11] *** odemg has joined #archiveteam-bs [05:13] I bet Jason would *love* some old BBS hard drives. Full contents. [05:14] perfect for textfiles.com [05:14] 8 TB of logs of somebody playing door games, mostly pimp wars [05:15] what of Trade Wars 2002? [05:15] I bought the license to that for our local BBS [05:15] that'd be too interesting :) [05:15] Nice chatting with you guys, I'm off here for a while. Have a good evening/night ^^ [05:15] anyhow, if all else fails, there is one other big archive out there...in Utah... [05:15] night HP [05:16] *** HP_Archiv has quit IRC (Quit: Leaving) [05:16] (congress has the keys to that one) [05:20] the Hoover Natl Monument. [05:24] *** markedL has joined #archiveteam-bs [05:27] That is an interesting name for it. Haven't heard it called that. But maybe someday it will be. [05:27] double meaning too. J.E.Hoover and also vacuuming... [05:28] I just made it up :) [05:28] sucks up all data [05:28] Does anyone know if AT did a project to rescue podcasts from the now-defunct site/app called Bumper [05:29] A content creator asked if anyone might have backed up his podcasts @ https://youtu.be/1330RHvXQrU?t=197 [05:35] doesn't look like it. I don't see it listed in Deathwatch. Found this article https://medium.com/bumpers/shutting-down-bumpers-de62f4a9a0ee [05:44] i don't recall such a project [05:46] HP_Archiv (if you read logs) -- IA (which we aren't the same as), does have quite a bit of wide public and institutional support. [05:48] HP_Archiv usually doesn't read logs [05:48] And they have a whole department, Archive-It, devoted to providing web archiving as a service to institutions (more or less all of which goes into the WBM) [05:48] markedL: Ah well. [05:49] And there are various organizations doing web archiving totally independent of IA, although none as well known. [05:54] *** markedL has quit IRC (Quit: The Lounge - https://thelounge.chat) [05:55] *** marked1 has joined #archiveteam-bs [06:00] and as for stuff being hard to access -- that can actually be a *bonus* to preservation, in some cases, as it keeps people who object [06:00] ... to the stuff being available from either realizing it is, or finding it too accessible for their taste [06:00] OK, that's *my* rant over. [06:31] *** asdf0101 has joined #archiveteam-bs [07:06] *** marked1 has left The Lounge - https://thelounge.chat [07:12] Somebody2, yes, lack of accessibility (which often means not showing up in a common search engine) cuts both ways. Accessible stuff is more easily removed and possibly destroyed. Not unlike antiquities excavated from archaeological digs, then placed in accessible museums, and then are destroyed by those who feel threatened by history. [07:14] *** krellstee has quit IRC (Read error: Connection reset by peer) [07:14] *** krellstee has joined #archiveteam-bs [07:18] The stuff that really survives sometimes is that which is preserved by being out of reach (buried, unknown, remote). This also complicates relations between nations that are holding ancient relics that were taken from other nations. Should everything be given back to the original nation? Or are the offsite remote relic collections actually a good thing, especially in case of severe societal upset and political turmoil? At [07:18] least with digital we have the possibility of keeping perfect remote copies, so we don't have an issue about physical, uncloneable objects to contend with. [07:35] *** oxguy3 has joined #archiveteam-bs [07:36] hey could someone toss https://media.miamidolphins.com/ into archivebot? hasn't made it into WBM since the 2019 nfl season started; also seems to be poor WBM coverage for linked files [07:47] !archive https://media.miamidolphins.com/ --explain "For oxguy3; 'hasn't made it into WBM since the 2019 nfl season started'; 'poor WBM coverage for linked files'" --concurrency 1 --ignoreset blogs [07:48] ...Whoops [07:48] lol, thanks! [08:42] *** oxguy3 has quit IRC (Quit: My MacBook has gone to sleep. ZZZzzz…) [09:20] *** VoynichCr has quit IRC (Quit: leaving) [10:28] *** Mateon1 has quit IRC (Remote host closed the connection) [10:28] *** Mateon1 has joined #archiveteam-bs [10:56] *** BlueMax has quit IRC (Read error: Connection reset by peer) [10:57] *** BlueMax has joined #archiveteam-bs [10:59] *** dxrt_ has quit IRC (The Lounge - https://thelounge.chat) [11:04] *** oxguy3 has joined #archiveteam-bs [11:17] *** oxguy3 has quit IRC (Quit: My MacBook has gone to sleep. ZZZzzz…) [11:24] *** schbirid has joined #archiveteam-bs [11:30] *** SoraUta has quit IRC (Read error: Operation timed out) [11:39] *** dxrt_ has joined #archiveteam-bs [11:39] *** dxrt sets mode: +o dxrt_ [11:45] *** BlueMax has quit IRC (Read error: Connection reset by peer) [11:56] *** dxrt_ has quit IRC (The Lounge - https://thelounge.chat) [11:56] *** dxrt_ has joined #archiveteam-bs [11:56] *** dxrt sets mode: +o dxrt_ [12:31] *** SilSte has joined #archiveteam-bs [12:32] *** Silvan has quit IRC (Ping timeout: 745 seconds) [13:24] *** tuluu has quit IRC (Quit: No Ping reply in 180 seconds.) [13:26] *** tuluu has joined #archiveteam-bs [14:01] *** X-Scale` has joined #archiveteam-bs [14:12] *** X-Scale has quit IRC (Ping timeout: 745 seconds) [14:12] *** X-Scale` is now known as X-Scale [14:58] *** d5f4a3622 has quit IRC (Read error: Connection reset by peer) [14:58] *** af10b3e5e has joined #archiveteam-bs [14:58] *** af10b3e5e has quit IRC (Read error: Connection reset by peer) [14:59] *** af10b3e5e has joined #archiveteam-bs [15:31] *** limb has quit IRC (WeeChat 2.2) [15:34] *** LowLevelM has quit IRC (Read error: Operation timed out) [15:56] *** sirvy has quit IRC (Ping timeout: 615 seconds) [16:12] *** sirvy has joined #archiveteam-bs [16:23] *** marked1 has joined #archiveteam-bs [16:26] *** krellstee has quit IRC (Read error: Connection reset by peer) [16:27] *** krellstee has joined #archiveteam-bs [16:27] "< Midnight> I was gonna put Clementine on there." Hmm, I wonder why? Project seems active still despite no official releases since 2016 (but 1.4rc1 was tagged yesterday and there has been a three-year break between releases before). [16:39] atphoenix: Yes indeed! I agree with with all that. [16:41] Let's find a billionaire that is willing to pay for a full IA mirror on Svalbard. :-) [16:42] Yes please! :-) [16:45] google says there's only 2,604 of those, should be easy [16:46] Thank you for volunteering. [16:46] :-P [16:50] the billionaires I've met don't return my calls, better luck to cold call or write to their foundations [16:52] I don't think I've knowingly met a billionaire. [16:52] the comments about a billionaire tech leader being involved in tech archive efforts were less so about the money for running the archive (which I think is crowd sourceable), and more so about the power/influence that leader could have in creating a sea change in how tech handles these issues. E.g. having the founders of the likes of Apple/Microsoft/Google/Facebook onboard would be huge [16:58] You're either overestimating crowdsourcing or underestimating the cost of building an IA mirror on Svalbard (or in a similar location). [17:00] Bezos (or his team) has a public email address [17:00] But there's another issue with having billionaires on board: an archive has to be fully independent so its contents can't be manipulated. [17:00] Gates and Zuckerburg have active foundations [17:05] \]\ [17:05] oops [17:11] *** LowLevelM has joined #archiveteam-bs [17:16] *** qw3rty2 has quit IRC (Quit: Nettalk6 - www.ntalk.de) [17:17] *** qw3rty has joined #archiveteam-bs [17:42] *** underscor has quit IRC (Read error: Operation timed out) [17:43] *** underscor has joined #archiveteam-bs [17:44] *** qw3rty has quit IRC (Quit: Nettalk6 - www.ntalk.de) [17:47] *** qw3rty has joined #archiveteam-bs [17:50] I see some (not all) of storage needed for archiving, especially for creating redundancy (but not for live access), as being crowd-sourceable. Large scale distributed archiving is what I had in mind for creating an IA mirror. Split IA into shards (such that each shard is meaningful even if others are somehow lost), distribute 10 copies of each shard to 10 distinct locations. [17:50] In torrent terms, at least 1 seed (IA), maybe a 2nd (IA Egypt), and the rest of the copies are distributed virtual copies. The DrivePool and DriveBender software tools can do something sort of similar to this proposal across Windows filesystems (copies of files are maintained, redundancy is present, but unlike RAID, if more drives are lost than the configured redundancy level, the remaining drives are still readable, with [17:50] intact files.) Goal is failure of the redundancy mechanism should not result complete data loss. [17:50] On scale: 60 PB = 6000 10 TB drives. x10 for 10 copies = 60000 10 TB drives, distributed around the world. At 500 GB/month (say for participants with 1 TB data caps) that is a 20 month commitment to populate the drive or retrieve a full drive). End user cost is the price of dedicating a drive and energy to run it and the computer it is connected to...maybe as basic as a RPi. Or an old laptop booted from a USB flash drive. [17:50] Are there 60,000 people across the world in the various archive/datahoarder/fan communities with $200 that they are willing to spend towards the effort? Chances go up if the barriers to entry are lowered by making participation easy. Higher still if the effort is made personal to them by designing the system to let them opt-in/out of certain archived data that is tagged by categories/characteristics. Even higher if they can [17:50] indicate specific data they want included in the public set (maybe identified by file hash?). [18:21] *** cerca has joined #archiveteam-bs [19:09] atphoenix: So basically, IA.BAK [19:13] those are basically thoughts that dovetail into IA.BAK + Valhalla [19:26] Jens, you posted about the inquirer last month, now it is on Deathwatch. Do you know if archivebot finished covering it? [19:30] *** systwi_ has joined #archiveteam-bs [19:36] *** systwi has quit IRC (Ping timeout: 622 seconds) [19:37] *** qw3rty has quit IRC (Remote host closed the connection) [19:37] *** qw3rty has joined #archiveteam-bs [19:43] *** VoltZero has joined #archiveteam-bs [20:33] *** SoraUta has joined #archiveteam-bs [20:34] re: Blackberry. In addition to the 75,751 app ids listed by their website api I have json details for a further 49,880 from when I was downloading incrementally. These aren't publicly visible on the website. [20:40] Checking one, the icon+screenshots for it are still downloadable. I'll have a look at downloading images for all of this stuff sometime. [20:41] *** VoltZero has quit IRC (Quit: Going offline, see ya! (www.adiirc.com)) [21:00] *** Mateon1 has quit IRC (Remote host closed the connection) [21:00] *** Mateon1 has joined #archiveteam-bs [21:17] SketchCow: so Collider Videos youtube channel has 9278 video ids [21:18] you can make the id list with this : youtube-dl -j --flat-playlist https://www.youtube.com/user/ColliderVideos/videos | jq -r '.id' > list.txt [21:18] i'm going to see about grabbing the older videos first [21:39] *** qw3rty has quit IRC (Read error: Connection reset by peer) [21:39] *** qw3rty has joined #archiveteam-bs [22:30] *** nyany_ has quit IRC (Read error: Connection reset by peer) [22:34] *** nyany_ has joined #archiveteam-bs [22:53] *** BlueMax has joined #archiveteam-bs [23:01] *** icedice has joined #archiveteam-bs [23:16] *** schbirid has quit IRC (Quit: Leaving) [23:20] *** BartoCH has quit IRC (Ping timeout: 615 seconds) [23:21] *** BartoCH has joined #archiveteam-bs [23:40] *** qw3rty has quit IRC (Remote host closed the connection) [23:40] *** qw3rty has joined #archiveteam-bs