[00:00] *** IAmbience has quit IRC (Quit: Connection closed for inactivity) [00:02] -> #yahoosucks [00:04] JAA: betamax said it could be #warrior, here or #yahoosucks [00:06] Basically, it should always go into the most specific channel. This is the general chat channel, #warrior is for warrior stuff (as in, issues and development of the VM), #yahoosucks in this case is for this project. So #yahoosucks is where this should go. [00:59] *** BlueMax has joined #archiveteam-bs [01:21] *** n00b330 has joined #archiveteam-bs [01:22] *** n00b330 has left [01:24] *** qw3rty2 has joined #archiveteam-bs [01:30] *** qw3rty has quit IRC (Ping timeout: 745 seconds) [01:34] *** tech234a has joined #archiveteam-bs [02:30] *** Smiley has quit IRC (Ping timeout: 252 seconds) [02:40] *** Rotzer has joined #archiveteam-bs [02:42] Hello, could anyone download http://game299716.konggames.com/gamez/0029/9716/live and https://game306509.konggames.com/gamez/0030/6509/live/index.html with Chromebot? These are some Kongregate games that I want to play offline using pywb. The problem is that since they are made in Javascript when loading, it also load additional files so it is impossible to download them in a conventional way. I tHello, could anyone download http:// [02:42] game299716.konggames.com/gamez/0029/9716/live and https://game306509.konggames.com/gamez/0030/6509/live/index.html with Chromebot? These are some Kongregate games that I want to play offline using pywb. The problem is that since they are made in Javascript when loading, it also load additional files so it is impossible to download them in a conventional way. [02:42] I tried using crocoite but I couldn't make it work. From already thank you very much. [02:45] Sorry about that, sometimes I have trouble sending long messages with this program. I'll try again: [02:45] Hello, could anyone download https://game299716.konggames.com/gamez/0029/9716/live and https://game306509.konggames.com/gamez/0030/6509/live/index.html with Chromebot? These are some Kongregate games that I want to play offline using pywb. The problem is that since they are made in Javascript when loading, it also load additional files so it is impossible to download them in a conventional way. [02:45] I tried using crocoite but I couldn't make it work. From already thank you very much. [03:01] Rotzer: I can throw them in chromebot, sure. But are you sure they'd work in pywb? [03:02] I guess it's an interesting test. [03:04] I don't know, I installed pywb recently but I think Chromebot should be more precise to download this type of pages. If you're interested, I can try it later and tell you how it turns out. [03:04] I have a feeling that all the assets won't be picked up just by Chrombot. I think you would have to play the game to expose all the assets. [03:05] That's my suspicion, anyway. [03:06] jodizzle, I was playing this with the network monitor open and did not load anything else, apparently the games are fully loaded at the beginning. [03:08] *** underscor has quit IRC (Quit: No Ping reply in 180 seconds.) [03:08] Really? With the first link, just pressing "New Game" seems like it downloads at least one new .ogg file [03:08] *** underscor has joined #archiveteam-bs [03:10] Strange, I'm going to clean the cache and do another test. [03:12] Anyway, seems like chromebot saved the links. The results will be available as WARCs in about a day. [03:12] At this link: https://archive.org/details/archiveteam_chromebot?sort=-publicdate [03:19] Thanks, and that's right, the first game load http://game299716.konggames.com/gamez/0029/9716/live/media/breath-done-jzg9hay7(1).ogg when you start a game. Anyway it doesn't bother me much to have it without music. [03:19] By the way, if one of these days you decide to archive http://kongregate.com/ you will also have to take into account this type of games. [03:23] *** qw3rty has joined #archiveteam-bs [03:28] *** Larsenv has quit IRC (Excess Flood) [03:30] *** qw3rty2 has quit IRC (Ping timeout: 745 seconds) [03:34] Yeah, archiving kongregate would be a substantial task. But I also think there are groups who have been archiving games from those sorts of sites for a long time now. [03:39] The problem is that games like Labyrneath and similar could represent a challenge for the way they are made, especially if we consider that Flash is a foot in the grave so these games in "HTML5" will be more and more. [03:39] *** zyphlar_ has joined #archiveteam-bs [03:41] *** Larsenv has joined #archiveteam-bs [03:53] *** godane has joined #archiveteam-bs [04:03] One question, are there plans to archive http://wattpad.com/ ? I know it's mostly used to write shitty fanfic and stuff like that but it turns out that in archive.org is excluded and it would be nice to have something to look in case for X or Y reason someone decides to delete everything they had written. [04:34] *** odemgi has joined #archiveteam-bs [04:40] *** odemgi_ has quit IRC (Read error: Operation timed out) [04:40] *** qw3rty2 has joined #archiveteam-bs [04:44] *** tech234a has quit IRC (Quit: Connection closed for inactivity) [04:50] *** qw3rty has quit IRC (Ping timeout: 745 seconds) [04:59] *** killsushi has joined #archiveteam-bs [05:04] *** tonsofpcs has quit IRC (Read error: Operation timed out) [05:48] *** zyphlar_ has quit IRC (Quit: Connection closed for inactivity) [05:50] *** HP_Archiv has joined #archiveteam-bs [05:51] Hey all! -Archivist has instructed me to get the attention of someone of you in here for entire site captures, some that are at-risk. He assured me that if I pinged someone in here, you would make sure the site would be ingested and archived properly [05:51] The sites are as follow: [05:52] http://thesciencenetwork.org/ http://redump.org/ https://www.barneys.com/ [05:52] For Barneys, see the recent news: https://www.nytimes.com/2019/11/01/business/barneys-bankruptcy-authentic-brands.html [05:53] They claim the online store will remain, but it will likely be shut down over time as well. [05:53] Can someone make sure these 3 sites and their associated hosted video/photos/data all be captured and archived properly? [05:54] HP_Archiv: Sure. Is there any particular motivation behind archiving the first two sites? [05:56] Hey @jodizzle, yes. The Science Network, though updated occassionally is fairly old. They host all of their conference videos on the site, but Flash based I think. They're responsible for the early 2000's 'Beyond Belief' conferences, see example: [05:56] https://www.youtube.com/results?search_query=neil+degrasse+tyson+beyond+belief [05:56] Not sure if anyone has archived their site/all of the videos they've produced, but thought I would pass it along. [05:57] For Redump: -Archivist explained that the owner is fussy and will shut down the site without notice if he suspects someone is trying to scrape it. Not sure if you guys have tried previously, I'm sure you have, but thought I would request it anyway [05:59] Much appreciated if you guys could make sure these sites are captured. Thank you ^^ [06:00] Well, none of the sites have been grabbed by us before, so I will queue them up in archivebot. I doubt it will get the Flash video content though (at least not in a way that can be nicely played back), so that may require a more targeted grab at some point. [06:02] Also, someone threatening to shut down the site if there's scraping sounds pretty risky, but I guess I'll give it a shot. [06:03] I just tested one of the videos from a 2014 conference. The player is Flash, but it appears that the video is hosted on the site directly as an .mp4 (not sure if that makes a difference though) [06:03] that actually makes it a lot simpler [06:03] Ah, that does make a difference. If the .mp4 is linked in the HTML source somewhere, archivebot will fetch it. [06:03] Or in the Javascript source. [06:04] or if you can pluck out a list of urls and !ao< them, it'll work in wayback [06:04] (modulo flash) [06:04] Yep, that too. [06:05] Okay awesome. Yeah, for example, this great discussion - I watched this live online in 2011 - is hosted as a .mov file, with a download video option. But in order to watch on the site you need Flash. [06:05] http://thesciencenetwork.org/programs/the-great-debate-what-is-life/what-is-life-panel [06:06] There are many videos, not sure if I have the time to pluck individual links for all of them... But maybe archivebot can fetch, hopefully [06:07] And yeah, for Redump - it's a massive respository of CD-ROM/video/computer game/disc hashing data. -Archivist mentioned that people have tried to do small scrapes here and there, but without much success per the owner making threats [06:09] Ah, there's a "Download Video" link for thesciencenetwork.org, which links out to s3. [06:10] No guarantee that all videos will have one, but it's a start. [06:10] And sure, I'll try to be careful with Redump. [06:11] Thank you ^^ If I can be of any assistance, please let me know. Though I don't have too much free time, I'll help in any way I can. I'm new to the online archiving communities, and -Archivist has been great/patient with my requests. He directed me to you guys for entire site archiving. [06:12] I'd recommend joining #archivebot if you want to monitor the archiving. [06:14] Okay cool. Will do. For Redump, I'm actually working on a project for digital preservation of early 2000's Potter PC Games, and among the many entries on there is disc data from all the different Potter games, on different platforms, regions, etc. The small group of people helping me have been in touch with several former developers, a few people from Warner Brothers, and one person out of the LoC's film division for video game pres [06:15] We're looking for the HP 1 PC game's proto source archive. So that's really the motivation behind seeing Redump archived accordingly. [06:17] *** Ryz has quit IRC (ircd.choopa.net irc.mzima.net) [06:17] *** Fusl has quit IRC (ircd.choopa.net irc.mzima.net) [06:17] *** markedL has quit IRC (ircd.choopa.net irc.mzima.net) [06:17] *** SketchCow has quit IRC (ircd.choopa.net irc.mzima.net) [06:17] *** kyledrake has quit IRC (ircd.choopa.net irc.mzima.net) [06:17] *** paul2520 has quit IRC (ircd.choopa.net irc.mzima.net) [06:17] *** nyany__ has quit IRC (ircd.choopa.net irc.mzima.net) [06:17] *** Fionera has quit IRC (ircd.choopa.net irc.mzima.net) [06:17] *** Yurume has quit IRC (ircd.choopa.net irc.mzima.net) [06:17] *** atomicthu has quit IRC (ircd.choopa.net irc.mzima.net) [06:17] *** svchfoo3 has quit IRC (ircd.choopa.net irc.mzima.net) [06:17] The Science Network is self explanatory - those conferences/panels are academic/intellectual in nature, worthy of being saved. [06:18] Barneys has been an iconclastic figure in the NYC landscape/history for many, many decades. Though I've never shopped there, thought it was worth capturing their online store before that goes away too. [06:18] Anyway, thank you :) [06:19] *** Ryz has joined #archiveteam-bs [06:19] *** Fusl has joined #archiveteam-bs [06:19] *** markedL has joined #archiveteam-bs [06:19] *** SketchCow has joined #archiveteam-bs [06:19] *** kyledrake has joined #archiveteam-bs [06:19] *** paul2520 has joined #archiveteam-bs [06:19] *** nyany__ has joined #archiveteam-bs [06:19] *** Fionera has joined #archiveteam-bs [06:19] *** Yurume has joined #archiveteam-bs [06:19] *** atomicthu has joined #archiveteam-bs [06:19] *** svchfoo3 has joined #archiveteam-bs [06:19] *** irc.mzima.net sets mode: +ooo Fusl SketchCow svchfoo3 [06:19] *** Fusl__ sets mode: +o Fusl [06:19] *** Fusl__ sets mode: +o SketchCow [06:19] *** Fusl__ sets mode: +o svchfoo3 [06:19] *** Fusl_ sets mode: +o Fusl [06:19] *** Fusl_ sets mode: +o SketchCow [06:19] *** Fusl_ sets mode: +o svchfoo3 [06:20] *** Proto_ has joined #archiveteam-bs [06:20] Ok so, Kd l [06:21] I’d like to archive a wiki. It’s called “Le Miiverse Resource”. It’s a wiki made for miiverse information [06:21] It is going to shut down tomorrow [06:22] we have some wiki archiving tools! [06:22] link me? i can tell you if our tooling works on it [06:22] Ok sure [06:22] *** Proto__ has joined #archiveteam-bs [06:23] https://le-miiverse-resource.fandom.com/wiki/Le_Miiverse_Resource_Wiki [06:23] Hello? [06:23] hello [06:23] Hey [06:23] Hey [06:23] *** tonsofpcs has joined #archiveteam-bs [06:25] Uhh, you there? [06:25] ok cool yeah that works with our tool! [06:25] i'm running it now [06:25] Nice! [06:25] Thank you! [06:25] might take a few hours [06:26] Sweet [06:26] I actually do have a general question - without being too nosey - what's the long-term plan for data that is captured by you guys? How is it stored, geographically? [06:26] would you like me to email you some info about the dump once i've done it? or would you like to stay around in here? [06:26] Email please! [06:26] i'm going to upload it to archive.org, which is physically in the state of california [06:26] pr0to13377331@gmail.com [06:26] okay :) [06:27] *** Proto_ has quit IRC (Ping timeout: 260 seconds) [06:27] Thank you soo much [06:27] sure thing :) [06:27] *** Fusl sets mode: +o kiskabak [06:27] *** Fusl sets mode: +o kiska [06:27] *** Fusl sets mode: +o kiska18 [06:27] *** Fusl sets mode: +o chfoo [06:27] *** Fusl sets mode: +o me [06:27] *** Fusl sets mode: +o Kenshin [06:27] *** Fusl sets mode: +o Fusl_ [06:27] *** Fusl sets mode: +o hook54321 [06:27] *** Fusl sets mode: +o Kaz [06:27] *** Fusl sets mode: +o HCross [06:27] *** Fusl sets mode: +o Fusl__ [06:27] *** Fusl sets mode: +o AlsoJAA [06:27] *** Fusl sets mode: +o arkiver [06:27] *** Fusl sets mode: +o jrwr [06:27] *** Fusl sets mode: +o astrid [06:27] *** Fusl sets mode: +o dxrt_ [06:27] *** Fusl sets mode: +o svchfoo1 [06:27] *** Fusl sets mode: +o dxrt [06:27] *** Fusl sets mode: +o PurpleSym [06:27] *** Fusl sets mode: +o ivan [06:27] *** Fusl sets mode: +o JAA [06:28] @astrid, okay cool. I'm actually located not far from IA HQs in San Francisco. Thanks again for the requests! [06:28] nice [06:29] you should go to one of their friday open-house sessions then :) [06:29] I just recently located from the North East into SoCal, so it's on my list of things to do :) [06:29] Anyone can go, right? I don't have an MLIS yet [06:30] yeah i think the main requirement is shoes and a shirt [06:30] Heh, something tells me that you're half joking considering it's Cali [06:31] *** Proto__ has quit IRC (Ping timeout: 260 seconds) [06:34] @astrid, so have a better understanding: For example, once Barneys has been completely archived and put on IA, will I be able to actually download a file and/or search through and view the site as it once was post-capture? [06:35] what is barneys? [06:35] no probably not search [06:35] https://en.wikipedia.org/wiki/Barneys_New_York [06:35] https://www.nytimes.com/2019/11/01/business/barneys-bankruptcy-authentic-brands.html [06:35] As mentioned a few minutes ago, they have been a staple and a part of NYC culture for many decades. [06:36] yea oh that barneys [06:36] naw basically nothing is ever perfect fidelity [06:36] *** wyatt8740 has quit IRC (Read error: Operation timed out) [06:36] especially fancy corporate things [06:37] Hm. So how can the capture be parsed? Other words, have can I view the archive job once it's on IA? [06:37] how can* [06:38] I'm just a bit confused on how post-archiving works (eg: what good is that capture if it's not easily readable/understandable) [06:39] https://web.archive.org [06:39] you can seek the .warc that is created from the capture, this has all the request and response headers and pages [06:39] you can browse it on web.archive.org, yes, you may be familiar with how archived web sites are ... sometimes not perfect [06:39] that's what i'm talking about [06:41] Okay, yeah makes sense. Like I said, I come to the table new at all of this. Thanks for your patience, heh [06:41] often you can see what a page looked like but functions on the original site like search won't run [06:42] ^^ Right, I didn't think that sort of capture was possible anyway. I guess so long as the look and feel of the site is captured, that's all that matters [07:24] *** jake_test has joined #archiveteam-bs [07:27] *** wyatt8740 has joined #archiveteam-bs [07:30] @astrid, one more link for you guys, https://www.moddb.com/ [07:30] ? [07:30] This site is dedicated to game modders, who upload custom maps/files, etc. Do you think you can archive it? [07:31] ¯\_(ツ)_/¯ [07:31] we can archive anything [07:32] i should just be clear, i havent started any jobs on your behalf, im not really in a state to safely drive archivebot tonight [07:32] Heh, well I'm wondering if the hosted files uploaded by individual users would be captured too? Example, custom maps on this: https://www.moddb.com/games/harry-potter-and-the-sorcerers-stone/addons [07:33] Ah okay, well, I did notice that someone did. I checked archivebot and all 3 earlier requests are being worked on [07:33] cool! [07:34] I would do all of this myself, but it seems #archiveteam functions in a by-request nature [07:34] Perhaps if someone else can submit https://www.moddb.com/ for archiving, that would be great [07:35] But no rush, enjoy your night, @astrid :) [07:59] *** Ivy has quit IRC (Quit: Connection closed for inactivity) [08:09] *** qwebirc63 has joined #archiveteam-bs [08:10] *** qwebirc63 has quit IRC (Client Quit) [08:10] *** DFJustin has quit IRC (Ping timeout: 745 seconds) [08:12] *** HP_Archiv has quit IRC (Ping timeout: 260 seconds) [08:13] *** HP_Archiv has joined #archiveteam-bs [08:51] *** BlueMax has quit IRC (Quit: Leaving) [09:04] *** markedL has quit IRC (Read error: Operation timed out) [09:05] *** asdf0101 has quit IRC (Read error: Operation timed out) [09:19] *** asdf0101 has joined #archiveteam-bs [09:19] *** markedL has joined #archiveteam-bs [09:35] *** DFJustin has joined #archiveteam-bs [10:08] *** Smiley has joined #archiveteam-bs [11:05] *** omglolbah has quit IRC (Quit: ZNC - https://znc.in) [11:07] *** omglolbah has joined #archiveteam-bs [11:11] *** qwebirc26 has joined #archiveteam-bs [11:21] *** qwebirc26 has quit IRC (Ping timeout: 264 seconds) [11:54] *** Hani111 has joined #archiveteam-bs [12:02] *** Hani has quit IRC (Ping timeout: 745 seconds) [12:02] *** Hani111 is now known as Hani [12:18] *** X-Scale` has joined #archiveteam-bs [12:19] *** X-Scale has quit IRC (Ping timeout: 252 seconds) [12:19] *** X-Scale` is now known as X-Scale [13:38] *** Rotzer has quit IRC (Quit: KVIrc 5.0.0 Aria http://www.kvirc.net/) [13:52] *** katocala has joined #archiveteam-bs [14:23] *** Ivy has joined #archiveteam-bs [14:23] *** Ivy has quit IRC (Client Quit) [14:30] *** schbirid has joined #archiveteam-bs [14:37] *** scorche` has joined #archiveteam-bs [14:38] *** scorche has quit IRC (Read error: Operation timed out) [14:38] *** scorche` is now known as scorche [15:06] *** killsushi has quit IRC (Quit: Leaving) [15:08] *** Ivy has joined #archiveteam-bs [15:40] *** Dallas has quit IRC (Read error: Connection reset by peer) [15:49] *** Dallas has joined #archiveteam-bs [16:07] *** katocala has quit IRC (Ping timeout: 258 seconds) [16:12] *** katocala has joined #archiveteam-bs [16:30] *** X-Scale` has joined #archiveteam-bs [16:31] *** X-Scale has quit IRC (Ping timeout: 252 seconds) [16:31] *** X-Scale` is now known as X-Scale [16:42] *** akierig has joined #archiveteam-bs [16:48] *** Atom__ has joined #archiveteam-bs [16:55] *** Atom-- has quit IRC (Read error: Operation timed out) [17:29] *** akierig has quit IRC (Quit: later_gator) [17:56] *** Tenebrae has joined #archiveteam-bs [18:50] *** JH8813269 has quit IRC (Quit: Ping timeout (120 seconds)) [18:51] *** apache2 has quit IRC (Ping timeout: 745 seconds) [18:53] *** JH8813269 has joined #archiveteam-bs [18:58] *** manjaro-u has joined #archiveteam-bs [19:03] *** dxrt has quit IRC (Ping timeout: 246 seconds) [19:08] Where does one find, warc-tiny ? [19:09] *** Video has joined #archiveteam-bs [19:09] markedL: https://github.com/JustAnotherArchivist/little-things/blob/master/warc-tiny [19:12] thx, quite the treasure trove / war chest [19:27] *** Video has quit IRC (Quit: Page closed) [19:46] *** IAmbience has joined #archiveteam-bs [19:49] *** akierig has joined #archiveteam-bs [19:49] *** systwi_ has joined #archiveteam-bs [19:50] JAA: Obviously, don't mail Danny again. [19:52] Yup [19:53] Also, did you know we're a company now? [19:54] *** fnax has joined #archiveteam-bs [19:55] Do we go all fuck the police on them? [19:55] *** systwi has quit IRC (Read error: Operation timed out) [19:59] problem is [20:00] their site is *so shit* it falls over when we do that [20:00] that's the whole proble [20:00] m [20:00] Anyway, I'll let you know if he's suing [20:02] But I'm not giving ANY of your names [20:03] since the beginning of archiveteam i can count the number of times that a site's management cooperated ... the number is approximately three [20:04] gitorious is probably the greatest success here [20:08] *** Video has joined #archiveteam-bs [20:14] *** fnax has quit IRC (Quit: Page closed) [20:15] Hmm, I can think of at least four I was involved with, but to be fair, those were rather small sites, and those are more likely to cooperate I guess. [20:22] *** alembic has joined #archiveteam-bs [20:25] *** Video has quit IRC (Ping timeout: 260 seconds) [20:47] *** trc has joined #archiveteam-bs [21:26] *** akierig has quit IRC (Quit: later_gator) [21:28] So.. would something like IRC logs hold up in a court? [21:28] Let's not find out, ok? [21:29] Has anyone looked into the Microsoft documentation deletion thing? https://old.reddit.com/r/sysadmin/comments/dshfbh/psa_microsoft_is_deleting_legacy_ie_documentation/ [21:34] odemgi: ^ ? [21:34] *** Raccoon has joined #archiveteam-bs [21:34] Oh yeah I really don't want to know. But I am definitely curious. [21:35] *** trc has quit IRC (Remote host closed the connection) [21:38] https://imgur.com/a/gbyJX9r Let's not forget about this either. [22:18] *** katocala has quit IRC () [22:27] *** katocala has joined #archiveteam-bs [22:31] *** alembic has quit IRC (Quit: Connection closed for inactivity) [22:40] *** katocala has quit IRC () [22:48] *** katocala has joined #archiveteam-bs [23:04] *** Video has joined #archiveteam-bs [23:04] *** dxrt has joined #archiveteam-bs [23:04] *** Fusl__ sets mode: +o dxrt [23:04] *** Fusl sets mode: +o dxrt [23:04] *** Fusl_ sets mode: +o dxrt [23:05] *** svchfoo1 sets mode: +o dxrt