[00:00] *** flashfire has joined #archiveteam-bs [00:00] https://globenewswire.com/news-release/2018/06/04/1516160/0/en/Corning-Closes-Acquisition-of-Substantially-All-of-3M-s-Communication-Markets-Division.html [00:02] *** wp494 has quit IRC (Ping timeout: 255 seconds) [00:02] *** wp494 has joined #archiveteam-bs [00:13] SketchCow: how many boxes are you senting? [00:13] *sending [00:23] *** dashcloud has joined #archiveteam-bs [00:23] *** tstarling has joined #archiveteam-bs [00:23] So yeah my archiving tends to be hit and miss. Sometimes I am at school and have to load it into Archive.is but this work around I have come up with allows me to use !ao [00:24] I also forget how to use the Ignore functions and sometimes forget i set jobs [00:25] Plus I tend to archive stuff nobody cares about sometimes simply because I liked the look of the site and didnt see a copy in the wayback machine [00:25] the forgetting ignore patterns was what astrid may have sighed about or I may have interpreted it as that [00:26] Also I am using mibbit with irc.underworld.no as a work around because my school is using smoothwall which blocks way to much to be usable [00:26] The archive itself is blocked at my school [00:26] my school blocked the archive too :( it's bogus [00:27] but strangely enough they didnt block archive.is [00:28] that is a little bit strange, yes [00:28] also when i access http://dashboard.at.ninjawedding.org/3?showNicks=1 I get a frozen snapshot of what it was up to when the site initially loaded. realtime updates are blocked but not the site itself [00:28] especially because archive.org is a state of california licensed library :) [00:28] exactly [00:28] anyway [00:28] I mean I am in australia but its still an internationally recognised library [00:29] https://finance.yahoo.com/news/match-group-can-get-away-acquiring-25-dating-sites-counting-151306438.html this seems suspicious [00:31] 5 minutes before I have to go any passing comments? [00:31] * astrid shrugs [00:32] Apart from WATCH THE IGNORE SETS FLASH DAMN IT ITS NOT THAT HARD [00:33] nah, i get it, it can be a pain to babysit these things [00:34] Yeah I find it oddly therapeutic to watch them tick over sometimes [00:34] it's more a "it'd be good if you knew when a job was going to be big and came by every day or so to check that it's not got stuck somewhere irrelevant" [00:35] I usually only grab small sites. Its when I put an explain next to it that I had a reason more than oooh that site has pretty colours [00:35] aye [00:35] or if you google the URL you will find its being shut down. Or its a thrown together spam site designed to collect ad revenue [00:36] bye [00:36] *** flashfire has quit IRC (Quit: http://www.mibbit.com ajax IRC Client) [00:36] re archiving POST responses: there's nothing in the WARC spec that helps us out, as far as I can see [00:36] the request headers will go into the arachive just like any other request header/body content [00:36] but, as i saaid, the waybaack machine has no way to match on that and ensure thata you get the right response [00:37] i'm not sure how much more clear i can be about this [00:37] yeah ok [00:37] "For a target-URI of the ‘http’ or ‘https’ schemes, a ‘request’ record block should contain the full HTTP request sent over the network, including headers." [00:38] so will archivebot actually archive POST requests right now? [00:39] no [00:39] you will have to make your own crawler [00:40] ok... [00:42] and when you said "create an index into it", is an index a concept that exists already? [00:42] i'm thinking more of a html file with links or something [00:42] worry about that once you have the content [00:45] so if I generated this WARC file, would I be able to upload it somewhere for safe keeping? [00:45] yep! [00:45] https://archive.org/upload [00:46] ok, and then IA would not be able to incorporate it into the wayback machine for now, but maybe in the future they could do that [00:46] yep! [00:47] *** ta9le has quit IRC (Quit: Connection closed for inactivity) [00:47] sounds like a plan [00:50] there's a revision history, which you're not able to view when logged out [00:51] but it would be cool to have it [00:52] account creation is not restricted, you just need any user account, but I guess the usual convention is not to archive content that's protected in that way? [00:53] eh, if you can freely register an account then you should do that [00:53] it starts getting less clear if you have to get manually approved [00:54] you can freely register [00:54] go for it [03:07] *** archodg__ has joined #archiveteam-bs [03:09] *** odemg has quit IRC (Read error: Operation timed out) [03:12] *** archodg_ has quit IRC (Read error: Operation timed out) [03:22] *** odemg has joined #archiveteam-bs [03:37] *** flashfire has joined #archiveteam-bs [03:59] *** Dimtree has quit IRC (Read error: Operation timed out) [04:01] astrid why are you still online werent you going to have a nap? [04:08] *** Meroje has quit IRC (Ping timeout: 260 seconds) [04:23] *** Dimtree has joined #archiveteam-bs [04:25] *** Meroje has joined #archiveteam-bs [04:56] *** flashfire has quit IRC (Quit: http://www.mibbit.com ajax IRC Client) [05:21] *** wp494 has quit IRC (Read error: Operation timed out) [05:25] *** wp494 has joined #archiveteam-bs [06:24] *** Sk2d has joined #archiveteam-bs [06:26] *** Sk1d has quit IRC (Read error: Operation timed out) [06:26] *** Sk2d is now known as Sk1d [06:39] *** schbirid has joined #archiveteam-bs [07:11] *** godane has quit IRC (Read error: Operation timed out) [07:21] The Bungie Halo forums grab finished around midnight UTC. 271 GiB in total now. It would be around 100 GiB less if I hadn't grabbed those threads thousands of times. Oh well... [07:21] I'll look into grabbing the user profile pages later. [07:24] JAA: chromebot dedups WARCs before uploading them. You might want to look into that. https://github.com/PromyLOPh/crocoite/blob/master/crocoite/tools.py#L29 [07:25] PurpleSym: Yeah, I'll look into that as well. Thanks. [07:46] *** schbirid has quit IRC (Quit: Leaving) [07:59] *** ta9le has joined #archiveteam-bs [08:35] *** BlueMax has quit IRC (Leaving) [08:43] *** junknickf has joined #archiveteam-bs [08:48] *** junknickf has quit IRC (Quit: Page closed) [08:55] *** horkermon has joined #archiveteam-bs [09:06] *** m007a83_ has joined #archiveteam-bs [09:08] *** m007a83__ has joined #archiveteam-bs [09:09] *** m007a83 has quit IRC (Read error: Operation timed out) [09:13] *** m007a83_ has quit IRC (Read error: Operation timed out) [10:28] *** godane has joined #archiveteam-bs [10:52] so i found something interesting: http://www2.boxoffice.com/the_vault/page_thumbnails?issue_id=2000-11-1 [10:53] bad news is there scans look like shit [10:54] the text for the most part is readable but images just ugly: http://www2.boxoffice.com/the_vault/issue_page?issue_id=2000-11-1&page_no=5#page_start [11:05] Looks like DJVU compression [11:36] *** Valentine has quit IRC (Quit: Addio, adieu, adios, aloha, arrivederci, auf Wiedersehen, au revoir, bye, bye-bye, cheerio, cheers, farewell, good) [11:43] how do you fix DJVU compression in those images? [11:45] You can't, it's lossy [11:59] so i found new website looking thur boxoffice magazine [11:59] called yumpu.com [11:59] i think its based in Germany but not 100% sure [12:00] anyways copys of boxoffice magazines was there too but still with bad compressing [12:11] *** Mateon1 has quit IRC (Read error: Operation timed out) [12:16] *** Valentine has joined #archiveteam-bs [12:48] *** ta9le has quit IRC (Quit: Connection closed for inactivity) [12:52] *** m007a83_ has joined #archiveteam-bs [12:56] *** m007a83__ has quit IRC (Ping timeout: 252 seconds) [13:20] *** m007a83_ is now known as m007a83 [13:21] *** m007a83 has quit IRC (Quit: Leaving) [13:21] *** m007a83 has joined #archiveteam-bs [13:27] evening 2005 pdfs looks like shit: http://www2.boxoffice.com/the_vault/issue_pages?issue_id=2005-1-1 [13:27] *even [13:27] i'm off the bed [15:16] *** SilSte has quit IRC (Read error: Operation timed out) [15:17] *** Sk2d has joined #archiveteam-bs [15:17] *** Sk1d has quit IRC (Read error: Operation timed out) [15:17] *** Sk2d is now known as Sk1d [15:23] *** SilSte has joined #archiveteam-bs [15:38] *** ta9le has joined #archiveteam-bs [16:11] *** schbirid has joined #archiveteam-bs [16:12] *** rbraun has quit IRC (Read error: Operation timed out) [16:14] *** rbraun has joined #archiveteam-bs [16:23] Muad-Dib: You were referring to Halo 2 stats yesterday, which were supposedly not grabbed as part of the project a few years ago. Where can I find those stats at all? The current halo.bungie.net website doesn't seem to have a section for them, and I haven't found any links on user profile pages either. [16:25] By the way, I found 296k members during the thread retrieval. I'm setting up a grab for the profile pages and stats plus an extraction of groups currently, to be started later today. [16:25] JAA: thatś correct, the buttons are gone but theyre still there, hold on [16:27] You have to get to them through the search function [16:27] Oh yeah, when you access a user's Halo 3 stats page, you get a link to the Halo 2 stats. [16:27] you look for a gamertag that can't be found for halo 3 or reach, then select Halo 2 through a dropdown menu [16:28] JAA: that too [16:28] I put some of that information on the wiki back in 14 [16:29] Yeah, I should read the wiki more often. [16:30] the game id's seem to be sequential, following the halo.bungie.net/Stats/GameStatsHalo2.aspx?gameid= pattern [16:30] *** Darkstar has quit IRC (Ping timeout: 1212 seconds) [16:30] Muad-Dib: Hmm, which page would that be? https://archiveteam.org/index.php?title=Halo has very little details. [16:31] problem: the lowest number I've found was around 6060, the highest somewhere in the 803 million ... [16:31] JAA: that's all, it wasn't much [16:32] Ah ok. Yeah, 800 million requests aren't going to happen. [16:33] yup, that's the problem [16:34] maybe grab the first couple thousand or something [16:34] they start 2 days before the release date http://halo.bungie.net/Stats/GameStatsHalo2.aspx?gameid=6066 [16:34] first and last couple thousand [16:36] because both those groups are interesting in their own way, birth/death of halo 2 multiplayer etc. [16:37] the 800 million figure does seem plausible, considering bungie mentioned 500 million a few years before shutdown [16:39] Damn, the numbers for Halo 3 are much bigger. I've seen over 1.9 billion already. [16:41] :/ [16:44] sparkle of hope: it seems all information contained in the tables of a game details screen is present without any async js bullshit needing to happen to retrive it, the only thing the js on the page seems to mostly do is toggle visibility attribute of the tables [16:44] toggle the* [16:45] so at least stuff can be recovered from non-js-enabled grabs [16:45] maybe I'll throw the first and last few thousand games into archivebot [16:50] except for the "rich" game statistics/game viewer, that would require figuring out the game viewer http://halo.bungie.net/Stats/Halo2WebMaps/richgame.aspx?g=800000000 -- http://halo.bungie.net/Stats/Halo2WebMaps/halo2webmap.ashx?g=800000000&mn=0&mx=571&v=85&zs=1 [16:51] *** Meroje has quit IRC (Quit: bye!) [16:52] *** Meroje has joined #archiveteam-bs [16:54] *** Darkstar has joined #archiveteam-bs [16:58] jesus christ, so I thought it would be manageable to grab the games from release to the end of 2004, since it was released in november... the last gameid from 2004 is 43103253 [16:58] over 43 million games within the first two months [16:58] from 6066 to 43103253 [16:59] When are they shutting down exactly again? [16:59] 28th [16:59] looking up the PDT time now [17:00] can't we get an insider to help us again? [17:02] That would be nice, yeah. [17:02] JAA: "These changes go into effect on June 28 at 10 a.m. PDT. If you need to save anything, get it done before then." https://www.bungie.net/en/Explore/Detail/News/46965 [17:03] they name someone in the news posting [17:03] So that's 2018-06-28 17:00 UTC. Thanks. [17:03] np [17:03] "Resident Archivist Roger Wolfson will explain what's changing and how it may affect you. " [17:04] It looks like our 2014/15 project was only about the Halo 3 files, i.e. the stuff listed on http://halo.bungie.net/online/default.aspx. Is that correct? [17:05] (That's based on looking at the halo-grab and halo-items repositories on GitHub.) [17:05] arkiver did the code, I believe [17:06] Yes, looks like it. [17:07] I pitched the idea about the project becaused I cared about the material there and also knew Jason enjoyed playing it ;) [17:07] https://web.archive.org/web/20120606154827/https://www.bungie.net/News/content.aspx?cid=18094 [17:09] that Wolfson guy got hired as "Server Software Development Lead", I'll just assume he also has weight on the administration side of things nowadays [17:10] If we want an insider, he'd probably be able to help us out [17:11] question is, would he? [17:11] (looking people up like this makes me feel like such a creep) [17:12] anyway, dinner before monologues, brb [17:12] Might be worth contacting him. I doubt it'd lead to an accelerated shutdown, so it couldn't hurt, right? [17:18] What about Bungie Pro Video http://halo.bungie.net/Projects/BungiePro/default.aspx ? They'll also purge those videos hosted there. Are they publicly available, and do we know the URL format? [17:27] *** Darkstar has quit IRC (Ping timeout: 633 seconds) [17:39] *** Darkstar has joined #archiveteam-bs [17:43] *** jschwart has joined #archiveteam-bs [17:47] *** SilSte has quit IRC (Read error: Connection reset by peer) [17:47] *** SilSte has joined #archiveteam-bs [17:50] http://halo.bungie.net/News/content.aspx?type=topnews&cid=32028 "Bungie will preserve all existing historical Halo data on Bungie.net for as long as the Internet and Bungie's data storage systems remain functional. [17:50] " [17:52] https://www.youtube.com/watch?v=Xr9Oubxw1gA [17:57] That didn't age well. [18:06] we could mention it to them ;) [18:10] *** Darkstar has quit IRC (Ping timeout: 246 seconds) [18:22] *** Darkstar has joined #archiveteam-bs [18:44] *** Gfy_ is now known as Gfy [18:51] I probably shouldn't write that e-mail though, you might've noticed it would become a rambling shitfest [18:54] *** ta9le has quit IRC (Quit: Connection closed for inactivity) [19:09] *** schbirid has quit IRC (Quit: Leaving) [19:26] *** antomati_ has joined #archiveteam-bs [19:28] *** antomatic has quit IRC (Read error: Operation timed out) [19:29] *** schbirid has joined #archiveteam-bs [19:37] *** Mateon1 has joined #archiveteam-bs [21:00] *** Jens has quit IRC (Remote host closed the connection) [21:01] *** Jens has joined #archiveteam-bs [21:21] JAA: do you know someone who's tactful at writing those mails? [21:23] I'm going to bed and have a pretty busy schedule the coming days, so I won't be able to help out as much as I want to [21:24] I've got 600GB or so of stuff from the forums so fasr [21:32] *** jschwart has quit IRC (Quit: Konversation terminated!) [21:39] *** Darkstar has quit IRC (Remote host closed the connection) [21:44] *** Darkstar has joined #archiveteam-bs [21:48] *** horker has joined #archiveteam-bs [21:51] *** horkermon has quit IRC (Read error: Operation timed out) [23:08] *** BlueMax has joined #archiveteam-bs [23:41] *** flashfire has joined #archiveteam-bs [23:42] If you need storage for the next few months I have a google education account at the moment [23:43] *** horker has quit IRC (Quit: Leaving) [23:45] *** flashfire has quit IRC (Client Quit) [23:53] *** Lord_Nigh has quit IRC (Read error: Operation timed out) [23:59] *** Lord_Nigh has joined #archiveteam-bs