[00:11] *** ta9le has joined #archiveteam-bs [00:16] *** wacky_ has quit IRC (Ping timeout: 260 seconds) [00:45] *** Stilett0 has quit IRC (Ping timeout: 252 seconds) [00:54] *** Darkstar has quit IRC (Ping timeout: 632 seconds) [01:08] *** Darkstar has joined #archiveteam-bs [01:13] *** C4K3 has quit IRC (Read error: Operation timed out) [01:28] *** m007a83 has quit IRC (Read error: Connection reset by peer) [01:37] *** Dimtree has quit IRC (Read error: Connection reset by peer) [01:41] *** archodg__ has quit IRC (Read error: Operation timed out) [01:44] *** Dimtree has joined #archiveteam-bs [02:22] *** wacky has joined #archiveteam-bs [02:31] *** m007a83 has joined #archiveteam-bs [02:45] *** C4K3 has joined #archiveteam-bs [03:03] *** ta9le has quit IRC (Quit: Connection closed for inactivity) [03:06] *** wp494 has quit IRC (Ping timeout: 492 seconds) [03:06] *** wp494 has joined #archiveteam-bs [03:09] *** odemg has quit IRC (Ping timeout: 260 seconds) [03:22] *** odemg has joined #archiveteam-bs [03:34] *** flashfire has joined #archiveteam-bs [04:08] *** K4k_ has quit IRC (Read error: Connection reset by peer) [04:34] *** flashfire has quit IRC (Quit: http://www.mibbit.com ajax IRC Client) [04:38] *** BlueMax has quit IRC (Leaving) [05:04] *** BlueMax has joined #archiveteam-bs [05:17] *** Zebranky has quit IRC (Read error: Operation timed out) [05:17] *** JAA has quit IRC (Read error: Operation timed out) [05:18] *** tapedrive has quit IRC (Read error: Operation timed out) [05:18] *** Dimtree has quit IRC (Read error: Operation timed out) [05:18] *** squires has quit IRC (Read error: Operation timed out) [05:18] *** beardicus has quit IRC (Read error: Operation timed out) [05:18] *** BlueMaxim has joined #archiveteam-bs [05:26] *** logchfoo3 starts logging #archiveteam-bs at Wed Jun 27 05:26:44 2018 [05:26] *** logchfoo3 has joined #archiveteam-bs [05:27] *** C4K3 has quit IRC (Ping timeout: 601 seconds) [05:27] *** Gfy has joined #archiveteam-bs [05:28] *** REiN^ has quit IRC (Ping timeout: 602 seconds) [05:30] *** Lord_Nigh has quit IRC (Read error: Operation timed out) [05:33] *** Lord_Nigh has joined #archiveteam-bs [05:58] *** Dimtree has joined #archiveteam-bs [06:14] *** beardicus has joined #archiveteam-bs [06:14] *** REiN^ has joined #archiveteam-bs [06:15] *** rbraun has joined #archiveteam-bs [06:17] *** squires has joined #archiveteam-bs [06:19] *** tyzoid has joined #archiveteam-bs [06:32] *** PotcFdk has joined #archiveteam-bs [06:33] *** phillipsj has quit IRC (Ping timeout: 1212 seconds) [06:41] *** ItsYoda has quit IRC (Ping timeout: 260 seconds) [06:45] *** sep332 has joined #archiveteam-bs [06:48] *** C4K3 has joined #archiveteam-bs [06:50] *** ItsYoda has joined #archiveteam-bs [07:23] *** Jusque has quit IRC (Ping timeout: 260 seconds) [07:23] *** Jusque has joined #archiveteam-bs [09:38] *** ta9le has joined #archiveteam-bs [10:08] *** Mateon1 has quit IRC (Read error: Operation timed out) [10:08] *** Mateon1 has joined #archiveteam-bs [10:43] [22:41:14] I'm not sure if 6k0ssd31g4qrmlp05pc2kgdx5 will complete or if we want it to [10:43] [22:41:29] It's got a lot of large files, most of which are TV shows, movies or software [10:43] [22:41:44] If we were to archive them, it'd be better to have them made as individual items [10:54] *** BlueMaxim has quit IRC (Quit: Leaving) [12:16] JAA: If someone were to be successful in getting the Halo data from Bungie, who will handle possible pre-IA storage? [12:36] Any idea how much data we’re talking about, Muad-Dib ? [12:36] (Spread some ops please) [12:47] the statistics themselves are mostly just tables of numbers for each game, but there are about 800M games played for Halo 2 and from what I hear WAY more for Halo 3 and up [12:47] PurpleSym: ^ [12:48] and we do get more multimedia content I'd push up the estimate to tens, if not hundreds of TB's [12:49] I can't remember the total file size and scope of the Halo 3 multimedia grab we did, but that should be a good basis for an estimate [12:51] *** Muad-Dib sets mode: +o arkiver [12:51] *** Muad-Dib sets mode: +oo HCross SketchCow [12:52] *** Muad-Dib sets mode: +o PurpleSym [12:52] JAA: I still haven't been able to find contact information for Wolfson, only an earthlink email address from a very old personal website of his that I'll assume not to be alive anymore [12:53] *** svchfoo1 has joined #archiveteam-bs [12:53] *** svchfoo3 has joined #archiveteam-bs [12:53] *** PurpleSym sets mode: +oo svchfoo1 svchfoo3 [12:59] PurpleSym: In the less ideal case, grabbing individual games' stats pages with Wpull/archivebot is currently giving me 4000 games per gigabyte (there's a "small" job for the last ~138k games running on archivebot right now) [13:23] Maybe we can guess his bungie email address. first.last@bungie.net or something similar? [13:28] I can provide a couple of TB of space to it, But it'd have to be WARCd and then I can upload to IA. But it'd be best as a warrior project [13:28] With numerous sync targets to spread the love [13:51] Igloo: there's probably no time to make it a Warrior project, the detailed stats are going tomorrow 17:00 UTC [13:52] that's we're considering asking Bungie [13:52] that's why we're* [14:04] *** Aoede has quit IRC (Quit: ZNC - https://znc.in) [14:07] *** Aoede has joined #archiveteam-bs [14:23] *** icedice has joined #archiveteam-bs [14:27] *** icedice has quit IRC (Client Quit) [14:27] *** icedice has joined #archiveteam-bs [14:57] *** K4k has joined #archiveteam-bs [15:12] According to our wiki page, we archived 37 TB of data in 2014/15. [15:13] The game pages should be smaller than that, but the problem is that we can't really grab 800 million (or more) pages in 26 hours. [15:15] Muad-Dib: I believe the numbers displayed on the ArchiveBot dashboard are uncompressed, by the way. [15:15] I.e. the amount of (download?) network traffic, not the size of the archives. [15:21] SketchCow: i'm uploading some old tapes rips i have not uploaded yet [15:21] one is NH Unsolved Mysteries from WMUR [15:22] i can't tell if its from 1996 or 1997 so i put 199x in file name [15:33] *** icedice has quit IRC (Ping timeout: 268 seconds) [16:47] uh, trying to "$ gpg --verify archiveteam-warrior-v3-20171013.ova.asc archiveteam-warrior-v3-20171013.ova" i dont have the public key and cannot find where to get it. chfoo [16:57] WHY HELLO [16:58] JAA: if we need a project, let me know [17:07] wasn't twitch like 2014/15? [17:08] oh, context is key, ignore me [17:10] *** schbirid has quit IRC (Quit: Leaving) [17:17] looks like that salem lot incomplete tbs recording is from late October 1993 [17:17] it maybe about october 25 or 26 [17:17] cause there was a tbs news report in the commercial breaks [17:38] *** archodg has joined #archiveteam-bs [17:39] *** schbirid has joined #archiveteam-bs [17:51] schbirid: the public key id is C718CE578A321F2D. it should be published on most public key servers. [17:57] cheers [17:57] gpg had suggested B251CF4887C1510C [17:57] maybe add the id to the readme?: ) [19:04] *** jschwart has joined #archiveteam-bs [19:23] *** icedice has joined #archiveteam-bs [19:38] SketchCow: i noticed your post about BBC Computer Literacy Project [19:39] i couldn't use youtube-dl to download urls directly but i made a script grab the m3u8 using youtube-dl [19:39] Let's... hold off [19:39] Like, maybe not reward the BBC for uploading a series by immediately mirroring it [19:40] its for my personal collection [19:40] its going to be a few months anyways before i upload it to you guys [19:41] i'm most likely going to give a copy to myspleen people first [19:41] Ok, but then you're calling me out [19:41] And I come in to see if I'm needed [19:41] And I'm not [19:41] http://fos.textfiles.com/RECOGNIZER/ [19:41] I'm over in this page and related ones trying to sort collections [19:41] It's hardish make-you-crazy work [19:42] http://fos.textfiles.com/RECOGNIZER/type.html [19:42] i'm just tell you cause it was not a simple youtube-dl $url [19:43] but was a simple youtube-dl --hls-prefer-native --fixup never https://computer-literacy-project.pilots.bbcconnectedstudio.co.uk/asset/video/$(basename $url)/index.m3u8 [21:01] *** wp494 has quit IRC (Ping timeout: 260 seconds) [21:01] *** wp494 has joined #archiveteam-bs [21:08] *** schbirid has quit IRC (Quit: Leaving) [21:23] *** verifiedj has joined #archiveteam-bs [21:34] *** verifiedj has quit IRC (Quit: http://www.mibbit.com ajax IRC Client) [21:51] *** verifiedj has joined #archiveteam-bs [22:04] atluxity: https://verifiedjoseph.com/archiveteam/mocpages.txt [22:05] *** jschwart has quit IRC (Quit: Konversation terminated!) [22:07] moc pages going down? [22:07] *** icedice has quit IRC (Quit: Leaving) [22:08] No, but its just come back online after 8 days of being offline with a database error, it has downtime like this on an all too frequent basis. [22:14] And the guy who runs it (Sean Kenney) never tells the community what's going on, so saving it maybe a good idea. [22:23] arkiver: Well, the deadline's in less than 19 hours. It would be great to do a warrior project, but I doubt we can do it that quickly and we probably won't get all of it anyway. [22:25] It's worth mentioning that it looks like they won't delete all content. If I read the announcement correctly, they'll purge some data about the games (e.g. medals achieved within a particular game) but keep at least the basic stats. [22:26] I've started a grab for some Halo 2 game pages, namely the oldest million (1-1M) and the newest million (802138050 to 803138049). Let's see how this goes. [22:28] we can try to do a project [22:28] I see it's just IDs? [22:28] Yup, e.g. http://halo.bungie.net/Stats/GameStatsHalo2.aspx?gameid=803138049 [22:28] (That's the last existing ID as far as I can see.) [22:29] and that one image of the playing field is specific to the game? [22:29] There are similar pages for the other Halo games, some with significantly more IDs (at least 1.9 billion for Halo 3 for example). [22:29] As in we should grab that page and the image of the playing field [22:29] Well we can do 1.9 billion :P, will just take a little bit of time [22:31] I'm doing 23k requests per minute at the moment. Two months at that speed for 2 billion. [22:31] Did you find any limits? [22:31] I believe we can do a few hundred thousand per minute [22:32] also nvm about that image, didn't look good [22:32] so only the html page, nothing more [22:32] Haven't come across any limits so far. [22:33] 803138049 is latest? [22:33] For Halo 2, yes. [22:33] *** DragonMon has quit IRC (Read error: Connection reset by peer) [22:33] we have to do ~700,000 URLs per minutes for this one [22:34] I'm thinking 3000 URLs/item? [22:34] or 4000 or something [22:34] just enough so we will not be limited by the tracker handling not enough items per minute [22:35] They seem to be using CloudFlare's CDN. So I'm sure they could handle it; the question is if CF lets us. [22:36] awesome [22:36] this will be fun [22:36] let's get people :D [22:36] odemg ^ [22:38] According to http://halo.bungie.net/images/News/Inline12/sunset/halo_mulitplayer_stats_sm.jpg, there were roughly 21 billion games in total across the different Halo games. [22:38] are you going to try and archive every halo game ever played? [22:38] that image doesn't say 800 million halo 2 games [22:38] But that graphic also mentions a number of 5.4 billion games for Halo 2, which doesn't match the maximum ID of 803 million. [22:39] in god's name, why [22:39] while 803138049 is the latest (?) [22:39] yah [22:39] yeah* [22:39] if you do that i'm quitting archiveteam [22:41] i'm not kidding [22:41] this is absurd and it's a sign that you all have lost your way [22:42] total size shouldn't be too big [22:47] kind of an overreaction there [22:47] About 1.1 GiB per 100k games, it seems. [22:49] it's not the space [22:49] it's the mental energy [22:49] i joined archiveteam because geocities was going away [22:49] no fucking way is geocities on the same level as a list of all the halo games [22:53] *** DragonMon has joined #archiveteam-bs [22:53] I don't disagree with you there, and that's why I invested significantly more time and energy into archiving the Halo discussion forums. But I still think it's also worth preserving at least some information about the actual games for such an influential video game. [23:00] * arkiver agrees with JAA [23:08] astrid: ^ ? [23:22] *** verifiedj has quit IRC (http://www.mibbit.com ajax IRC Client) [23:46] *** BlueMax has joined #archiveteam-bs [23:48] For the record, my grab of the Halo forums covered 3036432 threads. According to the homepage, there are 3037088 threads in total. So I should have pretty much everything; not sure where those 656 remaining ones are, or maybe the counters on the homepage are off, but that's only 0.02 %. [23:51] The game grab has slowed down to about 10k per minute, and the WARCs are now about 1.6 GiB per 100k games. I didn't look into it, but I suspect that the old ones were "purged" and thus the pages are much smaller than for newer, not-yet-purged games. The slowdown/size increase occurred around the time it completed the 1 million oldest games and started with the 1 million most recent ones. [23:54] JAA: yeah, that seemed to happen to my small archivebot runs as well