#archiveteam-bs 2018-06-28,Thu

↑back Search

Time Nickname Message
00:00 πŸ”— SketchCow Hi hello
00:02 πŸ”— SketchCow So, Halo games.
00:03 πŸ”— SketchCow How much disk space are we talking
00:05 πŸ”— JAA SketchCow: Between 8.6 and 12.5 TiB for all Halo 2 games, extrapolating from the numbers above and the total count of ~800 million games in the database.
00:05 πŸ”— JAA This is only the HTML page with some statistics and doesn't include the viewer thingy on the website, I think.
00:06 πŸ”— SketchCow Put it somewhere other than archive.org. Put some subset on archive.org, like, 50gb
00:06 πŸ”— JAA There's also the Bungie Pro Video service, which is hosting numerous video renders of game recordings or whatever. Not sure if we can even archive that at all though.
00:07 πŸ”— SketchCow Get some hard drives. Put it on them.
00:07 πŸ”— JAA I've grabbed the oldest million games and about half a million of the most recent ones. That's about 21 GiB.
00:08 πŸ”— JAA I might grab another million of random IDs inbetween or something like that.
00:08 πŸ”— SketchCow Astrid's right that it's not a great use of time, and additionally, anything over a million sample is quite enough unless you're insane.
00:08 πŸ”— SketchCow I realize in the future we'll have petabyte drives and I'll seem like a moron
00:08 πŸ”— SketchCow But really, at some point, let a specialized game archive take that over
00:09 πŸ”— SketchCow You'll be a hero, caloo calay
00:09 πŸ”— SketchCow Also, bear in mind you're talking to someone collaborating with folks to save thousands of WinAmp skins
00:09 πŸ”— SketchCow AND I just wrote a fucking screenshotter for winamp skins baby
00:09 πŸ”— JAA Yeah, and also this "1.6 GiB per 100k games" is a massive waste of storage as well. It could be stored much, much more compact in a database instead of those HTML pages.
00:10 πŸ”— SketchCow Like I said, take a little sample, 50gb, whatever you can shove it. Make it one object, describe heavily
00:10 πŸ”— JAA Yep, will do.
00:10 πŸ”— SketchCow The lizard race can muse about in 2402
00:10 πŸ”— SketchCow https://archive.org/details/winampskins?and%5B%5D=identifier%3Awinampskin_2*&sin=
00:10 πŸ”— SketchCow Behold
00:12 πŸ”— JAA I'm also setting up a grab for the user profile pages of people who posted in the forums. There are also "groups" (~ clans?) on the site, and I might grab those as well if there aren't too many and there's still time.
00:12 πŸ”— JAA It appears that there are slightly over 12 million accounts on the site, and a bit under 300k of them have posted in the forums.
00:23 πŸ”— Muad-Dib SketchCow: "unless you're insane"? Aren't we all at least *slightly* deranged for being here? ;)
00:26 πŸ”— Muad-Dib I'm gonna hit the bed. Keep being awesome, JAA
00:26 πŸ”— Muad-Dib :)
00:26 πŸ”— JAA I'll try. :-)
00:26 πŸ”— JAA Good night.
00:26 πŸ”— Muad-Dib nn
00:28 πŸ”— arkiver project paused again then
00:28 πŸ”— arkiver or we can start it and just get a little bit
00:28 πŸ”— arkiver all items are in, so it will be random packs of 4000 IDs
00:35 πŸ”— Darkstar has quit IRC (Ping timeout: 1212 seconds)
00:40 πŸ”— Darkstar has joined #archiveteam-bs
01:30 πŸ”— Darkstar has quit IRC (Ping timeout: 633 seconds)
01:41 πŸ”— Lord_Nigh has quit IRC (Ping timeout: 268 seconds)
01:41 πŸ”— Darkstar has joined #archiveteam-bs
02:05 πŸ”— ta9le has quit IRC (Quit: Connection closed for inactivity)
02:22 πŸ”— Darkstar has quit IRC (Ping timeout: 480 seconds)
02:39 πŸ”— Darkstar has joined #archiveteam-bs
03:02 πŸ”— K4k has quit IRC (Read error: Connection reset by peer)
03:05 πŸ”— archodg_ has joined #archiveteam-bs
03:08 πŸ”— Darkstar has quit IRC (Ping timeout: 268 seconds)
03:08 πŸ”— archodg has quit IRC (Ping timeout: 252 seconds)
03:08 πŸ”— odemg has quit IRC (Ping timeout: 260 seconds)
03:18 πŸ”— Darkstar has joined #archiveteam-bs
03:21 πŸ”— odemg has joined #archiveteam-bs
03:34 πŸ”— Lord_Nigh has joined #archiveteam-bs
03:53 πŸ”— Lord_Nigh has quit IRC (Ping timeout: 268 seconds)
03:59 πŸ”— Lord_Nigh has joined #archiveteam-bs
03:59 πŸ”— Darkstar has quit IRC (Ping timeout: 260 seconds)
04:09 πŸ”— Darkstar has joined #archiveteam-bs
04:51 πŸ”— Darkstar has quit IRC (Ping timeout: 480 seconds)
04:55 πŸ”— Darkstar has joined #archiveteam-bs
04:55 πŸ”— godane SketchCow: https://archive.org/details/disney-adventures-v7i4
04:55 πŸ”— godane SketchCow: https://archive.org/details/disney-adventures-v9i11
04:56 πŸ”— godane v6i7 issue is also being uploaded now
05:07 πŸ”— fenn has quit IRC (Ping timeout: 260 seconds)
05:15 πŸ”— fenn has joined #archiveteam-bs
05:43 πŸ”— Darkstar has quit IRC (Ping timeout: 480 seconds)
05:43 πŸ”— godane SketchCow: https://archive.org/details/disney-adventures-v6i7
05:45 πŸ”— Darkstar has joined #archiveteam-bs
05:46 πŸ”— swebb has quit IRC (Read error: Operation timed out)
06:28 πŸ”— Darkstar has quit IRC (Ping timeout: 480 seconds)
06:44 πŸ”— Darkstar has joined #archiveteam-bs
07:13 πŸ”— schbirid has joined #archiveteam-bs
07:56 πŸ”— schbirid has quit IRC (Quit: Leaving)
08:04 πŸ”— svchfoo3 has quit IRC (Read error: Operation timed out)
08:04 πŸ”— svchfoo3 has joined #archiveteam-bs
08:05 πŸ”— svchfoo1 sets mode: +o svchfoo3
08:10 πŸ”— Mateon1 has quit IRC (Ping timeout: 255 seconds)
08:10 πŸ”— Mateon1 has joined #archiveteam-bs
08:33 πŸ”— Aoede has quit IRC (Quit: ZNC - https://znc.in)
08:35 πŸ”— Aoede has joined #archiveteam-bs
09:14 πŸ”— Aoede has quit IRC (Ping timeout: 252 seconds)
09:17 πŸ”— Aoede has joined #archiveteam-bs
09:17 πŸ”— JAA My Bungie profile grab finished around 04:30 UTC and discovered some 15k groups. I'll look into those now.
09:21 πŸ”— JAA I think I'll just throw these into ArchiveBot.
09:28 πŸ”— JAA Done, job 2lefnzv589c6pyid0ik8lyd6i
09:34 πŸ”— Muad-Dib JAA: you grabbed the latest 500k, not 1M games?
09:35 πŸ”— JAA Muad-Dib: 1M oldest and 1M newest Halo 2 games
09:35 πŸ”— Muad-Dib ah, ok
09:35 πŸ”— JAA I'll set up another million randomly scattered through the remaining 801 million IDs, I think.
09:36 πŸ”— JAA 27.3 GiB for those 2 million games, by the way.
09:37 πŸ”— Muad-Dib the last 1M contains all games from 2010, which somehow sounds like a nice thing to have
09:37 πŸ”— JAA Indeed :-)
09:39 πŸ”— Muad-Dib It's cool to see the release-hype in the distribution of the amount of games played, less than 1% of the total amount of games in the last 4 months, compared to 5% (~43 million) within the first 2 months
09:40 πŸ”— Muad-Dib everything but surprising, but still cool to see
09:42 πŸ”— Muad-Dib those random clusters were also a good idea, btw
09:42 πŸ”— Muad-Dib s/were/are/
09:42 πŸ”— JAA I have a list of 731499 player names ("XBL gamer tag"), by the way, which could be used to grab the profile pages for the individual games. This is combined from the forum posters and the ones extracted from those 2 million games.
09:43 πŸ”— JAA Muad-Dib: You think I should do clusters rather than just individual random games? If so, what cluster size do you think would be best?
09:45 πŸ”— Muad-Dib Oh, I thought they were clustered for some reason, unclustered would also be just fine, I guess
10:03 πŸ”— Flashfire has joined #archiveteam-bs
10:04 πŸ”— Flashfire https://shop.velocityfrequentflyer.com/ Wanna grab as much as possible before a redesign?
10:06 πŸ”— Flashfire I can host some if we grab more of the Bungie stuff i have an education googledrive
10:06 πŸ”— Flashfire JAA I can help a bit with Bungie
10:07 πŸ”— JAA Storage isn't really the issue I think. Time is. They're taking it offline in about 7 hours.
10:08 πŸ”— Flashfire Then I say we gungho it and grab as much as possible. But then again thats my answer to a lot of things
10:08 πŸ”— JAA Have a look at the logs of this channel. There was a lot of discussion about it yesterday.
10:08 πŸ”— Flashfire Also can someone please queue the velocity frequent flyers store in the archivebot as I dont have voice and dont think im allowed to lol
10:08 πŸ”— Flashfire Only looked over todays will look now
10:13 πŸ”— Flashfire Wow ok
10:13 πŸ”— JAA Started another grab for 2 million random Halo 2 games from the ID range 1000001 to 802138049.
10:13 πŸ”— Muad-Dib JAA: I think we mostly just want to make sure the games are evenly distributed
10:13 πŸ”— Muad-Dib oh, you just started
10:14 πŸ”— Flashfire why not send it to the archive bot and it grabs as much as possible before the deadline?
10:14 πŸ”— Flashfire or is that just stupidity?
10:14 πŸ”— JAA Muad-Dib: It should be pretty evenly distributed. I used Python's random.randint to generate the IDs, which is supposed to use a uniform distribution.
10:15 πŸ”— Muad-Dib Flashfire: I already did a small batch there, but it's slow
10:15 πŸ”— JAA Flashfire: ArchiveBot is slooooooow. I'm doing ~20k requests per minute.
10:15 πŸ”— Muad-Dib JAA's waay faster ;)
10:15 πŸ”— Muad-Dib that
10:15 πŸ”— Flashfire lol
10:15 πŸ”— Flashfire Ok also muad if you could queue that job in the archivebot I think its a legit grab to do
10:15 πŸ”— Muad-Dib archivebot does have the !yahoo command, but I think it should be renamed !yolo
10:16 πŸ”— Flashfire lol
10:16 πŸ”— JAA Yeah, but even that doesn't necessarily help. It's the HTML parsing which slows everything down.
10:16 πŸ”— Flashfire I mean yahoo is owned by i think its verizon now so there is that
10:16 πŸ”— Muad-Dib yeah, parsing's the hurdle
10:16 πŸ”— JAA I'm not doing any parsing, just string manipulation.
10:17 πŸ”— JAA Which is ugly but works fine for well-constrained projects (i.e. known and stable HTML structure).
10:18 πŸ”— JAA Muad-Dib: Do you know how long your ArchiveBot job for the games took? (And how many IDs were that?)
10:18 πŸ”— Flashfire I am sad I dont think anyone will run my aussie job on archivebot
10:19 πŸ”— Muad-Dib JAA: they did more or less the amount of workers per second
10:20 πŸ”— JAA Ah, so something like 100 times slower than my grab.
10:20 πŸ”— ta9le has joined #archiveteam-bs
10:30 πŸ”— Muad-Dib JAA: and they were 6066-10000 and 803000000-803138049
10:35 πŸ”— Flashfire If you have time for any more let me know
10:38 πŸ”— JAA 500k of the 2M random IDs are done. I love how fast this is.
10:38 πŸ”— Flashfire Well if its that fast why not grab more?
10:38 πŸ”— JAA I might. We'll see.
10:38 πŸ”— eientei95 JAA: Is it fast enough for !yahoo?
10:39 πŸ”— JAA eientei95: This isn't ArchiveBot, it's a custom script which is about 100 times faster than ArchiveBot could ever be.
10:39 πŸ”— eientei95 Oo, nice
10:39 πŸ”— Flashfire something that could be incorporated into archivebot?
10:40 πŸ”— JAA Nope, the code is specific to the individual site I'm grabbing.
10:40 πŸ”— Flashfire ah ok
10:40 πŸ”— JAA It may be possible to use it in the warrior though, but it needs a lot more polishing for that.
10:41 πŸ”— Flashfire Am I the only one who finds it oddly theraputic to watch archivebot tick over via the web interface?
10:41 πŸ”— eientei95 What specific things does not do in order to be faster than archivebot?
10:43 πŸ”— JAA eientei95: HTML parsing is the most important one.
10:43 πŸ”— eientei95 Hm
10:44 πŸ”— JAA I'm also using a different network stack than wpull, and the DB is lighter as well (at the cost of some duplication in the archives if I were to grab resources that are shared between different pages).
10:44 πŸ”— Flashfire I GTG
10:44 πŸ”— Flashfire bye
10:44 πŸ”— Flashfire has quit IRC (Quit: Bye)
10:44 πŸ”— JAA And I can run multiple instances of the same thing against one DB, which wpull currently doesn't support.
10:45 πŸ”— JAA So parallelisation across CPU cores is possible.
10:45 πŸ”— eientei95 Nice
10:45 πŸ”— BlueMax has quit IRC (Read error: Connection reset by peer)
11:34 πŸ”— Flashfire has joined #archiveteam-bs
11:34 πŸ”— Flashfire How are we going with bungee? JAA
11:39 πŸ”— Flashfire has quit IRC (Ping timeout: 260 seconds)
12:11 πŸ”— JAA The 2 million random Halo 2 game IDs are almost done.
12:18 πŸ”— Muad-Dib JAA: it seems like the full year 2009 is about 6.4 million, would this be desirable?
12:19 πŸ”— Muad-Dib also: nice about the 2M
12:23 πŸ”— JAA Not sure. Jason suggested ~50 GB total yesterday, which is what I have now with those 4 million.
12:24 πŸ”— JAA Also, this is only Halo 2 so far. We should try to get some coverage of the other games as well.
12:27 πŸ”— JAA Can someone try to figure out what the maximum IDs for the other games are? I haven't tried at all, but here are some high IDs I've seen: Halo 3 http://halo.bungie.net/Stats/GameStatsHalo3.aspx?gameid=1910788443 and ODST http://halo.bungie.net/Stats/ODSTg.aspx?gameid=111197785
12:29 πŸ”— JAA Reach: http://halo.bungie.net/Stats/Reach/GameStats.aspx?gameid=972979787
12:34 πŸ”— Muad-Dib JAA: good point about the other games, I'll try to take a look between study breaks
12:37 πŸ”— JAA I discovered something: http://halo.bungie.net/api/odst/ODSTService.svc and http://halo.bungie.net/api/reach/reachapisoap.svc
12:38 πŸ”— JAA Presumably, there's something for Halo 2 and 3 as well, but I haven't found it yet.
12:40 πŸ”— Muad-Dib Highest I just found for halo 3 is http://halo.bungie.net/Stats/GameStatsHalo3.aspx?gameid=1917736471
12:41 πŸ”— Muad-Dib JAA: oh BOY http://halo.bungie.net/api/odst/ODSTService.svc?singleWsdl
12:41 πŸ”— JAA Yep :-)
12:42 πŸ”— Muad-Dib Too bad I really can't stomach any more C# after last semester :")
12:47 πŸ”— PurpleSym How does one call this API?
12:48 πŸ”— JAA Not sure, I've tried a few things but those didn't work. I've never used WSDL before either.
12:56 πŸ”— Muad-Dib I barely know anything about web-API's :/
12:57 πŸ”— Muad-Dib about working with*
12:57 πŸ”— PurpleSym Urgh, it’s SOAP. JAA: Do you have a valid game ID for that API?
12:58 πŸ”— JAA PurpleSym: 1000000 (one million) should work.
12:59 πŸ”— JAA The ArchiveBot job for the groups finished, by the way.
13:00 πŸ”— PurpleSym Nope, internal server error with soappy.
13:11 πŸ”— JAA Could that be related to the URLs in the WSDL being broken? They point to www.bungie.net instead of halo.bungie.net.
13:17 πŸ”— PurpleSym Yeah, could be. Changing the URLs I get SOAPpy.Types.faultType: <Fault a:ActionNotSupported: The message with Action 'http://halo.bungie.net/api/odst/ODSTService/GetGameDetail' cannot be processed at the receiver, due to a ContractFilter mismatch at the EndpointDispatcher. This may be because of either a contract mismatch (mismatched Actions between sender and receiver) or a binding/security mismatch between
13:17 πŸ”— PurpleSym the sender and the receiver. Check that sender and receiver have the same contract and the same binding (including security requirements, e.g. Message, Transport, None).>
14:20 πŸ”— swebb has joined #archiveteam-bs
14:32 πŸ”— Muad-Dib JAA: you still want max gameid's for reach and odst?
14:33 πŸ”— JAA Yes, please.
14:33 πŸ”— antomati_ has quit IRC (Ping timeout: 268 seconds)
14:33 πŸ”— JAA I just started a grab of the 500k first, 500k random, and the 500k last games for Halo 3. Not enough time to grab more, unfortunately.
14:33 πŸ”— JAA I can probably do the same for Reach and ODST if I start it soon.
14:34 πŸ”— JAA (T minus 2 hours 25 minutes 50 seconds)
14:35 πŸ”— Muad-Dib ODST: ID's 1 to 132830697
14:37 πŸ”— HCross JAA: I'm at a TB and a bit of the forums
14:40 πŸ”— JAA HCross: That's a recursive grab-site run, right? With offsite links?
14:41 πŸ”— HCross No offsite links
14:47 πŸ”— JAA Sounds good. I should have all content, but no images, stylesheets, etc. and also not all possible views (e.g. &viewreplies=1, which doesn't actually seem to do anything).
14:47 πŸ”— JAA So browsability of my archives would be... limited.
14:47 πŸ”— Muad-Dib JAA: For your reach link, http://halo.bungie.net/Stats/Reach/GameStats.aspx?gameid=974412276 seems to be the highest
14:48 πŸ”— antomatic has joined #archiveteam-bs
15:07 πŸ”— JAA Muad-Dib: Thanks. I've started crawls (500k first/random/last each) for ODST and Reach as well.
15:38 πŸ”— jschwart has joined #archiveteam-bs
15:40 πŸ”— antomatic has quit IRC (Ping timeout: 252 seconds)
15:42 πŸ”— antomatic has joined #archiveteam-bs
16:24 πŸ”— Muad-Dib JAA: great!
16:45 πŸ”— HCross JAA: if my maths is right... We have 15 minutes left
16:49 πŸ”— JAA HCross: Yep, that's correct.
16:59 πŸ”— Muad-Dib less than 1 minute
17:03 πŸ”— Darkstar has quit IRC (Ping timeout: 246 seconds)
17:04 πŸ”— JAA The forums are still online at least.
17:15 πŸ”— Stilett0 has joined #archiveteam-bs
17:15 πŸ”— Muad-Dib JAA: Aaaaaaaand... it's gone http://halo.bungie.net/forums/default.aspx
17:23 πŸ”— Darkstar has joined #archiveteam-bs
17:26 πŸ”— m007a83_ has joined #archiveteam-bs
17:26 πŸ”— JAA Games seem to be still there, at least the Halo 3 ones.
17:27 πŸ”— JAA Then again, I don't think they were planning on deleting them entirely anyway.
17:29 πŸ”— m007a83 has quit IRC (Ping timeout: 252 seconds)
17:39 πŸ”— DragonMon has quit IRC (Remote host closed the connection)
18:14 πŸ”— Darkstar has quit IRC (Ping timeout: 506 seconds)
18:22 πŸ”— JAA ODST game pages throw an error currently, but not sure whether that was also the case during my grab (which finished half an hour ago or so).
18:22 πŸ”— schbirid has joined #archiveteam-bs
18:23 πŸ”— JAA 3 and Reach seem to be fine.
18:23 πŸ”— JAA And my grabs of those are still running.
18:25 πŸ”— Darkstar has joined #archiveteam-bs
18:32 πŸ”— K4k has joined #archiveteam-bs
18:58 πŸ”— Darkstar has quit IRC (Ping timeout: 246 seconds)
19:08 πŸ”— Darkstar has joined #archiveteam-bs
19:22 πŸ”— Stilett0 has quit IRC (Read error: Operation timed out)
19:29 πŸ”— verifiedj has joined #archiveteam-bs
19:32 πŸ”— Muad-Dib JAA: no, they were going to delete the detailed information about those games, along with the forums
19:45 πŸ”— JAA Looks like they've purged the ODST games entirely. http://halo.bungie.net/Stats/ODSTg.aspx?gameid=1000000 worked before, for example.
19:45 πŸ”— JAA Muad-Dib: ^
19:46 πŸ”— Muad-Dib ODST seems to be gone from #1 up
19:48 πŸ”— Muad-Dib okay, that's more grave than they announced earlier
19:52 πŸ”— Darkstar has quit IRC (Ping timeout: 246 seconds)
20:01 πŸ”— archodg_ has quit IRC (Remote host closed the connection)
20:06 πŸ”— archodg_ has joined #archiveteam-bs
20:08 πŸ”— Darkstar has joined #archiveteam-bs
20:37 πŸ”— eientei95 Someone look at 4y59ewu7fohzirjmoplp5j0bn please
20:38 πŸ”— HCross JAA: starting my IA upload for the forums now
20:50 πŸ”— verifiedj has quit IRC (http://www.mibbit.com ajax IRC Client)
21:23 πŸ”— Stilett0 has joined #archiveteam-bs
21:26 πŸ”— JAA I'll have to run some deduplication on my archives first probably to avoid the thousands of copies of the pinned topics.
21:30 πŸ”— JAA arkiver: What's the status on PureVolume? We have two days left until the shutdown.
21:41 πŸ”— JAA There will be general elections in Mexico this weekend. Might be a good idea to compile a list of campaign websites etc. and throw them into ArchiveBot.
21:49 πŸ”— Jens has quit IRC (Remote host closed the connection)
21:49 πŸ”— Jens has joined #archiveteam-bs
22:03 πŸ”— jschwart has quit IRC (Quit: Konversation terminated!)
22:20 πŸ”— astrid agree
22:24 πŸ”— Flashfire has joined #archiveteam-bs
22:25 πŸ”— Flashfire JAA great job with what you grabbed from Bungie
22:25 πŸ”— Flashfire Just wanted to hop on and say that before I got to school
22:25 πŸ”— Flashfire has left
22:32 πŸ”— schbirid has quit IRC (Remote host closed the connection)
22:35 πŸ”— Stilett0 has quit IRC (Read error: Operation timed out)
22:40 πŸ”— hook54321 Why isn't archivebot able to grab complete tumblr sites anymore?
22:48 πŸ”— flashfire has joined #archiveteam-bs
22:50 πŸ”— flashfire I can do a list of mexican campaign sites if someone is keeping an eye on whatever urls I dump in the archivebot chat
22:50 πŸ”— flashfire JAA astrid arkiver
22:51 πŸ”— astrid sure or i can give you voice to do it yourself :)
22:51 πŸ”— flashfire Lol I am not sure Purple is happy with me still but sure if you want I will queue a few
22:51 πŸ”— astrid eh go for it
22:52 πŸ”— flashfire So long as I only grab them lol
22:53 πŸ”— astrid if you do "!a http://whatever --no-offsite-links" then that'll keep the size down ... also it'll omit external pages, so you lose context
22:53 πŸ”— flashfire ok
22:53 πŸ”— astrid not sure if good or not :P
22:53 πŸ”— astrid i'd say just !a http://whateevr
22:53 πŸ”— astrid if they go on too long we can cut them off
22:53 πŸ”— astrid might want to add --explain "mexican election"
22:54 πŸ”— flashfire ok
23:01 πŸ”— m007a83_ is now known as m007a83
23:02 πŸ”— m007a83 has quit IRC (Quit: Leaving)
23:02 πŸ”— m007a83 has joined #archiveteam-bs
23:18 πŸ”— flashfire JAA I took a look at PureVolume it seems stuck
23:18 πŸ”— flashfire 97mda7ux34dixidqjmgo0g7d1 isnt budging

irclogger-viewer