Time |
Nickname |
Message |
00:11
π
|
|
ta9le has joined #archiveteam-bs |
00:16
π
|
|
wacky_ has quit IRC (Ping timeout: 260 seconds) |
00:45
π
|
|
Stilett0 has quit IRC (Ping timeout: 252 seconds) |
00:54
π
|
|
Darkstar has quit IRC (Ping timeout: 632 seconds) |
01:08
π
|
|
Darkstar has joined #archiveteam-bs |
01:13
π
|
|
C4K3 has quit IRC (Read error: Operation timed out) |
01:28
π
|
|
m007a83 has quit IRC (Read error: Connection reset by peer) |
01:37
π
|
|
Dimtree has quit IRC (Read error: Connection reset by peer) |
01:41
π
|
|
archodg__ has quit IRC (Read error: Operation timed out) |
01:44
π
|
|
Dimtree has joined #archiveteam-bs |
02:22
π
|
|
wacky has joined #archiveteam-bs |
02:31
π
|
|
m007a83 has joined #archiveteam-bs |
02:45
π
|
|
C4K3 has joined #archiveteam-bs |
03:03
π
|
|
ta9le has quit IRC (Quit: Connection closed for inactivity) |
03:06
π
|
|
wp494 has quit IRC (Ping timeout: 492 seconds) |
03:06
π
|
|
wp494 has joined #archiveteam-bs |
03:09
π
|
|
odemg has quit IRC (Ping timeout: 260 seconds) |
03:22
π
|
|
odemg has joined #archiveteam-bs |
03:34
π
|
|
flashfire has joined #archiveteam-bs |
04:08
π
|
|
K4k_ has quit IRC (Read error: Connection reset by peer) |
04:34
π
|
|
flashfire has quit IRC (Quit: http://www.mibbit.com ajax IRC Client) |
04:38
π
|
|
BlueMax has quit IRC (Leaving) |
05:04
π
|
|
BlueMax has joined #archiveteam-bs |
05:17
π
|
|
Zebranky has quit IRC (Read error: Operation timed out) |
05:17
π
|
|
JAA has quit IRC (Read error: Operation timed out) |
05:18
π
|
|
tapedrive has quit IRC (Read error: Operation timed out) |
05:18
π
|
|
Dimtree has quit IRC (Read error: Operation timed out) |
05:18
π
|
|
squires has quit IRC (Read error: Operation timed out) |
05:18
π
|
|
beardicus has quit IRC (Read error: Operation timed out) |
05:18
π
|
|
BlueMaxim has joined #archiveteam-bs |
05:26
π
|
|
logchfoo3 starts logging #archiveteam-bs at Wed Jun 27 05:26:44 2018 |
05:26
π
|
|
logchfoo3 has joined #archiveteam-bs |
05:27
π
|
|
C4K3 has quit IRC (Ping timeout: 601 seconds) |
05:27
π
|
|
Gfy has joined #archiveteam-bs |
05:28
π
|
|
REiN^ has quit IRC (Ping timeout: 602 seconds) |
05:30
π
|
|
Lord_Nigh has quit IRC (Read error: Operation timed out) |
05:33
π
|
|
Lord_Nigh has joined #archiveteam-bs |
05:58
π
|
|
Dimtree has joined #archiveteam-bs |
06:14
π
|
|
beardicus has joined #archiveteam-bs |
06:14
π
|
|
REiN^ has joined #archiveteam-bs |
06:15
π
|
|
rbraun has joined #archiveteam-bs |
06:17
π
|
|
squires has joined #archiveteam-bs |
06:19
π
|
|
tyzoid has joined #archiveteam-bs |
06:32
π
|
|
PotcFdk has joined #archiveteam-bs |
06:33
π
|
|
phillipsj has quit IRC (Ping timeout: 1212 seconds) |
06:41
π
|
|
ItsYoda has quit IRC (Ping timeout: 260 seconds) |
06:45
π
|
|
sep332 has joined #archiveteam-bs |
06:48
π
|
|
C4K3 has joined #archiveteam-bs |
06:50
π
|
|
ItsYoda has joined #archiveteam-bs |
07:23
π
|
|
Jusque has quit IRC (Ping timeout: 260 seconds) |
07:23
π
|
|
Jusque has joined #archiveteam-bs |
09:38
π
|
|
ta9le has joined #archiveteam-bs |
10:08
π
|
|
Mateon1 has quit IRC (Read error: Operation timed out) |
10:08
π
|
|
Mateon1 has joined #archiveteam-bs |
10:43
π
|
eientei95 |
[22:41:14] <eientei95> I'm not sure if 6k0ssd31g4qrmlp05pc2kgdx5 will complete or if we want it to |
10:43
π
|
eientei95 |
[22:41:29] <eientei95> It's got a lot of large files, most of which are TV shows, movies or software |
10:43
π
|
eientei95 |
[22:41:44] <eientei95> If we were to archive them, it'd be better to have them made as individual items |
10:54
π
|
|
BlueMaxim has quit IRC (Quit: Leaving) |
12:16
π
|
Muad-Dib |
JAA: If someone were to be successful in getting the Halo data from Bungie, who will handle possible pre-IA storage? |
12:36
π
|
PurpleSym |
Any idea how much data weβre talking about, Muad-Dib ? |
12:36
π
|
PurpleSym |
(Spread some ops please) |
12:47
π
|
Muad-Dib |
the statistics themselves are mostly just tables of numbers for each game, but there are about 800M games played for Halo 2 and from what I hear WAY more for Halo 3 and up |
12:47
π
|
Muad-Dib |
PurpleSym: ^ |
12:48
π
|
Muad-Dib |
and we do get more multimedia content I'd push up the estimate to tens, if not hundreds of TB's |
12:49
π
|
Muad-Dib |
I can't remember the total file size and scope of the Halo 3 multimedia grab we did, but that should be a good basis for an estimate |
12:51
π
|
|
Muad-Dib sets mode: +o arkiver |
12:51
π
|
|
Muad-Dib sets mode: +oo HCross SketchCow |
12:52
π
|
|
Muad-Dib sets mode: +o PurpleSym |
12:52
π
|
Muad-Dib |
JAA: I still haven't been able to find contact information for Wolfson, only an earthlink email address from a very old personal website of his that I'll assume not to be alive anymore |
12:53
π
|
|
svchfoo1 has joined #archiveteam-bs |
12:53
π
|
|
svchfoo3 has joined #archiveteam-bs |
12:53
π
|
|
PurpleSym sets mode: +oo svchfoo1 svchfoo3 |
12:59
π
|
Muad-Dib |
PurpleSym: In the less ideal case, grabbing individual games' stats pages with Wpull/archivebot is currently giving me 4000 games per gigabyte (there's a "small" job for the last ~138k games running on archivebot right now) |
13:23
π
|
PurpleSym |
Maybe we can guess his bungie email address. first.last@bungie.net or something similar? |
13:28
π
|
Igloo |
I can provide a couple of TB of space to it, But it'd have to be WARCd and then I can upload to IA. But it'd be best as a warrior project |
13:28
π
|
Igloo |
With numerous sync targets to spread the love |
13:51
π
|
Muad-Dib |
Igloo: there's probably no time to make it a Warrior project, the detailed stats are going tomorrow 17:00 UTC |
13:52
π
|
Muad-Dib |
that's we're considering asking Bungie |
13:52
π
|
Muad-Dib |
that's why we're* |
14:04
π
|
|
Aoede has quit IRC (Quit: ZNC - https://znc.in) |
14:07
π
|
|
Aoede has joined #archiveteam-bs |
14:23
π
|
|
icedice has joined #archiveteam-bs |
14:27
π
|
|
icedice has quit IRC (Client Quit) |
14:27
π
|
|
icedice has joined #archiveteam-bs |
14:57
π
|
|
K4k has joined #archiveteam-bs |
15:12
π
|
JAA |
According to our wiki page, we archived 37 TB of data in 2014/15. |
15:13
π
|
JAA |
The game pages should be smaller than that, but the problem is that we can't really grab 800 million (or more) pages in 26 hours. |
15:15
π
|
JAA |
Muad-Dib: I believe the numbers displayed on the ArchiveBot dashboard are uncompressed, by the way. |
15:15
π
|
JAA |
I.e. the amount of (download?) network traffic, not the size of the archives. |
15:21
π
|
godane |
SketchCow: i'm uploading some old tapes rips i have not uploaded yet |
15:21
π
|
godane |
one is NH Unsolved Mysteries from WMUR |
15:22
π
|
godane |
i can't tell if its from 1996 or 1997 so i put 199x in file name |
15:33
π
|
|
icedice has quit IRC (Ping timeout: 268 seconds) |
16:47
π
|
schbirid |
uh, trying to "$ gpg --verify archiveteam-warrior-v3-20171013.ova.asc archiveteam-warrior-v3-20171013.ova" i dont have the public key and cannot find where to get it. chfoo |
16:57
π
|
SketchCow |
WHY HELLO |
16:58
π
|
arkiver |
JAA: if we need a project, let me know |
17:07
π
|
Kaz |
wasn't twitch like 2014/15? |
17:08
π
|
Kaz |
oh, context is key, ignore me |
17:10
π
|
|
schbirid has quit IRC (Quit: Leaving) |
17:17
π
|
godane |
looks like that salem lot incomplete tbs recording is from late October 1993 |
17:17
π
|
godane |
it maybe about october 25 or 26 |
17:17
π
|
godane |
cause there was a tbs news report in the commercial breaks |
17:38
π
|
|
archodg has joined #archiveteam-bs |
17:39
π
|
|
schbirid has joined #archiveteam-bs |
17:51
π
|
chfoo |
schbirid: the public key id is C718CE578A321F2D. it should be published on most public key servers. |
17:57
π
|
schbirid |
cheers |
17:57
π
|
schbirid |
gpg had suggested B251CF4887C1510C |
17:57
π
|
schbirid |
maybe add the id to the readme?: ) |
19:04
π
|
|
jschwart has joined #archiveteam-bs |
19:23
π
|
|
icedice has joined #archiveteam-bs |
19:38
π
|
godane |
SketchCow: i noticed your post about BBC Computer Literacy Project |
19:39
π
|
godane |
i couldn't use youtube-dl to download urls directly but i made a script grab the m3u8 using youtube-dl |
19:39
π
|
SketchCow |
Let's... hold off |
19:39
π
|
SketchCow |
Like, maybe not reward the BBC for uploading a series by immediately mirroring it |
19:40
π
|
godane |
its for my personal collection |
19:40
π
|
godane |
its going to be a few months anyways before i upload it to you guys |
19:41
π
|
godane |
i'm most likely going to give a copy to myspleen people first |
19:41
π
|
SketchCow |
Ok, but then you're calling me out |
19:41
π
|
SketchCow |
And I come in to see if I'm needed |
19:41
π
|
SketchCow |
And I'm not |
19:41
π
|
SketchCow |
http://fos.textfiles.com/RECOGNIZER/ |
19:41
π
|
SketchCow |
I'm over in this page and related ones trying to sort collections |
19:41
π
|
SketchCow |
It's hardish make-you-crazy work |
19:42
π
|
SketchCow |
http://fos.textfiles.com/RECOGNIZER/type.html |
19:42
π
|
godane |
i'm just tell you cause it was not a simple youtube-dl $url |
19:43
π
|
godane |
but was a simple youtube-dl --hls-prefer-native --fixup never https://computer-literacy-project.pilots.bbcconnectedstudio.co.uk/asset/video/$(basename $url)/index.m3u8 |
21:01
π
|
|
wp494 has quit IRC (Ping timeout: 260 seconds) |
21:01
π
|
|
wp494 has joined #archiveteam-bs |
21:08
π
|
|
schbirid has quit IRC (Quit: Leaving) |
21:23
π
|
|
verifiedj has joined #archiveteam-bs |
21:34
π
|
|
verifiedj has quit IRC (Quit: http://www.mibbit.com ajax IRC Client) |
21:51
π
|
|
verifiedj has joined #archiveteam-bs |
22:04
π
|
verifiedj |
atluxity: https://verifiedjoseph.com/archiveteam/mocpages.txt |
22:05
π
|
|
jschwart has quit IRC (Quit: Konversation terminated!) |
22:07
π
|
arkiver |
moc pages going down? |
22:07
π
|
|
icedice has quit IRC (Quit: Leaving) |
22:08
π
|
verifiedj |
No, but its just come back online after 8 days of being offline with a database error, it has downtime like this on an all too frequent basis. |
22:14
π
|
verifiedj |
And the guy who runs it (Sean Kenney) never tells the community what's going on, so saving it maybe a good idea. |
22:23
π
|
JAA |
arkiver: Well, the deadline's in less than 19 hours. It would be great to do a warrior project, but I doubt we can do it that quickly and we probably won't get all of it anyway. |
22:25
π
|
JAA |
It's worth mentioning that it looks like they won't delete all content. If I read the announcement correctly, they'll purge some data about the games (e.g. medals achieved within a particular game) but keep at least the basic stats. |
22:26
π
|
JAA |
I've started a grab for some Halo 2 game pages, namely the oldest million (1-1M) and the newest million (802138050 to 803138049). Let's see how this goes. |
22:28
π
|
arkiver |
we can try to do a project |
22:28
π
|
arkiver |
I see it's just IDs? |
22:28
π
|
JAA |
Yup, e.g. http://halo.bungie.net/Stats/GameStatsHalo2.aspx?gameid=803138049 |
22:28
π
|
JAA |
(That's the last existing ID as far as I can see.) |
22:29
π
|
arkiver |
and that one image of the playing field is specific to the game? |
22:29
π
|
JAA |
There are similar pages for the other Halo games, some with significantly more IDs (at least 1.9 billion for Halo 3 for example). |
22:29
π
|
arkiver |
As in we should grab that page and the image of the playing field |
22:29
π
|
arkiver |
Well we can do 1.9 billion :P, will just take a little bit of time |
22:31
π
|
JAA |
I'm doing 23k requests per minute at the moment. Two months at that speed for 2 billion. |
22:31
π
|
arkiver |
Did you find any limits? |
22:31
π
|
arkiver |
I believe we can do a few hundred thousand per minute |
22:32
π
|
arkiver |
also nvm about that image, didn't look good |
22:32
π
|
arkiver |
so only the html page, nothing more |
22:32
π
|
JAA |
Haven't come across any limits so far. |
22:33
π
|
arkiver |
803138049 is latest? |
22:33
π
|
JAA |
For Halo 2, yes. |
22:33
π
|
|
DragonMon has quit IRC (Read error: Connection reset by peer) |
22:33
π
|
arkiver |
we have to do ~700,000 URLs per minutes for this one |
22:34
π
|
arkiver |
I'm thinking 3000 URLs/item? |
22:34
π
|
arkiver |
or 4000 or something |
22:34
π
|
arkiver |
just enough so we will not be limited by the tracker handling not enough items per minute |
22:35
π
|
JAA |
They seem to be using CloudFlare's CDN. So I'm sure they could handle it; the question is if CF lets us. |
22:36
π
|
arkiver |
awesome |
22:36
π
|
arkiver |
this will be fun |
22:36
π
|
arkiver |
let's get people :D |
22:36
π
|
arkiver |
odemg ^ |
22:38
π
|
JAA |
According to http://halo.bungie.net/images/News/Inline12/sunset/halo_mulitplayer_stats_sm.jpg, there were roughly 21 billion games in total across the different Halo games. |
22:38
π
|
astrid |
are you going to try and archive every halo game ever played? |
22:38
π
|
arkiver |
that image doesn't say 800 million halo 2 games |
22:38
π
|
JAA |
But that graphic also mentions a number of 5.4 billion games for Halo 2, which doesn't match the maximum ID of 803 million. |
22:39
π
|
astrid |
in god's name, why |
22:39
π
|
arkiver |
while 803138049 is the latest (?) |
22:39
π
|
arkiver |
yah |
22:39
π
|
arkiver |
yeah* |
22:39
π
|
astrid |
if you do that i'm quitting archiveteam |
22:41
π
|
astrid |
i'm not kidding |
22:41
π
|
astrid |
this is absurd and it's a sign that you all have lost your way |
22:42
π
|
arkiver |
total size shouldn't be too big |
22:47
π
|
arkiver |
kind of an overreaction there |
22:47
π
|
JAA |
About 1.1 GiB per 100k games, it seems. |
22:49
π
|
astrid |
it's not the space |
22:49
π
|
astrid |
it's the mental energy |
22:49
π
|
astrid |
i joined archiveteam because geocities was going away |
22:49
π
|
astrid |
no fucking way is geocities on the same level as a list of all the halo games |
22:53
π
|
|
DragonMon has joined #archiveteam-bs |
22:53
π
|
JAA |
I don't disagree with you there, and that's why I invested significantly more time and energy into archiving the Halo discussion forums. But I still think it's also worth preserving at least some information about the actual games for such an influential video game. |
23:00
π
|
* |
arkiver agrees with JAA |
23:08
π
|
arkiver |
astrid: ^ ? |
23:22
π
|
|
verifiedj has quit IRC (http://www.mibbit.com ajax IRC Client) |
23:46
π
|
|
BlueMax has joined #archiveteam-bs |
23:48
π
|
JAA |
For the record, my grab of the Halo forums covered 3036432 threads. According to the homepage, there are 3037088 threads in total. So I should have pretty much everything; not sure where those 656 remaining ones are, or maybe the counters on the homepage are off, but that's only 0.02 %. |
23:51
π
|
JAA |
The game grab has slowed down to about 10k per minute, and the WARCs are now about 1.6 GiB per 100k games. I didn't look into it, but I suspect that the old ones were "purged" and thus the pages are much smaller than for newer, not-yet-purged games. The slowdown/size increase occurred around the time it completed the 1 million oldest games and started with the 1 million most recent ones. |
23:54
π
|
Muad-Dib |
JAA: yeah, that seemed to happen to my small archivebot runs as well |