[00:11] SketchCow: can you please send me fotolog-profile_lah_maripositahx-20160209-000355_data.txt from fotolog? [00:11] It looks like some items are returned as 0 MB items, while they should contain more then that [00:11] And I'm not able to recreate the problem, so I'd like to have a look at the WARC [00:12] SketchCow: sorry, I mean file fotolog-profile_lah_maripositahx-20160209-000355.warc.gz [00:18] One moment [00:19] fos.textfiles.com/fotolog-profile_lah_maripositahx-20160209-000355.warc.gz [00:19] yes! thanks! [00:19] *** kyan has joined #archiveteam [00:22] they're returning internal server errors as 200... [00:24] hmm actually no [00:24] some strange redirect problem [00:37] *** kyan has quit IRC (Quit: This computer has gone to sleep) [00:41] *** kyan has joined #archiveteam [00:53] gametrailers is closing: https://twitter.com/GameTrailers/status/696858020215586816 [00:54] Can anymore stuff shut down this month? [00:55] http://www.gametrailers.com/ [00:55] FUCKING [00:56] today?? [00:57] Wow that's a lot of lead time they're giving us [00:58] They updated their site a few months ago [00:59] with a bunch of stuff that basically says fuck history http://www.gametrailers.com/forums/read/668.885579-GameTrailers-com-New-Site-FAQ [01:06] They're giving connection refused for my dedi and for at least one archivebot pipeline [01:07] I dont think it's UA related [01:08] IP? [01:09] HCross2 my dedi's at 104.254.90.187 [01:09] The archivebot pipeline F_DE_2-10 also got it [01:10] but aupipe-4 is fine [01:11] it's running on that now [01:27] The gametrailers youtube channel is still up [01:27] might want to hit that up [01:27] https://www.youtube.com/user/gametrailers [01:28] One of the best things gametrailers ever did was their retrospectives on franchies like Star Wars, Metroid, Castlevania, Final Fantasy, and Metal Gear. Tons of research and very thorough. [01:28] Those are older though, so you might have to find them sprinkled across other user accounts [01:29] I believe there was a GTA one? and RE? and also a Mario Kart one as well. [01:29] They stopped doing them after a certain point, though they still stand as some of the most impressive video retrospectives around. [01:29] I would say they alone trump all other generic content (trailers, hype/marketing material) at GT [01:31] I think they're on the gametrailers main youtube channel, though you have to hunt through the playlists to find them. Should we assume the content at youtube is going away too? There's a *lot* of stuff there. >.< [01:36] I fear most of the retrospectives of old were not "HD" so thy might not be in the youtube channel [01:41] Hmm yea I can't even find the Metroid Retrospective on youtube anywhere :/ [01:44] *** JesseW has joined #archiveteam [01:46] Metroid retrospective from 2007 is on the website still, it looks like. And of course youtube-dl doesn't like it... [01:50] Well, you can grab the .m3u8 once you extract it, I guess. Sigh. [01:51] I cant access gametrailers.com :/ its hanging [01:51] ah [01:51] must be my dedicated server not being able to access it [01:51] lame [01:56] Good news is the .m3u8 filenames are relatively predictable, anyway. [02:03] *** Smiley has quit IRC (Remote host closed the connection) [02:04] Metroid and Fallout retrospectives both seem to be missing from Youtube. :/ [02:04] *** lunG has joined #archiveteam [02:04] *** JesseW has quit IRC (Read error: Operation timed out) [02:05] *** philpem has quit IRC (Ping timeout: 260 seconds) [02:07] Not super clear at a quick glance, but the Resident Evil retrospective might also be at risk of disappearing. [02:09] Also, the Zelda ones from 2006. [02:09] *** lun has quit IRC (Read error: Operation timed out) [02:10] *** Smiley has joined #archiveteam [02:11] The content on gametrailers comes down in chunks from edgecastcdn [02:11] *** lunG has quit IRC (Ping timeout: 250 seconds) [02:13] Yeah, I'm hoping that if we can save the CDN URLs, we'll have a little extra breathing room after the site goes dark. [02:15] Batman Arkham and Warcraft retrospectives don't seem to be mirrored either. [02:15] If you plan to get the boards [02:15] you won't get it automatically via crawl [02:15] pagination in html only goes back to 20 pages [02:15] need to manually increment to get a higher pagination count [02:16] GT is going to be tricky to get, lots of dynamic JS loadnig for content [02:18] ah i figured it out [02:19] http://wpc.10016.edgecastcdn.net/0010016/gtcomstor/media/videos_root/shows/retrospectives/2012/metroid/part3/gt_retro_metroid_part_3-1200_kbps.mp4?1B608EE7AFCE3765E176F3C6FBB98002B3D18C64572F2307D769A970A072F1BEC602 [02:19] That downloads [02:24] We have my permission to download as much of GT, immediately, as possible. [02:25] As much as possible. [02:25] They're blocking a lot of datacenter [02:25] it's kind of hard because of that [02:25] Unfortunately some of the oldest (ca. 2006) retrospectives are already AWOL - D&D and Dexter are two I've found so far. [02:26] Er, Daxter. [02:28] Well, you have my blessing. Do what it takes [02:32] #UnhitchedTrailer perhaps? [02:33] Holy hell, and I thought *Yahoo* was bad. [02:33] *** Ungstein1 has joined #archiveteam [02:35] *** schbirid has quit IRC (Read error: Operation timed out) [02:38] *** JesseW has joined #archiveteam [02:44] *** espes__ has joined #archiveteam [02:50] *** schbirid has joined #archiveteam [02:56] http://www.gametrailers.com/sitemap/sitemap.xml [02:58] May be some sort of weird geo-permission thing going on with the CDN; some videos will download from a US server but not one in France, even though the France server can grab others. So weird. And annoying. [02:58] https://itunes.apple.com/us/artist/gametrailers.com/id125973330?mt=2 [02:58] *** lukeman has joined #archiveteam [03:02] *** JesseW has quit IRC (Quit: Leaving.) [03:02] *** PuppyCock has joined #archiveteam [03:02] gametrailers.com shut down... [03:03] If anyone can successfully grab this one-episode retrospective from their location (404s for both of my servers): http://wpc.10016.edgecastcdn.net/0310016/gtcomstor/media/video/2/7/5/d/t_fallout_retro_gt-,900,_kbps.mp4.m3u8 [03:05] *** BubuAnabe has quit IRC (Ping timeout: 360 seconds) [03:08] *** sevs44936 has joined #archiveteam [03:17] do we know the diff between content they host on their CDN and things uploaded to youtube? [03:18] *** dxrt- sets mode: +o dxrt [03:24] *** mutoso has quit IRC (Ping timeout: 252 seconds) [03:25] snape: that URL works in australia [03:26] *** kyan has quit IRC (Leaving) [03:26] Yay for yonder warm antipodes. [03:28] *** einstein9 has joined #archiveteam [03:28] huh, i can grab the m3u8 itself, but not the file [03:30] *** BubuAnabe has joined #archiveteam [03:30] same here. on the video's page someone posted that it was broken in mid january as well [03:30] yeah, the URL the file points to is 404 [03:30] not just you, snape [03:31] Damn. :/ [03:31] *** BubuAnabe has left [03:32] it's an old video, so looking for alternative sources [03:33] So how's it going [03:33] *** mutoso has joined #archiveteam [03:33] yes [03:34] I think there are four missing/lost retrospectives - D&D and Daxter, Fallout, and possibly/probably Halo. (Halo is/was four parts, and two seem to be missing.) [03:36] There could be copies on Youtube that don't credit gametrailers as the source, though. Hard to tell with nothing to compare to. :/ [03:38] GameTrailers videos generally had their bumper on the front, right? At least the AVGN videos produced when he was employed by them did [03:38] Just remembered I had downloaded the gt_timeline_kingdomhearts_ith_site_part_1_960x540_2200_m31.mp4 parts. [03:39] i'm looking through the older versions of the site for filename ideas and any other sleuthing https://web.archive.org/web/20090305081554/http://www.gametrailers.com/player/42026.html [03:39] to think they used to just have download links [03:41] *** plog99 has quit IRC () [03:43] Cool, also have the Legend of Zelda Timeline parts, just not in the best quality [03:48] *** Coderjoe has quit IRC (Ping timeout: 260 seconds) [03:52] *** JesseW has joined #archiveteam [03:55] *** Coderjoe has joined #archiveteam [03:56] GameTrailers appears to be blocking webrecorder.io [04:00] If someone has the time space and motivation, the only copies of the Final Fantasy retrospective on youtube are from third-party uploaders, so it could be worthwhile to grab copies of the originals to be safe. I think there are... twelve episodes, or something like that. [04:05] snape: Downloading them now via jdownloader [04:09] 13 episodes [04:10] Yup [04:11] Yea there is lots of content at GameTrailers, but nothing beats these retrospectives. The Final Fantasy one clocks in at over 3.5 hours when all the parts are stitched together. They are incredibly detailed. [04:11] Excellent. JesseW, maybe scrape their Twitter account with webrecorder, on the assumption that's going to disappear shortly too? [04:12] webrecorder doesn't seem to work well for me... [04:12] scrape the MP4 file links from the embeds [04:12] snape: but we've got an archivebot job running already [04:13] The boards will end up being incomplete without feeding it a more detailed stack of URL's [04:14] The pagination is limited to 20 pages on any given board, but if you change the query string, posts go back to 2013 [04:15] JesseW, okay, I'm not in #archivebot and doing a bunch of stuff in other windows, so not on top of everything. Just coughing up hairb^H^H^H^H^Hideas... [04:16] Cool I am just calling stuff out as I see it ;) Not trying to be critical at all [04:16] ^H^H^H^H [04:16] ideas are VERY WELCOME! [04:16] i0npulse: same goes for your point about the forums [04:16] thank you [04:16] ill go head and download all the videos on edgecast [04:16] All those FF Retrospectives came to ~1.5GB [04:17] Are the GT videos on Escapist going to be covered? [04:18] Are they youtube embeds, like the Tumblr? [04:18] Nah [04:19] One example given to me is www.escapistmagazine.com/videos/embed/116780?width=640 [04:20] HM. Uses the same servers as GT's main site [04:20] Here is a Retrospectives URL that is static (non js based) pagination: http://www.gametrailers.com/videos/view/gt-retrospectives?page=6 [04:20] starting at page 6 [04:20] *** vitzli has joined #archiveteam [04:20] so should have all of the links there to the retrospectives instead of having to grapple with google or GT's crappy onsite search engine [04:22] are there any videos that aren't on embed.gametrailers.com/embed/* ? [04:22] Just search the page for div class filmstrip_video and get the embedded href [04:23] *** Coderjoe has quit IRC (Read error: Operation timed out) [04:23] espes__: Are you thinking of bruteforcing the IDs and getting the URLs via that? [04:23] yeah [04:23] *** lbft_ has quit IRC (Read error: Operation timed out) [04:24] like, 20% done [04:24] Not a bad idea [04:24] *** SmileyG has joined #archiveteam [04:24] *** lbft has joined #archiveteam [04:24] *** vegbrasil has quit IRC (Read error: Operation timed out) [04:26] *** Frogging has quit IRC (Read error: Operation timed out) [04:26] *** Frogging1 has joined #archiveteam [04:28] *** aMunster has quit IRC (Read error: Operation timed out) [04:29] i0npulse, thanks for the non-borked pagination link. The Google search works, but... [04:32] *** vegbrasil has joined #archiveteam [04:32] *** edsu has quit IRC (Read error: Connection reset by peer) [04:33] yea ;) np [04:33] *** aMunster has joined #archiveteam [04:33] did someone do youtube the youtube channel? can archivebot going to handle the ~4000 videos? [04:35] *** mistym- has quit IRC (Ping timeout: 633 seconds) [04:36] *** Smiley has quit IRC (Ping timeout: 864 seconds) [04:36] *** dxrt has quit IRC (Ping timeout: 633 seconds) [04:37] *** mistym has joined #archiveteam [04:37] *** edsu has joined #archiveteam [04:37] I'm grabbing the youtube channel but if someone can get it faster than 25-30MB/s then that would be good [04:37] get all the ids and invoke youtube-dl in parallel? [04:39] *** phuzion_ has quit IRC (Ping timeout: 633 seconds) [04:39] that's a good idea actually [04:40] note youtube-dl --get-id is stupid and hits the page of each video, so maybe there's another way [04:41] *** phuzion has joined #archiveteam [04:41] *** aMunster has quit IRC (Remote host closed the connection) [04:41] *** dxrt has joined #archiveteam [04:43] yea, when trying to scrap metadata, especially in different parts it can get a bit greedy in terms of how many page requests it makes [04:43] *** dxrt- sets mode: +o dxrt [04:44] I think you could do youtube-dl --playlist-items=1-500 https://www.youtube.com/user/gametrailers/videos and run a couple instances in parallel with different chunks of the playlist, but I've not actually tried this, note... [04:46] ~22900 videos i could find on embed [04:46] getting https://www.youtube.com/user/gametrailers/videos?view=0&sort=da&live_view=500&flow=grit&spf=navigate returns a JSON [04:46] *** lbft has quit IRC (Read error: Operation timed out) [04:46] snape, I noticed that link with the retrospectives, they get a little wierd past 2010, dupes and inconsistancies [04:47] *** lbft has joined #archiveteam [04:47] Probobly replicated entries in some corporate CMS [04:50] *** vegbrasil has quit IRC (Ping timeout: 633 seconds) [04:51] *** Ghost_of_ has joined #archiveteam [04:51] *** vegbrasil has joined #archiveteam [04:51] the sitemap has 20,000 items. just pulled these out of the xml if it's helpful: https://gist.githubusercontent.com/anonymous/668e46c85483fa8d33d4/raw/f8d7f886eb695dea10399fd951fbbf3ab4159be0/gametrailers-urls.txt [04:52] youtube-dl -j --flat-playlist https://www.youtube.com/user/gametrailers gives me presumably all id's json encoded [04:53] a whole bunch of videos are dead [04:53] http://www.gametrailers.com/videos/view/gametrailers-com/113515-Mr-Blonde-Interview-Part-2 [04:53] *** aMunster has joined #archiveteam [04:53] Using that JSON, you can parse the ['body']['content'] for the links then send them to youtube-dl [04:53] *** Coderjoe has joined #archiveteam [04:55] *** brayden_ has joined #archiveteam [04:55] it looks like you can just brute force http://www.gametrailers.com/videos/view/gametrailers-com/ with ids from 0 through 116780? [04:56] sorry. 1 through 116780 [04:56] 3893 via einstein9's --flatlist [04:57] lukeman: Easier to bruteforce via the embed URLs [04:57] Also the gametrailers-com part is not needed [04:57] ah. cool [04:57] only got ~15k valid edgecast urls [04:58] might be videos ourside the range i checked or not available through embed. [04:58] *** brayden has quit IRC (Read error: Operation timed out) [04:58] espes__: Got a document of the invalid URLs? [04:59] where valid edgecast urls means non-empty video links [04:59] urls for youtube gametrailers channel: https://gist.github.com/vitzli/4c19ae1c9bd0e38e29d2 full url, can provide raw ids too [05:00] ERROR: AFQCcz-R1nY: YouTube said: Please sign in to view this video [05:01] Thanks, gametrailers, thanks for nothing, lol. >.< [05:01] age restricted? [05:01] the 7k dead embed ids: https://gist.github.com/espes/b0b27bcf4ef8703862dd [05:02] make sure it isn't growing [05:02] Not age restricted, "private"... [05:04] vitzli: does that include the GTReviews account? [05:04] >Sorry about that. [05:04] looks like they post reviews here: https://www.youtube.com/user/ClevverGames [05:04] lukeman, no, only user/gametrailers/ [05:05] Another thing that'd need archiving is http://www.twitch.tv/GameTrailers [05:06] http://www.twitch.tv/GameTrailers/profile for links to the past broadcasts [05:07] just youtube-dl http://www.twitch.tv/GameTrailers/profile/past_broadcasts [05:07] there's only 2 months of stuff because twitch deletes old stuff [05:07] Right [05:08] lukeman, here is the list for GTReviews/CleverGames youtube channel, full urls https://gist.github.com/vitzli/ea95a8da33015b3edba1 [05:08] 4322 videos on GTReviews/CleverGames [05:09] it's looks like videos from Twitch might be on the GTReviews/ClevverGames channel [05:09] Archivebot working on Twitch [05:12] Fletcher, are you doing 'gametrailers' channel only? [05:13] to start with yep [05:17] is someone working on the GTReviews channel? i could point youtube-dl in it's direction, but im somewhat concerned i won't have enough space for all of it... [05:18] sevs44936, I'm trying to estimate total size of it [05:19] i guess we got the bad end on gametrailers eh [05:19] between 2 systems i've got like 5tb free, unsure if thats enough [05:19] it should be [05:20] which setting should i use for youtube-dl [05:21] ? [05:21] just defaults or anything else? [05:22] http://www.archiveteam.org/index.php?title=YouTube is reccomended AFAIK [05:22] youtube-dl --title --continue --retries 4 --write-info-json --write-description --write-thumbnail --write-annotations --all-subs --ignore-errors -f bestvideo+bestaudio URL [05:22] ahh, wonderful [05:22] sevs44936, if you don't use youtube-dl very often check your version before starting [05:23] Fletcher, i've got 2016.01.15 [05:23] perfect :) [05:24] bah, only getting 1gbps from edgecast [05:24] just haven't used it "professionally" [05:25] *** kyan has joined #archiveteam [05:31] arkiver: ack [05:31] there she blows [05:32] ok, 2gbps [05:32] which is the throughput limit of gce ssds -_- [05:33] Using tubeup.py on Twitch [05:33] average video size is ~125 MiB on 16 random video sample size; I got one 1.2 GiB video, without it I got ~50 MiB/video [05:34] with 125 MiB/video estimated size for GTReviews is 527 GiB [05:34] The only archive for the Twitch broadcasts is the SomeGameNews YT channel [05:35] im about to order a 6TB so you start server [05:39] ...actually nvm, too expensive [05:43] *** wutno has joined #archiveteam [05:44] *** Sk1d has quit IRC (Ping timeout: 200 seconds) [05:47] ~70 videos on the Twitch channel, would estimate a GB each [05:49] *** Sk1d has joined #archiveteam [05:56] *** WinterFox has joined #archiveteam [06:02] ok, currently pulling the first 400 [06:03] i'm getting around 25MiB/s combined down [06:05] Wonder what it'd be like if IGN called it quits [06:06] around 115 MiB/video on 20 video sample on GTReviews channel [06:06] *** megaminxw has joined #archiveteam [06:06] *** megaminxw has left [06:07] *** megaminxw has joined #archiveteam [06:09] einstein9, I've been slowly working on IGN [06:10] all 100k+ videos :( [06:10] Huh, cool [06:11] I know that gamespot's videos are quite troublesome as some didn't survive the move to the new layout/servers [06:12] *** amm has joined #archiveteam [06:13] *** wutno has quit IRC (Ping timeout: 252 seconds) [06:14] *** wutno has joined #archiveteam [06:14] einstein9: don't say that. lol [06:15] gave me a heart attack [06:15] I read what Fletcher said and was like "OH MY..." [06:15] and then I saw you said "if" [06:15] Hahaha [06:15] :P [06:16] hey all, archiveteam tradition is to move project-specific discussion to its own channel [06:16] please come up with a suitable pun [06:16] gamefailures [06:16] shametrailers [06:16] I think GlaDOS said it way way above [06:16] UnhitchedTrailers [06:17] #UnhitchedTrailer [06:17] lol [06:17] yah [06:17] cool that works [06:17] ok joining [06:17] *** xmc sets mode: +o swebb [06:17] *** swebb sets mode: +o brayden_ [06:17] *** swebb sets mode: +o edsu [06:26] Entrailers, cause it's dying [06:33] kyan, gameyentrails... [06:33] +1000 [06:37] *** mismatch has quit IRC (Ping timeout: 250 seconds) [06:56] ftr i think ive got all the videos that still worked on the site [06:57] only ~14k, 1tb [06:57] on gametrailers* [06:58] *** wutno has quit IRC (Quit: ZNC - http://znc.in) [07:02] well [07:02] looks like I've got a job this summee [07:02] summer * [07:02] You know what that means [07:03] moar hard drives :D [07:03] espes__, ooh, awesome, congrats! [07:04] all their old videos have been gone on the site for a while :( [07:04] Really wish I had a gigabit adaptor, would make archiving that much easier [07:05] Are the ones you could get going into IA? [07:06] yeah maybe [07:06] i feel like just doing it again with archivebot if there's a pipeline with free space [07:06] this was just incase everything started vanishing right now [07:06] Cool [07:07] all it would need is a list of URLs to feed to the bot I think [07:07] The main site's in the bot, but IDK if it's getting the videos [07:09] the videos urls are in a script tag [07:09] and off on edgecast [07:11] I won't be able to do it though, time for bed for me :P [07:11] should i select a particular pipeline though? [07:19] *** megaminxw has quit IRC (Quit: Leaving.) [07:19] *** megaminxw has joined #archiveteam [07:45] *** RedType_ has quit IRC (Ping timeout: 252 seconds) [07:53] *** RedType has joined #archiveteam [08:02] *** tobbez has quit IRC (Ping timeout: 506 seconds) [08:04] *** lukeman has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [08:15] *** kyan has quit IRC (Leaving) [08:16] *** TheKiwi has joined #archiveteam [08:18] *** JesseW has quit IRC (Quit: Leaving.) [08:20] *** vitzli has quit IRC (Quit: Leaving) [08:30] *** Ghost_of_ has quit IRC (Quit: Leaving) [08:34] *** atomotic has joined #archiveteam [08:51] espes__, aim for something with large disk http://archivebot.at.ninjawedding.org:4567/pipelines [08:54] HCross: sorted, since apparently it's 5gb chunked [09:18] *** Morbus has quit IRC (Ping timeout: 260 seconds) [09:19] *** Morbus has joined #archiveteam [09:29] *** tobbez has joined #archiveteam [09:33] *** trs80 has quit IRC (Ping timeout: 190 seconds) [09:55] *** sevs44936 has quit IRC (Ping timeout: 258 seconds) [11:36] *** WinterFox has quit IRC (Remote host closed the connection) [11:38] *** oldcad has joined #archiveteam [11:40] gametrailers will shut down :( [11:42] yup [11:42] rip [11:50] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [12:00] *** oldcad has quit IRC (Ping timeout: 250 seconds) [12:07] *** tobbez has quit IRC (Ping timeout: 506 seconds) [12:09] *** oldcad has joined #archiveteam [12:15] didnt we grab parts of that already? [12:23] Looks like fotolog.com isn't fast enough to get everything before the deadline [12:23] I'll ignore URLs later today so we can get more then we do now [12:24] I think fotolog only announced their shutdown 20 days before the shutdown [12:24] Way to short to get everything :/ [12:24] Are the videos of gametrailers saved? [12:24] I'm not sure how many videos they have, but writing the scripts for the grab wouldn't be too hard [12:25] I can probably get that up later today if we need it [12:26] arkiver: want me to add more concurretns to fotolog? [12:27] You can do that, but the it looks like the isn't becoming faster if that happens [12:27] or well, releasing more URLs [12:27] right, then its pointsless [12:27] take a look at the graph of fotolog http://tracker.archiveteam.org/fotolog/ [12:27] You see when you came in, the red line, BnAboyZ slowed down [12:30] ah THATS why that is [12:30] (the color is different for different people, but I understand) [12:30] did BnAboyZ just discover the stand-alone pipeline? [12:31] arkiver: fotolog needs more boxes or are we killing them? [12:31] oh nevermind [12:31] midas: we are not the bottleneck :P [12:31] i noiced [12:32] fotolog is for some reason sending infinite 301s when it feels like it [12:32] Can't make the fix for that right now, but will later today [12:32] and then some items will be requeued [12:32] Probably not liking the loaf [12:32] Load [12:33] yeaH [12:33] nice, im connected via dailup atm so on slownet, will lag [12:33] dialup? does that still exist? [12:34] yes [12:34] wow [12:36] I'm not that fast either http://www.speedtest.net/my-result/a/1729887563 [12:41] i wont click that because it will take ages to load ;-) [12:41] also -bs [12:43] *** atomotic has joined #archiveteam [13:01] *** weles has joined #archiveteam [13:09] *** megaminxw has quit IRC (Quit: Leaving.) [13:15] *** oldcad has quit IRC (Ping timeout: 250 seconds) [13:19] *** tobbez has joined #archiveteam [13:26] *** RichardG has quit IRC (Ping timeout: 499 seconds) [13:27] *** oldcad has joined #archiveteam [13:35] *** vitzli has joined #archiveteam [13:58] *** trs80 has joined #archiveteam [14:11] *** sevs44936 has joined #archiveteam [14:12] *** oldcad has quit IRC (Ping timeout: 250 seconds) [14:32] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [14:41] *** Ungstein1 has quit IRC (Quit: Leaving.) [14:57] *** Start has quit IRC (Quit: Disconnected.) [15:34] *** atomotic has joined #archiveteam [15:45] *** Start has joined #archiveteam [15:49] *** RichardG has joined #archiveteam [15:59] *** Start has quit IRC (Ping timeout: 360 seconds) [16:00] *** Digressin has joined #archiveteam [16:00] *** Zei-Pii has joined #archiveteam [16:00] *** Start has joined #archiveteam [16:00] *** Digressin has quit IRC (Client Quit) [16:00] *** Digressin has joined #archiveteam [16:01] *** Digressin has quit IRC (Client Quit) [16:01] *** DigresNSQ has joined #archiveteam [16:04] *** vitzli has quit IRC (Quit: Leaving) [16:09] I don't have the warrior installed on this computer, yet. How much of GameTrailers were we able to save? [16:19] *** DigresNSQ has quit IRC (Quit: Page closed) [16:34] *** scyther has joined #archiveteam [17:04] *** JesseW has joined #archiveteam [17:07] *** Start has quit IRC (Quit: Disconnected.) [17:22] *** JesseW has quit IRC (Quit: Leaving.) [17:27] *** Tomcat_ has joined #archiveteam [17:33] *** godane has quit IRC (Quit: Leaving.) [17:34] *** godane has joined #archiveteam [17:48] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [17:53] *** philpem has joined #archiveteam [17:55] *** Stilett0 is now known as Stiletto [18:00] *** lunG has joined #archiveteam [18:01] Do we need a small project for gametrailers? [18:02] If we do for the videos I now have the time to create one [18:12] SketchCow: due to the slow speed and their late announcement of shutting down we won't get friendsreuited and fotolog fully saved [18:15] I'm creating a small project for gametrailer to save the trailers (videos only) [18:16] chfoo: SketchCow: can you please create a rsync target on FOS for it? [18:16] It'll be quick project I think [18:18] arkiver, I think most of the videos have been saved, either by individuals or archivebot...? [18:18] Also as WARCs? [18:19] I'm currently downloading the GTReviews yt channel [18:19] ok! [18:19] Not sure, TBH. [18:19] I'll make sure videos are saved as WARCs [18:19] if archivebot grabbed it we should be OK on the warc side [18:19] just youtube-dl [18:20] arkiver: ok, rsync directory for gametrailer made [18:20] thanks! [18:22] The announcement of their closing made it sound like we had maybe just a handful of hours left before the site went down, so people were grabbing what they could, how they could. [18:23] yeah [18:23] let's do one good full grab of the videos [18:24] You going after the youtube videos or the ones on the site? [18:24] the site [18:25] OK. [18:25] I really really wish fotolog and friendsreunited would have announced earlier [18:25] we [18:26] we're now losing years and years of history [18:26] Friendsunited doesn't like we're doing this, at all. [18:26] There's a local group claiming to do the work [18:26] But we're not trusting them [18:26] group of insiders doing the work? [18:26] They connected with a library [18:27] But the library knows me, they shouldn't be surprised. [18:27] Probably some insiders then [18:27] *** Tomcat_ has quit IRC (Read error: Operation timed out) [18:27] There's a gametrailer slot for you [18:27] I've no idea how else they can save everything [18:27] yes [18:27] Is it one of the UK's many local libraries that dont seem to communicate [18:28] there's an active uk library? didnt know there was one [18:28] Many. But they dont publicse [18:28] or talk [18:29] SketchCow, you talking to friends reunited then? [18:32] *** ndiddy has joined #archiveteam [18:35] Not anymore [18:35] They told me to fuck off [18:35] And stop [18:35] wow [18:35] haha [18:36] fine, we'll just run the site hot till it goes down :P [18:37] I was going to say, I dont live that far from their office. Happy to go round and take hard disks of data [18:38] But I think Ill get a punch in the face, rather than disks of data [18:38] or the police [18:39] Me: "Ive come to archive your shit" Them: "Arent you the reason our site is so slow" Me: "Learn to shutdown prop...." BANG [18:44] *** Start has joined #archiveteam [18:44] yes [18:44] "Hullo, I'm the county collection agent for the Portable Antiquities Scheme, I'm here to collect your computers under the auspices of the Historical Records Preservation Act (Digital) 2011..." [18:44] Wait, you're in New Zealand? [18:45] Nope, Friends Reunited are UK based [18:45] Oh! [18:46] We're going to check all 3 million video IDs rom gametrailers [18:46] most olders IDs will probably be dead [18:47] Wait, you're in New Zealand? [18:47] I am tho :v [18:48] Dead IDs, per someone last night, arkiver: https://gist.github.com/espes/b0b27bcf4ef8703862dd [18:48] einstein9: Do you know Rob Isaac? [18:49] Nope [18:49] Opposite end of the country [18:49] Bummer [18:49] Well, not a bummer, you're still in the beautiful island [18:51] i'm grabbing more of david bowie bootlegs [18:52] from 1990 [18:52] Oh yeah, if anyone wants to archive these for the IA: http://paperspast.natlib.govt.nz/cgi-bin/paperspast [18:57] einstein9: i will look at it [18:58] Thanks [18:59] The per-page png are probably better quality than the full pdf of the issues [18:59] *** Arendo has joined #archiveteam [19:00] The full pdfs have a ToC tho [19:01] *** Arendo has quit IRC (Client Quit) [19:02] *** reinhard has joined #archiveteam [19:02] hey guys, just want to ask, is there something wrong with fgts.jp? cause I'm unable to connect to their image database.... [19:03] why would we know? [19:04] oh well, perhaps it's not a global problem and it's just my connection issues [19:04] just curious dude [19:04] no like [19:04] i'm curious what makes you think "fgts.jp ... archiveteam will know!" [19:05] http://www.downforeveryoneorjustme.com/fgts.jp [19:07] to be fair [19:07] here is usually a good place to ask whether something is dead [19:07] :P [19:11] cause when I look at 4chan archive sites on your page, this IRC channel was listed on fgts.jp section, and I automatically assume perhaps this channel knows something about it [19:11] not so hard to figure that [19:15] *** Start has quit IRC (Quit: Disconnected.) [19:17] *** reinhard has left [19:18] is there a gametrailers channel? help needed? [19:18] #gamefailers ? [19:19] #UnhitchedTrailer [19:28] http://www.embulk.org/docs/ dropping that there for people to look at [19:28] xmc: Archive Team has eyes everywhere [19:28] including all over 4chan i suppose [19:28] #gamefailers wins [19:29] *** Nertsy has quit IRC (Quit: Nertsy) [19:29] #UnhitchedTrailer already exists and is full of people, i simply did not now [19:30] *** atomotic has joined #archiveteam [19:32] *** Nertsy has joined #archiveteam [19:41] *** Tomcat_ has joined #archiveteam [19:50] *** Start has joined #archiveteam [19:58] *** megaminxw has joined #archiveteam [19:59] *** Tomcat__ has joined #archiveteam [20:00] *** Tomcat_ has quit IRC (Read error: Operation timed out) [20:05] Transferring data across oceans is relatively slow [20:05] *** Frogging1 is now known as Frogging [20:05] i blame the bandwidth-delay product and tcp [20:06] Bandwidth delay product? [20:06] Is that relating latency to effective throughput? (just a guess, googling on a tablet is cumbersome) [20:06] also, fgts is working for me [20:06] yeah [20:07] Guess you could try transfers with UDP and then verifying the data after :p [20:07] *** Zei-Pii has quit IRC (Ping timeout: 250 seconds) [20:07] Frogging: it's the amount of data that is in transit at any given time [20:07] like, send everything, check it, and then send the bits that got lost [20:07] -bs [20:07] ^ [20:07] So you don't have to fuss with ACKS during the transfer [20:08] we won't use udp [20:08] woop woop woop off-topic siren [20:08] I wonder how much faster that would be [20:08] take it to #archiveteam-bs [20:08] Lol sorry :p [20:08] forgot about that channel [20:33] arkiver: im pretty sure there's only the 14k videos i found still available on the site [20:33] i got a copy and now archivebot is going through them [20:33] since its only 1tb [20:34] everything else i think was deleted in september [20:35] Frogging: https://github.com/LabAdvComp/UDR [20:37] yah I found that, I posted in #archiveteam-bs about it [20:37] To avoid derailing this channel :p [20:43] *** Start has quit IRC (Quit: Disconnected.) [20:54] espes__: how did you get a copy? [20:54] *** megaminxw has quit IRC (Quit: Leaving.) [20:55] *** bauruine has quit IRC (Quit: ZNC - http://znc.in) [20:55] in what format [20:58] For archiving one of my own sites, is something like httrack a good option? Its running an outdated Joomla install so I don't really want to keep it online as-is anymore [20:58] arkiver: scraping embed.gametrailers.com/embed/{2942000..3008493} [20:58] archivebot w [20:58] will, [20:59] espes__: did you grab them in WARCs? what format of the video did you grab? and did you also grab the segmented videos? [20:59] arkiver: and just the files, im assuming archivebot can handle the videos [20:59] and no [20:59] ok [21:00] well the segmented videos are all available as files [21:00] i think [21:00] they are needed for playback of the video [21:00] so future wayback machine can handle the videos [21:01] do you have a list of IDs for me that you found working? [21:01] like some early videos? I'd like to test if those work the same as the recent videos [21:01] *** bauruine has joined #archiveteam [21:01] there'll all recent, they were moved to edgecast in september [21:02] HCross: Is it possible for me to get the HTML back from that? Thanks [21:03] *** weles has quit IRC (Read error: Operation timed out) [21:03] yes, just view the source and ignore the header [21:03] view-source:https://web.archive.org/web/20000229183439/http:/www.cl.cam.ac.uk/coffee/coffee.html look at line 197 [21:03] Okay thanks [21:04] arkiver: https://docs.google.com/uc?id=0B4uDFAm87U0jR3N2Q2VmVEpURE0&export=download [21:11] *** Tomcat__ has quit IRC (Remote host closed the connection) [21:16] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [21:18] *** amm has quit IRC (Quit: Leaving) [21:34] *** bauruine has quit IRC (Ping timeout: 260 seconds) [21:36] *** kyan has joined #archiveteam [21:39] *** bauruine has joined #archiveteam [22:03] *** kyan has quit IRC (This computer has gone to sleep) [22:07] *** kyan has joined #archiveteam [22:20] *** scyther has quit IRC (Read error: Connection reset by peer) [23:20] *** Mayonaise has quit IRC (Read error: Operation timed out) [23:29] *** BnA-Robin has joined #archiveteam [23:45] https://cve.fbi.gov - let's explain violent extremism to kids in 2016 with pictures of CRT monitors and floppy disks, yeah! [23:45] Dammit, wrong channel. >.< [23:50] *** Start has joined #archiveteam [23:51] lol [23:55] *** gter has joined #archiveteam [23:55] is the Gametrailers archive project ongoing? [23:56] Think so; check #UnhitchedTrailer, AFAIK [23:59] woah, gametrailers is closing? [23:59] Yeah