#archiveteam 2016-02-09,Tue

↑back Search

Time Nickname Message
00:11 🔗 arkiver SketchCow: can you please send me fotolog-profile_lah_maripositahx-20160209-000355_data.txt from fotolog?
00:11 🔗 arkiver It looks like some items are returned as 0 MB items, while they should contain more then that
00:11 🔗 arkiver And I'm not able to recreate the problem, so I'd like to have a look at the WARC
00:12 🔗 arkiver SketchCow: sorry, I mean file fotolog-profile_lah_maripositahx-20160209-000355.warc.gz
00:18 🔗 SketchCow One moment
00:19 🔗 SketchCow fos.textfiles.com/fotolog-profile_lah_maripositahx-20160209-000355.warc.gz
00:19 🔗 arkiver yes! thanks!
00:19 🔗 kyan has joined #archiveteam
00:22 🔗 arkiver they're returning internal server errors as 200...
00:24 🔗 arkiver hmm actually no
00:24 🔗 arkiver some strange redirect problem
00:37 🔗 kyan has quit IRC (Quit: This computer has gone to sleep)
00:41 🔗 kyan has joined #archiveteam
00:53 🔗 wp494 gametrailers is closing: https://twitter.com/GameTrailers/status/696858020215586816
00:54 🔗 HCross2 Can anymore stuff shut down this month?
00:55 🔗 SketchCow http://www.gametrailers.com/
00:55 🔗 SketchCow FUCKING
00:56 🔗 arkiver today??
00:57 🔗 kyan Wow that's a lot of lead time they're giving us
00:58 🔗 kyan They updated their site a few months ago
00:59 🔗 kyan with a bunch of stuff that basically says fuck history http://www.gametrailers.com/forums/read/668.885579-GameTrailers-com-New-Site-FAQ
01:06 🔗 kyan They're giving connection refused for my dedi and for at least one archivebot pipeline
01:07 🔗 kyan I dont think it's UA related
01:08 🔗 HCross2 IP?
01:09 🔗 kyan HCross2 my dedi's at 104.254.90.187
01:09 🔗 kyan The archivebot pipeline F_DE_2-10 also got it
01:10 🔗 kyan but aupipe-4 is fine
01:11 🔗 kyan it's running on that now
01:27 🔗 i0npulse The gametrailers youtube channel is still up
01:27 🔗 i0npulse might want to hit that up
01:27 🔗 i0npulse https://www.youtube.com/user/gametrailers
01:28 🔗 i0npulse One of the best things gametrailers ever did was their retrospectives on franchies like Star Wars, Metroid, Castlevania, Final Fantasy, and Metal Gear. Tons of research and very thorough.
01:28 🔗 i0npulse Those are older though, so you might have to find them sprinkled across other user accounts
01:29 🔗 i0npulse I believe there was a GTA one? and RE? and also a Mario Kart one as well.
01:29 🔗 i0npulse They stopped doing them after a certain point, though they still stand as some of the most impressive video retrospectives around.
01:29 🔗 i0npulse I would say they alone trump all other generic content (trailers, hype/marketing material) at GT
01:31 🔗 snape I think they're on the gametrailers main youtube channel, though you have to hunt through the playlists to find them. Should we assume the content at youtube is going away too? There's a *lot* of stuff there. >.<
01:36 🔗 i0npulse I fear most of the retrospectives of old were not "HD" so thy might not be in the youtube channel
01:41 🔗 i0npulse Hmm yea I can't even find the Metroid Retrospective on youtube anywhere :/
01:44 🔗 JesseW has joined #archiveteam
01:46 🔗 snape Metroid retrospective from 2007 is on the website still, it looks like. And of course youtube-dl doesn't like it...
01:50 🔗 snape Well, you can grab the .m3u8 once you extract it, I guess. Sigh.
01:51 🔗 i0npulse I cant access gametrailers.com :/ its hanging
01:51 🔗 i0npulse ah
01:51 🔗 i0npulse must be my dedicated server not being able to access it
01:51 🔗 i0npulse lame
01:56 🔗 snape Good news is the .m3u8 filenames are relatively predictable, anyway.
02:03 🔗 Smiley has quit IRC (Remote host closed the connection)
02:04 🔗 snape Metroid and Fallout retrospectives both seem to be missing from Youtube. :/
02:04 🔗 lunG has joined #archiveteam
02:04 🔗 JesseW has quit IRC (Read error: Operation timed out)
02:05 🔗 philpem has quit IRC (Ping timeout: 260 seconds)
02:07 🔗 snape Not super clear at a quick glance, but the Resident Evil retrospective might also be at risk of disappearing.
02:09 🔗 snape Also, the Zelda ones from 2006.
02:09 🔗 lun has quit IRC (Read error: Operation timed out)
02:10 🔗 Smiley has joined #archiveteam
02:11 🔗 i0npulse The content on gametrailers comes down in chunks from edgecastcdn
02:11 🔗 lunG has quit IRC (Ping timeout: 250 seconds)
02:13 🔗 snape Yeah, I'm hoping that if we can save the CDN URLs, we'll have a little extra breathing room after the site goes dark.
02:15 🔗 snape Batman Arkham and Warcraft retrospectives don't seem to be mirrored either.
02:15 🔗 i0npulse If you plan to get the boards
02:15 🔗 i0npulse you won't get it automatically via crawl
02:15 🔗 i0npulse pagination in html only goes back to 20 pages
02:15 🔗 i0npulse need to manually increment to get a higher pagination count
02:16 🔗 i0npulse GT is going to be tricky to get, lots of dynamic JS loadnig for content
02:18 🔗 i0npulse ah i figured it out
02:19 🔗 i0npulse http://wpc.10016.edgecastcdn.net/0010016/gtcomstor/media/videos_root/shows/retrospectives/2012/metroid/part3/gt_retro_metroid_part_3-1200_kbps.mp4?1B608EE7AFCE3765E176F3C6FBB98002B3D18C64572F2307D769A970A072F1BEC602
02:19 🔗 i0npulse That downloads
02:24 🔗 SketchCow We have my permission to download as much of GT, immediately, as possible.
02:25 🔗 SketchCow As much as possible.
02:25 🔗 kyan They're blocking a lot of datacenter
02:25 🔗 kyan it's kind of hard because of that
02:25 🔗 snape Unfortunately some of the oldest (ca. 2006) retrospectives are already AWOL - D&D and Dexter are two I've found so far.
02:26 🔗 snape Er, Daxter.
02:28 🔗 SketchCow Well, you have my blessing. Do what it takes
02:32 🔗 GLaDOS #UnhitchedTrailer perhaps?
02:33 🔗 pikhq Holy hell, and I thought *Yahoo* was bad.
02:33 🔗 Ungstein1 has joined #archiveteam
02:35 🔗 schbirid has quit IRC (Read error: Operation timed out)
02:38 🔗 JesseW has joined #archiveteam
02:44 🔗 espes__ has joined #archiveteam
02:50 🔗 schbirid has joined #archiveteam
02:56 🔗 JesseW http://www.gametrailers.com/sitemap/sitemap.xml
02:58 🔗 snape May be some sort of weird geo-permission thing going on with the CDN; some videos will download from a US server but not one in France, even though the France server can grab others. So weird. And annoying.
02:58 🔗 JesseW https://itunes.apple.com/us/artist/gametrailers.com/id125973330?mt=2
02:58 🔗 lukeman has joined #archiveteam
03:02 🔗 JesseW has quit IRC (Quit: Leaving.)
03:02 🔗 PuppyCock has joined #archiveteam
03:02 🔗 PuppyCock gametrailers.com shut down...
03:03 🔗 snape If anyone can successfully grab this one-episode retrospective from their location (404s for both of my servers): http://wpc.10016.edgecastcdn.net/0310016/gtcomstor/media/video/2/7/5/d/t_fallout_retro_gt-,900,_kbps.mp4.m3u8
03:05 🔗 BubuAnabe has quit IRC (Ping timeout: 360 seconds)
03:08 🔗 sevs44936 has joined #archiveteam
03:17 🔗 lukeman do we know the diff between content they host on their CDN and things uploaded to youtube?
03:18 🔗 dxrt- sets mode: +o dxrt
03:24 🔗 mutoso has quit IRC (Ping timeout: 252 seconds)
03:25 🔗 GLaDOS snape: that URL works in australia
03:26 🔗 kyan has quit IRC (Leaving)
03:26 🔗 snape Yay for yonder warm antipodes.
03:28 🔗 einstein9 has joined #archiveteam
03:28 🔗 GLaDOS huh, i can grab the m3u8 itself, but not the file
03:30 🔗 BubuAnabe has joined #archiveteam
03:30 🔗 lukeman same here. on the video's page someone posted that it was broken in mid january as well
03:30 🔗 GLaDOS yeah, the URL the file points to is 404
03:30 🔗 GLaDOS not just you, snape
03:31 🔗 snape Damn. :/
03:31 🔗 BubuAnabe has left
03:32 🔗 lukeman it's an old video, so looking for alternative sources
03:33 🔗 einstein9 So how's it going
03:33 🔗 mutoso has joined #archiveteam
03:33 🔗 GLaDOS yes
03:34 🔗 snape I think there are four missing/lost retrospectives - D&D and Daxter, Fallout, and possibly/probably Halo. (Halo is/was four parts, and two seem to be missing.)
03:36 🔗 snape There could be copies on Youtube that don't credit gametrailers as the source, though. Hard to tell with nothing to compare to. :/
03:38 🔗 MrRadar GameTrailers videos generally had their bumper on the front, right? At least the AVGN videos produced when he was employed by them did
03:38 🔗 einstein9 Just remembered I had downloaded the gt_timeline_kingdomhearts_ith_site_part_1_960x540_2200_m31.mp4 parts.
03:39 🔗 lukeman i'm looking through the older versions of the site for filename ideas and any other sleuthing https://web.archive.org/web/20090305081554/http://www.gametrailers.com/player/42026.html
03:39 🔗 lukeman to think they used to just have download links
03:41 🔗 plog99 has quit IRC ()
03:43 🔗 einstein9 Cool, also have the Legend of Zelda Timeline parts, just not in the best quality
03:48 🔗 Coderjoe has quit IRC (Ping timeout: 260 seconds)
03:52 🔗 JesseW has joined #archiveteam
03:55 🔗 Coderjoe has joined #archiveteam
03:56 🔗 JesseW GameTrailers appears to be blocking webrecorder.io
04:00 🔗 snape If someone has the time space and motivation, the only copies of the Final Fantasy retrospective on youtube are from third-party uploaders, so it could be worthwhile to grab copies of the originals to be safe. I think there are... twelve episodes, or something like that.
04:05 🔗 einstein9 snape: Downloading them now via jdownloader
04:09 🔗 i0npulse 13 episodes
04:10 🔗 einstein9 Yup
04:11 🔗 i0npulse Yea there is lots of content at GameTrailers, but nothing beats these retrospectives. The Final Fantasy one clocks in at over 3.5 hours when all the parts are stitched together. They are incredibly detailed.
04:11 🔗 snape Excellent. JesseW, maybe scrape their Twitter account with webrecorder, on the assumption that's going to disappear shortly too?
04:12 🔗 JesseW webrecorder doesn't seem to work well for me...
04:12 🔗 i0npulse scrape the MP4 file links from the embeds
04:12 🔗 JesseW snape: but we've got an archivebot job running already
04:13 🔗 i0npulse The boards will end up being incomplete without feeding it a more detailed stack of URL's
04:14 🔗 i0npulse The pagination is limited to 20 pages on any given board, but if you change the query string, posts go back to 2013
04:15 🔗 snape JesseW, okay, I'm not in #archivebot and doing a bunch of stuff in other windows, so not on top of everything. Just coughing up hairb^H^H^H^H^Hideas...
04:16 🔗 i0npulse Cool I am just calling stuff out as I see it ;) Not trying to be critical at all
04:16 🔗 einstein9 ^H^H^H^H
04:16 🔗 JesseW ideas are VERY WELCOME!
04:16 🔗 JesseW i0npulse: same goes for your point about the forums
04:16 🔗 JesseW thank you
04:16 🔗 espes__ ill go head and download all the videos on edgecast
04:16 🔗 einstein9 All those FF Retrospectives came to ~1.5GB
04:17 🔗 einstein9 Are the GT videos on Escapist going to be covered?
04:18 🔗 snape Are they youtube embeds, like the Tumblr?
04:18 🔗 einstein9 Nah
04:19 🔗 einstein9 One example given to me is www.escapistmagazine.com/videos/embed/116780?width=640
04:20 🔗 einstein9 HM. Uses the same servers as GT's main site
04:20 🔗 i0npulse Here is a Retrospectives URL that is static (non js based) pagination: http://www.gametrailers.com/videos/view/gt-retrospectives?page=6
04:20 🔗 i0npulse starting at page 6
04:20 🔗 vitzli has joined #archiveteam
04:20 🔗 i0npulse so should have all of the links there to the retrospectives instead of having to grapple with google or GT's crappy onsite search engine
04:22 🔗 espes__ are there any videos that aren't on embed.gametrailers.com/embed/* ?
04:22 🔗 einstein9 Just search the page for div class filmstrip_video and get the embedded href
04:23 🔗 Coderjoe has quit IRC (Read error: Operation timed out)
04:23 🔗 einstein9 espes__: Are you thinking of bruteforcing the IDs and getting the URLs via that?
04:23 🔗 espes__ yeah
04:23 🔗 lbft_ has quit IRC (Read error: Operation timed out)
04:24 🔗 espes__ like, 20% done
04:24 🔗 einstein9 Not a bad idea
04:24 🔗 SmileyG has joined #archiveteam
04:24 🔗 lbft has joined #archiveteam
04:24 🔗 vegbrasil has quit IRC (Read error: Operation timed out)
04:26 🔗 Frogging has quit IRC (Read error: Operation timed out)
04:26 🔗 Frogging1 has joined #archiveteam
04:28 🔗 aMunster has quit IRC (Read error: Operation timed out)
04:29 🔗 snape i0npulse, thanks for the non-borked pagination link. The Google search works, but...
04:32 🔗 vegbrasil has joined #archiveteam
04:32 🔗 edsu has quit IRC (Read error: Connection reset by peer)
04:33 🔗 i0npulse yea ;) np
04:33 🔗 aMunster has joined #archiveteam
04:33 🔗 espes__ did someone do youtube the youtube channel? can archivebot going to handle the ~4000 videos?
04:35 🔗 mistym- has quit IRC (Ping timeout: 633 seconds)
04:36 🔗 Smiley has quit IRC (Ping timeout: 864 seconds)
04:36 🔗 dxrt has quit IRC (Ping timeout: 633 seconds)
04:37 🔗 mistym has joined #archiveteam
04:37 🔗 edsu has joined #archiveteam
04:37 🔗 Fletcher I'm grabbing the youtube channel but if someone can get it faster than 25-30MB/s then that would be good
04:37 🔗 espes__ get all the ids and invoke youtube-dl in parallel?
04:39 🔗 phuzion_ has quit IRC (Ping timeout: 633 seconds)
04:39 🔗 Fletcher that's a good idea actually
04:40 🔗 espes__ note youtube-dl --get-id is stupid and hits the page of each video, so maybe there's another way
04:41 🔗 phuzion has joined #archiveteam
04:41 🔗 aMunster has quit IRC (Remote host closed the connection)
04:41 🔗 dxrt has joined #archiveteam
04:43 🔗 i0npulse yea, when trying to scrap metadata, especially in different parts it can get a bit greedy in terms of how many page requests it makes
04:43 🔗 dxrt- sets mode: +o dxrt
04:44 🔗 snape I think you could do youtube-dl --playlist-items=1-500 https://www.youtube.com/user/gametrailers/videos and run a couple instances in parallel with different chunks of the playlist, but I've not actually tried this, note...
04:46 🔗 espes__ ~22900 videos i could find on embed
04:46 🔗 einstein9 getting https://www.youtube.com/user/gametrailers/videos?view=0&sort=da&live_view=500&flow=grit&spf=navigate returns a JSON
04:46 🔗 lbft has quit IRC (Read error: Operation timed out)
04:46 🔗 i0npulse snape, I noticed that link with the retrospectives, they get a little wierd past 2010, dupes and inconsistancies
04:47 🔗 lbft has joined #archiveteam
04:47 🔗 i0npulse Probobly replicated entries in some corporate CMS
04:50 🔗 vegbrasil has quit IRC (Ping timeout: 633 seconds)
04:51 🔗 Ghost_of_ has joined #archiveteam
04:51 🔗 vegbrasil has joined #archiveteam
04:51 🔗 lukeman the sitemap has 20,000 items. just pulled these out of the xml if it's helpful: https://gist.githubusercontent.com/anonymous/668e46c85483fa8d33d4/raw/f8d7f886eb695dea10399fd951fbbf3ab4159be0/gametrailers-urls.txt
04:52 🔗 sevs44936 youtube-dl -j --flat-playlist https://www.youtube.com/user/gametrailers gives me presumably all id's json encoded
04:53 🔗 espes__ a whole bunch of videos are dead
04:53 🔗 espes__ http://www.gametrailers.com/videos/view/gametrailers-com/113515-Mr-Blonde-Interview-Part-2
04:53 🔗 aMunster has joined #archiveteam
04:53 🔗 einstein9 Using that JSON, you can parse the ['body']['content'] for the links then send them to youtube-dl
04:53 🔗 Coderjoe has joined #archiveteam
04:55 🔗 brayden_ has joined #archiveteam
04:55 🔗 lukeman it looks like you can just brute force http://www.gametrailers.com/videos/view/gametrailers-com/ with ids from 0 through 116780?
04:56 🔗 lukeman sorry. 1 through 116780
04:56 🔗 vitzli 3893 via einstein9's --flatlist
04:57 🔗 einstein9 lukeman: Easier to bruteforce via the embed URLs
04:57 🔗 einstein9 Also the gametrailers-com part is not needed
04:57 🔗 lukeman ah. cool
04:57 🔗 espes__ only got ~15k valid edgecast urls
04:58 🔗 espes__ might be videos ourside the range i checked or not available through embed.
04:58 🔗 brayden has quit IRC (Read error: Operation timed out)
04:58 🔗 einstein9 espes__: Got a document of the invalid URLs?
04:59 🔗 espes__ where valid edgecast urls means non-empty video links
04:59 🔗 vitzli urls for youtube gametrailers channel: https://gist.github.com/vitzli/4c19ae1c9bd0e38e29d2 full url, can provide raw ids too
05:00 🔗 snape ERROR: AFQCcz-R1nY: YouTube said: Please sign in to view this video
05:01 🔗 snape Thanks, gametrailers, thanks for nothing, lol. >.<
05:01 🔗 espes__ age restricted?
05:01 🔗 espes__ the 7k dead embed ids: https://gist.github.com/espes/b0b27bcf4ef8703862dd
05:02 🔗 GLaDOS make sure it isn't growing
05:02 🔗 snape Not age restricted, "private"...
05:04 🔗 lukeman vitzli: does that include the GTReviews account?
05:04 🔗 einstein9 >Sorry about that.
05:04 🔗 lukeman looks like they post reviews here: https://www.youtube.com/user/ClevverGames
05:04 🔗 vitzli lukeman, no, only user/gametrailers/
05:05 🔗 einstein9 Another thing that'd need archiving is http://www.twitch.tv/GameTrailers
05:06 🔗 einstein9 http://www.twitch.tv/GameTrailers/profile for links to the past broadcasts
05:07 🔗 espes__ just youtube-dl http://www.twitch.tv/GameTrailers/profile/past_broadcasts
05:07 🔗 espes__ there's only 2 months of stuff because twitch deletes old stuff
05:07 🔗 einstein9 Right
05:08 🔗 vitzli lukeman, here is the list for GTReviews/CleverGames youtube channel, full urls https://gist.github.com/vitzli/ea95a8da33015b3edba1
05:08 🔗 vitzli 4322 videos on GTReviews/CleverGames
05:09 🔗 lukeman it's looks like videos from Twitch might be on the GTReviews/ClevverGames channel
05:09 🔗 einstein9 Archivebot working on Twitch
05:12 🔗 vitzli Fletcher, are you doing 'gametrailers' channel only?
05:13 🔗 Fletcher to start with yep
05:17 🔗 sevs44936 is someone working on the GTReviews channel? i could point youtube-dl in it's direction, but im somewhat concerned i won't have enough space for all of it...
05:18 🔗 vitzli sevs44936, I'm trying to estimate total size of it
05:19 🔗 yipdw i guess we got the bad end on gametrailers eh
05:19 🔗 sevs44936 between 2 systems i've got like 5tb free, unsure if thats enough
05:19 🔗 vitzli it should be
05:20 🔗 sevs44936 which setting should i use for youtube-dl
05:21 🔗 sevs44936 ?
05:21 🔗 sevs44936 just defaults or anything else?
05:22 🔗 dxrt http://www.archiveteam.org/index.php?title=YouTube is reccomended AFAIK
05:22 🔗 dxrt youtube-dl --title --continue --retries 4 --write-info-json --write-description --write-thumbnail --write-annotations --all-subs --ignore-errors -f bestvideo+bestaudio URL
05:22 🔗 sevs44936 ahh, wonderful
05:22 🔗 Fletcher sevs44936, if you don't use youtube-dl very often check your version before starting
05:23 🔗 sevs44936 Fletcher, i've got 2016.01.15
05:23 🔗 Fletcher perfect :)
05:24 🔗 espes__ bah, only getting 1gbps from edgecast
05:24 🔗 sevs44936 just haven't used it "professionally"
05:25 🔗 kyan has joined #archiveteam
05:31 🔗 Atluxity arkiver: ack
05:31 🔗 Atluxity there she blows
05:32 🔗 espes__ ok, 2gbps
05:32 🔗 espes__ which is the throughput limit of gce ssds -_-
05:33 🔗 einstein9 Using tubeup.py on Twitch
05:33 🔗 vitzli average video size is ~125 MiB on 16 random video sample size; I got one 1.2 GiB video, without it I got ~50 MiB/video
05:34 🔗 vitzli with 125 MiB/video estimated size for GTReviews is 527 GiB
05:34 🔗 einstein9 The only archive for the Twitch broadcasts is the SomeGameNews YT channel
05:35 🔗 GLaDOS im about to order a 6TB so you start server
05:39 🔗 GLaDOS ...actually nvm, too expensive
05:43 🔗 wutno has joined #archiveteam
05:44 🔗 Sk1d has quit IRC (Ping timeout: 200 seconds)
05:47 🔗 einstein9 ~70 videos on the Twitch channel, would estimate a GB each
05:49 🔗 Sk1d has joined #archiveteam
05:56 🔗 WinterFox has joined #archiveteam
06:02 🔗 sevs44936 ok, currently pulling the first 400
06:03 🔗 sevs44936 i'm getting around 25MiB/s combined down
06:05 🔗 einstein9 Wonder what it'd be like if IGN called it quits
06:06 🔗 vitzli around 115 MiB/video on 20 video sample on GTReviews channel
06:06 🔗 megaminxw has joined #archiveteam
06:06 🔗 megaminxw has left
06:07 🔗 megaminxw has joined #archiveteam
06:09 🔗 Fletcher einstein9, I've been slowly working on IGN
06:10 🔗 Fletcher all 100k+ videos :(
06:10 🔗 einstein9 Huh, cool
06:11 🔗 einstein9 I know that gamespot's videos are quite troublesome as some didn't survive the move to the new layout/servers
06:12 🔗 amm has joined #archiveteam
06:13 🔗 wutno has quit IRC (Ping timeout: 252 seconds)
06:14 🔗 wutno has joined #archiveteam
06:14 🔗 i0npulse einstein9: don't say that. lol
06:15 🔗 i0npulse gave me a heart attack
06:15 🔗 i0npulse I read what Fletcher said and was like "OH MY..."
06:15 🔗 i0npulse and then I saw you said "if"
06:15 🔗 einstein9 Hahaha
06:15 🔗 Fletcher :P
06:16 🔗 yipdw hey all, archiveteam tradition is to move project-specific discussion to its own channel
06:16 🔗 yipdw please come up with a suitable pun
06:16 🔗 einstein9 gamefailures
06:16 🔗 yipdw shametrailers
06:16 🔗 i0npulse I think GlaDOS said it way way above
06:16 🔗 i0npulse UnhitchedTrailers
06:17 🔗 GLaDOS #UnhitchedTrailer
06:17 🔗 einstein9 lol
06:17 🔗 i0npulse yah
06:17 🔗 yipdw cool that works
06:17 🔗 i0npulse ok joining
06:17 🔗 xmc sets mode: +o swebb
06:17 🔗 swebb sets mode: +o brayden_
06:17 🔗 swebb sets mode: +o edsu
06:26 🔗 kyan Entrailers, cause it's dying
06:33 🔗 snape kyan, gameyentrails...
06:33 🔗 kyan +1000
06:37 🔗 mismatch has quit IRC (Ping timeout: 250 seconds)
06:56 🔗 espes__ ftr i think ive got all the videos that still worked on the site
06:57 🔗 espes__ only ~14k, 1tb
06:57 🔗 espes__ on gametrailers*
06:58 🔗 wutno has quit IRC (Quit: ZNC - http://znc.in)
07:02 🔗 Frogging1 well
07:02 🔗 Frogging1 looks like I've got a job this summee
07:02 🔗 Frogging1 summer *
07:02 🔗 Frogging1 You know what that means
07:03 🔗 Frogging1 moar hard drives :D
07:03 🔗 kyan espes__, ooh, awesome, congrats!
07:04 🔗 espes__ all their old videos have been gone on the site for a while :(
07:04 🔗 einstein9 Really wish I had a gigabit adaptor, would make archiving that much easier
07:05 🔗 kyan Are the ones you could get going into IA?
07:06 🔗 espes__ yeah maybe
07:06 🔗 espes__ i feel like just doing it again with archivebot if there's a pipeline with free space
07:06 🔗 espes__ this was just incase everything started vanishing right now
07:06 🔗 kyan Cool
07:07 🔗 kyan all it would need is a list of URLs to feed to the bot I think
07:07 🔗 kyan The main site's in the bot, but IDK if it's getting the videos
07:09 🔗 espes__ the videos urls are in a script tag
07:09 🔗 espes__ and off on edgecast
07:11 🔗 kyan I won't be able to do it though, time for bed for me :P
07:11 🔗 espes__ should i select a particular pipeline though?
07:19 🔗 megaminxw has quit IRC (Quit: Leaving.)
07:19 🔗 megaminxw has joined #archiveteam
07:45 🔗 RedType_ has quit IRC (Ping timeout: 252 seconds)
07:53 🔗 RedType has joined #archiveteam
08:02 🔗 tobbez has quit IRC (Ping timeout: 506 seconds)
08:04 🔗 lukeman has quit IRC (Quit: Textual IRC Client: www.textualapp.com)
08:15 🔗 kyan has quit IRC (Leaving)
08:16 🔗 TheKiwi has joined #archiveteam
08:18 🔗 JesseW has quit IRC (Quit: Leaving.)
08:20 🔗 vitzli has quit IRC (Quit: Leaving)
08:30 🔗 Ghost_of_ has quit IRC (Quit: Leaving)
08:34 🔗 atomotic has joined #archiveteam
08:51 🔗 HCross espes__, aim for something with large disk http://archivebot.at.ninjawedding.org:4567/pipelines
08:54 🔗 espes__ HCross: sorted, since apparently it's 5gb chunked
09:18 🔗 Morbus has quit IRC (Ping timeout: 260 seconds)
09:19 🔗 Morbus has joined #archiveteam
09:29 🔗 tobbez has joined #archiveteam
09:33 🔗 trs80 has quit IRC (Ping timeout: 190 seconds)
09:55 🔗 sevs44936 has quit IRC (Ping timeout: 258 seconds)
11:36 🔗 WinterFox has quit IRC (Remote host closed the connection)
11:38 🔗 oldcad has joined #archiveteam
11:40 🔗 oldcad gametrailers will shut down :(
11:42 🔗 TheKiwi yup
11:42 🔗 TheKiwi rip
11:50 🔗 atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com)
12:00 🔗 oldcad has quit IRC (Ping timeout: 250 seconds)
12:07 🔗 tobbez has quit IRC (Ping timeout: 506 seconds)
12:09 🔗 oldcad has joined #archiveteam
12:15 🔗 midas didnt we grab parts of that already?
12:23 🔗 arkiver Looks like fotolog.com isn't fast enough to get everything before the deadline
12:23 🔗 arkiver I'll ignore URLs later today so we can get more then we do now
12:24 🔗 arkiver I think fotolog only announced their shutdown 20 days before the shutdown
12:24 🔗 arkiver Way to short to get everything :/
12:24 🔗 arkiver Are the videos of gametrailers saved?
12:24 🔗 arkiver I'm not sure how many videos they have, but writing the scripts for the grab wouldn't be too hard
12:25 🔗 arkiver I can probably get that up later today if we need it
12:26 🔗 Atluxity arkiver: want me to add more concurretns to fotolog?
12:27 🔗 arkiver You can do that, but the it looks like the isn't becoming faster if that happens
12:27 🔗 arkiver or well, releasing more URLs
12:27 🔗 Atluxity right, then its pointsless
12:27 🔗 arkiver take a look at the graph of fotolog http://tracker.archiveteam.org/fotolog/
12:27 🔗 arkiver You see when you came in, the red line, BnAboyZ slowed down
12:30 🔗 Atluxity ah THATS why that is
12:30 🔗 Atluxity (the color is different for different people, but I understand)
12:30 🔗 Atluxity did BnAboyZ just discover the stand-alone pipeline?
12:31 🔗 midas arkiver: fotolog needs more boxes or are we killing them?
12:31 🔗 midas oh nevermind
12:31 🔗 Atluxity midas: we are not the bottleneck :P
12:31 🔗 midas i noiced
12:32 🔗 arkiver fotolog is for some reason sending infinite 301s when it feels like it
12:32 🔗 arkiver Can't make the fix for that right now, but will later today
12:32 🔗 arkiver and then some items will be requeued
12:32 🔗 HCross2 Probably not liking the loaf
12:32 🔗 HCross2 Load
12:33 🔗 arkiver yeaH
12:33 🔗 midas nice, im connected via dailup atm so on slownet, will lag
12:33 🔗 Atluxity dialup? does that still exist?
12:34 🔗 midas yes
12:34 🔗 Atluxity wow
12:36 🔗 HCross2 I'm not that fast either http://www.speedtest.net/my-result/a/1729887563
12:41 🔗 midas i wont click that because it will take ages to load ;-)
12:41 🔗 midas also -bs
12:43 🔗 atomotic has joined #archiveteam
13:01 🔗 weles has joined #archiveteam
13:09 🔗 megaminxw has quit IRC (Quit: Leaving.)
13:15 🔗 oldcad has quit IRC (Ping timeout: 250 seconds)
13:19 🔗 tobbez has joined #archiveteam
13:26 🔗 RichardG has quit IRC (Ping timeout: 499 seconds)
13:27 🔗 oldcad has joined #archiveteam
13:35 🔗 vitzli has joined #archiveteam
13:58 🔗 trs80 has joined #archiveteam
14:11 🔗 sevs44936 has joined #archiveteam
14:12 🔗 oldcad has quit IRC (Ping timeout: 250 seconds)
14:32 🔗 atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com)
14:41 🔗 Ungstein1 has quit IRC (Quit: Leaving.)
14:57 🔗 Start has quit IRC (Quit: Disconnected.)
15:34 🔗 atomotic has joined #archiveteam
15:45 🔗 Start has joined #archiveteam
15:49 🔗 RichardG has joined #archiveteam
15:59 🔗 Start has quit IRC (Ping timeout: 360 seconds)
16:00 🔗 Digressin has joined #archiveteam
16:00 🔗 Zei-Pii has joined #archiveteam
16:00 🔗 Start has joined #archiveteam
16:00 🔗 Digressin has quit IRC (Client Quit)
16:00 🔗 Digressin has joined #archiveteam
16:01 🔗 Digressin has quit IRC (Client Quit)
16:01 🔗 DigresNSQ has joined #archiveteam
16:04 🔗 vitzli has quit IRC (Quit: Leaving)
16:09 🔗 DigresNSQ I don't have the warrior installed on this computer, yet. How much of GameTrailers were we able to save?
16:19 🔗 DigresNSQ has quit IRC (Quit: Page closed)
16:34 🔗 scyther has joined #archiveteam
17:04 🔗 JesseW has joined #archiveteam
17:07 🔗 Start has quit IRC (Quit: Disconnected.)
17:22 🔗 JesseW has quit IRC (Quit: Leaving.)
17:27 🔗 Tomcat_ has joined #archiveteam
17:33 🔗 godane has quit IRC (Quit: Leaving.)
17:34 🔗 godane has joined #archiveteam
17:48 🔗 atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com)
17:53 🔗 philpem has joined #archiveteam
17:55 🔗 Stilett0 is now known as Stiletto
18:00 🔗 lunG has joined #archiveteam
18:01 🔗 arkiver Do we need a small project for gametrailers?
18:02 🔗 arkiver If we do for the videos I now have the time to create one
18:12 🔗 arkiver SketchCow: due to the slow speed and their late announcement of shutting down we won't get friendsreuited and fotolog fully saved
18:15 🔗 arkiver I'm creating a small project for gametrailer to save the trailers (videos only)
18:16 🔗 arkiver chfoo: SketchCow: can you please create a rsync target on FOS for it?
18:16 🔗 arkiver It'll be quick project I think
18:18 🔗 snape arkiver, I think most of the videos have been saved, either by individuals or archivebot...?
18:18 🔗 arkiver Also as WARCs?
18:19 🔗 sevs44936 I'm currently downloading the GTReviews yt channel
18:19 🔗 arkiver ok!
18:19 🔗 snape Not sure, TBH.
18:19 🔗 arkiver I'll make sure videos are saved as WARCs
18:19 🔗 midas if archivebot grabbed it we should be OK on the warc side
18:19 🔗 sevs44936 just youtube-dl
18:20 🔗 chfoo arkiver: ok, rsync directory for gametrailer made
18:20 🔗 arkiver thanks!
18:22 🔗 snape The announcement of their closing made it sound like we had maybe just a handful of hours left before the site went down, so people were grabbing what they could, how they could.
18:23 🔗 arkiver yeah
18:23 🔗 arkiver let's do one good full grab of the videos
18:24 🔗 snape You going after the youtube videos or the ones on the site?
18:24 🔗 arkiver the site
18:25 🔗 SketchCow OK.
18:25 🔗 arkiver I really really wish fotolog and friendsreunited would have announced earlier
18:25 🔗 arkiver we
18:26 🔗 arkiver we're now losing years and years of history
18:26 🔗 SketchCow Friendsunited doesn't like we're doing this, at all.
18:26 🔗 SketchCow There's a local group claiming to do the work
18:26 🔗 SketchCow But we're not trusting them
18:26 🔗 arkiver group of insiders doing the work?
18:26 🔗 SketchCow They connected with a library
18:27 🔗 SketchCow But the library knows me, they shouldn't be surprised.
18:27 🔗 arkiver Probably some insiders then
18:27 🔗 Tomcat_ has quit IRC (Read error: Operation timed out)
18:27 🔗 SketchCow There's a gametrailer slot for you
18:27 🔗 arkiver I've no idea how else they can save everything
18:27 🔗 arkiver yes
18:27 🔗 HCross Is it one of the UK's many local libraries that dont seem to communicate
18:28 🔗 SimpBrai1 there's an active uk library? didnt know there was one
18:28 🔗 HCross Many. But they dont publicse
18:28 🔗 HCross or talk
18:29 🔗 HCross SketchCow, you talking to friends reunited then?
18:32 🔗 ndiddy has joined #archiveteam
18:35 🔗 SketchCow Not anymore
18:35 🔗 SketchCow They told me to fuck off
18:35 🔗 SketchCow And stop
18:35 🔗 SimpBrai1 wow
18:35 🔗 arkiver haha
18:36 🔗 SimpBrai1 fine, we'll just run the site hot till it goes down :P
18:37 🔗 HCross I was going to say, I dont live that far from their office. Happy to go round and take hard disks of data
18:38 🔗 HCross But I think Ill get a punch in the face, rather than disks of data
18:38 🔗 SimpBrai1 or the police
18:39 🔗 HCross Me: "Ive come to archive your shit" Them: "Arent you the reason our site is so slow" Me: "Learn to shutdown prop...." BANG
18:44 🔗 Start has joined #archiveteam
18:44 🔗 arkiver yes
18:44 🔗 snape "Hullo, I'm the county collection agent for the Portable Antiquities Scheme, I'm here to collect your computers under the auspices of the Historical Records Preservation Act (Digital) 2011..."
18:44 🔗 SketchCow Wait, you're in New Zealand?
18:45 🔗 HCross Nope, Friends Reunited are UK based
18:45 🔗 SketchCow Oh!
18:46 🔗 arkiver We're going to check all 3 million video IDs rom gametrailers
18:46 🔗 arkiver most olders IDs will probably be dead
18:47 🔗 einstein9 <SketchCow> Wait, you're in New Zealand?
18:47 🔗 einstein9 I am tho :v
18:48 🔗 snape Dead IDs, per someone last night, arkiver: https://gist.github.com/espes/b0b27bcf4ef8703862dd
18:48 🔗 SketchCow einstein9: Do you know Rob Isaac?
18:49 🔗 einstein9 Nope
18:49 🔗 einstein9 Opposite end of the country
18:49 🔗 SketchCow Bummer
18:49 🔗 SketchCow Well, not a bummer, you're still in the beautiful island
18:51 🔗 godane i'm grabbing more of david bowie bootlegs
18:52 🔗 godane from 1990
18:52 🔗 einstein9 Oh yeah, if anyone wants to archive these for the IA: http://paperspast.natlib.govt.nz/cgi-bin/paperspast
18:57 🔗 godane einstein9: i will look at it
18:58 🔗 einstein9 Thanks
18:59 🔗 einstein9 The per-page png are probably better quality than the full pdf of the issues
18:59 🔗 Arendo has joined #archiveteam
19:00 🔗 einstein9 The full pdfs have a ToC tho
19:01 🔗 Arendo has quit IRC (Client Quit)
19:02 🔗 reinhard has joined #archiveteam
19:02 🔗 reinhard hey guys, just want to ask, is there something wrong with fgts.jp? cause I'm unable to connect to their image database....
19:03 🔗 xmc why would we know?
19:04 🔗 reinhard oh well, perhaps it's not a global problem and it's just my connection issues
19:04 🔗 reinhard just curious dude
19:04 🔗 xmc no like
19:04 🔗 xmc i'm curious what makes you think "fgts.jp ... archiveteam will know!"
19:05 🔗 snape http://www.downforeveryoneorjustme.com/fgts.jp
19:07 🔗 joepie91 to be fair
19:07 🔗 joepie91 here is usually a good place to ask whether something is dead
19:07 🔗 joepie91 :P
19:11 🔗 reinhard cause when I look at 4chan archive sites on your page, this IRC channel was listed on fgts.jp section, and I automatically assume perhaps this channel knows something about it
19:11 🔗 reinhard not so hard to figure that
19:15 🔗 Start has quit IRC (Quit: Disconnected.)
19:17 🔗 reinhard has left
19:18 🔗 schbirid is there a gametrailers channel? help needed?
19:18 🔗 HCross #gamefailers ?
19:19 🔗 snape #UnhitchedTrailer
19:28 🔗 SketchCow http://www.embulk.org/docs/ dropping that there for people to look at
19:28 🔗 SketchCow xmc: Archive Team has eyes everywhere
19:28 🔗 xmc including all over 4chan i suppose
19:28 🔗 SketchCow #gamefailers wins
19:29 🔗 Nertsy has quit IRC (Quit: Nertsy)
19:29 🔗 schbirid #UnhitchedTrailer already exists and is full of people, i simply did not now
19:30 🔗 atomotic has joined #archiveteam
19:32 🔗 Nertsy has joined #archiveteam
19:41 🔗 Tomcat_ has joined #archiveteam
19:50 🔗 Start has joined #archiveteam
19:58 🔗 megaminxw has joined #archiveteam
19:59 🔗 Tomcat__ has joined #archiveteam
20:00 🔗 Tomcat_ has quit IRC (Read error: Operation timed out)
20:05 🔗 Frogging1 Transferring data across oceans is relatively slow
20:05 🔗 Frogging1 is now known as Frogging
20:05 🔗 xmc i blame the bandwidth-delay product and tcp
20:06 🔗 Frogging Bandwidth delay product?
20:06 🔗 Frogging Is that relating latency to effective throughput? (just a guess, googling on a tablet is cumbersome)
20:06 🔗 arkiver also, fgts is working for me
20:06 🔗 xmc yeah
20:07 🔗 Frogging Guess you could try transfers with UDP and then verifying the data after :p
20:07 🔗 Zei-Pii has quit IRC (Ping timeout: 250 seconds)
20:07 🔗 xmc Frogging: it's the amount of data that is in transit at any given time
20:07 🔗 Frogging like, send everything, check it, and then send the bits that got lost
20:07 🔗 HCross -bs
20:07 🔗 xmc ^
20:07 🔗 Frogging So you don't have to fuss with ACKS during the transfer
20:08 🔗 arkiver we won't use udp
20:08 🔗 xmc woop woop woop off-topic siren
20:08 🔗 Frogging I wonder how much faster that would be
20:08 🔗 xmc take it to #archiveteam-bs
20:08 🔗 Frogging Lol sorry :p
20:08 🔗 Frogging forgot about that channel
20:33 🔗 espes__ arkiver: im pretty sure there's only the 14k videos i found still available on the site
20:33 🔗 espes__ i got a copy and now archivebot is going through them
20:33 🔗 espes__ since its only 1tb
20:34 🔗 espes__ everything else i think was deleted in september
20:35 🔗 espes__ Frogging: https://github.com/LabAdvComp/UDR
20:37 🔗 Frogging yah I found that, I posted in #archiveteam-bs about it
20:37 🔗 Frogging To avoid derailing this channel :p
20:43 🔗 Start has quit IRC (Quit: Disconnected.)
20:54 🔗 arkiver espes__: how did you get a copy?
20:54 🔗 megaminxw has quit IRC (Quit: Leaving.)
20:55 🔗 bauruine has quit IRC (Quit: ZNC - http://znc.in)
20:55 🔗 arkiver in what format
20:58 🔗 will For archiving one of my own sites, is something like httrack a good option? Its running an outdated Joomla install so I don't really want to keep it online as-is anymore
20:58 🔗 espes__ arkiver: scraping embed.gametrailers.com/embed/{2942000..3008493}
20:58 🔗 HCross archivebot w
20:58 🔗 HCross will,
20:59 🔗 arkiver espes__: did you grab them in WARCs? what format of the video did you grab? and did you also grab the segmented videos?
20:59 🔗 espes__ arkiver: and just the files, im assuming archivebot can handle the videos
20:59 🔗 espes__ and no
20:59 🔗 arkiver ok
21:00 🔗 espes__ well the segmented videos are all available as files
21:00 🔗 espes__ i think
21:00 🔗 arkiver they are needed for playback of the video
21:00 🔗 arkiver so future wayback machine can handle the videos
21:01 🔗 arkiver do you have a list of IDs for me that you found working?
21:01 🔗 arkiver like some early videos? I'd like to test if those work the same as the recent videos
21:01 🔗 bauruine has joined #archiveteam
21:01 🔗 espes__ there'll all recent, they were moved to edgecast in september
21:02 🔗 will HCross: Is it possible for me to get the HTML back from that? Thanks
21:03 🔗 weles has quit IRC (Read error: Operation timed out)
21:03 🔗 HCross yes, just view the source and ignore the header
21:03 🔗 HCross view-source:https://web.archive.org/web/20000229183439/http:/www.cl.cam.ac.uk/coffee/coffee.html look at line 197
21:03 🔗 will Okay thanks
21:04 🔗 espes__ arkiver: https://docs.google.com/uc?id=0B4uDFAm87U0jR3N2Q2VmVEpURE0&export=download
21:11 🔗 Tomcat__ has quit IRC (Remote host closed the connection)
21:16 🔗 atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com)
21:18 🔗 amm has quit IRC (Quit: Leaving)
21:34 🔗 bauruine has quit IRC (Ping timeout: 260 seconds)
21:36 🔗 kyan has joined #archiveteam
21:39 🔗 bauruine has joined #archiveteam
22:03 🔗 kyan has quit IRC (This computer has gone to sleep)
22:07 🔗 kyan has joined #archiveteam
22:20 🔗 scyther has quit IRC (Read error: Connection reset by peer)
23:20 🔗 Mayonaise has quit IRC (Read error: Operation timed out)
23:29 🔗 BnA-Robin has joined #archiveteam
23:45 🔗 snape https://cve.fbi.gov - let's explain violent extremism to kids in 2016 with pictures of CRT monitors and floppy disks, yeah!
23:45 🔗 snape Dammit, wrong channel. >.<
23:50 🔗 Start has joined #archiveteam
23:51 🔗 ndiddy lol
23:55 🔗 gter has joined #archiveteam
23:55 🔗 gter is the Gametrailers archive project ongoing?
23:56 🔗 snape Think so; check #UnhitchedTrailer, AFAIK
23:59 🔗 ndiddy woah, gametrailers is closing?
23:59 🔗 gter Yeah

irclogger-viewer