#archiveteam 2014-06-14,Sat

↑back Search

Time Nickname Message
03:03 πŸ”— redhook Hello Archive Team. The website GameTrailers was sold by Viacom to a different media conglomerate today, Defy. GameTrailers has spent 12 years making superb video game review videos, as well as original video game-related shows. Several staff have been laid off already, and I fear their years of content may be in jeopardy to make way for a Γ’Β€ΒœrebootҀ or somesuch. No official announcement to that effect yet, but I wanted to bring
03:03 πŸ”— redhook this site to your attention.
03:38 πŸ”— garyrh hmm, it looks like gametrailers.com's videos are downloadable, even w/o being logged in
03:40 πŸ”— garyrh ...and youtube-dl can get the videos as well, so yeah.
03:43 πŸ”— redhook Yeah, most have direct download links. Not 100% though.
03:54 πŸ”— trs80 how many videos are there?
04:05 πŸ”— SN4T14 trs80, tons, it's a big site
04:14 πŸ”— redhook It looks like thereҀ™s 1,704 reviews, which I think are the most important. TheyҀ™ve also produced some great retrospectives, covering the history of franchises like Zelda, Grand Theft Auto, etc. ThereҀ™s talk shows too, which are less important. If you just go to their videos page (http://www.gametrailers.com/videos-trailers) and do the math (20 videos/page * 3520 pages) itҀ™s about 70,000 gameplay, interview, trailer, etc.
04:14 πŸ”— redhook videos, which would be nice to have but not crucial.
04:17 πŸ”— garyrh for user videos, it looks to be ~263,180 videos
04:20 πŸ”— garyrh looks like youtube-dl can get videos w/o a download button via rtmpdump
05:59 πŸ”— Nemo_bis the Ancestry.com agreement, by which Ancestry digitized records of genealogical interest to make available behind their subscription service (which is free to use at NARA facilities) and then transmitted the digital copies to NARA to put in the catalog after 5 or 10 years.
07:41 πŸ”— * db48x laughs at http://archiveteam.org/images/1/1b/Archiveteam_warrior_infrastructure.png
07:42 πŸ”— db48x chfoo gets a +1 for that
07:56 πŸ”— Nemo_bis Yes, it's pretty :)
08:01 πŸ”— godane so i got about 30mins of video about ritalin
08:01 πŸ”— godane from 2001
08:40 πŸ”— godane SketchCow: i really need some sort of back access to IA
08:41 πŸ”— godane nbc blocked modules folder i think cause of drupal
08:42 πŸ”— godane if i can get access to everything here i may have a better change to get all nbc news clips: http://msnbc.com/modules/
08:45 πŸ”— godane fun fact robots.txt doesn't exist every in 2007 for msnbc.com: https://web.archive.org/web/20070326005247/http://www.msnbc.com/robots.txt
08:46 πŸ”— godane 2011 not blocked: https://web.archive.org/web/20110625001406/http://msnbc.com/robots.txt
09:54 πŸ”— schbirid earbits news: we did not manage to get a file list off the half-open earbits s3 bucket with the music, but will grab the assets (images etc) off another one where we did.
09:55 πŸ”— schbirid if someone wants a real challenge, reverse engineer how their stream IDs are constructed. see http://archiveteam.org/index.php?title=Earbits
09:55 πŸ”— schbirid i think it is done in client side javascript, so it should be doable in a way
10:21 πŸ”— schbirid if you want to help with downloading images etc, come to #earbite
12:00 πŸ”— danneh_ So just to letchas know, I'm grabbing a bunch more from here: http://h18000.www1.hp.com/cpq-products/quickspecs/productbulletin.html
12:00 πŸ”— danneh_ looking into how their URLs and resources are addressed, fairly easy to get lists of every single product ID on there
12:01 πŸ”— danneh_ so I'll just go through and make some lists and set stuff to download, got HTML files, images, PDF files, all that sorta stuff should be alright to save
12:01 πŸ”— danneh_ will letchas know
12:46 πŸ”— danneh_ Grabbing the item JSON files now, after that I should be able to parse through those, extract all the PDF/jpg/html/etc links from that and set those to all download
12:47 πŸ”— danneh_ About 14k items to go through, so it might take a little bit to grab, but it should be alright
12:48 πŸ”— danneh_ Easier than trying to do it manually, just got script generating all the links to grab at each step
13:28 πŸ”— dashcloud my angelfire.com grab is continuing along slowly
13:35 πŸ”— Nemo_bis aww memories
13:47 πŸ”— dashcloud tripod's still around if you want to try grabbing that
15:18 πŸ”— joepie91 dashcloud: for relative values of "around"
15:23 πŸ”— Nemo_bis "around the graveyard"
15:24 πŸ”— joepie91 the Dutch Tripod is obliterated as far as I can tell
16:45 πŸ”— dashcloud if I provided a list of URLs to wget in a file, can I append that file with new URLs and have wget pick them up, or does wget just read the file once at startup?
17:12 πŸ”— schbirid i am very sure it reads it just once :(
17:15 πŸ”— SN4T14 He's gone. >.>
17:29 πŸ”— schbirid we now have about 50000 mp3s to download from earbits, join #earbite if you want to help
17:48 πŸ”— schbirid can you make aria2c download files with their server datetime like wget does?
17:49 πŸ”— schbirid --remote-time=true
21:17 πŸ”— schbirid http://www.ikeahackers.net/2014/06/big-changes-coming-to-ikeahackers.html
21:36 πŸ”— danneh_ Alright, and downloading about 46k pdf/json/jpg files, should be done in about 12 hours hopefully
21:37 πŸ”— danneh_ And that should be pretty well 100% of the stuff on that HP website, from what I've seen
21:37 πŸ”— danneh_ As much as can be accessed through that interfcae, at least
22:58 πŸ”— db48x only one more justintv item left

irclogger-viewer