[05:12] Hi, I am trapped in a hotel room with a chatty young man. [05:21] hey SketchCow [05:23] the NEC drivers grab is going well [06:10] Excellent [09:16] A question here [09:16] Is the IP address saved in the .ASPXAUTH cookie? [18:59] http://techcrunch.com/2014/09/26/yahoo-to-shut-down-qwiki-yahoo-education-and-the-yahoo-directory/ [19:01] time to break out the sirens and those spinny red light things... [19:02] I sometimes wonder why yahoo creates or buys anything since they shut stuff down real soon anyway [19:03] the directory seems like pretty *big* news, given its historic importance to the company [19:03] I bet it's announced in some FAQ somewhere ;-) [19:05] dir.yahoo.com is down already, unless that's just me. :) [19:05] primus: they are probably what the industry calls acquhires. [19:06] oh, qwiki is not a wiki; /me relieved [19:09] Why would Yahoo aquire something and then shut it down just a year later [19:09] Do they have an infinite pool of money or something [19:09] I wonder why directory is down. [19:09] Maybe the internet suddenly piled on for a massive nostalgia rush when they heard the news. [19:09] Influx in traffic? [19:10] Perhaps they are moving it to new servers to withstand the arrival of ArchiveTeam. [19:10] loads fine for me. [19:10] hm. [19:12] They are probably acquhires to get talent or they are hoping users transition from one service to another. [19:14] Yahoo directory doesn't work here. [19:16] http://education.yahoo.net/ may be a good candidate for the archivebot [19:20] So something needs a project? [19:20] I'm free all day tomorrow and create and start a project if needed [19:22] so [19:22] -- Qwiki [19:22] There are at least three public urls on the front page to shared videos with qwiki. [19:22] Are all the other qwiki items also available on the website? [19:23] Seems like it, qwiki gives a lot of results when searching on google [19:27] Other sites are unavaible right now [19:29] So I think education.yahoo.net and dir.yahoo.com can be done by archivebot [19:29] For qwiki we might need a project and some clever way to get all the urls [19:29] I haven't taken a good look at it, but we might be able to do a discovery crawl for qwiki first [19:32] * arkiver is away for a few hours, will see about creating a project when I'm back [20:22] I think I figured out why education.yahoo.net is closing, half the links go to www.university.net and I bet they are probably not renewing a contract with yahoo for referrals. [20:22] err, university.com [21:12] will take a good look at qwiki now [21:15] could you stop quizilla aborting items for 500s first? :) [21:26] so I think I'll be doing a discovery for qwiki to find out what urls exist and what not [21:26] then will do the grab and download the urls that exist [21:27] I just hope qwiki has the same capacity as the other yahoo sites [21:37] Qwiki might be a storage issue if there are lots of videos, though... [21:37] they reckoned 250,000 users not long after launch [21:38] yeah, we'll do the the discovery first [21:38] working on it right now [21:38] we will try to go through all the urls and test for working urls, we won't save the urls yet [21:41] 218340105584896 possible video IDs, as far as I can see... [21:42] Google finds about 5,250 links [21:45] one extract off of the qwiki blog [21:45] $ curl -X HEAD -v http://d2japcd9yzs5kz.cloudfront.net/joshh/media/videos/9a0e62110e4beca36877002b885e90a2_640x360.webm 2>&1 | grep 'Content-Length' [21:45] < Content-Length: 4483371 [21:46] 4 MB video is not that bad [21:46] also if this is cloudfront it's likely it's feeding out of an S3 bucket somewhere [21:46] they'd be stupid to do otherwise [21:46] The videos don't seem long - I havent' seen one longer than 75 seconds so far [21:47] so you probably have all the bandwidth in the world [21:48] also is there some silicon valley committee for stupid-ass names [21:50] yeah, so we can go all-in on speed for grab and discovery [21:50] one request [21:50] arkiver: promise me you're not going to try to brute-force 218 trillion possible IDs? :) [21:50] please minimize copypasta from other projects, it makes it really hard to understand the grab script [21:52] yipdw: is this understandable? https://github.com/Arkiver2/qwiki-discovery [21:52] Google won't help much, a search for "site:qwiki.com" returns "About 6,090 results" [21:52] arkiver: fine for now [21:53] maybe that's all there is? :) [21:53] I guess I'm asking for the code to be there because it's needed [21:53] not because it was in some other project [21:53] yipdw: I understand, will keep think about it when creating scripts [21:53] boo [21:53] http://www.qwiki.com/sitemap.xml [21:54] also #quickie [21:54] yeah, checked it all already [21:54] if nobody else has an idea [21:54] Google site:qwiki.com inurl:/v/ = 5250 results [21:54] I'm a newbie at archiveteam, is emailing someone at Yahoo asking for an index of the available URLs a stupid idea? [21:54] is fine [21:54] site:youtube.com inurl:/watch = 329 million results [21:54] dserodio: it's worth a shot, but they've never responded before [21:54] antomatic: what are you trying to make clear [21:54] we are going to do the full discovery [21:55] My suggestion (for what it's worth) is to start with what you can easily find and know to exist [21:55] antomatic: that's almost nothing, there are thousands more [21:55] we need to do the discovery. [21:55] You can't bruteforce 218 trillion URLs. Not even you can do that. [21:56] 218 trillion is a lot. [21:56] you're right about that probably :/ [21:56] * Kazzy checks the math [21:56] yep, 218 trillion is a big number [21:56] but we'll see once it's running [21:56] if it's going fast and we have a lot of machines, why not? [21:57] but going to work on grab script now [21:57] nobody has enough bandwidth for that [21:57] Yahoo won't put up with it, I'm pretty sure of that. They've felt the hand of AT before. [21:57] there are a lot of problems and concerns here [21:58] Our main priority is to start with the discovery for qwiki [21:58] Don't let bruteforcing put you off archiving what you _know_ is there to be archived. [21:58] Priorities, that's all. [21:58] If Google lists 5,250 known videos, that's a great place to start. [21:58] I sent an email to customersupport@yahoo, maybe we'll get lucky [21:58] antomatic: There will be two at the same time [21:58] phase 1 and phase 2 [21:59] I'll constantly add new items we know exist to the grab [21:59] If it's on Google it's likely to be linked from other websites too, so those are pieces of the jigsaw that will be well-appreciated in the wayback machine. [21:59] while the discovery is going [21:59] * antomatic nods [21:59] good. [21:59] hahahahahaha customercare@yahoo.com bounces [22:00] "This user doesn't have a yahoo.com account" - I bet it doesn't [22:00] they've listed this address in the shutdown page themselves... [22:01] I know someone who used to work as a sysadmin at Yahoo, let's see if he can get us something [22:03] dserodio: please try! it would be awesome if we can get some help from inside yahoo [22:27] ---------------------------------------------------- [22:27] -- Join #quickie [22:27] -- Qwiki will be shutdown the 1st of November [22:27] -- Yahoo! just killed Qwiki! http://www.qwiki.com/ [22:27] ----------------------------------------------------