#archiveteam 2014-09-26,Fri

↑back Search

Time Nickname Message
05:12 🔗 SketchCow Hi, I am trapped in a hotel room with a chatty young man.
05:21 🔗 godane hey SketchCow
05:23 🔗 godane the NEC drivers grab is going well
06:10 🔗 SketchCow Excellent
09:16 🔗 arkiver A question here
09:16 🔗 arkiver Is the IP address saved in the .ASPXAUTH cookie?
18:59 🔗 primus http://techcrunch.com/2014/09/26/yahoo-to-shut-down-qwiki-yahoo-education-and-the-yahoo-directory/
19:01 🔗 antomatic time to break out the sirens and those spinny red light things...
19:02 🔗 primus I sometimes wonder why yahoo creates or buys anything since they shut stuff down real soon anyway
19:03 🔗 antomatic the directory seems like pretty *big* news, given its historic importance to the company
19:03 🔗 primus I bet it's announced in some FAQ somewhere ;-)
19:05 🔗 antomatic dir.yahoo.com is down already, unless that's just me. :)
19:05 🔗 aaaaaaaaa primus: they are probably what the industry calls acquhires.
19:06 🔗 Nemo_bis oh, qwiki is not a wiki; /me relieved
19:09 🔗 will__ Why would Yahoo aquire something and then shut it down just a year later
19:09 🔗 will__ Do they have an infinite pool of money or something
19:09 🔗 antomatic I wonder why directory is down.
19:09 🔗 antomatic Maybe the internet suddenly piled on for a massive nostalgia rush when they heard the news.
19:09 🔗 will__ Influx in traffic?
19:10 🔗 antomatic Perhaps they are moving it to new servers to withstand the arrival of ArchiveTeam.
19:10 🔗 garyrh loads fine for me.
19:10 🔗 antomatic hm.
19:12 🔗 aaaaaaaaa They are probably acquhires to get talent or they are hoping users transition from one service to another.
19:14 🔗 aaaaaaaaa Yahoo directory doesn't work here.
19:16 🔗 aaaaaaaaa http://education.yahoo.net/ may be a good candidate for the archivebot
19:20 🔗 arkiver So something needs a project?
19:20 🔗 arkiver I'm free all day tomorrow and create and start a project if needed
19:22 🔗 arkiver so
19:22 🔗 arkiver -- Qwiki
19:22 🔗 arkiver There are at least three public urls on the front page to shared videos with qwiki.
19:22 🔗 arkiver Are all the other qwiki items also available on the website?
19:23 🔗 arkiver Seems like it, qwiki gives a lot of results when searching on google
19:27 🔗 arkiver Other sites are unavaible right now
19:29 🔗 arkiver So I think education.yahoo.net and dir.yahoo.com can be done by archivebot
19:29 🔗 arkiver For qwiki we might need a project and some clever way to get all the urls
19:29 🔗 arkiver I haven't taken a good look at it, but we might be able to do a discovery crawl for qwiki first
19:32 🔗 * arkiver is away for a few hours, will see about creating a project when I'm back
20:22 🔗 aaaaaaaaa I think I figured out why education.yahoo.net is closing, half the links go to www.university.net and I bet they are probably not renewing a contract with yahoo for referrals.
20:22 🔗 aaaaaaaaa err, university.com
21:12 🔗 arkiver will take a good look at qwiki now
21:15 🔗 antomatic could you stop quizilla aborting items for 500s first? :)
21:26 🔗 arkiver so I think I'll be doing a discovery for qwiki to find out what urls exist and what not
21:26 🔗 arkiver then will do the grab and download the urls that exist
21:27 🔗 arkiver I just hope qwiki has the same capacity as the other yahoo sites
21:37 🔗 antomatic Qwiki might be a storage issue if there are lots of videos, though...
21:37 🔗 antomatic they reckoned 250,000 users not long after launch
21:38 🔗 arkiver yeah, we'll do the the discovery first
21:38 🔗 arkiver working on it right now
21:38 🔗 arkiver we will try to go through all the urls and test for working urls, we won't save the urls yet
21:41 🔗 antomatic 218340105584896 possible video IDs, as far as I can see...
21:42 🔗 antomatic Google finds about 5,250 links
21:45 🔗 yipdw one extract off of the qwiki blog
21:45 🔗 yipdw $ curl -X HEAD -v http://d2japcd9yzs5kz.cloudfront.net/joshh/media/videos/9a0e62110e4beca36877002b885e90a2_640x360.webm 2>&1 | grep 'Content-Length'
21:45 🔗 yipdw < Content-Length: 4483371
21:46 🔗 yipdw 4 MB video is not that bad
21:46 🔗 yipdw also if this is cloudfront it's likely it's feeding out of an S3 bucket somewhere
21:46 🔗 yipdw they'd be stupid to do otherwise
21:46 🔗 antomatic The videos don't seem long - I havent' seen one longer than 75 seconds so far
21:47 🔗 yipdw so you probably have all the bandwidth in the world
21:48 🔗 yipdw also is there some silicon valley committee for stupid-ass names
21:50 🔗 arkiver yeah, so we can go all-in on speed for grab and discovery
21:50 🔗 yipdw one request
21:50 🔗 antomatic arkiver: promise me you're not going to try to brute-force 218 trillion possible IDs? :)
21:50 🔗 yipdw please minimize copypasta from other projects, it makes it really hard to understand the grab script
21:52 🔗 arkiver yipdw: is this understandable? https://github.com/Arkiver2/qwiki-discovery
21:52 🔗 dserodio Google won't help much, a search for "site:qwiki.com" returns "About 6,090 results"
21:52 🔗 yipdw arkiver: fine for now
21:53 🔗 antomatic maybe that's all there is? :)
21:53 🔗 yipdw I guess I'm asking for the code to be there because it's needed
21:53 🔗 yipdw not because it was in some other project
21:53 🔗 arkiver yipdw: I understand, will keep think about it when creating scripts
21:53 🔗 yipdw boo
21:53 🔗 yipdw http://www.qwiki.com/sitemap.xml
21:54 🔗 yipdw also #quickie
21:54 🔗 arkiver yeah, checked it all already
21:54 🔗 yipdw if nobody else has an idea
21:54 🔗 antomatic Google site:qwiki.com inurl:/v/ = 5250 results
21:54 🔗 dserodio I'm a newbie at archiveteam, is emailing someone at Yahoo asking for an index of the available URLs a stupid idea?
21:54 🔗 arkiver is fine
21:54 🔗 antomatic site:youtube.com inurl:/watch = 329 million results
21:54 🔗 yipdw dserodio: it's worth a shot, but they've never responded before
21:54 🔗 arkiver antomatic: what are you trying to make clear
21:54 🔗 arkiver we are going to do the full discovery
21:55 🔗 antomatic My suggestion (for what it's worth) is to start with what you can easily find and know to exist
21:55 🔗 arkiver antomatic: that's almost nothing, there are thousands more
21:55 🔗 arkiver we need to do the discovery.
21:55 🔗 antomatic You can't bruteforce 218 trillion URLs. Not even you can do that.
21:56 🔗 antomatic 218 trillion is a lot.
21:56 🔗 arkiver you're right about that probably :/
21:56 🔗 * Kazzy checks the math
21:56 🔗 Kazzy yep, 218 trillion is a big number
21:56 🔗 arkiver but we'll see once it's running
21:56 🔗 arkiver if it's going fast and we have a lot of machines, why not?
21:57 🔗 arkiver but going to work on grab script now
21:57 🔗 xmc nobody has enough bandwidth for that
21:57 🔗 antomatic Yahoo won't put up with it, I'm pretty sure of that. They've felt the hand of AT before.
21:57 🔗 arkiver there are a lot of problems and concerns here
21:58 🔗 arkiver Our main priority is to start with the discovery for qwiki
21:58 🔗 antomatic Don't let bruteforcing put you off archiving what you _know_ is there to be archived.
21:58 🔗 antomatic Priorities, that's all.
21:58 🔗 antomatic If Google lists 5,250 known videos, that's a great place to start.
21:58 🔗 dserodio I sent an email to customersupport@yahoo, maybe we'll get lucky
21:58 🔗 arkiver antomatic: There will be two at the same time
21:58 🔗 arkiver phase 1 and phase 2
21:59 🔗 arkiver I'll constantly add new items we know exist to the grab
21:59 🔗 antomatic If it's on Google it's likely to be linked from other websites too, so those are pieces of the jigsaw that will be well-appreciated in the wayback machine.
21:59 🔗 arkiver while the discovery is going
21:59 🔗 * antomatic nods
21:59 🔗 arkiver good.
21:59 🔗 dserodio hahahahahaha customercare@yahoo.com bounces
22:00 🔗 dserodio "This user doesn't have a yahoo.com account" - I bet it doesn't
22:00 🔗 dserodio they've listed this address in the shutdown page themselves...
22:01 🔗 dserodio I know someone who used to work as a sysadmin at Yahoo, let's see if he can get us something
22:03 🔗 arkiver dserodio: please try! it would be awesome if we can get some help from inside yahoo
22:27 🔗 arkiver ----------------------------------------------------
22:27 🔗 arkiver -- Join #quickie
22:27 🔗 arkiver -- Qwiki will be shutdown the 1st of November
22:27 🔗 arkiver -- Yahoo! just killed Qwiki! http://www.qwiki.com/
22:27 🔗 arkiver ----------------------------------------------------

irclogger-viewer