Time |
Nickname |
Message |
05:12
🔗
|
SketchCow |
Hi, I am trapped in a hotel room with a chatty young man. |
05:21
🔗
|
godane |
hey SketchCow |
05:23
🔗
|
godane |
the NEC drivers grab is going well |
06:10
🔗
|
SketchCow |
Excellent |
09:16
🔗
|
arkiver |
A question here |
09:16
🔗
|
arkiver |
Is the IP address saved in the .ASPXAUTH cookie? |
18:59
🔗
|
primus |
http://techcrunch.com/2014/09/26/yahoo-to-shut-down-qwiki-yahoo-education-and-the-yahoo-directory/ |
19:01
🔗
|
antomatic |
time to break out the sirens and those spinny red light things... |
19:02
🔗
|
primus |
I sometimes wonder why yahoo creates or buys anything since they shut stuff down real soon anyway |
19:03
🔗
|
antomatic |
the directory seems like pretty *big* news, given its historic importance to the company |
19:03
🔗
|
primus |
I bet it's announced in some FAQ somewhere ;-) |
19:05
🔗
|
antomatic |
dir.yahoo.com is down already, unless that's just me. :) |
19:05
🔗
|
aaaaaaaaa |
primus: they are probably what the industry calls acquhires. |
19:06
🔗
|
Nemo_bis |
oh, qwiki is not a wiki; /me relieved |
19:09
🔗
|
will__ |
Why would Yahoo aquire something and then shut it down just a year later |
19:09
🔗
|
will__ |
Do they have an infinite pool of money or something |
19:09
🔗
|
antomatic |
I wonder why directory is down. |
19:09
🔗
|
antomatic |
Maybe the internet suddenly piled on for a massive nostalgia rush when they heard the news. |
19:09
🔗
|
will__ |
Influx in traffic? |
19:10
🔗
|
antomatic |
Perhaps they are moving it to new servers to withstand the arrival of ArchiveTeam. |
19:10
🔗
|
garyrh |
loads fine for me. |
19:10
🔗
|
antomatic |
hm. |
19:12
🔗
|
aaaaaaaaa |
They are probably acquhires to get talent or they are hoping users transition from one service to another. |
19:14
🔗
|
aaaaaaaaa |
Yahoo directory doesn't work here. |
19:16
🔗
|
aaaaaaaaa |
http://education.yahoo.net/ may be a good candidate for the archivebot |
19:20
🔗
|
arkiver |
So something needs a project? |
19:20
🔗
|
arkiver |
I'm free all day tomorrow and create and start a project if needed |
19:22
🔗
|
arkiver |
so |
19:22
🔗
|
arkiver |
-- Qwiki |
19:22
🔗
|
arkiver |
There are at least three public urls on the front page to shared videos with qwiki. |
19:22
🔗
|
arkiver |
Are all the other qwiki items also available on the website? |
19:23
🔗
|
arkiver |
Seems like it, qwiki gives a lot of results when searching on google |
19:27
🔗
|
arkiver |
Other sites are unavaible right now |
19:29
🔗
|
arkiver |
So I think education.yahoo.net and dir.yahoo.com can be done by archivebot |
19:29
🔗
|
arkiver |
For qwiki we might need a project and some clever way to get all the urls |
19:29
🔗
|
arkiver |
I haven't taken a good look at it, but we might be able to do a discovery crawl for qwiki first |
19:32
🔗
|
* |
arkiver is away for a few hours, will see about creating a project when I'm back |
20:22
🔗
|
aaaaaaaaa |
I think I figured out why education.yahoo.net is closing, half the links go to www.university.net and I bet they are probably not renewing a contract with yahoo for referrals. |
20:22
🔗
|
aaaaaaaaa |
err, university.com |
21:12
🔗
|
arkiver |
will take a good look at qwiki now |
21:15
🔗
|
antomatic |
could you stop quizilla aborting items for 500s first? :) |
21:26
🔗
|
arkiver |
so I think I'll be doing a discovery for qwiki to find out what urls exist and what not |
21:26
🔗
|
arkiver |
then will do the grab and download the urls that exist |
21:27
🔗
|
arkiver |
I just hope qwiki has the same capacity as the other yahoo sites |
21:37
🔗
|
antomatic |
Qwiki might be a storage issue if there are lots of videos, though... |
21:37
🔗
|
antomatic |
they reckoned 250,000 users not long after launch |
21:38
🔗
|
arkiver |
yeah, we'll do the the discovery first |
21:38
🔗
|
arkiver |
working on it right now |
21:38
🔗
|
arkiver |
we will try to go through all the urls and test for working urls, we won't save the urls yet |
21:41
🔗
|
antomatic |
218340105584896 possible video IDs, as far as I can see... |
21:42
🔗
|
antomatic |
Google finds about 5,250 links |
21:45
🔗
|
yipdw |
one extract off of the qwiki blog |
21:45
🔗
|
yipdw |
$ curl -X HEAD -v http://d2japcd9yzs5kz.cloudfront.net/joshh/media/videos/9a0e62110e4beca36877002b885e90a2_640x360.webm 2>&1 | grep 'Content-Length' |
21:45
🔗
|
yipdw |
< Content-Length: 4483371 |
21:46
🔗
|
yipdw |
4 MB video is not that bad |
21:46
🔗
|
yipdw |
also if this is cloudfront it's likely it's feeding out of an S3 bucket somewhere |
21:46
🔗
|
yipdw |
they'd be stupid to do otherwise |
21:46
🔗
|
antomatic |
The videos don't seem long - I havent' seen one longer than 75 seconds so far |
21:47
🔗
|
yipdw |
so you probably have all the bandwidth in the world |
21:48
🔗
|
yipdw |
also is there some silicon valley committee for stupid-ass names |
21:50
🔗
|
arkiver |
yeah, so we can go all-in on speed for grab and discovery |
21:50
🔗
|
yipdw |
one request |
21:50
🔗
|
antomatic |
arkiver: promise me you're not going to try to brute-force 218 trillion possible IDs? :) |
21:50
🔗
|
yipdw |
please minimize copypasta from other projects, it makes it really hard to understand the grab script |
21:52
🔗
|
arkiver |
yipdw: is this understandable? https://github.com/Arkiver2/qwiki-discovery |
21:52
🔗
|
dserodio |
Google won't help much, a search for "site:qwiki.com" returns "About 6,090 results" |
21:52
🔗
|
yipdw |
arkiver: fine for now |
21:53
🔗
|
antomatic |
maybe that's all there is? :) |
21:53
🔗
|
yipdw |
I guess I'm asking for the code to be there because it's needed |
21:53
🔗
|
yipdw |
not because it was in some other project |
21:53
🔗
|
arkiver |
yipdw: I understand, will keep think about it when creating scripts |
21:53
🔗
|
yipdw |
boo |
21:53
🔗
|
yipdw |
http://www.qwiki.com/sitemap.xml |
21:54
🔗
|
yipdw |
also #quickie |
21:54
🔗
|
arkiver |
yeah, checked it all already |
21:54
🔗
|
yipdw |
if nobody else has an idea |
21:54
🔗
|
antomatic |
Google site:qwiki.com inurl:/v/ = 5250 results |
21:54
🔗
|
dserodio |
I'm a newbie at archiveteam, is emailing someone at Yahoo asking for an index of the available URLs a stupid idea? |
21:54
🔗
|
arkiver |
is fine |
21:54
🔗
|
antomatic |
site:youtube.com inurl:/watch = 329 million results |
21:54
🔗
|
yipdw |
dserodio: it's worth a shot, but they've never responded before |
21:54
🔗
|
arkiver |
antomatic: what are you trying to make clear |
21:54
🔗
|
arkiver |
we are going to do the full discovery |
21:55
🔗
|
antomatic |
My suggestion (for what it's worth) is to start with what you can easily find and know to exist |
21:55
🔗
|
arkiver |
antomatic: that's almost nothing, there are thousands more |
21:55
🔗
|
arkiver |
we need to do the discovery. |
21:55
🔗
|
antomatic |
You can't bruteforce 218 trillion URLs. Not even you can do that. |
21:56
🔗
|
antomatic |
218 trillion is a lot. |
21:56
🔗
|
arkiver |
you're right about that probably :/ |
21:56
🔗
|
* |
Kazzy checks the math |
21:56
🔗
|
Kazzy |
yep, 218 trillion is a big number |
21:56
🔗
|
arkiver |
but we'll see once it's running |
21:56
🔗
|
arkiver |
if it's going fast and we have a lot of machines, why not? |
21:57
🔗
|
arkiver |
but going to work on grab script now |
21:57
🔗
|
xmc |
nobody has enough bandwidth for that |
21:57
🔗
|
antomatic |
Yahoo won't put up with it, I'm pretty sure of that. They've felt the hand of AT before. |
21:57
🔗
|
arkiver |
there are a lot of problems and concerns here |
21:58
🔗
|
arkiver |
Our main priority is to start with the discovery for qwiki |
21:58
🔗
|
antomatic |
Don't let bruteforcing put you off archiving what you _know_ is there to be archived. |
21:58
🔗
|
antomatic |
Priorities, that's all. |
21:58
🔗
|
antomatic |
If Google lists 5,250 known videos, that's a great place to start. |
21:58
🔗
|
dserodio |
I sent an email to customersupport@yahoo, maybe we'll get lucky |
21:58
🔗
|
arkiver |
antomatic: There will be two at the same time |
21:58
🔗
|
arkiver |
phase 1 and phase 2 |
21:59
🔗
|
arkiver |
I'll constantly add new items we know exist to the grab |
21:59
🔗
|
antomatic |
If it's on Google it's likely to be linked from other websites too, so those are pieces of the jigsaw that will be well-appreciated in the wayback machine. |
21:59
🔗
|
arkiver |
while the discovery is going |
21:59
🔗
|
* |
antomatic nods |
21:59
🔗
|
arkiver |
good. |
21:59
🔗
|
dserodio |
hahahahahaha customercare@yahoo.com bounces |
22:00
🔗
|
dserodio |
"This user doesn't have a yahoo.com account" - I bet it doesn't |
22:00
🔗
|
dserodio |
they've listed this address in the shutdown page themselves... |
22:01
🔗
|
dserodio |
I know someone who used to work as a sysadmin at Yahoo, let's see if he can get us something |
22:03
🔗
|
arkiver |
dserodio: please try! it would be awesome if we can get some help from inside yahoo |
22:27
🔗
|
arkiver |
---------------------------------------------------- |
22:27
🔗
|
arkiver |
-- Join #quickie |
22:27
🔗
|
arkiver |
-- Qwiki will be shutdown the 1st of November |
22:27
🔗
|
arkiver |
-- Yahoo! just killed Qwiki! http://www.qwiki.com/ |
22:27
🔗
|
arkiver |
---------------------------------------------------- |