[04:43] m [04:44] yes [04:53] ivan`, is there any more of google reader left uploading? [04:59] omf_: no, but I have some ugly data that can be manually processed [04:59] are you looking for something in particular? [05:00] I just want to browse it, find some interesting stuff [05:08] omf_: you can grab all the .cdx files and build some kind of index [16:56] is there a way to grab stuff google has indexed but isn't easily wget-able as in the have directory listing turned off. [16:58] Basically I want to wget all the results for a google search of site:blah.blah.com [17:01] not really, you have to spider the whole the web or beg for URLs from various people [17:01] or run a lot of site: queries [17:02] Its only one site, but they have indexing disabled and I don't want to manually save page as in firefox. [17:03] you can pass wget a list of URLs [17:05] Jonimus: if you feel like signing up for a microsoft azure developer's account, get yourself 5000 free bing API queries, use a script i wrote to get a list of URLs [17:06] then use wget -i file [17:07] Not sure if Bing has them crawled, they technically aren't supported to be world readable afaik. [17:07] Jonimus: there's one way to find out! [17:07] Though the company also doesn't really care, its just API test scripts, I just want them for the example code. [17:08] http://paste.archivingyoursh.it/fehelaqoyu.coffee [17:09] python whatever.py "site:theirsite.com" | sort | uniq > dongs.txt [17:09] wget --input-file=dongs.txt [17:09] if bing has it indexed [17:10] I just realized I can downthemall right from the google search page, so I got it [17:10] oh, neat [17:11] I just need to uncheck the cached links, which is pretty easy. [17:11] yes [21:37] http://arstechnica.com/gadgets/2013/07/capptivate-a-site-capturing-apps-before-they-disappear-forever/ [21:38] might be worth having a look at this