Time |
Nickname |
Message |
04:43
🔗
|
JackWS |
m |
04:44
🔗
|
winr4r |
yes |
04:53
🔗
|
omf_ |
ivan`, is there any more of google reader left uploading? |
04:59
🔗
|
ivan` |
omf_: no, but I have some ugly data that can be manually processed |
04:59
🔗
|
ivan` |
are you looking for something in particular? |
05:00
🔗
|
omf_ |
I just want to browse it, find some interesting stuff |
05:08
🔗
|
ivan` |
omf_: you can grab all the .cdx files and build some kind of index |
16:56
🔗
|
Jonimus |
is there a way to grab stuff google has indexed but isn't easily wget-able as in the have directory listing turned off. |
16:58
🔗
|
Jonimus |
Basically I want to wget all the results for a google search of site:blah.blah.com |
17:01
🔗
|
ivan` |
not really, you have to spider the whole the web or beg for URLs from various people |
17:01
🔗
|
ivan` |
or run a lot of site: queries |
17:02
🔗
|
Jonimus |
Its only one site, but they have indexing disabled and I don't want to manually save page as in firefox. |
17:03
🔗
|
ivan` |
you can pass wget a list of URLs |
17:05
🔗
|
winr4r |
Jonimus: if you feel like signing up for a microsoft azure developer's account, get yourself 5000 free bing API queries, use a script i wrote to get a list of URLs |
17:06
🔗
|
winr4r |
then use wget -i file |
17:07
🔗
|
Jonimus |
Not sure if Bing has them crawled, they technically aren't supported to be world readable afaik. |
17:07
🔗
|
winr4r |
Jonimus: there's one way to find out! |
17:07
🔗
|
Jonimus |
Though the company also doesn't really care, its just API test scripts, I just want them for the example code. |
17:08
🔗
|
winr4r |
http://paste.archivingyoursh.it/fehelaqoyu.coffee |
17:09
🔗
|
winr4r |
python whatever.py "site:theirsite.com" | sort | uniq > dongs.txt |
17:09
🔗
|
winr4r |
wget --input-file=dongs.txt |
17:09
🔗
|
winr4r |
if bing has it indexed |
17:10
🔗
|
Jonimus |
I just realized I can downthemall right from the google search page, so I got it |
17:10
🔗
|
winr4r |
oh, neat |
17:11
🔗
|
Jonimus |
I just need to uncheck the cached links, which is pretty easy. |
17:11
🔗
|
winr4r |
yes |
21:37
🔗
|
joepie91 |
http://arstechnica.com/gadgets/2013/07/capptivate-a-site-capturing-apps-before-they-disappear-forever/ |
21:38
🔗
|
joepie91 |
might be worth having a look at this |