| Time |
Nickname |
Message |
|
04:43
🔗
|
JackWS |
m |
|
04:44
🔗
|
winr4r |
yes |
|
04:53
🔗
|
omf_ |
ivan`, is there any more of google reader left uploading? |
|
04:59
🔗
|
ivan` |
omf_: no, but I have some ugly data that can be manually processed |
|
04:59
🔗
|
ivan` |
are you looking for something in particular? |
|
05:00
🔗
|
omf_ |
I just want to browse it, find some interesting stuff |
|
05:08
🔗
|
ivan` |
omf_: you can grab all the .cdx files and build some kind of index |
|
16:56
🔗
|
Jonimus |
is there a way to grab stuff google has indexed but isn't easily wget-able as in the have directory listing turned off. |
|
16:58
🔗
|
Jonimus |
Basically I want to wget all the results for a google search of site:blah.blah.com |
|
17:01
🔗
|
ivan` |
not really, you have to spider the whole the web or beg for URLs from various people |
|
17:01
🔗
|
ivan` |
or run a lot of site: queries |
|
17:02
🔗
|
Jonimus |
Its only one site, but they have indexing disabled and I don't want to manually save page as in firefox. |
|
17:03
🔗
|
ivan` |
you can pass wget a list of URLs |
|
17:05
🔗
|
winr4r |
Jonimus: if you feel like signing up for a microsoft azure developer's account, get yourself 5000 free bing API queries, use a script i wrote to get a list of URLs |
|
17:06
🔗
|
winr4r |
then use wget -i file |
|
17:07
🔗
|
Jonimus |
Not sure if Bing has them crawled, they technically aren't supported to be world readable afaik. |
|
17:07
🔗
|
winr4r |
Jonimus: there's one way to find out! |
|
17:07
🔗
|
Jonimus |
Though the company also doesn't really care, its just API test scripts, I just want them for the example code. |
|
17:08
🔗
|
winr4r |
http://paste.archivingyoursh.it/fehelaqoyu.coffee |
|
17:09
🔗
|
winr4r |
python whatever.py "site:theirsite.com" | sort | uniq > dongs.txt |
|
17:09
🔗
|
winr4r |
wget --input-file=dongs.txt |
|
17:09
🔗
|
winr4r |
if bing has it indexed |
|
17:10
🔗
|
Jonimus |
I just realized I can downthemall right from the google search page, so I got it |
|
17:10
🔗
|
winr4r |
oh, neat |
|
17:11
🔗
|
Jonimus |
I just need to uncheck the cached links, which is pretty easy. |
|
17:11
🔗
|
winr4r |
yes |
|
21:37
🔗
|
joepie91 |
http://arstechnica.com/gadgets/2013/07/capptivate-a-site-capturing-apps-before-they-disappear-forever/ |
|
21:38
🔗
|
joepie91 |
might be worth having a look at this |