#archiveteam 2013-07-15,Mon

↑back Search

Time Nickname Message
04:43 🔗 JackWS m
04:44 🔗 winr4r yes
04:53 🔗 omf_ ivan`, is there any more of google reader left uploading?
04:59 🔗 ivan` omf_: no, but I have some ugly data that can be manually processed
04:59 🔗 ivan` are you looking for something in particular?
05:00 🔗 omf_ I just want to browse it, find some interesting stuff
05:08 🔗 ivan` omf_: you can grab all the .cdx files and build some kind of index
16:56 🔗 Jonimus is there a way to grab stuff google has indexed but isn't easily wget-able as in the have directory listing turned off.
16:58 🔗 Jonimus Basically I want to wget all the results for a google search of site:blah.blah.com
17:01 🔗 ivan` not really, you have to spider the whole the web or beg for URLs from various people
17:01 🔗 ivan` or run a lot of site: queries
17:02 🔗 Jonimus Its only one site, but they have indexing disabled and I don't want to manually save page as in firefox.
17:03 🔗 ivan` you can pass wget a list of URLs
17:05 🔗 winr4r Jonimus: if you feel like signing up for a microsoft azure developer's account, get yourself 5000 free bing API queries, use a script i wrote to get a list of URLs
17:06 🔗 winr4r then use wget -i file
17:07 🔗 Jonimus Not sure if Bing has them crawled, they technically aren't supported to be world readable afaik.
17:07 🔗 winr4r Jonimus: there's one way to find out!
17:07 🔗 Jonimus Though the company also doesn't really care, its just API test scripts, I just want them for the example code.
17:08 🔗 winr4r http://paste.archivingyoursh.it/fehelaqoyu.coffee
17:09 🔗 winr4r python whatever.py "site:theirsite.com" | sort | uniq > dongs.txt
17:09 🔗 winr4r wget --input-file=dongs.txt
17:09 🔗 winr4r if bing has it indexed
17:10 🔗 Jonimus I just realized I can downthemall right from the google search page, so I got it
17:10 🔗 winr4r oh, neat
17:11 🔗 Jonimus I just need to uncheck the cached links, which is pretty easy.
17:11 🔗 winr4r yes
21:37 🔗 joepie91 http://arstechnica.com/gadgets/2013/07/capptivate-a-site-capturing-apps-before-they-disappear-forever/
21:38 🔗 joepie91 might be worth having a look at this

irclogger-viewer