ruby - download a pdf file with webcrawler -
i'm beginning use ruby programming language. have ruby script crawl pdf files on page anemone:
anemone.crawl("http://example.com") |anemone| anemone.on_pages_like(/\b.+.pdf/) |page| puts page.url end end i want download page.url using gem ruby. gem can use download page.url?
no need gem, try this
require 'anemone' anemone.crawl("http://www.rubyinside.com/media/",:depth_limit => 1, :obey_robots_txt => true, :skip_query_strings => true) |anemone| anemone.on_pages_like(/\b.+.pdf/) |page| begin filename = file.basename(page.url.request_uri.to_s) file.open(filename,"wb") {|f| f.write(page.body)} puts "downloaded #{page.url}" rescue puts "error while downloading #{page.url}" end end end gives
downloaded http://www.rubyinside.com/media/poignant-guide.pdf and pdf fine.
Comments
Post a Comment