ruby - download a pdf file with webcrawler -


i'm beginning use ruby programming language. have ruby script crawl pdf files on page anemone:

anemone.crawl("http://example.com") |anemone|   anemone.on_pages_like(/\b.+.pdf/) |page|     puts page.url   end end 

i want download page.url using gem ruby. gem can use download page.url?

no need gem, try this

require 'anemone'  anemone.crawl("http://www.rubyinside.com/media/",:depth_limit => 1, :obey_robots_txt => true, :skip_query_strings => true) |anemone|   anemone.on_pages_like(/\b.+.pdf/) |page|     begin       filename = file.basename(page.url.request_uri.to_s)       file.open(filename,"wb") {|f| f.write(page.body)}       puts "downloaded #{page.url}"     rescue       puts "error while downloading #{page.url}"     end   end end 

gives

downloaded http://www.rubyinside.com/media/poignant-guide.pdf 

and pdf fine.


Comments

Popular posts from this blog

how to insert data php javascript mysql with multiple array session 2 -

multithreading - Exception in Application constructor -

windows - CertCreateCertificateContext returns CRYPT_E_ASN1_BADTAG / 8009310b -