More than 5 years have passed since last update.

配列がわかった！

Posted at 2015-07-09

コード

lass Scraping
  def self.movie_urls
    links = []
    agent1 = Mechanize.new
    main_page = agent1.get("http://eiga.com/now/")
    elements = main_page.search('.m_unit h3 a')
    elements.each do |ele|
      links << ele.get_attribute('href')
    end

    links.each do |link|
      get_product('http://eiga.com' + link)
  end



  def self.get_product(link)
    agent2 = Mechanize.new
    sub_page = agent2.get("http://eiga.com/" + links)
    title = sub_page.at('.moveInfoBox h1')
    image = sub_page.at('.pictBox img')
    puts title
    puts image
  end
end

理解したこと

配列は引数などとは違って、何かを受け渡したりしない!

links = []
agent1 = Mechanize.new
    main_page = agent1.get("http://eiga.com/now/")
    elements = main_page.search('.m_unit h3 a')
    elements.each do |ele|
      links << ele.get_attribute('href')
    end

まずこのコードで空の配列linksに、main_pageから取得したhref要素の全てを入れている。つまり、メイン画面のリンク先が溜まっているタンクのようなものを作成している。

links.each do |link|
      get_product('http://eiga.com' + link )
  end

現段階ではlinksに多くのリンク先がたまっているので、今度はそのlink先一つ一つについて分解したいのだが、その前にlinkのurlについて整理し、それをget_productというメソッドに渡している。つまりeach.do関数でたまっているlinksが一つ一つlinkとしてget_product(http:// + link)という形で定義され、全てのリンク先が定義される。

def self.get_product(link)
    agent2 = Mechanize.new
    page = agent.get(link)
    title = page.at('.moveInfoBox h1').inner_text
    image_url = page.at('.pictBox img')[:src]
    product ~ Product.new(:title => title, :image_url => image_url)
    product.save
  end
end

そしてここで得たlinkという配列をつかって、リンク先の一つ一つのタイトルをゲットしてるのだ。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up