1
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 1 year has passed since last update.

RailsのNokogiriでamazonをスクレイピングする際に503エラーが出たので回避した方法

Last updated at Posted at 2023-07-12

やったこと

  • ユーザーエージェントを10種類用意
  • リファラを3種類用意
  • wait_timeを5種類用意

それらをランダムに組み合わせてアクセス。

ここまでやれば、問題なくスクレイピングできた。

  # ユーザーエージェントを偽装するオプションを設定。amazonの503エラーの回避
  user_agents = [
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36 OPR/77.0.4054.254',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:90.0) Gecko/20100101 Firefox/90.0',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36 Edg/92.0.902.55',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36 OPR/77.0.4054.254',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:90.0) Gecko/20100101 Firefox/90.0',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36 Edg/92.0.902.55',
    'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36',
    'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36 OPR/77.0.4054.254'
  ]

  referrers = [
    'https://narou-osusume.com/osusumes/2',
    'https://narou-osusume.com/osusumes/10',
    'https://narou-osusume.com/osusumes/15'
  ]

  # ランダムにユーザーエージェントとリファラを選択
  user_agent = user_agents.sample
  referrer = referrers.sample

  # ユーザーエージェントを偽装するためのヘッダーを設定
  headers = {
    'User-Agent' => user_agent,
    'Referer' => referrer
  }

  # 人間っぽく偽装するために、wait_timeを挟む
  wait_times = [0.3, 0.5, 0.7, 1, 1.2]
  wait_time = wait_times.sample
  sleep(wait_time)

  html = URI.open(url, headers).read
  doc = Nokogiri::HTML(html)
1
0
1

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?