LoginSignup
0
0

More than 1 year has passed since last update.

Rakuten 宝くじから 過去の当選番号をスクレイピングしてみよう!

Last updated at Posted at 2023-04-30

ライブラリをインストール

!pip install beautifulsoup4
!pip install requests

URLからHTMLを取得

import requests
from bs4 import BeautifulSoup

url = "https://takarakuji.rakuten.co.jp/backnumber/numbers3/202304/"

response = requests.get(url)
response.raise_for_status()

soup = BeautifulSoup(response.content, "html.parser")
print(soup)

結果

<!DOCTYPE html>

<!--[if IE 8 ]><html class="ie ie8" lang="ja" prefix="og: http://ogp.me/ns# fb: http://www.facebook.com/2008/fbml"><![endif]-->
<!--[if IE 9 ]><html class="ie ie9" lang="ja" prefix="og: http://ogp.me/ns# fb: http://www.facebook.com/2008/fbml"><![endif]-->
<!--[if !(IE)]><!-->
<html lang="ja" prefix="og: http://ogp.me/ns# fb: http://www.facebook.com/2008/fbml">
<!--<![endif]-->
<head>
<meta charset="utf-8"/>

以下略

当選番号を取得します

winning_numbers = soup.find("td", colspan="2")
print(winning_numbers)

結果

<td colspan="2">2023/04/03</td>

複数の情報を取得するために、find_allを利用します

winning_numbers = soup.find_all("td", colspan="2")
print(winning_numbers)

結果

[<td colspan="2">2023/04/03</td>,
 <td colspan="2">009</td>,
 <td colspan="2">2023/04/04</td>,
 <td colspan="2">911</td>,
 <td colspan="2">2023/04/05</td>,

以下略

タグを削除

tmp = str(winning_numbers[0]).replace('</td>', '')
tmp = tmp.replace('<td colspan="2">', '')
print(tmp)

結果

`2023/04/03`

リスト内のタグを全て取り除き置き換える

for i in range(len(winning_numbers)):
  tmp = str(winning_numbers[i]).replace('</td>', '')
  tmp = tmp.replace('<td colspan="2">', '')
  winning_numbers[i] = tmp
0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0