ライブラリをインストール
!pip install beautifulsoup4
!pip install requests
URLからHTMLを取得
import requests
from bs4 import BeautifulSoup
url = "https://takarakuji.rakuten.co.jp/backnumber/numbers3/202304/"
response = requests.get(url)
response.raise_for_status()
soup = BeautifulSoup(response.content, "html.parser")
print(soup)
結果
<!DOCTYPE html>
<!--[if IE 8 ]><html class="ie ie8" lang="ja" prefix="og: http://ogp.me/ns# fb: http://www.facebook.com/2008/fbml"><![endif]-->
<!--[if IE 9 ]><html class="ie ie9" lang="ja" prefix="og: http://ogp.me/ns# fb: http://www.facebook.com/2008/fbml"><![endif]-->
<!--[if !(IE)]><!-->
<html lang="ja" prefix="og: http://ogp.me/ns# fb: http://www.facebook.com/2008/fbml">
<!--<![endif]-->
<head>
<meta charset="utf-8"/>
以下略
当選番号を取得します
winning_numbers = soup.find("td", colspan="2")
print(winning_numbers)
結果
<td colspan="2">2023/04/03</td>
複数の情報を取得するために、find_all
を利用します
winning_numbers = soup.find_all("td", colspan="2")
print(winning_numbers)
結果
[<td colspan="2">2023/04/03</td>,
<td colspan="2">009</td>,
<td colspan="2">2023/04/04</td>,
<td colspan="2">911</td>,
<td colspan="2">2023/04/05</td>,
以下略
タグを削除
tmp = str(winning_numbers[0]).replace('</td>', '')
tmp = tmp.replace('<td colspan="2">', '')
print(tmp)
結果
`2023/04/03`
リスト内のタグを全て取り除き置き換える
for i in range(len(winning_numbers)):
tmp = str(winning_numbers[i]).replace('</td>', '')
tmp = tmp.replace('<td colspan="2">', '')
winning_numbers[i] = tmp