More than 5 years have passed since last update.

webスクレイピングで今日の天気と気温を取得

Last updated at 2018-06-13Posted at 2018-06-13

はじめに

はじめまして、webスクレイピング素人です。

BeautifulSoupを使ってtenki.jpから調布市の今日の天気と気温を取得してみました。

環境

言語

python

使用ライブラリ

requests
BeautifulSoup4

HTMLソースの取得、解析

調布の天気ページのHTMLソースをrequests.getで取得してBeautifulSoupで解析

import requests
from bs4 import BeautifulSoup


# tenki.jpの目的の地域のページのURL(東京都調布市)
url = 'https://tenki.jp/forecast/3/16/4410/13208/'

# HTTPリクエスト
r = requests.get(url)

# プロキシ環境下の場合は以下を記述
"""
proxies = {
    #自分のプロキシのアドレスを記述
    "http":"http://proxy.xxx.xxx.xxx:8080",
    "https":"http://proxy.xxx.xxx.xxx:8080"
}
r = requests.get(url, proxies=proxies)
"""


bsObj = BeautifulSoup(r.content, "html.parser")

Chromeで閲覧してF12で調べてみると、
今日の天気情報はsectionタグのtoday-weatherクラスに入っている。

この部分を取得する。

today = bsObj.find(class_="today-weather")

todayの中身はこんな感じになる。

todayの中身

<section class="today-weather"><!-- 今日の天気 -->
<h3 class="left-style">今日 06月13日<span class="weekday">(水)</span><span class="roku-you">[先負]</span></h3>
<div class="weather-wrap clearfix">
<div class="weather-icon">
<img alt="晴" height="60" src="https://static.tenki.jp/images/icon/forecast-days-weather/01.png" title="晴" width="94"/>
<p class="weather-telop">晴</p> </div>
<div class="date-value-wrap"><div class="forecast-days-temp-telop"><span class="forecast-natsu">夏日</span></div> <dl class="date-value">
<dt class="high-temp sumarry">最高</dt>
<dd class="high-temp temp"><span class="value">27</span><span class="unit">℃</span></dd>
<dd class="high-temp tempdiff">[+1]</dd>
<dt class="low-temp sumarry">最低</dt>
<dd class="low-temp temp"><span class="value">20</span><span class="unit">℃</span></dd>
<dd class="low-temp tempdiff">[0]</dd>
</dl><!-- /.date_value -->
</div><!-- /.date-value-wrap -->
</div><!-- /.weather_wrap -->
<div class="precip-table">
<table>
<tr>
<th>時間</th>
<th>00-06</th>
<th>06-12</th>
<th>12-18</th>
<th>18-24</th>
</tr>
<tr class="rain-probability">
<th>降水確率</th>
<td><span class="grey">---</span></td>
<td>0<span class="unit">%</span></td>
<td>0<span class="unit">%</span></td>
<td>0<span class="unit">%</span></td>
</tr>
<tr class="wind-wave">
<th>風</th>
<td colspan="4">北の風</td>
</tr> </table>
</div><!-- /.precip_table -->
</section>

ここから目的の天気と気温情報を取得したい。

お天気はpタグに入っている

weather = today.p.string #天気

気温情報はさらにdivタグのdate-value-wrapクラスに入っている。

temp = today.div.find(class_="date-value-wrap")

中身

<div class="date-value-wrap"><div class="forecast-days-temp-telop"><span class="forecast-natsu">夏日</span></div> <dl class="date-value">
<dt class="high-temp sumarry">最高</dt>
<dd class="high-temp temp"><span class="value">27</span><span class="unit">℃</span></dd>
<dd class="high-temp tempdiff">[+1]</dd>
<dt class="low-temp sumarry">最低</dt>
<dd class="low-temp temp"><span class="value">20</span><span class="unit">℃</span></dd>
<dd class="low-temp tempdiff">[0]</dd>
</dl><!-- /.date_value -->
</div>

具体的な数値はddタグに入っているのでそれぞれ取り出す

# ddタグをすべて取り出す
temp=temp.find_all("dd")
# 最高気温
temp_max = temp[0].span.string
# 最高気温の前日比
temp_max_diff=temp[1].string
# 最低気温
temp_min = temp[2].span.string
# 最低気温の前日比
temp_min_diff=temp[3].string

それでは取得した中身を見てみましょう

print("天気:{}".format(weather))
print("最高気温:{} {}".format(temp_max,temp_max_diff))
print("最低気温:{} {}".format(temp_min,temp_min_diff))

出力(2018/6/13)

天気:晴
最高気温:27 [+1]
最低気温:20 [0]

できました。

以上の内容をまとめたコードがこちらです。

weather_tenkijp.py

# ライブラリのインポート
import requests
from bs4 import BeautifulSoup


# tenki.jpの目的の地域のページのURL（今回は東京都調布市）
url = 'https://tenki.jp/forecast/3/16/4410/13208/'

# HTTPリクエスト
r = requests.get(url)

# プロキシ環境下の場合は以下を記述
"""
proxies = {
    "http":"http://proxy.xxx.xxx.xxx:8080",
    "https":"http://proxy.xxx.xxx.xxx:8080"
}
r = requests.get(url, proxies=proxies)
"""

# HTMLの解析
bsObj = BeautifulSoup(r.content, "html.parser")

# 今日の天気を取得
today = bsObj.find(class_="today-weather")
weather = today.p.string

# 気温情報のまとまり
temp=today.div.find(class_="date-value-wrap")

# 気温の取得
temp=temp.find_all("dd")
temp_max = temp[0].span.string #最高気温
temp_max_diff=temp[1].string #最高気温の前日比
temp_min = temp[2].span.string #最低気温
temp_min_diff=temp[3].string #最低気温の前日比

# 結果の出力
print("天気:{}".format(weather))
print("最高気温:{} {}".format(temp_max,temp_max_diff))
print("最低気温:{} {}".format(temp_min,temp_min_diff))

おわりに

拙い文ですが読んでいただきありがとうございました。

参考記事

PythonとBeautiful Soupでスクレイピング

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up