0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 3 years have passed since last update.

BeautifulSoupによるXMLのパースについて

Posted at

で一旦はElement Treeでやってみたんけど、別の人のやっているBeautifulSoupが綺麗だったので、やってみた。

data

Generator Output by Fuel Type Hourly Report@Independent Electricity System Operator, Canadaのデータ

前日分までの発電種類で毎時集計したデータ

code

xml_read.py

from bs4 import BeautifulSoup as bs4
import requests 

url ='http://reports.ieso.ca/public/GenOutputbyFuelHourly/PUB_GenOutputbyFuelHourly'

r=requests.get(url)
 
soup = bs4(r.text,'lxml-xml')
result=[]
for i in soup("DailyData"):
    l=[[i.Day.text,v.Hour.text,]+[int(k.text) for k in v("Output")] for v in i('HourlyData')]
    for idx,lst in enumerate(l):
        tmp=lst[2:]
        l[idx].append(sum(tmp))
    result += l

df=pd.DataFrame(result, columns=['Date','Hour','NUCLEAR','GAS','HYDRO','WIND','SOLAR','BIOFUEL','Total'])

filename=soup.CreatedAt.text.replace("T","").replace(":","")

# to csv
# df.to_csv(filename+".csv", index=False)

気づいたこと

BeautifulSoup(以下「BS」)のタグにアクセスするには

  1. soup.DailyData : bs4.element.Tag
  2. soup("DailyData") : bs4.element.ResultSet
    の二つがあるけど、ループとかに使うには2番のほうじゃないと行けなかった。

まとめ

xml.etree.ElementTreeだとreを利用して余計なものを消さないとアクセスしづらかったけど、_BS_だと簡単だった。

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?