More than 3 years have passed since last update.

【Python】グラフ作成ライブラリ Altair の使い方

Last updated at 2021-06-27Posted at 2020-10-18

更新版

概要

本稿では、Python グラフ作成ライブラリ Altair を用いて様々なグラフを描画してみる。Altair は Pandas の DataFrame でデータを入力するのが特徴である。

テストデータ

本稿では、Kaggle で公開されているタイタニック号の乗客データベースを用いた。データ形式は以下のとおりである。

train.csv

PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2. 3101282,7.925,,S
4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35,1,0,113803,53.1,C123,S
5,0,3,"Allen, Mr. William Henry",male,35,0,0,373450,8.05,,S
6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q

データの見方を以下に示す。参考

PassengerId – 乗客識別ユニークID
Survived – 生存フラグ（0=死亡、1=生存）
Pclass – チケットクラス
Name – 乗客の名前
Sex – 性別（male=男性、female＝女性）
Age – 年齢
SibSp – タイタニックに同乗している兄弟/配偶者の数
parch – タイタニックに同乗している親/子供の数
ticket – チケット番号
fare – 料金
cabin – 客室番号
Embarked – 出港地（タイタニックへ乗った港）

Altair のインストール

pip でインストール可能。

ターミナル

pip install altair

環境

Python: ver3.8.4
Altair: ver4.1.0

散布図: Scatter plot

Altair は、散布図を作成するのに最も適したライブラリである。数値データでも、:Oを追加するとカテゴリーデータとして処理される。

altair_demo.py

import os
import altair as alt
import pandas as pd

cwd = os.getcwd()
path = ['train.csv']
file = os.path.join(cwd, *path)

df = pd.read_table(file, sep=',', index_col=0 ,header=0)

scatter_plot = alt.Chart(df).mark_circle().encode(
    x=alt.X('Age'),
    y=alt.Y('Fare'),
    column=alt.Column('Survived:O'),
    color=alt.Color('Sex', sort=['male', 'female']),
    tooltip=['Age', 'Fare', 'Name'],
    size=alt.Size('Pclass:O')
).properties(
	width=600,
	height=500
).interactive()

scatter_plot.show()

線形回帰直線

線分を引く場合、始点と終点の座標を DataFrame に作成すると、結ぶことができる。
線形回帰の切片と傾きは、sckit-learn で求める。

altair_demo.py

import os
import altair as alt
import pandas as pd
from sklearn.linear_model import LinearRegression

cwd = os.getcwd()
path = ['train.csv']
file = os.path.join(cwd, *path)

df = pd.read_table(file, sep=',', index_col=0 ,header=0)

# 欠損値を含む行を削除する

linear_df = df.dropna(subset=['Age', 'Fare'], how='any', axis=0)

# 線形回帰モデルを作成

linear = LinearRegression(
    ).fit(linear_df['Age'].values.reshape(-1,1), linear_df['Fare'].values.reshape(-1,1))

# パラメータの決定

a = linear.coef_[0]
b = linear.intercept_

# 閾値の決定

x_min = df['Age'].min()
x_max = df['Age'].max()

# データフレームの作成

linear_points = pd.DataFrame({
    'Age': [x_min, x_max],
    'Fare': [a*x_min+b, a*x_max+b],
}).astype(float)

linear_line = alt.Chart(linear_points).mark_line(color='steelblue').encode(
    x=alt.X('Age'),
    y=alt.Y('Fare')
    ).properties(
    width=500,
    height=500
    ).interactive()

linear_line.show()

図の重ね合わせ

散布図と重ねて表示させることも可能である。

altair_demo.py

import os
import altair as alt
import pandas as pd

cwd = os.getcwd()
path = ['train.csv']
file = os.path.join(cwd, *path)

df = pd.read_table(file, sep=',', index_col=0 ,header=0)

scatter_plot = alt.Chart(df).mark_circle(size=50).encode(
    x=alt.X('Age'),
    y=alt.Y('Fare'),
).properties(
    width=500,
    height=500
).interactive()

linear_line = 上と同じ（省略）

(scatter_plot + linear_line).show()

箱ひげ図: Boxplot

詳細はこちら。

altair_demo.py

import os
import altair as alt
import pandas as pd

cwd = os.getcwd()
path = ['train.csv']
file = os.path.join(cwd, *path)

df = pd.read_table(file, sep=',', index_col=0 ,header=0)

boxplot = alt.Chart(df.dropna(subset=['Embarked'], how='any', axis=0)).mark_boxplot(size=100,ticks=alt.MarkConfig(width=30), median=alt.MarkConfig(color='black',size=100)).encode(
    x=alt.X('Survived:O',axis=alt.Axis(labelFontSize=15, ticks=True, titleFontSize=18, title='Survive',labelAngle=0)),
    y=alt.Y('Fare', axis=alt.Axis(labelFontSize=15, ticks=True, titleFontSize=18, title='Fare',labelAngle=0)),
    column=alt.Column('Embarked', sort=['S','Q','C'], header=alt.Header(labelFontSize=15, labelAngle=0, titleFontSize=18)),
    row=alt.Row('Sex', sort=['male','female'], header=alt.Header(labelFontSize=15, labelAngle=-90, titleFontSize=18)),
    color=alt.Color('Sex', sort=['male', 'female'])
).properties(
	width=600,
	height=500
).interactive()

boxplot.show()

ヒストグラム: Histgram

Y 軸を count() とすることで、要素を数え上げてくれる。alt.X()で bin の設定が可能である。

altair_demo.py

import os
import altair as alt
import pandas as pd

cwd = os.getcwd()
path = ['train.csv']
file = os.path.join(cwd, *path)

df = pd.read_table(file, sep=',', index_col=0 ,header=0)

histgram = alt.Chart(df).mark_bar().encode(
    x=alt.X("Age", bin=alt.Bin(step=10,extent=[0,90]),axis=alt.Axis(labelFontSize=15, ticks=True, titleFontSize=18, title='Age')),
    y=alt.Y('count()', axis=alt.Axis(labelFontSize=15, ticks=True, titleFontSize=18, title='Frequency')),
    column=alt.Column('Survived:O'),
    row=alt.Row('Sex', sort=['male','female']),
    color=alt.Color('Sex', sort=['male', 'female']),
    opacity=alt.Opacity('Sex', sort=['male', 'female'])
    ).properties(
	width=600,
	height=500
	).interactive()

histgram.show()

ヒートマップ: Heatmap

altair_demo.py

import os
import altair as alt
import pandas as pd

cwd = os.getcwd()
path = ['train.csv']
file = os.path.join(cwd, *path)

df = pd.read_table(file, sep=',', index_col=0 ,header=0)

df = df.groupby(['Sex','Embarked']).mean().reset_index()

heatmap = alt.Chart(df).mark_rect().encode(
    x=alt.X('Sex:O',axis=alt.Axis(labelFontSize=20, ticks=True, titleFontSize=20, title='Sex',labelAngle=0)),
    y=alt.Y('Embarked:O', axis=alt.Axis(labelFontSize=20, ticks=True, titleFontSize=20, title='Embarked',labelAngle=0)),
    color='Survived:Q',
    tooltip=['Sex', 'Embarked', 'Survived']
    ).properties(
    width=600,
    height=500
    ).interactive()

heatmap.show()

折れ線グラフ: Line plot

Altair は、melt 関数を用いて dataframe を入力する。

altair_demo.py

import os
import altair as alt
import pandas as pd

cwd = os.getcwd()
path = ['train.csv']
file = os.path.join(cwd, *path)

df = pd.read_table(file, sep=',', index_col=0 ,header=0)

Embarked_order = ['S','C','Q']

line_df = df.groupby(['Embarked']).mean().reset_index()
line_df['order']=line_df['Embarked'].apply(lambda x: Embarked_order.index(x))
line_df = pd.melt(line_df,id_vars=['Age','Embarked','order'],var_name='index',value_name='values')
line_df['Age'] = line_df['Age'].round()
line_df['values'] = line_df['values'].round()

line = alt.Chart(line_df).mark_line().encode(
    x=alt.X('Age:O',axis=alt.Axis(labelFontSize=20, ticks=True, titleFontSize=20, title='Age',labelAngle=0)),
    y=alt.Y('values:O', axis=alt.Axis(labelFontSize=20, ticks=True, titleFontSize=20, title='values',labelAngle=0)),
    color='index',
    order='order'
    ).properties(
    width=600,
    height=500
    ).interactive()

line.show()

図の保存方法

HTML と VEGA

altair_saver をインストールすると html や VEGA で保存できる。

ターミナル

pip install altair_saver

.interactive() メソッドでグラフが自由に動かせるようになる。この性質は保存先でも保たれる。

altair_demo.py

import os
import altair as alt
import pandas as pd
from altair_saver import save

cwd = os.getcwd()
path = ['train.csv']
file = os.path.join(cwd, *path)

df = pd.read_table(file, sep=',', index_col=0 ,header=0)

boxplot = alt.Chart(df.dropna(subset=['Embarked'], how='any', axis=0)).mark_boxplot().encode(
    x=alt.X('Survived:O'),
    y=alt.Y('Fare')).interactive()

save(boxplot,"fig.html") # HTML
save(boxplot,"fig.vl.json") # VEGA-Lite
save(boxplot,"fig.vg.json") # VEGA

ただし VEGA で保存する場合は npm で以下をインストールする必要がある。

npm install vega-lite vega-cli canvas

Githubも参考にするとよい。

PNG と SVG

chromedriver-binary をインストールしをインポートして保存する。バージョンはブラウザの Chrome と合わせておく。

pip install chromedriver-binary

altair_demo.py

import os
import altair as alt
import pandas as pd
from altair_saver import save
import chromedriver_binary

cwd = os.getcwd()
path = ['train.csv']
file = os.path.join(cwd, *path)

df = pd.read_table(file, sep=',', index_col=0 ,header=0)

boxplot = alt.Chart(df.dropna(subset=['Embarked'], how='any', axis=0)).mark_boxplot().encode(
    x=alt.X('Survived:O'),
    y=alt.Y('Fare')).interactive()

boxplot.save('boxplot.png')
boxplot.save('boxplot.svg')

応用例

Streamlit と組み合わせることで、様々なデータ分析アプリケーションを作成することができる。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up