More than 3 years have passed since last update.

kaggleのデータを用いてplotlyで可視化してみた

Last updated at 2021-04-07Posted at 2021-04-06

はじめに##

この記事ではpython初学者がデータの可視化の手法としてplotlyを使い、様々な形式で可視化する過程を記事にしたものです。
自分なりの理解で進めているため間違っているところありましたら指摘してもらえますとありがたいです。

データの取得##

import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from IPython.display import HTML,display
from plotly.subplots import make_subplots

data = pd.read_csv('../input/videogamesales/vgsales.csv')

今回はkaggleのVideo Game Salesというデータセットを使っていきます。

data.info()

.info()でデータの中身が簡単に見れるのでおすすめです。

data = data.dropna().reset_index(drop=True)

NaN値の削除、indexの振りなおし

data.head(10)

data.describe()

.describe()で統計量の確認ができます。

data.corr()

.corr()で相関係数が確認できます。

0.7~1.0　かなり強い正の相関がある
0.4~0.7　正の相関がある
0.2~0.4　弱い正の相関がある
-0.2~0~0.2　ほとんど相関がない
-0.4~ -0.2　弱い負の相関がある
-0.7~ -0.4　負の相関がある
-1.0~ -0.7　かなり強い負の相関がある

折れ線グラフ##

data_n = data.query('Publisher=="Nintendo"', engine='python')
data_n.head()

PublisherがNintendoの物だけ抽出します。

new_data = data_n.groupby(data_n['Year'])[['Publisher']].count().rename(columns={'Publisher':'Nintendo'})

カラム名のPublisherをNintendoに変更
Yearでグループ化してNintendoの数を返す

fig = px.line(new_data, height=400, width=800, title='Number of Nintendo releases by year')
fig.show()

# 本来はpx.line(df, x='', y='')みたいに指定

円グラフ##

pub_100 = data[:100].groupby(data['Publisher'])[['Rank']].count()

fig = px.pie(pub_100, names=pub_100.index, values='Rank', title='top100 publishers')
fig.update_traces(pull=[], textinfo="percent+label")
fig.show()
# fig.update_tracesにpull=[ , , , , ]を指定すると指定した数字で間隔が空く

売上TOP１００のゲーム販売会社の割合を円グラフにしたものです。任天堂が圧倒的ですね。

pub_100 = data[:100].groupby(data['Publisher'])[['Rank']].count()
fig = px.pie(pub_100, names=pub_100.index, values='Rank', title='top100 publishers')
fig.update_traces(pull=[0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1], textinfo="percent+label")
fig.show()

↓pull指定

ヒストグラム##

fig = px.histogram(data['Year'], x='Year',template='plotly_white', color_discrete_sequence=["rgb(127,232,186)"])
fig.show()

color_discrete_sequence=["rgb( , , )"]でグラフ色を指定

棒グラフ##

pub_rank = data.groupby(data['Platform'])[['Rank']].count().rename(columns={'Rank':'counts'}).sort_values('counts', ascending=False)

fig = px.bar(pub_rank, x=pub_rank.index, y='counts', color='counts', color_continuous_scale=['rgba(17, 211, 122, 0.6)', 'rgba(17, 141, 171, 0.6)'])
fig.show()

共通項目（引数）##

height=400 #グラフの高さ
width=800 #グラフの横幅
template=' ' #グラフの背景
color_discrete_sequence=["rgb( , , )"] #グラフの色
title=' ' #グラフのタイトル名

棒グラフ＋折れ線グラフ##

yearcount = data.groupby(data['Year'])[['Rank']].count().rename(columns={'Rank':'counts'})
yearsales = data.groupby(data['Year'])[['Global_Sales']].sum()

fig = make_subplots(specs=[[{"secondary_y": True}]])

fig.add_trace(
    go.Bar(x=yearcount.index, y=yearcount['counts'], name='counts'),
    secondary_y=False
)

fig.add_trace(
    go.Scatter(x=yearsales.index, y=yearsales['Global_Sales'], name='Global_Sales'),
    secondary_y=True
)
fig.update_xaxes(title_text='Year')
fig.update_yaxes(title_text='Yearcounts', secondary_y=False)
fig.update_yaxes(title_text='Global_sales', secondary_y=True)

fig.show()

まとめ##

今回は基本的なグラフの実装だったので今後は統計量を用いて色んな形で可視化していければと思っています。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up