1
1

More than 3 years have passed since last update.

(個人用メモ)サンキーダイアグラム

Last updated at Posted at 2020-06-28

この記事の目的

以下のような sankey-diagram を作成する.
{0,1} は2種のイベントを表していて, 一定期間に 0 -> 1 -> 1 などのように複数回のイベントが発生するという想定で, そのイベント発生の遷移を可視化することが目的.

image.png

準備

import numpy as np
import pandas as pd
pd.set_option('display.max_columns', 100)
import random

import warnings
warnings.filterwarnings('ignore')

ダミーデータ準備

n=100

agent = [random.choice([1,2,3,4,5,6,7,8,9,10]) for i in range(n)]
day = [random.choice([1,2,3,4,5]) for i in range(n)]
time = [random.random() for i in range(n)]
event = [random.choice([0,1]) for i in range(n)]


df = pd.DataFrame({"agent":agent,
                   "day":day,
                   "time":time,
                   "event":event})

df = df.sort_values(["day","time"])

summary = df.groupby(["agent", "day"])["event"].apply(lambda x: [str(xi) for xi in x]).apply(''.join).reset_index()
summary.head()

image.png

agent (人だったり機械だったり)が, 一定期間のdayという間でどのようにイベントを起こしたか event
というデータである.

max_len_event = max([len(i) for i in summary["event"]])
max_len_event
>>5
summary["event"] = ["{0:-<5}".format(i) for i in summary["event"]]
summary.head()

image.png

value_list = []

for i in range(max_len_event-1):
    tmp = [ei[i:(i+2)] for ei in summary["event"]]

    #0->0
    n = sum([1 if ei=="00" else 0 for ei in tmp])
    value_list.append(n)

    #0->1
    n = sum([1 if ei=="01" else 0 for ei in tmp])
    value_list.append(n)

    #1->0
    n = sum([1 if ei=="10" else 0 for ei in tmp])
    value_list.append(n)

    #1->1
    n = sum([1 if ei=="11" else 0 for ei in tmp])
    value_list.append(n)

source_list = [[i, i] for i in range(max_len_event*2-2)]
source_list = np.array(source_list).flatten()

target_list = [[i, i+1, i, i+1] for i in range(2, max_len_event*2, 2)]
target_list = np.array(target_list).flatten()

label_list = ["0", "1"] * max_len_event
import plotly.graph_objects as go

fig = go.Figure(data=[go.Sankey(
    node = dict(
      pad = 15,
      thickness = 20,
      line = dict(color = "black", width = 0.5),
      label =  label_list,
      color = "blue"
    ),
    link = dict(
      source = source_list,
      target = target_list,
      value = value_list
  ))])

fig.update_layout(title_text="Basic Sankey Diagram", font_size=10)
fig.show()

image.png

以上!

参考

Qiita:【Python】4変数以上の可視化ってどうするの?
Plotly:Sankey Diagram in Python

1
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
1