More than 5 years have passed since last update.

Google製可視化ツールFacetsをお試し感覚で使う

Last updated at 2018-08-11Posted at 2018-08-11

概要

一年ちょっと前に可視化ツールのFacetsというのが出ていたけど
真面目に使ったことはなかったので、ちょっと使うついでに設定と使い方をメモっておく。

注意:

真っ当なインストール方法ではないです。
使えない機能があったとしても、その都度解決していけば良いというスタンスで試しています。

TL; DR

使いやすくはない。
chromeじゃないと動かない
ここのファイルをパスが通っている場所に置けば、どこでも使えるようにはなる。
非常に重たい

import pandas as pd
from IPython.core.display import display, HTML

train = pd.read_csv("train.csv").to_json(orient='records')

HTML_TEMPLATE = """<link rel="import" href="/nbextensions/facets-dist/facets-jupyter.html">
        <facets-dive sprite-image-width="{sprite_size}" sprite-image-height="{sprite_size}" id="elem" height="600"></facets-dive>
        <script>
          document.querySelector("#elem").data = {jsonstr};
        </script>"""
html = HTML_TEMPLATE.format(jsonstr=train, sprite_size=10)
display(HTML(html))

import pandas as pd
from IPython.core.display import display, HTML
from facets.generic_feature_statistics_generator import GenericFeatureStatisticsGenerator
import base64

train = pd.read_csv("train.csv")
test = pd.read_csv("test.csv")

gfsg = GenericFeatureStatisticsGenerator()
proto = gfsg.ProtoFromDataFrames([{'name': 'train', 'table': train},
                                  {'name': 'test', 'table': test}])
protostr = base64.b64encode(proto.SerializeToString()).decode("utf-8")
HTML_TEMPLATE = """<link rel="import" href="https://raw.githubusercontent.com/PAIR-code/facets/master/facets-dist/facets-jupyter.html" >
        <facets-overview id="elem"></facets-overview>
        <script>
          document.querySelector("#elem").protoInput = "{protostr}";
        </script>"""
html = HTML_TEMPLATE.format(protostr=protostr)
display(HTML(html))

やり方

まずはgit cloneしてくる。

Facets Overview
Facets Dive
というのがメイン機能の模様。
前者が、二つのデータを比較するもので
後者が、特定のデータソースをグリグリと観察するためのもの。

Facets Overview

デフォルトのコードはこんな感じだった。

# Add the facets overview python code to the python path
import sys
sys.path.append('./python')

# Load UCI census train and test data into dataframes.
import pandas as pd
features = ["Age", "Workclass", "fnlwgt", "Education", "Education-Num", "Marital Status",
            "Occupation", "Relationship", "Race", "Sex", "Capital Gain", "Capital Loss",
            "Hours per week", "Country", "Target"]
train_data = pd.read_csv(
    "https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data",
    names=features,
    sep=r'\s*,\s*',
    engine='python',
    na_values="?")
test_data = pd.read_csv(
    "https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.test",
    names=features,
    sep=r'\s*,\s*',
    skiprows=[0],
    engine='python',
    na_values="?")

# Calculate the feature statistics proto from the datasets and stringify it for use in facets overview
from generic_feature_statistics_generator import GenericFeatureStatisticsGenerator
import base64

gfsg = GenericFeatureStatisticsGenerator()
proto = gfsg.ProtoFromDataFrames([{'name': 'train', 'table': train_data},
                                  {'name': 'test', 'table': test_data}])
protostr = base64.b64encode(proto.SerializeToString()).decode("utf-8")

# Display the facets overview visualization for this data
from IPython.core.display import display, HTML

HTML_TEMPLATE = """<link rel="import" href="https://raw.githubusercontent.com/PAIR-code/facets/master/facets-dist/facets-jupyter.html" >
        <facets-overview id="elem"></facets-overview>
        <script>
          document.querySelector("#elem").protoInput = "{protostr}";
        </script>"""
html = HTML_TEMPLATE.format(protostr=protostr)
display(HTML(html))

sys.path.append('./python')

!!?
pipでライブラリをインポートするスタイルじゃないのか。。。
とりあえずどこからでも呼べるように、./pythonをpythonのパスが通っているところに移す。
pyenvを使っているので、具体的には

.pyenv/versions/3.6.3/lib/python3.6

名前もpythonではなく、facetsとしておく。
そして、パスを通さなくしたのでimportの仕方も

from generic_feature_statistics_generator import GenericFeatureStatisticsGenerator

from facets.generic_feature_statistics_generator import GenericFeatureStatisticsGenerator

こんな感じで書き換えて行く。
facets/の中身も同様に書き換えて行く。
これで、一応どこからでも呼べるようになった。

他は、外部から色々とimportしてきて入力するデータをdecode("utf-8")するだけなので
割と簡単に使えた。
~~使い勝手がいいとは言ってない~~

Facets Dive

# Load UCI census and convert to json for sending to the visualization
import pandas as pd
features = ["Age", "Workclass", "fnlwgt", "Education", "Education-Num", "Marital Status",
            "Occupation", "Relationship", "Race", "Sex", "Capital Gain", "Capital Loss",
            "Hours per week", "Country", "Target"]

# Load dataframe from external CSV and add header information
df = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.test",
    names=features, # name features for header row
    sep=r'\s*,\s*', # separator used in this dataset
    engine='python',
    skiprows=[0], # skip first row without data 
    na_values="?") # add ? where data is missing

# set the sprite_size based on the number of records in dataset,
# larger datasets can crash the browser if the size is too large (>50000)
sprite_size = 32 if len(df.index)>50000 else 64

jsonstr = df.to_json(orient='records')

# Display the Dive visualization for this data
from IPython.core.display import display, HTML

# Create Facets template  
HTML_TEMPLATE = """<link rel="import" href="/nbextensions/facets-dist/facets-jupyter.html">
        <facets-dive sprite-image-width="{sprite_size}" sprite-image-height="{sprite_size}" id="elem" height="600"></facets-dive>
        <script>
          document.querySelector("#elem").data = {jsonstr};
        </script>"""

# Load the json dataset and the sprite_size into the template
html = HTML_TEMPLATE.format(jsonstr=jsonstr, sprite_size=sprite_size)

# Display the template
display(HTML(html))

こっちは特に追加のライブラリは入れなくてもいい模様。
ただ、

href="/nbextensions/facets-dist/facets-jupyter.html"

動くものの、404が出る。。。
ということで、overviewと同じようにhttps://raw.githubusercontent.com/PAIR-code/facets/master/facets-dist/facets-jupyter.html にrelを書き換えると。。。動かない

まとめ

3dでグリグリできるやつが欲しい。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up