0
2

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

Google製可視化ツールFacetsをお試し感覚で使う

Last updated at Posted at 2018-08-11

概要

一年ちょっと前に可視化ツールのFacetsというのが出ていたけど
真面目に使ったことはなかったので、ちょっと使うついでに設定と使い方をメモっておく。

注意:

真っ当なインストール方法ではないです。
使えない機能があったとしても、その都度解決していけば良いというスタンスで試しています。

TL; DR

  • 使いやすくはない。
  • chromeじゃないと動かない
  • ここのファイルをパスが通っている場所に置けば、どこでも使えるようにはなる。
  • 非常に重たい
import pandas as pd
from IPython.core.display import display, HTML

train = pd.read_csv("train.csv").to_json(orient='records')

HTML_TEMPLATE = """<link rel="import" href="/nbextensions/facets-dist/facets-jupyter.html">
        <facets-dive sprite-image-width="{sprite_size}" sprite-image-height="{sprite_size}" id="elem" height="600"></facets-dive>
        <script>
          document.querySelector("#elem").data = {jsonstr};
        </script>"""
html = HTML_TEMPLATE.format(jsonstr=train, sprite_size=10)
display(HTML(html))
import pandas as pd
from IPython.core.display import display, HTML
from facets.generic_feature_statistics_generator import GenericFeatureStatisticsGenerator
import base64

train = pd.read_csv("train.csv")
test = pd.read_csv("test.csv")

gfsg = GenericFeatureStatisticsGenerator()
proto = gfsg.ProtoFromDataFrames([{'name': 'train', 'table': train},
                                  {'name': 'test', 'table': test}])
protostr = base64.b64encode(proto.SerializeToString()).decode("utf-8")
HTML_TEMPLATE = """<link rel="import" href="https://raw.githubusercontent.com/PAIR-code/facets/master/facets-dist/facets-jupyter.html" >
        <facets-overview id="elem"></facets-overview>
        <script>
          document.querySelector("#elem").protoInput = "{protostr}";
        </script>"""
html = HTML_TEMPLATE.format(protostr=protostr)
display(HTML(html))

やり方

まずはgit cloneしてくる。

  • Facets Overview
  • Facets Dive
    というのがメイン機能の模様。
    前者が、二つのデータを比較するもので
    後者が、特定のデータソースをグリグリと観察するためのもの。

Facets Overview

デフォルトのコードはこんな感じだった。

# Add the facets overview python code to the python path
import sys
sys.path.append('./python')

# Load UCI census train and test data into dataframes.
import pandas as pd
features = ["Age", "Workclass", "fnlwgt", "Education", "Education-Num", "Marital Status",
            "Occupation", "Relationship", "Race", "Sex", "Capital Gain", "Capital Loss",
            "Hours per week", "Country", "Target"]
train_data = pd.read_csv(
    "https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data",
    names=features,
    sep=r'\s*,\s*',
    engine='python',
    na_values="?")
test_data = pd.read_csv(
    "https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.test",
    names=features,
    sep=r'\s*,\s*',
    skiprows=[0],
    engine='python',
    na_values="?")

# Calculate the feature statistics proto from the datasets and stringify it for use in facets overview
from generic_feature_statistics_generator import GenericFeatureStatisticsGenerator
import base64

gfsg = GenericFeatureStatisticsGenerator()
proto = gfsg.ProtoFromDataFrames([{'name': 'train', 'table': train_data},
                                  {'name': 'test', 'table': test_data}])
protostr = base64.b64encode(proto.SerializeToString()).decode("utf-8")

# Display the facets overview visualization for this data
from IPython.core.display import display, HTML

HTML_TEMPLATE = """<link rel="import" href="https://raw.githubusercontent.com/PAIR-code/facets/master/facets-dist/facets-jupyter.html" >
        <facets-overview id="elem"></facets-overview>
        <script>
          document.querySelector("#elem").protoInput = "{protostr}";
        </script>"""
html = HTML_TEMPLATE.format(protostr=protostr)
display(HTML(html))

sys.path.append('./python')

!!?
pipでライブラリをインポートするスタイルじゃないのか。。。
とりあえずどこからでも呼べるように、./pythonをpythonのパスが通っているところに移す。
pyenvを使っているので、具体的には

.pyenv/versions/3.6.3/lib/python3.6

名前もpythonではなく、facetsとしておく。
そして、パスを通さなくしたのでimportの仕方も

from generic_feature_statistics_generator import GenericFeatureStatisticsGenerator
from facets.generic_feature_statistics_generator import GenericFeatureStatisticsGenerator

こんな感じで書き換えて行く。
facets/の中身も同様に書き換えて行く。
これで、一応どこからでも呼べるようになった。

他は、外部から色々とimportしてきて入力するデータをdecode("utf-8")するだけなので
割と簡単に使えた。
使い勝手がいいとは言ってない

Facets Dive

# Load UCI census and convert to json for sending to the visualization
import pandas as pd
features = ["Age", "Workclass", "fnlwgt", "Education", "Education-Num", "Marital Status",
            "Occupation", "Relationship", "Race", "Sex", "Capital Gain", "Capital Loss",
            "Hours per week", "Country", "Target"]

# Load dataframe from external CSV and add header information
df = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.test",
    names=features, # name features for header row
    sep=r'\s*,\s*', # separator used in this dataset
    engine='python',
    skiprows=[0], # skip first row without data 
    na_values="?") # add ? where data is missing

# set the sprite_size based on the number of records in dataset,
# larger datasets can crash the browser if the size is too large (>50000)
sprite_size = 32 if len(df.index)>50000 else 64

jsonstr = df.to_json(orient='records')

# Display the Dive visualization for this data
from IPython.core.display import display, HTML

# Create Facets template  
HTML_TEMPLATE = """<link rel="import" href="/nbextensions/facets-dist/facets-jupyter.html">
        <facets-dive sprite-image-width="{sprite_size}" sprite-image-height="{sprite_size}" id="elem" height="600"></facets-dive>
        <script>
          document.querySelector("#elem").data = {jsonstr};
        </script>"""

# Load the json dataset and the sprite_size into the template
html = HTML_TEMPLATE.format(jsonstr=jsonstr, sprite_size=sprite_size)

# Display the template
display(HTML(html))

こっちは特に追加のライブラリは入れなくてもいい模様。
ただ、

href="/nbextensions/facets-dist/facets-jupyter.html"

動くものの、404が出る。。。
ということで、overviewと同じようにhttps://raw.githubusercontent.com/PAIR-code/facets/master/facets-dist/facets-jupyter.html にrelを書き換えると。。。動かない

まとめ

3dでグリグリできるやつが欲しい。

0
2
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
2

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?