7

More than 5 years have passed since last update.

関数型でデータサイエンス#4：インプットしたデータを集約する②

16

Last updated at 2018-09-29Posted at 2018-07-09

（この記事は「fukuoka.ex x ザキ研 Advent Calendar 2017」の21日目です)

昨日は、@kobatako さんの「GraphQL for Elixir#3 Middlewareと認証について考える」でした

fukuoka.ex代表のpiacereです
今回もご覧いただいて、ありがとうございます

この連載の、前回までの記事は、以下になります
　|> 関数型でデータサイエンス#1：様々なデータをインプットする
　|> 関数型でデータサイエンス#2：インプットしたデータを変換する
　|> 関数型でデータサイエンス#3：インプットしたデータを集約する①

今回は、前回の前半に続き、「インプットしたデータの集約」の後半を解説します

ElixirのEnumが、平均値・中央値などの「統計」機能を標準で計算できないため、ここを対応していきます

サンプルデータ等は、前回と同じものを使います

　お礼：6/22のfukuoka.ex#11、おかげさまで盛り上がりました　

fukuoka.ex設立から1周年記念となる、fukuoka.ex#11「DB／データサイエンスにコネクトするElixir」、過去最大規模の50名オーバーで、コンテンツもメチャクチャ濃く、とても盛り上がりました

データ集約（続き）

以下について、説明します

平均値
中央値
パーセンタイル
分散値
標準偏差
最頻値

統計ライブラリ「Statistics」のインストール

統計ライブラリを自作しようかな、と思ったら、「Statistics」という統計ライブラリを発見したので、こちらをインストールして、利用します

mix.exsの「def deps do」配下に追記します（:phoenix～の記載の上行に追加）

ついでに、小粒でピリリなユーティリティライブラリ「smallex」のバージョンアップもしておきます

mix.exs

defmodule SampleAnalytics.Mixfile do
  use Mix.Project
…
  defp deps do
    [
      {:statistics, "~> 0.5.1"}, 
      {:smallex, "~> 0.1.8"}, 
      {:phoenix, "~> 1.3.0"},
      …
    ]
  end
…

ライブラリを取得します（要ネット接続）

mix deps.get

Phoenixサーバーを起動することで、コンパイルします

iex -S mix phx.server

平均値

平均値は、データリストにStatistics.mean()を行うことで取得できます

lib/sample_db_web/templates/page/index.html.eex

…
datas = result
  |> Enum.group_by( &( &1[ "Profession" ] ), &( &1[ "ApplicantIncome" ] ) ) 
  |> Enum.map( &( 
    %{ "Profession" => elem( &1, 0 ), "ApplicantIncome" => elem( &1, 1 ) |> 
      Enum.map( fn( n ) -> String.to_integer( n ) end ) |> Statistics.mean } ) )
…

中央値

中央値は、データリストにStatistics.median()を行うことで取得できます

lib/sample_db_web/templates/page/index.html.eex

…
datas = result
  |> Enum.group_by( &( &1[ "Profession" ] ), &( &1[ "ApplicantIncome" ] ) ) 
  |> Enum.map( &( 
    %{ "Profession" => elem( &1, 0 ), "ApplicantIncome" => elem( &1, 1 ) |> 
      Enum.map( fn( n ) -> String.to_integer( n ) end ) |> Statistics.median } ) )
…

パーセンタイル

パーセンタイルは、データリストにStatistics.percentile()を行うことで取得できます

たとえば、90%tileであれば、以下の通りです

lib/sample_db_web/templates/page/index.html.eex

…
datas = result
  |> Enum.group_by( &( &1[ "Profession" ] ), &( &1[ "ApplicantIncome" ] ) ) 
  |> Enum.map( &( 
    %{ "Profession" => elem( &1, 0 ), "ApplicantIncome" => elem( &1, 1 ) |> 
      Enum.map( fn( n ) -> String.to_integer( n ) end ) |> Statistics.percentile( 90 ) } ) )
…

分散値

分散値は、データリストにStatistics.variance()を行うことで取得できます

lib/sample_db_web/templates/page/index.html.eex

…
datas = result
  |> Enum.group_by( &( &1[ "Profession" ] ), &( &1[ "ApplicantIncome" ] ) ) 
  |> Enum.map( &( 
    %{ "Profession" => elem( &1, 0 ), "ApplicantIncome" => elem( &1, 1 ) |> 
      Enum.map( fn( n ) -> String.to_integer( n ) end ) |> Statistics.variance } ) )
…

標準偏差

標準偏差は、データリストにStatistics.stdev()を行うことで取得できます

lib/sample_db_web/templates/page/index.html.eex

…
datas = result
  |> Enum.group_by( &( &1[ "Profession" ] ), &( &1[ "ApplicantIncome" ] ) ) 
  |> Enum.map( &( 
    %{ "Profession" => elem( &1, 0 ), "ApplicantIncome" => elem( &1, 1 ) |> 
      Enum.map( fn( n ) -> String.to_integer( n ) end ) |> Statistics.stdev } ) )
…

最頻値

最頻値は、データリストにStatistics.mode()を行うことで取得できます

lib/sample_db_web/templates/page/index.html.eex

…
datas = result
  |> Enum.group_by( &( &1[ "Profession" ] ), &( &1[ "ApplicantIncome" ] ) ) 
  |> Enum.map( &( 
    %{ "Profession" => elem( &1, 0 ), "ApplicantIncome" => elem( &1, 1 ) |> 
      Enum.map( fn( n ) -> String.to_integer( n ) end ) |> Statistics.mode } ) )
…

終わり

今回は、「インプットしたデータの集約」のうち、統計を伴う集計を行いました

統計ライブラリ「Statistics」のおかげで、だいぶラクができた感じですね

「Statistics」には、上記以外にも、相関係数を取得するStatistics.correlation()や、調和平均を取得するStatistics.harmonic_mean()、尖度を取得するStatistics.kurtosis()、歪度を取得するStatistics.skew()、
Zスコアを取得するStatistics.zscore()など、便利な統計関数が揃っているので、トレーディングなどのデータサイエンス以外の分野でも活用できそうです

次回は、「インプットしたデータの変形」を行います

明日は @yukq16 さんの「Elixirで関数の実行速度を測定する」です

p.s.「いいね」よろしくお願いします

よろしければ、ページ左上のやのクリックをお願いしますー
ここの数字が増えると、書き手としては「ウケている」という感覚が得られ、連載を更に進化させていくモチベーションになりますので、もっとElixirネタを見たいというあなた、私たちと一緒に盛り上げてください！

7

Register as a new user and use Qiita more conveniently

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

7