8
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

【TIPS】マップリストにScikit-learnのimputerっぽい欠損値処理をする

Posted at

この記事は、Elixir Advent Calendar 2024 シリーズ9 の9日目です


【本コラムは、5分で読め、2分で試せます】

piacere です、ご覧いただいてありがとございます :bow:

Scikit-learnのimputerは、AI・LLMの前処理における「欠損値置換処理」を、fillna のような固定値では無く、データの統計値を計算し、実現します

統計アルゴリズムを指定しない場合、各列ごとの平均値(mean)が適用されます

maplist = [
  %{"c1" => "v1", "c2" => 2, "c3" => 3}, 
  %{"c1" => "v2", "c2" => nil, "c3" => 6}, 
  %{"c1" => "v3", "c2" => 5, "c3" => nil}
]

結果:[
  %{"c1" => "v1", "c2" => 2, "c3" => 3}, 
  %{"c1" => "v2", "c2" => 3.5, "c3" => 6}, 
  %{"c1" => "v3", "c2" => 5, "c3" => 4.5}
]

これを実装すると、fillna に近いコードになります

なお、Scikit-learnでは、SimpleImputer 等のImputerインスタンスにより、データを事前に学習させ、統計値を出しますが、ここでは、Statistics ライブラリで即値統計計算で代用します

Mix.install([{:statistics, "~> 0.6"}])

maplist = [
  %{"c1" => "v1", "c2" => 2, "c3" => 3}, 
  %{"c1" => "v2", "c2" => nil, "c3" => 8}, 
  %{"c1" => "v3", "c2" => 1, "c3" => nil}
]
keys = ["c2", "c3"]

keys
|> Enum.reduce(maplist, fn key, acc -> 
    fill = acc 
      |> Enum.map(& &1[key]) 
      |> Enum.reject(& &1 == nil)
      |> Statistics.mean
    acc
    |> Enum.map(& Map.update!(&1, key, fn v -> 
        if v == nil do fill else v end 
      end))
  end)

結果:[
  %{"c1" => "v1", "c2" => 2, "c3" => 3}, 
  %{"c1" => "v2", "c2" => 1.5, "c3" => 8}, 
  %{"c1" => "v3", "c2" => 1, "c3" => 5.5}
]

平均値では無く、最大値(Statistics.max)や最小値(Statistics.min)、最頻値(Statistics.mode)、標準偏差(Statistics.stde)、中央値(Statistics.median)などに変更も可能です

Mix.install([{:statistics, "~> 0.6"}])

maplist = [
  %{"c1" => "v1", "c2" => 2, "c3" => 3}, 
  %{"c1" => "v2", "c2" => nil, "c3" => 8}, 
  %{"c1" => "v3", "c2" => 1, "c3" => nil}
]
keys = ["c2", "c3"]
stat = &Statistics.max/1

keys
|> Enum.reduce(maplist, fn key, acc -> 
    fill = acc 
      |> Enum.map(& &1[key]) 
      |> Enum.reject(& &1 == nil)
      |> stat.()
    acc
    |> Enum.map(& Map.update!(&1, key, fn v -> 
        if v == nil do fill else v end 
      end))
  end)

結果:[
  %{"c1" => "v1", "c2" => 2, "c3" => 3}, 
  %{"c1" => "v2", "c2" => 2, "c3" => 8}, 
  %{"c1" => "v3", "c2" => 1, "c3" => 8}
]

p.s.このコラムが、面白かったり、役に立ったら…

image.png にて、どうぞ応援よろしくお願いします :bow:

8
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
8
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?