More than 5 years have passed since last update.

【Ruby】Rumaleでクラスタリング - DBSCAN -

Last updated at 2019-04-26Posted at 2019-04-26

Rubyの機械学習ライブラリRumaleには、クラスタリングのアルゴリズムDBSCANも搭載されているので試してみました。
https://github.com/yoshoku/rumale

名前のつけかたがよくないですが、BEFOREが元のデータで、AFTERがクラスタリングしたあとの結果です。
AFTER-1をみると、3つに分類されているようですが、だいたいうまくいっているみたいですね。

require 'numo/narray'
require 'numo/gnuplot'
require 'rumale'

DFloat = Numo::DFloat
NMath = Numo::NMath

def main
  all_sample, s1, s2 = create_toydata_1
  run(all_sample, s1, s2, "1")

  all_sample, s1, s2 = create_toydata_2
  run(all_sample, s1, s2, "2")
end

def run(sample, s1, s2, name)
  plot [s1, s2], "BEFORE-#{name}"

  analyzer = Rumale::Clustering::DBSCAN.new(eps: 0.5, min_samples: 5)
  result = analyzer.fit_predict(sample)

  samples = split_sample(sample, result)
  p クラスタの数: samples.size
  plot samples, "AFTER-#{name}"
end

def plot(samples, name)
  Numo.gnuplot do
    reset
    set term: "pngcairo size 400,400"
    set output: "#{name}.png"
    set title: name
    set key: "right bottom"

    data_for_gnuplot = samples.map.with_index do |s, i|
      x = s[true, 0]
      y = s[true, 1]
      [x, y, pt: 6, lw: 2, t: i.to_s]
    end

    plot *data_for_gnuplot
  end
end

def create_sample(x1, y1, x2, y2)
  s1 = DFloat.vstack([x1, y1]).transpose
  s2 = DFloat.vstack([x2, y2]).transpose
  sample = s1.concatenate s2
  [sample, s1, s2]
end

def split_sample(sample, result)
  clusters = result.to_a.uniq
  clusters.map do |i|
    sample[result.eq(i).where, true]
  end
end

def create_toydata_1
  x1 = DFloat.new(100).rand_norm(-1, 0.5)
  y1 = DFloat.new(100).rand_norm(-1, 0.5)
  x2 = DFloat.new(100).rand_norm(1, 0.5)
  y2 = DFloat.new(100).rand_norm(1, 0.5)
  create_sample(x1, y1, x2, y2)
end

def create_toydata_2
  x1 = DFloat.new(100).rand(0, Math::PI)
  y1 = NMath.sin(x1) + DFloat.new(100).rand_norm(0, 0.1)
  x1 -= Math::PI / 4.0
  x2 = DFloat.new(100).rand(-Math::PI, 0)
  y2 = NMath.sin(x2) + DFloat.new(100).rand_norm(0, 0.1)
  x2 += Math::PI / 4.0
  create_sample(x1, y1, x2, y2)
end

main

Rumaleを呼びだすところよりもサンプルデータを作るところの方が行数が長くなる傾向。

参考資料

scikit-learnでDBSCAN(クラスタリング)

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up