2
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

【Ruby】Rumaleでクラスタリング - DBSCAN -

Last updated at Posted at 2019-04-26

Rubyの機械学習ライブラリRumaleには、クラスタリングのアルゴリズムDBSCANも搭載されているので試してみました。
https://github.com/yoshoku/rumale

BEFORE-1.pngAFTER-1.png
BEFORE-2.pngAFTER-2.png

名前のつけかたがよくないですが、BEFOREが元のデータで、AFTERがクラスタリングしたあとの結果です。
AFTER-1をみると、3つに分類されているようですが、だいたいうまくいっているみたいですね。

require 'numo/narray'
require 'numo/gnuplot'
require 'rumale'

DFloat = Numo::DFloat
NMath = Numo::NMath

def main
  all_sample, s1, s2 = create_toydata_1
  run(all_sample, s1, s2, "1")

  all_sample, s1, s2 = create_toydata_2
  run(all_sample, s1, s2, "2")
end

def run(sample, s1, s2, name)
  plot [s1, s2], "BEFORE-#{name}"

  analyzer = Rumale::Clustering::DBSCAN.new(eps: 0.5, min_samples: 5)
  result = analyzer.fit_predict(sample)

  samples = split_sample(sample, result)
  p クラスタの数: samples.size
  plot samples, "AFTER-#{name}"
end

def plot(samples, name)
  Numo.gnuplot do
    reset
    set term: "pngcairo size 400,400"
    set output: "#{name}.png"
    set title: name
    set key: "right bottom"

    data_for_gnuplot = samples.map.with_index do |s, i|
      x = s[true, 0]
      y = s[true, 1]
      [x, y, pt: 6, lw: 2, t: i.to_s]
    end

    plot *data_for_gnuplot
  end
end

def create_sample(x1, y1, x2, y2)
  s1 = DFloat.vstack([x1, y1]).transpose
  s2 = DFloat.vstack([x2, y2]).transpose
  sample = s1.concatenate s2
  [sample, s1, s2]
end

def split_sample(sample, result)
  clusters = result.to_a.uniq
  clusters.map do |i|
    sample[result.eq(i).where, true]
  end
end

def create_toydata_1
  x1 = DFloat.new(100).rand_norm(-1, 0.5)
  y1 = DFloat.new(100).rand_norm(-1, 0.5)
  x2 = DFloat.new(100).rand_norm(1, 0.5)
  y2 = DFloat.new(100).rand_norm(1, 0.5)
  create_sample(x1, y1, x2, y2)
end

def create_toydata_2
  x1 = DFloat.new(100).rand(0, Math::PI)
  y1 = NMath.sin(x1) + DFloat.new(100).rand_norm(0, 0.1)
  x1 -= Math::PI / 4.0
  x2 = DFloat.new(100).rand(-Math::PI, 0)
  y2 = NMath.sin(x2) + DFloat.new(100).rand_norm(0, 0.1)
  x2 += Math::PI / 4.0
  create_sample(x1, y1, x2, y2)
end

main

Rumaleを呼びだすところよりもサンプルデータを作るところの方が行数が長くなる傾向。

参考資料

scikit-learnでDBSCAN(クラスタリング)

2
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
2
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?