LoginSignup
4
4

More than 5 years have passed since last update.

DBSCANアルゴリズム(データクラスタリング)

Last updated at Posted at 2015-04-23

こんにちは。
DBSCANアルゴリズムとは1、データクラスタリングの一種で、近傍探索()と素集合データ構造法との組み合わせに基づいています。

自分でも、scikit-learnのソース2を読み、ほぼ引き写しで DBSCANアルゴリズム(depth-first型とbreadth-first型の二種)を書いてみました。

念のためscikit-learnによる結果と一致するかを検算しました。

$ ./dbscan.py
True : depth-first
True : breadth-first
dbscan.py
#!/usr/bin/env python
# -*- coding: utf-8 -*-

from __future__ import print_function
import numpy as np
import sklearn.cluster
from scipy import spatial

def dbscandepth(neigh, minp):
    ilabel = 0
    labels = [-1 for _ in neigh]  # initialized as noise
    for i in range(len(neigh)):
        if labels[i] >= 0 or len(neigh[i]) < minp:  # visited or noise
            continue
        candid = [i]
        while len(candid) > 0:
            i = candid.pop()
            if len(neigh[i]) >= minp:  # core only
                for j in neigh[i]:
                    if labels[j] < 0:
                        candid.append(j)
                        labels[j] = ilabel
        ilabel += 1
    return labels

def dbscanbreadth(neigh, minp):
    def expandcandid(candid):
        candid = filter(lambda j: len(neigh[j]) >= minp, candid)  # core only
        if candid != []:
            candid = [neigh[j] for j in candid]
            candid = list(set.union(*map(set, candid)))
            candid = filter(lambda j: labels[j] < 0, candid)
        return candid
    ilabel = 0
    labels = [-1 for _ in neigh]  # initialized as noise
    for i, candid in enumerate(neigh):
        if labels[i] >= 0 or len(candid) < minp:  # visited or noise
            continue
        while len(candid) > 0:
            candid = expandcandid(candid)
            for j in candid:
                labels[j] = ilabel
        ilabel += 1
    return labels

def main():
    minp = 4
    radius = 0.1
    points = np.random.uniform(size=(100,2))
    labels_sklearn = list(sklearn.cluster.DBSCAN(radius, minp).fit(points).labels_  # scikit-learn

    tree = spatial.cKDTree(points)
    neigh = [tree.query_ball_point(p, radius) for p in points]  # 近傍探索

    labels = dbscandepth(neigh, minp)
    print(labels == labels_sklearn, ": depth first")
    labels = dbscanbreadth(neigh, minp)
    print(labels == labels_sklearn, ": breadth first")
    return 0

if __name__ == '__main__':
    main()


  1. ここにも記事があります:"DBSCAN Blues

  2. ソースは _dbscan_inner.pyxdbscan_.py 

4
4
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
4
4