More than 3 years have passed since last update.

DBSCANのJavaScriptによる実装

Last updated at 2020-11-08Posted at 2020-11-08

はじめに

「色々な機械学習処理をブラウザ上で試せるサイトを作った」中で実装したモデルの解説の五回目です。

今回はDBSCANの実装についてです。

デモはこちらから。（TaskをClusteringにして、ModelのDBSCANを選択）
実際のコードはdbscan.jsにあります。

なお、可視化部分については一切触れません。

概説

Wikipediaの擬似コードをそのままコードに起こしただけとなります。
なので、解説することがありません。

コード中のgetNeighborsが、上記の擬似コードのregionQueryとなります。

アルゴリズム

日本語で書いただけになります。

クラスタ番号$C$を$0$に初期化する
各データ間の距離を計算する
各データ$P$に対して
1. $P$にすでに訪問していた場合は、次のデータの処理に移る
2. $P$を訪問したものとする
3. $P$の近傍データ$P_{near}$を取得する
4. $P_{near}$のデータ数が一定値に満たない場合、$P$はノイズとし、次のデータの処理に移る
5. $P$のクラスタを$C$とする
6. $P_{near}$の各データ$P'$に対して
  1. $P'$に訪問していない場合
    1. $P'$を訪問したものとする
    2. $P'$の近傍データ$P_{near}'$を取得する
    3. $P_{near}'$のデータ数が一定値以上の場合は、$P_{near}$に$P_{near}'$を追加する
  2. $P'$のクラスタが決定していない場合は、$P'$のクラスタを$C$とする
7. $C$に$1$を加える

コード

ノイズはカテゴリ-1に対応します。

class DBSCAN {
	// https://ja.wikipedia.org/wiki/DBSCAN
	constructor(eps = 0.5, minPts = 5, metric = 'euclid') {
		this._eps = eps;
		this._minPts = minPts;

		this._metric = metric
		switch (this._metric) {
		case 'euclid':
			this._d = (a, b) => Math.sqrt(a.reduce((s, v, i) => s + (v - b[i]) ** 2, 0));
			break
		case 'manhattan':
			this._d = (a, b) => a.reduce((s, v, i) => s + Math.abs(v - b[i]), 0)
			break
		case 'chebyshev':
			this._d = (a, b) => Math.max(...a.map((v, i) => Math.abs(v - b[i])))
			break;
		}
	}

	predict(datas) {
		let c = 0;
		const n = datas.length;
		const visited = Array(n).fill(false);
		const cluster = Array(n);
		const d = Array(n);
		for (let i = 0; i < n; d[i++] = Array(n));
		for (let i = 0; i < n; i++) {
			for (let j = 0; j < i; j++) {
				const v = this._d(datas[i], datas[j]);
				d[i][j] = d[j][i] = v;
			}
		}
		const getNeighbors = (i) => {
			const neighbors = [];
			for (let k = 0; k < n; k++) {
				if (d[i][k] < this._eps) neighbors.push(k);
			}
			return neighbors
		}
		for (let i = 0; i < n; i++) {
			if (visited[i]) continue;
			visited[i] = true;
			const neighbors = getNeighbors(i);
			if (neighbors.length < this._minPts) {
				cluster[i] = -1;
				continue;
			}
			const clst = c++;
			cluster[i] = clst;
			while(neighbors.length > 0) {
				const k = neighbors.pop();
				if (!visited[k]) {
					visited[k] = true
					const ns = getNeighbors(k);
					if (ns.length >= this._minPts) {
						neighbors.push(...ns);
					}
				}
				if (!cluster[k]) cluster[k] = clst;
			}
		}
		return cluster;
	}
}

考慮点

この記事を書いているときに、一か所うまく動かない可能性のある場所を見つけたので記載しておきます。

クラスタ番号clstが0から始まるため、最後の場所

				if (!cluster[k]) cluster[k] = clst;

が正常に動作しないと思われます。

修正方針としては、次のどちらか

if (!cluster[k])をif (cluster[k] === undefined)に変更する
clst = c++をclst = ++cに変更する

さいごに

配列について、Array(n)などとして初期容量を指定していますが、不要、あるいはむしろ効率が悪くなる、という考え方もあるようです。
特にこの程度のコードであれば、単に[]としたほうが分かりやすいのではないでしょうか。

また、fillも不要です。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up