More than 1 year has passed since last update.

PythonによるK-Center Greedyアルゴリズムの実装

Posted at 2024-04-13

大規模データセットから効率的に代表的なサンプルを選択する手法として、「K-Center Greedy」アルゴリズムがあります。このアルゴリズムは、クラスタリングや異常検出、データの要約に有効です。今回は、このアルゴリズムをPythonで実装し、その使用方法を紹介します。

1. 必要なライブラリのインポート

from sklearn.datasets import make_blobs
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import pairwise_distances

sklearn.datasets: データ生成用のモジュール
numpy: 数値計算を効率的に行う
matplotlib.pyplot: データの視覚化
sklearn.metrics: 距離計算用の関数

KCenterGreedy クラスの定義

class KCenterGreedy:
    def __init__(self, features):
        self.features = features
        self.distances = pairwise_distances(features, features)

    def select_batch(self, initial_idx=None, N=10):
        if initial_idx is None:
            initial_idx = np.random.randint(len(self.features))
        selected = [initial_idx]
        min_distances = self.distances[initial_idx].copy()

        while len(selected) < N:
            next_idx = np.argmax(min_distances)
            selected.append(next_idx)
            min_distances = np.minimum(min_distances, self.distances[next_idx])

        return np.array(selected)

このクラスは、特徴量データを受け取り、そのペアワイズ距離を計算します。select_batchメソッドでは、貪欲法を用いてデータポイントを選択していきます。

3. データの生成とアルゴリズムの適用

X, _ = make_blobs(n_samples=3000, centers=100, n_features=30, random_state=42)
k_center_greedy = KCenterGreedy(X)
selected_indices = k_center_greedy.select_batch(N=300)
selected_samples = X[selected_indices]

make_blobs関数を使用して合成データを生成し、KCenterGreedyアルゴリズムを適用しています。

結果のプロット

figure = plt.figure()
ax = figure.add_subplot()
ax.scatter(X[:,0], X[:,1], c="blue", alpha=0.5, label="Original Data")
ax.scatter(selected_samples[:,0], selected_samples[:,1], c="red", edgecolors="k", label="Selected Samples")
ax.set_xlabel("Feature 1")
ax.set_ylabel("Feature 2")
ax.grid(True)
plt.show()

生成されたデータと選択されたサンプルをプロットして、アルゴリズムの結果を視覚的に確認できます。

まとめ

K-Center Greedyアルゴリズムは、大規模なデータセットから重要な情報を抽出するのに有用な手法です。このブログ記事で紹介した実装を参考にして、自分のデータ解析プロジェクトに活用してみてください。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up