More than 3 years have passed since last update.

PCAUMAP: 次元削減手法PCAとUMAPを直列につなげた解析を簡単に

Last updated at 2021-03-26Posted at 2021-03-24

次元削減手法として有名なものに主成分分析（PCA, Principal Component Analysis）、そして最近注目を浴びているものにUMAP（Uniform Manifold Approximation and Projection for Dimension Reduction）というものがあります。

今回は、その２つを直列に繋げてデータ構造をチョイ見するためのツール PCAUMAP を作りました。以下のコードは Google Colaboratory 上での動作を前提としています。

インストール

PCAUMAP を github レポジトリからインストールします。コードは https://github.com/maskot1977/PCAUMAP.git から見られます。

!pip install git+https://github.com/maskot1977/PCAUMAP.git

Collecting git+https://github.com/maskot1977/PCAUMAP.git
  Cloning https://github.com/maskot1977/PCAUMAP.git to /tmp/pip-req-build-y2pfqm58
  Running command git clone -q https://github.com/maskot1977/PCAUMAP.git /tmp/pip-req-build-y2pfqm58
Building wheels for collected packages: pcaumap
  Building wheel for pcaumap (setup.py) ... [?25l[?25hdone
  Created wheel for pcaumap: filename=pcaumap-0.1.0-cp37-none-any.whl size=4069 sha256=105a1af1e8b544dd1bcc83563ee0e333d9a1f48b873c6891075cac9ac45b215e
  Stored in directory: /tmp/pip-ephem-wheel-cache-539fnps9/wheels/38/75/23/32a1f509a49530de49787801ee6e723d02375f371a299eaf76
Successfully built pcaumap
Installing collected packages: pcaumap
Successfully installed pcaumap-0.1.0

UMAP のインストール。「umap」ではなく「umap-learn」とすることに注意してください。

!pip install umap-learn

Requirement already satisfied: umap-learn in /usr/local/lib/python3.7/dist-packages (0.5.1)
Requirement already satisfied: scipy>=1.0 in /usr/local/lib/python3.7/dist-packages (from umap-learn) (1.4.1)
Requirement already satisfied: scikit-learn>=0.22 in /usr/local/lib/python3.7/dist-packages (from umap-learn) (0.22.2.post1)
Requirement already satisfied: pynndescent>=0.5 in /usr/local/lib/python3.7/dist-packages (from umap-learn) (0.5.2)
Requirement already satisfied: numba>=0.49 in /usr/local/lib/python3.7/dist-packages (from umap-learn) (0.51.2)
Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.7/dist-packages (from umap-learn) (1.19.5)
Requirement already satisfied: joblib>=0.11 in /usr/local/lib/python3.7/dist-packages (from scikit-learn>=0.22->umap-learn) (1.0.1)
Requirement already satisfied: llvmlite>=0.30 in /usr/local/lib/python3.7/dist-packages (from pynndescent>=0.5->umap-learn) (0.34.0)
Requirement already satisfied: setuptools in /usr/local/lib/python3.7/dist-packages (from numba>=0.49->umap-learn) (54.1.2)

基本操作（分類用データ）

次のコードは、PCAを行って、基本的な

主成分プロット
ローディングプロット
累積寄与率表示

の表示を行います。

from pcaumap import PCAUmap
import sklearn.datasets

dataset = sklearn.datasets.load_breast_cancer()

pcau = PCAUmap()
pcau.fit(dataset.data)
pcau.pca_summary(c=dataset.target)

続いて、PCAした結果をUMAPしたものを表示するコードはこちらです。

import matplotlib.pyplot as plt

plt.figure(figsize=(6, 6))
plt.scatter(pcau.embedding[:, 0], pcau.embedding[:, 1], alpha=0.5, c=dataset.target)
plt.grid()
plt.show()

次のコードのようにして、scikit-learnの分類モデルを適用して、分類結果のランドスケープを眺めることができます。この例ではパラメーターチューニングを全くしていないランダムフォレストを用いていますが、必要に応じて、ゴリゴリにチューニングした任意のモデルを用いることができます。

from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier()
model.fit(dataset.data, dataset.target)

pcau.map_predicted_values(model)

基本操作（回帰用データ）

回帰用のデータと scikit-learn の回帰モデルを適用すれば、下記のように、ほぼ同じコードで回帰結果のランドスケープを眺めることができます。

from pcaumap import PCAUmap
import sklearn.datasets

dataset = sklearn.datasets.load_diabetes()

pcau = PCAUmap()
pcau.fit(dataset.data)
pcau.pca_summary(c=dataset.target)

import matplotlib.pyplot as plt

plt.figure(figsize=(6, 6))
plt.scatter(pcau.embedding[:, 0], pcau.embedding[:, 1], alpha=0.5, c=dataset.target)
plt.grid()
plt.show()

from sklearn.ensemble import RandomForestRegressor

model = RandomForestRegressor()
model.fit(dataset.data, dataset.target)

pcau.map_predicted_values(model)

細かい設定

以上、PCAUMAPの簡単な説明をしました。もうちょっと細かい設定を行うこともできるのですが、それはgithubレポジトリ https://github.com/maskot1977/PCAUMAP.git のコードをご参照してください。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up