0
2

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 1 year has passed since last update.

Ivis で教師なし・教師あり・半教師あり次元削減

Last updated at Posted at 2022-11-22

以前に UMAP, t-SNE, Isomap の教師ありハイパーパラメーターチューニングという記事を書きましたが、教師あり・教師なし・半教師ありの次元削減ができる Ivis というものがあるので、使ってみました。

Ivis ダウンロード

!pip install git+https://github.com/beringresearch/ivis.git
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting git+https://github.com/beringresearch/ivis.git
  Cloning https://github.com/beringresearch/ivis.git to /tmp/pip-req-build-fmiazz46
  Running command git clone -q https://github.com/beringresearch/ivis.git /tmp/pip-req-build-fmiazz46
Requirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from ivis==2.0.8) (1.21.6)
Requirement already satisfied: scikit-learn>0.20.0 in /usr/local/lib/python3.7/dist-packages (from ivis==2.0.8) (1.0.2)
Collecting annoy>=1.15.2
  Downloading annoy-1.17.1.tar.gz (647 kB)
[K     |████████████████████████████████| 647 kB 5.1 MB/s 
[?25hRequirement already satisfied: tqdm in /usr/local/lib/python3.7/dist-packages (from ivis==2.0.8) (4.64.1)
Requirement already satisfied: dill in /usr/local/lib/python3.7/dist-packages (from ivis==2.0.8) (0.3.6)
Requirement already satisfied: joblib>=0.11 in /usr/local/lib/python3.7/dist-packages (from scikit-learn>0.20.0->ivis==2.0.8) (1.2.0)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.7/dist-packages (from scikit-learn>0.20.0->ivis==2.0.8) (3.1.0)
Requirement already satisfied: scipy>=1.1.0 in /usr/local/lib/python3.7/dist-packages (from scikit-learn>0.20.0->ivis==2.0.8) (1.7.3)
Building wheels for collected packages: ivis, annoy
  Building wheel for ivis (setup.py) ... [?25l[?25hdone
  Created wheel for ivis: filename=ivis-2.0.8-py3-none-any.whl size=35462 sha256=2254db91d6b5d1e2b898313c211ff5fc15bc0b165301dbcd03e2cbda32254878
  Stored in directory: /tmp/pip-ephem-wheel-cache-xjhcw8qi/wheels/d1/d7/92/46dbb25fa631e6ed1b333d872b9898f8d2df3f5437452fb834
  Building wheel for annoy (setup.py) ... [?25l[?25hdone
  Created wheel for annoy: filename=annoy-1.17.1-cp37-cp37m-linux_x86_64.whl size=395185 sha256=a0bf153dd01cd1bcb4a5062d921af7467759c698f8d5e15857512ec38fc2d6e8
  Stored in directory: /root/.cache/pip/wheels/81/94/bf/92cb0e4fef8770fe9c6df0ba588fca30ab7c306b6048ae8a54
Successfully built ivis annoy
Installing collected packages: annoy, ivis
Successfully installed annoy-1.17.1 ivis-2.0.8

糖尿病データを教師なしで

お試し用のデータとして、scikit-learn で取得できる糖尿病データを教師なしで用いてみます。まずは学習から。

from ivis import Ivis
import sklearn.datasets
import matplotlib.pyplot as plt

dataset = sklearn.datasets.load_diabetes()
mapper = Ivis()
mapper.fit(dataset.data)
Building KNN index


100%|██████████| 442/442 [00:00<00:00, 135181.74it/s]


Extracting KNN neighbours


100%|██████████| 442/442 [00:00<00:00, 869.80it/s] 


Training neural network
Epoch 1/1000
4/4 [==============================] - 2s 21ms/step - loss: 1.2240
Epoch 2/1000
4/4 [==============================] - 0s 14ms/step - loss: 1.1184
Epoch 3/1000
4/4 [==============================] - 0s 12ms/step - loss: 1.0714
Epoch 4/1000
4/4 [==============================] - 0s 10ms/step - loss: 1.1124
Epoch 5/1000
4/4 [==============================] - 0s 9ms/step - loss: 1.0280
(中略)
Epoch 168/1000
4/4 [==============================] - 0s 11ms/step - loss: 0.5203
Epoch 169/1000
4/4 [==============================] - 0s 10ms/step - loss: 0.5536

学習が終わったら、次のようにして transform します。

embedding = mapper.transform(dataset.data)
4/4 [==============================] - 0s 7ms/step

いちおう shape を確認しましょう。次元数はデフォルトで 2 ですが、好みで他の次元数も当然設定できます。

embedding.shape
(442, 2)

次元削減の結果を図示します。

plt.scatter(embedding[:, 0], embedding[:, 1], c=dataset.target, alpha=0.5)
plt.colorbar()
plt.show()

Ivis_for_Qiita_8_0.png

Ivis は Tensorflow に基づいた深層学習を行なっており、乱数を固定しないと動作ごとに結果が異なります。同じ計算を再度行った結果がこちらです。

Ivis_for_Qiita_9_7.png

糖尿病データを教師あり(回帰)で

今度は、同じ糖尿病データですが、目的変数を設定し、回帰モデルを用いて教師あり次元削減します。

from ivis import Ivis
import sklearn.datasets
import matplotlib.pyplot as plt

dataset = sklearn.datasets.load_diabetes()
mapper = Ivis(supervision_metric="mae")
mapper.fit(dataset.data, dataset.target)
embedding = mapper.transform(dataset.data)
plt.scatter(embedding[:, 0], embedding[:, 1], c=dataset.target, alpha=0.5)
plt.colorbar()
plt.show()
Building KNN index


100%|██████████| 442/442 [00:00<00:00, 91131.22it/s]


Extracting KNN neighbours


100%|██████████| 442/442 [00:00<00:00, 1086.21it/s]


Training neural network
Epoch 1/1000
4/4 [==============================] - 2s 9ms/step - loss: 76.6442 - stacked_triplets_loss: 1.1391 - supervised_loss: 152.1492
Epoch 2/1000
4/4 [==============================] - 0s 10ms/step - loss: 76.5158 - stacked_triplets_loss: 1.0523 - supervised_loss: 151.9793
Epoch 3/1000
4/4 [==============================] - 0s 14ms/step - loss: 76.4499 - stacked_triplets_loss: 1.1045 - supervised_loss: 151.7953
Epoch 4/1000
4/4 [==============================] - 0s 10ms/step - loss: 76.3735 - stacked_triplets_loss: 1.0908 - supervised_loss: 151.6561
Epoch 5/1000
4/4 [==============================] - 0s 12ms/step - loss: 76.2924 - stacked_triplets_loss: 1.0760 - supervised_loss: 151.5088
(中略)
Epoch 149/1000
4/4 [==============================] - 0s 9ms/step - loss: 28.2583 - stacked_triplets_loss: 6.5393 - supervised_loss: 49.9772
Epoch 150/1000
4/4 [==============================] - 0s 12ms/step - loss: 28.7334 - stacked_triplets_loss: 6.4428 - supervised_loss: 51.0239
4/4 [==============================] - 0s 6ms/step

Ivis_for_Qiita_10_5.png

同じ計算を再度行った結果です。

Ivis_for_Qiita_11_7.png

教師ありを用いると、目的変数の大小をある程度反映した次元削減ができている気がしますね。

乳がんデータを教師なしで

次は、分類用データセットである乳がんデータを用いてみましょう。まずは目的変数を与えず、教師なしで次元削減してみます。

from ivis import Ivis
import sklearn.datasets
import matplotlib.pyplot as plt

dataset = sklearn.datasets.load_breast_cancer()
mapper = Ivis()
mapper.fit(dataset.data)
embedding = mapper.transform(dataset.data)
plt.scatter(embedding[:, 0], embedding[:, 1], c=dataset.target, alpha=0.5)
plt.colorbar()
plt.show()
Building KNN index


100%|██████████| 569/569 [00:00<00:00, 125173.55it/s]


Extracting KNN neighbours


100%|██████████| 569/569 [00:00<00:00, 1404.38it/s]


Training neural network
Epoch 1/1000
5/5 [==============================] - 1s 10ms/step - loss: 18.8974
Epoch 2/1000
5/5 [==============================] - 0s 10ms/step - loss: 11.1786
Epoch 3/1000
5/5 [==============================] - 0s 12ms/step - loss: 9.5566
Epoch 4/1000
5/5 [==============================] - 0s 12ms/step - loss: 8.4553
Epoch 5/1000
5/5 [==============================] - 0s 12ms/step - loss: 8.2069
(中略)
Epoch 82/1000
5/5 [==============================] - 0s 10ms/step - loss: 1.0038
Epoch 83/1000
5/5 [==============================] - 0s 10ms/step - loss: 0.8592
5/5 [==============================] - 0s 4ms/step

Ivis_for_Qiita_13_5.png

同じ計算を再度行った結果です。

Ivis_for_Qiita_14_5.png

結果にだいぶ違いがあるように(結果が安定しないように)見えますね...

乳がんデータを教師あり(分類)で

同じデータに対して、今度は分類用の目的変数を与えて、教師あり次元削減してみましょう。

from ivis import Ivis
import sklearn.datasets
import matplotlib.pyplot as plt

dataset = sklearn.datasets.load_breast_cancer()
mapper = Ivis()
mapper.fit(dataset.data, dataset.target)
embedding = mapper.transform(dataset.data)
plt.scatter(embedding[:, 0], embedding[:, 1], c=dataset.target, alpha=0.5)
plt.colorbar()
plt.show()
Building KNN index


100%|██████████| 569/569 [00:00<00:00, 53078.29it/s]


Extracting KNN neighbours


100%|██████████| 569/569 [00:00<00:00, 681.58it/s]

Training neural network
Epoch 1/1000





5/5 [==============================] - 2s 10ms/step - loss: 16.7840 - stacked_triplets_loss: 15.3962 - supervised_loss: 18.1718
Epoch 2/1000
5/5 [==============================] - 0s 10ms/step - loss: 11.9572 - stacked_triplets_loss: 13.6644 - supervised_loss: 10.2501
Epoch 3/1000
5/5 [==============================] - 0s 16ms/step - loss: 9.0844 - stacked_triplets_loss: 10.7357 - supervised_loss: 7.4332
Epoch 4/1000
5/5 [==============================] - 0s 11ms/step - loss: 8.2655 - stacked_triplets_loss: 8.6759 - supervised_loss: 7.8552
Epoch 5/1000
5/5 [==============================] - 0s 9ms/step - loss: 8.3221 - stacked_triplets_loss: 8.7451 - supervised_loss: 7.8990
(中略)
Epoch 210/1000
5/5 [==============================] - 0s 12ms/step - loss: 0.3639 - stacked_triplets_loss: 0.4976 - supervised_loss: 0.2302
Epoch 211/1000
5/5 [==============================] - 0s 11ms/step - loss: 0.3241 - stacked_triplets_loss: 0.4379 - supervised_loss: 0.2104
5/5 [==============================] - 0s 4ms/step

Ivis_for_Qiita_15_7.png

同じ計算を再度行った結果です。

Ivis_for_Qiita_16_5.png

目的変数の違いが多少(少しだけ?)反映された次元削減になっていそうです。

ワインのデータ

ワインのデータについても同様に、教師なし・教師あり次元削減を行います。まずは教師なしから。

from ivis import Ivis
import sklearn.datasets
import matplotlib.pyplot as plt

dataset = sklearn.datasets.load_wine()
mapper = Ivis()
mapper.fit(dataset.data)
embedding = mapper.transform(dataset.data)
plt.scatter(embedding[:, 0], embedding[:, 1], c=dataset.target, alpha=0.5)
plt.colorbar()
plt.show()
Building KNN index


100%|██████████| 178/178 [00:00<00:00, 180160.74it/s]


Extracting KNN neighbours


100%|██████████| 178/178 [00:00<00:00, 573.19it/s] 


Training neural network
Epoch 1/1000
2/2 [==============================] - 1s 14ms/step - loss: 19.4780
Epoch 2/1000
2/2 [==============================] - 0s 12ms/step - loss: 14.3163
Epoch 3/1000
2/2 [==============================] - 0s 13ms/step - loss: 11.7559
Epoch 4/1000
2/2 [==============================] - 0s 9ms/step - loss: 11.7646
Epoch 5/1000
2/2 [==============================] - 0s 10ms/step - loss: 16.4651
(中略)
Epoch 70/1000
2/2 [==============================] - 0s 17ms/step - loss: 1.4658
Epoch 71/1000
2/2 [==============================] - 0s 16ms/step - loss: 1.6900
2/2 [==============================] - 0s 6ms/step

Ivis_for_Qiita_17_5.png

同じ計算を再度行った結果です。

Ivis_for_Qiita_18_5.png

続いて、同様に説明変数(分類用)を与えた教師あり次元削減です。

from ivis import Ivis
import sklearn.datasets
import matplotlib.pyplot as plt

dataset = sklearn.datasets.load_wine()
mapper = Ivis()
mapper.fit(dataset.data, dataset.target)
embedding = mapper.transform(dataset.data)
plt.scatter(embedding[:, 0], embedding[:, 1], c=dataset.target, alpha=0.5)
plt.colorbar()
plt.show()
Building KNN index


100%|██████████| 178/178 [00:00<00:00, 167546.25it/s]


Extracting KNN neighbours


100%|██████████| 178/178 [00:00<00:00, 577.28it/s] 

Training neural network
Epoch 1/1000





2/2 [==============================] - 2s 16ms/step - loss: 56.4703 - stacked_triplets_loss: 19.4923 - supervised_loss: 93.4482
Epoch 2/1000
2/2 [==============================] - 0s 14ms/step - loss: 27.5077 - stacked_triplets_loss: 19.3239 - supervised_loss: 35.6914
Epoch 3/1000
2/2 [==============================] - 0s 17ms/step - loss: 30.7517 - stacked_triplets_loss: 11.3563 - supervised_loss: 50.1470
Epoch 4/1000
2/2 [==============================] - 0s 16ms/step - loss: 30.9872 - stacked_triplets_loss: 18.4747 - supervised_loss: 43.4996
Epoch 5/1000
2/2 [==============================] - 0s 15ms/step - loss: 19.9857 - stacked_triplets_loss: 13.8399 - supervised_loss: 26.1315
(中略)
Epoch 111/1000
2/2 [==============================] - 0s 20ms/step - loss: 1.0617 - stacked_triplets_loss: 0.9649 - supervised_loss: 1.1585
Epoch 112/1000
2/2 [==============================] - 0s 26ms/step - loss: 2.1483 - stacked_triplets_loss: 2.9829 - supervised_loss: 1.3138


WARNING:tensorflow:5 out of the last 15 calls to <function Model.make_predict_function.<locals>.predict_function at 0x7fc88118b320> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.


2/2 [==============================] - 0s 9ms/step

おや、何か Warning が出てる...(でも放置)

Ivis_for_Qiita_19_9.png

Ivis_for_Qiita_20_7.png

改善してないように見える。Warningに対処しなければいけないのかな...(でも放置)

手書き数字データ

最後に、手書き数字データセットを取り扱ってみましょう。まずは説明変数を与えない教師なし次元削減から。

from ivis import Ivis
import sklearn.datasets
import matplotlib.pyplot as plt

dataset = sklearn.datasets.load_digits()
mapper = Ivis()
mapper.fit(dataset.data)
embedding = mapper.transform(dataset.data)
plt.scatter(embedding[:, 0], embedding[:, 1], c=dataset.target, alpha=0.5)
plt.colorbar()
plt.show()
Building KNN index


100%|██████████| 1797/1797 [00:00<00:00, 116449.04it/s]


Extracting KNN neighbours


100%|██████████| 1797/1797 [00:01<00:00, 1148.54it/s]

Training neural network
Epoch 1/1000





15/15 [==============================] - 1s 11ms/step - loss: 1.0162
Epoch 2/1000
15/15 [==============================] - 0s 12ms/step - loss: 0.8303
Epoch 3/1000
15/15 [==============================] - 0s 16ms/step - loss: 0.5445
Epoch 4/1000
15/15 [==============================] - 0s 12ms/step - loss: 0.5609
Epoch 5/1000
15/15 [==============================] - 0s 13ms/step - loss: 0.4943
(中略)
Epoch 132/1000
15/15 [==============================] - 0s 11ms/step - loss: 0.1893
Epoch 133/1000
15/15 [==============================] - 0s 11ms/step - loss: 0.1837
15/15 [==============================] - 0s 3ms/step

Ivis_for_Qiita_21_7.png

Ivis_for_Qiita_22_7.png

続いて、説明変数を与える教師あり次元削減。

from ivis import Ivis
import sklearn.datasets
import matplotlib.pyplot as plt

dataset = sklearn.datasets.load_digits()
mapper = Ivis()
mapper.fit(dataset.data, dataset.target)
embedding = mapper.transform(dataset.data)
plt.scatter(embedding[:, 0], embedding[:, 1], c=dataset.target, alpha=0.5)
plt.colorbar()
plt.show()
Building KNN index


100%|██████████| 1797/1797 [00:00<00:00, 40648.71it/s]

Extracting KNN neighbours



100%|██████████| 1797/1797 [00:01<00:00, 1011.28it/s]


Training neural network
Epoch 1/1000
15/15 [==============================] - 2s 11ms/step - loss: 1.7909 - stacked_triplets_loss: 1.0847 - supervised_loss: 2.4971
Epoch 2/1000
15/15 [==============================] - 0s 11ms/step - loss: 1.3948 - stacked_triplets_loss: 0.8389 - supervised_loss: 1.9507
Epoch 3/1000
15/15 [==============================] - 0s 11ms/step - loss: 1.1755 - stacked_triplets_loss: 0.5271 - supervised_loss: 1.8240
Epoch 4/1000
15/15 [==============================] - 0s 11ms/step - loss: 1.1343 - stacked_triplets_loss: 0.5658 - supervised_loss: 1.7028
Epoch 5/1000
15/15 [==============================] - 0s 12ms/step - loss: 1.0922 - stacked_triplets_loss: 0.5646 - supervised_loss: 1.6198
(中略)
Epoch 87/1000
15/15 [==============================] - 0s 13ms/step - loss: 0.5124 - stacked_triplets_loss: 0.4945 - supervised_loss: 0.5302
Epoch 88/1000
15/15 [==============================] - 0s 12ms/step - loss: 0.4649 - stacked_triplets_loss: 0.4065 - supervised_loss: 0.5233
15/15 [==============================] - 0s 3ms/step

Ivis_for_Qiita_23_5.png

Ivis_for_Qiita_24_5.png

目的変数がある場合はしっかり特徴を捉えられている気がしますが、目的変数を与えない場合はUMAP等より劣る気がしますね...

より詳しく

今回は Ivis を簡単に取り扱いましたが、詳しくは本家サイト https://bering-ivis.readthedocs.io/en/latest/ をご覧ください。半教師学習もできますし、その他いろんなパラメータがあり、それらの効果について詳細な解説があります。

所感

Ivis は次元数の多いスパースなデータに強みがあるようなことが書いてありましたので、そういうデータを使わなければならない時は良いのかもしれません。でもまあ普通のデータ(?)のときは UMAP とかのほうが良いような気がします(自分が使いこなしていないだけかもしれません)。

関連記事

0
2
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
2

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?