8
8

More than 1 year has passed since last update.

cos類似度計算を高速に行う

Posted at

cos類似度計算の高速化

sickit-learnのcosine_similarity関数を使うと、cos類似度を一度に計算できる

悪い例

  • 2重ループを回して類似度計算を1つ1つやっていた
vector_list1 = [[0.3423, 0.5123, 0.4232], [0.1412, 0.9634, 0.7292]]
vector_list2 = [[0.6461, 0.8734, 0.9854], [0.1412, 0.9425, 0.8392]]
for vector1 in vector_list1:
    for vector2 in vector_list2:
        # 類似度計算
        similarity = calc_cos_sim(vector1, vector2)

改修後

  • 関数一発で実行。結果は2次元配列。
from sklearn.metrics.pairwise import cosine_similarity

vector_list1 = [[0.3423, 0.5123, 0.4232], [0.1412, 0.9634, 0.7292]]
vector_list2 = [[0.6461, 0.8734, 0.9854], [0.1412, 0.9425, 0.8392]]

similarities = cosine_similarity(vector_list1, vector_list2)
[[0.2, 0.5], [0.1, 0.8]]
  • また、ベクトルに0が多い場合は疎行列にしてあげるとさらに速くなるみたい
    • word2vecとかは0はないが、BagOfWordsだと使えるかも?
8
8
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
8
8