3
3

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

Pythonでジャガード係数を求める【Jaccard係数】

Last updated at Posted at 2019-08-27

#ジャガード係数とは
ジャガード係数は分母に和集合、分子に積集合の大きさをそれぞれ入れて計算することができます。
これにより、集合同士の類似度を計算することができます。数式は次の通りです。

$$
J( A, B ) = \frac { \mid A \cap B \mid } { \mid A \cup B \mid }  = \frac { \mid A \cap B \mid } { |A| + |B| - \mid A \cap B \mid }
$$

このジャガード係数をPythonで書くと以下のようなコードになります。

jaccard.py

def jaccard(data1, data2):
    items = 0
    for item in data1:
        if item in data2:
            items += 1
    return items / (len(data1) + len(data2) - items)

data1 = ['Action', 'Comedy', 'Parody', 'Sci-Fi', 'Seinen', 'Super Power', 'Supernatural'] #One Punch Man
data2 = ['Action', 'Comedy', 'Historical', 'Parody', 'Samurai', 'Sci-Fi', 'Shounen'] #Gintama°

jaccard(data1, data2) #0.4

#追記
集合体を使うとよりわかりやすいです。

jaccard2.py
data1 = ['Action', 'Comedy', 'Parody', 'Sci-Fi', 'Seinen', 'Super Power', 'Supernatural'] #One Punch Man
data2 = ['Action', 'Comedy', 'Historical', 'Parody', 'Samurai', 'Sci-Fi', 'Shounen'] #Gintama°

set_data1 = set(data1)
set_data2 = set(data2)
numerator = len(set_data1 & set_data2) # {'Action', 'Comedy', 'Parody', 'Sci-Fi'}
denominator = len(set_data1 | set_data2) # {'Action', 'Comedy', ..., 'Super Power', 'Supernatural'}

print(numerator / denominator) # 4 / 10 --> 0.4

#参考
pythonでデータ間の類似度を計算する方法いろいろ
【技術解説】集合の類似度(Jaccard係数,Dice係数,Simpson係数)
pythonでJaccard係数を実装
自然言語処理する時に計算するJaccard係数をPythonで計算する方法まとめ

3
3
1

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
3
3

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?