0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

Jaccard係数(Jaccard index)をPythonで計算する

Last updated at Posted at 2025-01-09

Jaccard係数(Jaccard index、Jaccard similarity coefficient)とは、二つのデータセット間の類似度を測る指標です。 値の範囲は0〜1です。1に近いほど、二つのデータセットは類似しています。

Pythonで計算する:

a = ["apple", "orange", "banana"] 
b = ["orange", "strawberry", "apple", "banana"] 

def jaccard_sim(a, b): 
    s1 = set(a) 
    s2 = set(b) 
    return float(len(s1.intersection(s2)) / len(s1.union(s2))) 

result = jaccard_sim(a, b) 
print(result) # 結果: 0.75

scikit-learnで計算する:

from sklearn.metrics import jaccard_score 
from sklearn.feature_extraction.text import CountVectorizer 

corpus = [ "apple orange banana", "orange strawberry apple banana" ] 
X = CountVectorizer().fit_transform(corpus) 
arr = X.toarray() 
result = jaccard_score(arr[0], arr[1]) 
print(result) # 結果: 0.75 
0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?