LoginSignup
0
0

More than 3 years have passed since last update.

Word2Vecで単語の分散表現を獲得する

Last updated at Posted at 2019-09-25

TravelBlogのテキスト情報から,Word2Vecのモデルを生成しました.

下準備

> print(corpus)
[['This', 'is', 'a', 'pen', '.'], ['His', 'name', 'is', 'Bob', '.'],...]

のような二重リストを作る.

学習

from gensim.models import word2vec
model = word2vec.Word2Vec(corpus, size=300, min_count=20, window=10)
model.save("TravelBlog2Vec.model")

確認

king - man + woman = ?

confirm.py
from gensim.models import word2vec
model = word2vec.Word2Vec.load(TravelBlog2Vec.model)
similar_words = model.wv.most_similar(positive=[king, woman], negative=[man], topn=9)
print(similar_words)
$python3 confirm.py
[(‘queen’, 0.6165680885314941), (‘emperor’, 0.6151667833328247), (‘King’, 0.6119736433029175), (‘Pharaoh’, 0.5968493223190308), (‘kings’, 0.5776773691177368), (‘Emperor’, 0.576242983341217), (‘monarch’, 0.5675389766693115), (‘princess’, 0.5627557039260864), (‘pharaoh’, 0.5518648624420166)]

関連のある投稿

fastTextで単語の分散表現を獲得する

参考ページ

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0