python3.5.2
calc_w2v.py
import better_exceptions
from gensim.models import word2vec
file_name = "./model.txt"
model = word2vec.Word2Vec.load(file_name)
...
上記pythonコードで./model.txt
というword2vecモデルファイル(binary=False
)を読み込もうとすると、以下のようなエラーが出る。(bettter_exceptions
あり)
Traceback (most recent call last):
File "calc_w2v.py", line 7, in <module>
model = word2vec.Word2Vec.load(file_name)
│ └ './model.txt'
└ <module 'gensim.models.word2vec' from '/Users/user/.pyenv/versions/anaconda3-4.2.0/lib/python3.5/site-packages/gensim/models/wo...
File "/Users/user/.pyenv/versions/anaconda3-4.2.0/lib/python3.5/site-packages/gensim/models/word2vec.py", line 1520, in load
model = super(Word2Vec, cls).load(*args, **kwargs)
│ │ │ └ {}
│ │ └ ('./model.txt',)
│ └ <class 'gensim.models.word2vec.Word2Vec'>
└ <class 'gensim.models.word2vec.Word2Vec'>
File "/Users/user/.pyenv/versions/anaconda3-4.2.0/lib/python3.5/site-packages/gensim/utils.py", line 280, in load
obj = unpickle(fname)
│ └ './model.txt'
└ <function unpickle at 0x10580e8c8>
File "/Users/user/.pyenv/versions/anaconda3-4.2.0/lib/python3.5/site-packages/gensim/utils.py", line 933, in unpickle
return _pickle.load(f, encoding='latin1')
│ └ <_io.BufferedReader name='./model.txt'>
└ <module 'pickle' from '/Users/user/.pyenv/versions/anaconda3-4.2.0/lib/python3.5/pickle.py'>
_pickle.UnpicklingError: could not find MARK
gensimで生成したファイルを読み込む際、load
関数に対応するフォーマットとは違うことからエラーが出るらしい。読み込む関数を変える必要があるようだ。(詳しくはwebで。)
そこで、下記のようにpython3コードを修正。
calc_w2v.py
import better_exceptions
from gensim.models import word2vec
file_name = "./model.txt"
model = word2vec.KeyedVectors.load_word2vec_format(file_name)
...
無事に読み込めた。
calc_w2v.py
import better_exceptions
from gensim.models import word2vec
import sys
import cython
file_name = "./model.txt"
# model = word2vec.Word2Vec.load(file_name) ### _pickle.UnpicklingError: could not find MARK
model = word2vec.KeyedVectors.load_word2vec_format(file_name)
def word_subtraction(pos, neg):
pos_list = [pos]
neg_list = [neg]
result = model.most_similar(positive=pos_list, negative=neg_list, topn = 10)
for r in result:
print(r)
pos = sys.argv[1]
neg = sys.argv[2]
print(model[pos])
print(model[neg])
print(pos, " - ", neg, " = ")
word_subtraction(pos, neg)
$ python3 calc_w2v.py アメリカ 日本