LoginSignup
6
5

More than 5 years have passed since last update.

gensim word2vecのモデルファイル読み込みで_pickle.UnpicklingErrorが出たときの対処法

Posted at

python3.5.2

calc_w2v.py
import better_exceptions
from gensim.models import word2vec

file_name = "./model.txt"
model = word2vec.Word2Vec.load(file_name)
...

上記pythonコードで./model.txtというword2vecモデルファイル(binary=False)を読み込もうとすると、以下のようなエラーが出る。(bettter_exceptionsあり)

Traceback (most recent call last):
  File "calc_w2v.py", line 7, in <module>
    model = word2vec.Word2Vec.load(file_name)
            │                      └ './model.txt'
            └ <module 'gensim.models.word2vec' from '/Users/user/.pyenv/versions/anaconda3-4.2.0/lib/python3.5/site-packages/gensim/models/wo...
  File "/Users/user/.pyenv/versions/anaconda3-4.2.0/lib/python3.5/site-packages/gensim/models/word2vec.py", line 1520, in load
    model = super(Word2Vec, cls).load(*args, **kwargs)
                  │         │          │       └ {}
                  │         │          └ ('./model.txt',)
                  │         └ <class 'gensim.models.word2vec.Word2Vec'>
                  └ <class 'gensim.models.word2vec.Word2Vec'>
  File "/Users/user/.pyenv/versions/anaconda3-4.2.0/lib/python3.5/site-packages/gensim/utils.py", line 280, in load
    obj = unpickle(fname)
          │        └ './model.txt'
          └ <function unpickle at 0x10580e8c8>
  File "/Users/user/.pyenv/versions/anaconda3-4.2.0/lib/python3.5/site-packages/gensim/utils.py", line 933, in unpickle
    return _pickle.load(f, encoding='latin1')
           │            └ <_io.BufferedReader name='./model.txt'>
           └ <module 'pickle' from '/Users/user/.pyenv/versions/anaconda3-4.2.0/lib/python3.5/pickle.py'>
_pickle.UnpicklingError: could not find MARK

gensimで生成したファイルを読み込む際、load関数に対応するフォーマットとは違うことからエラーが出るらしい。読み込む関数を変える必要があるようだ。(詳しくはwebで。)

そこで、下記のようにpython3コードを修正。

calc_w2v.py
import better_exceptions
from gensim.models import word2vec

file_name = "./model.txt"
model = word2vec.KeyedVectors.load_word2vec_format(file_name)
...

無事に読み込めた。


calc_w2v.py
import better_exceptions
from gensim.models import word2vec
import sys
import cython

file_name = "./model.txt"
# model = word2vec.Word2Vec.load(file_name) ### _pickle.UnpicklingError: could not find MARK
model = word2vec.KeyedVectors.load_word2vec_format(file_name)

def word_subtraction(pos, neg):
    pos_list = [pos]
    neg_list = [neg]
    result = model.most_similar(positive=pos_list, negative=neg_list, topn = 10)
    for r in result:
        print(r)

pos = sys.argv[1]
neg = sys.argv[2]
print(model[pos])
print(model[neg])

print(pos, " - ", neg, " = ")
word_subtraction(pos, neg)
$ python3 calc_w2v.py アメリカ 日本
6
5
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
6
5