More than 3 years have passed since last update.

BERTを用いた日本語文書分類で起きたエラーとその解決方法

Last updated at 2021-11-09Posted at 2021-11-04

元の記事

こちらの記事を参考にして日本語の文書分類を試したのですが、いくつかのエラーが出てきて詰まってしまったのでその報告と解決方法を載せます。

作業環境

項目	バージョン等
OS	mac Mojave
python	3.8.2
pyenv	1.2.26
pandas	1.3.4
scikit-learn	1.0.1
transformers	4.12.2
torch	1.10.0

エラー1 (pandasのwarning)

pandasに関するエラーです。

エラー内容

/Users/local/.pyenv/versions/3.8.2/lib/python3.8/site-packages/pandas/compat/__init__.py:124: UserWarning: Could not import the lzma module. Your installed Python is incomplete. Attempting to use lzma compression will result in a RuntimeError.
  warnings.warn(msg)

調べてみたところ、pyenvでpythonを使っているときにこのエラーが出てしまうみたいです。
これはエラーではなくwarningであり、出たままでも問題なく動くため解決方法はここでは割愛します。
（解決方法などは以下のサイト等に記載あり）
https://zenn.dev/grahamian/articles/f292163325653dbe2c42

エラー2 (fugashiの不足)

使っている日本語BERTモデルに必要なモジュールがないって怒られたのでそれを入れる必要がありました。

エラー内容

ModuleNotFoundError: No module named 'fugashi'

解決方法

pipでfugashiをインストールするだけです。（mecabとかcabochaもそうだけど、なんで食べ物の名前つけるんでしょうね...笑）

$ pip install fugashi

エラー3 （ipadicの不足）

モジュール不足2個目です。

エラー内容

ModuleNotFoundError: The ipadic dictionary is not installed. See https://github.com/polm/ipadic-py for installation.

解決方法

こちらもモジュールをインストールするだけです。

$ pip install ipadic

エラー4 （dropoutの引数がおかしい）

元の記事とのバージョン違いによるエラー。
なかなか調べるのに苦労しました...。

エラー内容

TypeError: dropout(): argument 'input' (position 1) must be Tensor, not str

解決方法

# BERT分類モデルの定義
class BERTClass(torch.nn.Module):
  def __init__(self, pretrained, drop_rate, otuput_size):
    super().__init__()
    self.bert = BertModel.from_pretrained(pretrained)

上は、元の記事でBERTClassを定義しているところの一部です。
こちらの最終行を以下のように変更します。

self.bert = BertModel.from_pretrained(pretrained, return_dict=False)

まとめ

いくつかエラーが出て結構大変でしたが、なんとか動かすことに成功しました。同じエラーにはまっている方はぜひ参考にしてください。

参考記事

追記

下のコマンドで、fugashiやipadicなど必要なものを一括でインストールできるようです！

pip install transformers["ja"]

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up