Style-Bert-VITS2 エラー修正まとめ

Last updated at 2026-04-18Posted at 2026-04-18

合成音声のモデルを作成しようとしてStyle-Bert-VITS2を使ったときに、躓いたところ。

まずはここ↓のサイトを見て、セットアップを終えた後にやったことです。

エラーが出たらエラーログを全部Claude Codeさんに突っ込んでやってもらったので、その内容をコピペしただけです。
何やってるか全くわからないんですけど、内容がわかる方が何かできれば・・・と思って残します。

学習用のデータ作成はVoice-Design-Clonerで行ったため、「学習」タブの自動前処理での作業のみです。「音声合成」や「データセット作成」は試していません。

1. `onnxruntime` バージョン競合

エラー:

ImportError: cannot import name 'OrtDeviceMemoryType' from 'onnxruntime.capi._pybind_state'

原因: onnxruntime・onnxruntime-directml・onnxruntime-gpu が同時インストールされており競合。

修正手順:

# 全削除
venv/Scripts/pip uninstall -y onnxruntime onnxruntime-directml onnxruntime-gpu

# Windows用のみ再インストール（安定版）
venv/Scripts/pip install onnxruntime-directml==1.19.2

2. `pkg_resources` が見つからない

エラー:

ModuleNotFoundError: No module named 'pkg_resources'

原因: setuptools 82.x では pkg_resources がトップレベルモジュールとして提供されなくなった。

修正手順:

venv/Scripts/pip install "setuptools<70"

3. サブプロセス出力の UnicodeDecodeError

エラー:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x82 in position 3247

原因: Windows のサブプロセス出力が cp932（Shift-JIS）エンコードだが、UTF-8 として読もうとしていた。

修正ファイル: style_bert_vits2/utils/subprocess.py

# 変更前
result = subprocess.run(
    [sys.executable] + cmd,
    stdout=SAFE_STDOUT,
    stderr=subprocess.PIPE,
    text=True,
    encoding="utf-8",
    check=False,
)

# 変更後
result = subprocess.run(
    [sys.executable] + cmd,
    stdout=SAFE_STDOUT,
    stderr=subprocess.PIPE,
    text=True,
    encoding="cp932",
    errors="replace",
    check=False,
)

4. BERT トークナイザーの誤ロードによる AssertionError

エラー:

AssertionError: 
# bert_feature.py line 70:
assert len(word2ph) == len(text) + 2, text

原因:

tokenizer_config.json には "tokenizer_class": "BertJapaneseTokenizer" と指定されているが、bert_models.py が AutoTokenizer(..., use_fast=True) でロードしていた
Fast tokenizer（TokenizersBackend）は日本語の複数文字をすべて [UNK] として扱う
その結果、g2p.py が形態素単位で計算した word2ph と、bert_feature.py が期待する文字単位のトークン数が不一致

修正ファイル: style_bert_vits2/nlp/bert_models.py

# 変更前（import部）
from transformers import (
    AutoModelForMaskedLM,
    AutoTokenizer,
    DebertaV2Model,
    DebertaV2TokenizerFast,
    PreTrainedModel,
    PreTrainedTokenizer,
    PreTrainedTokenizerFast,
)

# 変更後（import部）
from transformers import (
    AutoModelForMaskedLM,
    AutoTokenizer,
    BertJapaneseTokenizer,
    DebertaV2Model,
    DebertaV2TokenizerFast,
    PreTrainedModel,
    PreTrainedTokenizer,
    PreTrainedTokenizerFast,
)

# 変更前（トークナイザーロード部）
    else:
        __loaded_tokenizers[language] = AutoTokenizer.from_pretrained(
            pretrained_model_name_or_path,
            cache_dir=cache_dir,
            revision=revision,
            use_fast=True,
        )

# 変更後
    elif language == Languages.JP:
        __loaded_tokenizers[language] = BertJapaneseTokenizer.from_pretrained(
            pretrained_model_name_or_path,
            cache_dir=cache_dir,
        )
    else:
        __loaded_tokenizers[language] = AutoTokenizer.from_pretrained(
            pretrained_model_name_or_path,
            cache_dir=cache_dir,
            revision=revision,
        )

注意: この修正後は Step 3（テキスト前処理）を再実行 して train.list の word2ph を再生成する必要があります。

5. `pyannote.audio` の `use_auth_token` 廃止エラー

エラー:

TypeError: hf_hub_download() got an unexpected keyword argument 'use_auth_token'

原因: huggingface_hub の新バージョンで use_auth_token 引数が削除され、token に変更された。

修正ファイル: venv/lib/site-packages/pyannote/audio/core/model.py

# 変更前（618行目・655行目 の2箇所）
                    use_auth_token=use_auth_token,

# 変更後（両箇所）
                    token=use_auth_token,

6. `torch.load` の `weights_only` デフォルト変更エラー

エラー:

_pickle.UnpicklingError: Weights only load failed.
...
GLOBAL torch.torch_version.TorchVersion was not an allowed global by default.

原因: PyTorch 2.6 から torch.load の weights_only デフォルト値が False → True に変更され、古い pyannote チェックポイントが読めなくなった。

修正ファイル: venv/lib/site-packages/lightning_fabric/utilities/cloud_io.py

# 変更前（73行目付近）
    fs = get_filesystem(path_or_url)
    with fs.open(path_or_url, "rb") as f:
        return torch.load(
            f,
            map_location=map_location,
            weights_only=weights_only,
        )

# 変更後
    fs = get_filesystem(path_or_url)
    with fs.open(path_or_url, "rb") as f:
        return torch.load(
            f,
            map_location=map_location,
            weights_only=False if weights_only is None else weights_only,
        )

ここまででようやく学習させることができた。
学習終了後、「音声合成」タブに移動し、作成されたモデルを使って音声再生をしようとしたところ以下のエラー

7. 推論時の fp16 / fp32 型不一致エラー

エラー:

RuntimeError: Input type (struct c10::Half) and bias type (float) should be the same

原因: BERT が出力した埋め込みテンソルが fp16（half precision）だが、bert_proj（Conv1d）の重み・バイアスが fp32 のため型不一致が発生。

修正ファイル: style_bert_vits2/models/models_jp_extra.py

# 変更前（418行目）
        bert_emb = self.bert_proj(bert).transpose(1, 2)

# 変更後
        bert_emb = self.bert_proj(bert.to(self.bert_proj.weight.dtype)).transpose(1, 2)

修正ファイル: style_bert_vits2/models/models.py

# 変更前（408行目）
        bert_emb = self.bert_proj(bert).transpose(1, 2)
        ja_bert_emb = self.ja_bert_proj(ja_bert).transpose(1, 2)
        en_bert_emb = self.en_bert_proj(en_bert).transpose(1, 2)

# 変更後
        bert_emb = self.bert_proj(bert.to(self.bert_proj.weight.dtype)).transpose(1, 2)
        ja_bert_emb = self.ja_bert_proj(ja_bert.to(self.ja_bert_proj.weight.dtype)).transpose(1, 2)
        en_bert_emb = self.en_bert_proj(en_bert.to(self.en_bert_proj.weight.dtype)).transpose(1, 2)

修正ファイル一覧

ファイル	種別
`style_bert_vits2/utils/subprocess.py`	プロジェクトファイル
`style_bert_vits2/nlp/bert_models.py`	プロジェクトファイル
`style_bert_vits2/models/models_jp_extra.py`	プロジェクトファイル
`style_bert_vits2/models/models.py`	プロジェクトファイル
`venv/lib/site-packages/pyannote/audio/core/model.py`	venv内パッケージ
`venv/lib/site-packages/lightning_fabric/utilities/cloud_io.py`	venv内パッケージ

Style-Bert-VITS2の作者さんあたりに届くといいな。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

Style-Bert-VITS2 エラー修正まとめ

1. onnxruntime バージョン競合

2. pkg_resources が見つからない

3. サブプロセス出力の UnicodeDecodeError

4. BERT トークナイザーの誤ロードによる AssertionError

5. pyannote.audio の use_auth_token 廃止エラー

6. torch.load の weights_only デフォルト変更エラー

7. 推論時の fp16 / fp32 型不一致エラー

修正ファイル一覧

1. `onnxruntime` バージョン競合

2. `pkg_resources` が見つからない

5. `pyannote.audio` の `use_auth_token` 廃止エラー

6. `torch.load` の `weights_only` デフォルト変更エラー