Hugging face phonemizerが機能せずエラーが出る。
解決したいこと
Hugging faceのPipelineによる音声生成をしたいのですが、phonemizerがインストールされていない(?)というエラーが起きました。解決方法を教えて頂きたいです。
お借りしたモデル(Hugging faceより)
anhnct/audioldm2_gigaspeech
実行環境(関係ありそうなやつのみ)
・OS:Linux x86_64 Ubuntu 22.04.3 LTS(google colab上)、windows11
・python:3.10
・cuda:12.2
・espeak-phonemizer-windows 1.0.4
・phonemizer 3.3.0
・torch: 2.4.1+cu121
・torchaudio: 2.4.1+cu121
・torchsummary: 1.5.1
・torchvision: 0.19.1+cu121
・scipy: 1.13.1
・diffusers: 0.30.3
実行したコード(google colab)
from diffusers import DiffusionPipeline
pipe = DiffusionPipeline.from_pretrained("anhnct/audioldm2_gigaspeech")
prompt = "An female actor say with angry voice"
transcript = "wish you have a good day, i hope you never forget me"
negative_prompt = "low quality"
audio = pipe(prompt,transcript).audio[0]
発生している問題・エラー
Collecting espeak-phonemizer-windows
Downloading espeak_phonemizer_windows-1.0.4-py3-none-any.whl.metadata (2.5 kB)
Downloading espeak_phonemizer_windows-1.0.4-py3-none-any.whl (9.4 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 9.4/9.4 MB 2.7 MB/s eta 0:00:00
Installing collected packages: espeak-phonemizer-windows
Successfully installed espeak-phonemizer-windows-1.0.4
vae/diffusion_pytorch_model.safetensors not found
Loading pipeline components...: 100%
11/11 [00:03<00:00, 5.08it/s]
An error occurred while trying to fetch /root/.cache/huggingface/hub/models--anhnct--audioldm2_gigaspeech/snapshots/c812a7861f38a69441a8e0428438e782d9864614/unet: Error no file named diffusion_pytorch_model.safetensors found in directory /root/.cache/huggingface/hub/models--anhnct--audioldm2_gigaspeech/snapshots/c812a7861f38a69441a8e0428438e782d9864614/unet.
Defaulting to unsafe serialization. Pass `allow_pickle=False` to raise an error instead.
An error occurred while trying to fetch /root/.cache/huggingface/hub/models--anhnct--audioldm2_gigaspeech/snapshots/c812a7861f38a69441a8e0428438e782d9864614/projection_model: Error no file named diffusion_pytorch_model.safetensors found in directory /root/.cache/huggingface/hub/models--anhnct--audioldm2_gigaspeech/snapshots/c812a7861f38a69441a8e0428438e782d9864614/projection_model.
Defaulting to unsafe serialization. Pass `allow_pickle=False` to raise an error instead.
An error occurred while trying to fetch /root/.cache/huggingface/hub/models--anhnct--audioldm2_gigaspeech/snapshots/c812a7861f38a69441a8e0428438e782d9864614/vae: Error no file named diffusion_pytorch_model.safetensors found in directory /root/.cache/huggingface/hub/models--anhnct--audioldm2_gigaspeech/snapshots/c812a7861f38a69441a8e0428438e782d9864614/vae.
Defaulting to unsafe serialization. Pass `allow_pickle=False` to raise an error instead.
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-22-8b84e53bddb4> in <cell line: 15>()
13
14
---> 15 audio = pipe(prompt,transcript).audio[0]
9 frames
/usr/local/lib/python3.10/dist-packages/transformers/models/vits/tokenization_vits.py in prepare_for_tokenization(self, text, is_split_into_words, normalize, **kwargs)
188 if self.phonemize:
189 if not is_phonemizer_available():
--> 190 raise ImportError("Please install the `phonemizer` Python package to use this tokenizer.")
191
192 filtered_text = phonemizer.phonemize(
ImportError: Please install the `phonemizer` Python package to use this tokenizer.```
自分で試したこと
!pip install datasets transformers
!pip install phonemizer
!apt-get install espeak
参考ページ
!pip install espeak-phonemizer-windows
参考ページ
最後に
初心者のため、使い勝手がわからず、情報不足かもしれません。
これだけでは状況がわからない!もっと情報を提供してくれないと解決できない!などなどありましたら、お手数ですがコメントにてご指摘いただけるとありがたいです。どうかよろしくお願いします。