More than 1 year has passed since last update.

M1MAX でOpenAIの「Whisper」を試してみた

Posted at 2022-10-09

OpenAIが発表したSpeech-To-Text AIの「Whisper」を試してみたいと思います。

Macの環境は、
M1MAX 24コア
Python3.9
メモリ32GB

whisper 環境構築

今回はこのまま環境を作りますが、環境を分けたい方はpyenvなどで仮想環境を作ってください。

whisperはpipコマンドですんなりインストールできます。

pip install git+https://github.com/openai/whisper.git

つぎにffmpegをインストールします。
homebrewの利用方法は今回は割愛します。

brew install ffmpeg

git cloneする場合はこちら
openAI Whisper

git clone https://github.com/openai/whisper.git

ゆっくりに書き文章を読ませた音声ファイルを作成してみます。
「徳川埋蔵金は、江戸時代末期の1867年に江戸幕府が大政奉還に際し、密かに埋蔵したとされる幕府再興のための軍資金である。」

whisper tokugawa.wav --language Japanese

[00:00.000 --> 00:07.360] 徳川埋蔵金は、江戸時代末期の1867年に、江戸幕府が体制放寺に際し、
[00:07.360 --> 00:33.500] ひそかに埋蔵したとされる、幕府最高のための軍資金である。

幕府再興が幕府最高ってなっていますが、これは文章の前後や背景がわからない状態で音だけ聞いたら勘違いしてもしょうがない部分ですかね。
大政奉還が体制放寺なってるのはちょっと勿体無い間違い。
体制の部分を先に返還してしまったのかもしれないですね。おもしろいなぁ。

M1MAXですがCPU処理で34秒ほどかかっています。
そこそこですねー。
GPUで計算するには、本体をちょっといじる必要がありそうですね。

次に　--model large　で試してみます。

satoshi@SatoshinoMacBook-Pro whisper % whisper ./sound/tokugawa.wav --language Japanese --model large
100%|█████████████████████████████████████| 2.87G/2.87G [16:37<00:00, 3.10MiB/s]
/opt/homebrew/lib/python3.9/site-packages/whisper/transcribe.py:78: UserWarning: FP16 is not supported on CPU; using FP32 instead
  warnings.warn("FP16 is not supported on CPU; using FP32 instead")
[00:00.000 --> 00:04.900] 徳川埋蔵金は、江戸時代末期の1867年に、
[00:04.900 --> 00:07.320] 江戸幕府が大政奉還に際し、
[00:07.320 --> 00:30.320] 密かに埋蔵したとされる、幕府最高のための軍資金である。

大政奉還がちゃんとでましたね！すごい！
動画とかそのまま放り込んだら結構いい感じに文字起こししてくれそうな気がしますねー。

おまけ M1MAXのGPUをつかいたい

--device mps でmpsをしていしてGPUを使ってみよう！

RuntimeError: don't know how to restore data location of torch.storage._UntypedStorage (tagged with mps)

ダメっぽいですね。

/opt/homebrew/lib/python3.9/site-packages/whisper/transcribe.py
をいじってみよう

いかの二箇所をいじってみます。

    if model.device == torch.device("cpu"):
        if torch.cuda.is_available():
            warnings.warn("Performing inference on CPU when CUDA is available")
        if dtype == torch.float16:
            warnings.warn("FP16 is not supported on CPU; using FP32 instead")
            dtype = torch.float32

def cli():
    from . import available_models

    parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
    parser.add_argument("audio", nargs="+", type=str, help="audio file(s) to transcribe")
    parser.add_argument("--model", default="small", choices=available_models(), help="name of the Whisper model to use")
    parser.add_argument("--device", default="cuda" if torch.cuda.is_available() else "cpu", help="device to use for PyTorch inference")
    parser.add_argument("--output_dir", "-o", type=str, default=".", help="directory to save the outputs")

修正後

    if model.device == torch.device("cpu"):
        if torch.backends.mps.is_available(): # ←ここ
            warnings.warn("Performing inference on CPU when MPS is available")
        if dtype == torch.float16:
            warnings.warn("FP16 is not supported on CPU; using FP32 instead")
            dtype = torch.float32

def cli():
    from . import available_models

    parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
    parser.add_argument("audio", nargs="+", type=str, help="audio file(s) to transcribe")
    parser.add_argument("--model", default="small", choices=available_models(), help="name of the Whisper model to use")
    parser.add_argument("--device", default="mps" if torch.backends.mps.is_available() else "cpu", help="device to use for PyTorch inference")  # ←ここ
    parser.add_argument("--output_dir", "-o", type=str, default=".", help="directory to save the outputs")

/opt/homebrew/lib/python3.9/site-packages/whisper/init.py
もいじります。

修正後

    if device is None:
        device = "mps" if torch.backends.mps.is_available() else "cpu"

ではもう一度

% whisper ./sound/tokugawa.wav --language Japanese --model large --device mps
Traceback (most recent call last):
  File "/opt/homebrew/bin/whisper", line 8, in <module>
    sys.exit(cli())
  File "/opt/homebrew/lib/python3.9/site-packages/whisper/transcribe.py", line 297, in cli
    model = load_model(model_name, device=device)
  File "/opt/homebrew/lib/python3.9/site-packages/whisper/__init__.py", line 103, in load_model
    checkpoint = torch.load(fp, map_location=device)
  File "/opt/homebrew/lib/python3.9/site-packages/torch/serialization.py", line 712, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "/opt/homebrew/lib/python3.9/site-packages/torch/serialization.py", line 1049, in _load
    result = unpickler.load()
  File "/opt/homebrew/lib/python3.9/site-packages/torch/serialization.py", line 1019, in persistent_load
    load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
  File "/opt/homebrew/lib/python3.9/site-packages/torch/serialization.py", line 1001, in load_tensor
    wrap_storage=restore_location(storage, location),
  File "/opt/homebrew/lib/python3.9/site-packages/torch/serialization.py", line 970, in restore_location
    return default_restore_location(storage, map_location)
  File "/opt/homebrew/lib/python3.9/site-packages/torch/serialization.py", line 178, in default_restore_location
    raise RuntimeError("don't know how to restore data location of "
RuntimeError: don't know how to restore data location of torch.storage._UntypedStorage (tagged with mps)

torch自体をいじらないといけないかもしれませんね。もう少し調べてみますが、
とりあえずM1MAXでも動いたと言うことで。
今回はここまで。

参考
https://vivinko.com/inoue/blog/2022/09/22/231252.html

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up