More than 1 year has passed since last update.

OpenAI Whisperをローカル環境で試す

Posted at 2024-04-06

作成日： 2023年6月3日(土)
変更日： 2024年2月10日(日) PytorchのGPU、CUDA有効の確認方法追記

OpenAI Whisperは、人工知能技術を用いて、音声を自動的に書き起こすシステムです。このシステムは、会議や講演などの音声データを収集し、自動的に書き起こしを行います。このシステムは、高い精度で音声を書き起こすことができ、時間と手間を節約することができます。
この記事ではWindows 11のPCでNVIDIAのGPUを使用しPython環境でOpenAI Whisperをローカル環境で動作を検証しています。

使用するものと役割

No	使用するもの	説明	役割
1	Build Tools for Visual Studio 2022(Ver17.6.2)	Visual Studio 2022のC言語のコマンドラインコンパイラです	NVIDIA cuDNNをWindowsでコンパイルするために利用します。
2	NVIDIAドライバー
3	NVIDIA CUDAツールキット11.8(*1)
4	ZLIB.DLL(zlibwapi.dll)	Zipやgzipに使われる圧縮アルゴリズムのライブラリーです。	cuDNNの動作に必要です。
5	cuDNN v8.9.2(June 1st,2023), for CUDA 11.x	CUDAツールキット11用のcuDNNソースコードです。

(*1) Support Matrix - NVIDIA DocsによるとCUDAツールキットのパフォーマンスはVersion11.8が最良のようです。
翻訳「最高のパフォーマンスを得るためには、推奨される構成は、CUDA 12.0でH100上のcuDNN 8.9.2、およびCUDA 11.8で他のすべてのGPU上のcuDNN 8.9.2です。これはヒューリスティックのチューニングに使用された構成です。」
原文「For best performance, the recommended configuration is cuDNN 8.9.2 on H100 with CUDA 12.0, and cuDNN 8.9.2 on all other GPUs with CUDA 11.8, because this is the configuration that was used for tuning heuristics.」

CUDAツールキットとTensorFlowとPython関連

No	CUDA version	TensorFlow	version	cuDNN version
1	12.1		8.9.2
2	11.8		8.9.2	?3.10?
3	11.2	2.6	8.2	3.6～3.10
4	11.0	2.5	8.0	3.5～3.10
5	11.0	2.4	8.0	3.5～3.8
6	10.1	2.3	7.6	3.5～3.8

インストール

Build Tools for Visual Studio 2022

MicrosoftのVisual Studioのダウンロードページから、「すべてのダウンロード」を選択し、[Tools for Visual Studio]⇒[Build Tools for Visual Studio 2022]（vs_BuildTools.exe）をダウンロードします。

通常の権限で（管理者権限ではなく）vs_BuildTools.exeを実行すると、インストールが開始されます。更新が必要かどうかのメッセージが表示された場合は、更新して問題ありません。Visual Studio 2022内で互換性があると考えられます。

NVIDIA CUDAツールキット11.8

2023年6月3日（土）時点では、最新版は12.1.1（April 2023）ですが、Support Matrix - NVIDIA Docsのコメントに従い、あえて11.8.0（October 2022）をダウンロードします。ダウンロードはCUDA Tookkit Archiveページから行います。

ダウンロードサイズは約3.0GBで、ファイル名は「cuda_11.8.0_522.06_windows.exe」でした。

以下に、ダウンロードの「Select Target Platform」で選択した項目を記載しておきます。

No	項目	選択	備考
1	Operating System	Windows	Windows OSの場合
2	Architecture	x86_64	６４ビットしか選べません。
3	Version	11	Windows11では「11」を選択する
4	Installer Type	exe (local)

通常の権限で（管理者権限ではなく）cuda_11.8.0_522.06_windows.exeを実行すると、インストールが開始されます。

以下に、インストールじの選択項目を記載しておきます。

通常の権限で（管理者権限ではなく）cuda_11.8.0_522.06_windows.exeを実行すると、インストールが開始されます。

以下に、インストールじの選択項目を記載しておきます。

No	項目	選択	備考
1	Extraction path	C:\Users\portf\AppData\Local\Temp\CUDA	デフォルトのままです。
2	使用許諾誓約書	同意して続行する
3	インストールオプション	カスタム(詳細)
4	カスタムインストールオプション	CUDAとDriver componentsのみチェックしてその他の「Other components」のチェックを外しました。
☑CUDA
☐Other components
☐Driver components	Driver componentsはhttps://docs.nvidia.com/deeplearning/cudnn/support-matrix/index.htmlに記載されているようCUDA ToolKit Version 11.8はではVer452.39以上ならば良いので、私の場合はチェックを外しました。
私の場合
現在のバージョン:531.68
新しいバージョン:522.06
NVIDIA CUDAツールキット11.8に同梱されていたNVIDIAドライバーは522.06で私の環境より古かったため、インストール済みの531.68を使うことにしました。
5	インストール場所の選択	CUDA Documentation
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8
CUDA Development
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8	デフォルトのままです。
6	CUDA Visual Studio Integration	☑I understand, and wish to continue the installation regardless.	Build Tools for Visual Studio 2022のインストールが終わっていてもVisual Studio Community(フル機能)のインストールを行っていない場合は、この表示が出力されますが、チェックを入れてインストールを続行しても問題がありません。
インストール後、「Nsight Visual Studio Edition Summary」画面で
Nsight MonitorがInstalledとなっていれば問題ありません。

NVIDIA CUDAツールキット11.8で提供されるnvccコマンドはWindowsの%TEMP%環境変数を使用するのですが、ログインユーザ名が日本語のときにうまく動作しません。
従って、ログインユーザ名が日本語の時はシステム環境変数の%TEMP%を日本語パスが含まれないパスに変更してください。
[スタート]⇒[設定]⇒[システム]⇒[バージョン情報]⇒[システムの詳細設定]⇒[環境変数]から
日本語パス名が含まれないTEMPフォルダを指定してください。

変更前：TEMP C:\User\鈴木太郎\AppData\Local\Temp
変更後：TEMP C:\TEMP

※C:\TEMPは事前に作成しておいてください。

システム環境変数の確認

インストールが正しく行われると以下のシステム環境変数が設定されます。
もし設定されていない場合はWindowsのサインアウト後、再度サインインしてみてください。

PATH
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\bin
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\libnvvp
C:\Program Files\NVIDIA Corporation\Nsight Compute 2022.3.0
CUDA_PATH
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8
CUDA_PATH_V11_8
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8

ZLIB DLLのインストール

NVIDIA cuDNNの動作にはZLIB.DLLが必要なため、コマンドプロンプトを管理者として起動して、以下のコマンドでインストールする。

cd "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\bin"
curl -O http://www.winimage.com/zLibDll/zlib123dllx64.zip
call powershell -command "Expand-Archive -Path zlib123dllx64.zip"
copy zlib123dllx64\dll_x64\zlibwapi.dll .

NVIDIA Developer Programメンバーシップへの加入

NVIDIA cuDNNはソースコードで
NVIDIA cuDNNのソースコードをダウンロードするにはNVIDIA Developer Programへの参加が必要です。NVIDIA Developer Programページの「Join the NVIDIA Developer Program」をクリックして無料のNVIDIA Developer Programに加入します。

アカウントの作成にはEmailアドレスとNVIDIA Developer Program用のパスワードが必要です。

登録時に電子メールアドレスの確認がされますので、「Verify Your Email」が表示されたらメールをチェックして、「電子メールアドレスの確認」ボタンを押して確認をします。
すると通知の設定や新しいSDKなどの情報が必要か確認されるので、必要ならばチェックして「登録」ボタンをクリックします。
個人のプロフィール設定画面が表示されたら名前や組織などを入力します。

NVIDIA cuDNN

NVIDIA cuDNNのページの「Download cuDNN」をクリックしてcuDNNをダウンロードします。
例：cuDNN v8.9.2(June 1st,2023), for CUDA 11.x
cuDNNのバージョンは今後、新しいものがリリースされると思いますが「for CUDA 11.x」を選らべば問題ありません。
InstallerはLocal Installer for Windows (Zip)をダウンロードします。

ダウンロードしたcudnn-windows-x86_64-8.9.2.26_cuda11-archive.zipを解凍して、
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8にコピーします。

システム環境変数にCUDNN_PATHを追加します。

[スタート]⇒[設定]⇒[システム]⇒[バージョン情報]⇒[システムの詳細設定]⇒[環境変数]

CUDNN_PATH
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8

PyTorchのインストール

ここでcudnn版のpytorchをインストールする。
この作業を行わないとCPU版になる

pip install -U pip
pip install -U torch torchvision torchaudio numpy --index-url https://download.pytorch.org/whl/cu118

FFmpegのインストール

ImageMagickのウェブサイトの[Download]⇒[Windows Binary Release]からImageMagickをダウンロードする

2023年6月3日(土)時点ではImageMagick-7.1.1-11-Q16-x64-static.exeが最新でした。
インストーラを起動したら、「Select Additional Tasks」のセクションで「Install FFmpeg」にチェックがついていることを確認してインストールを続行します。
（デフォルトで「Install FFmpeg」にはチェックがついています。）

Whisperのインストール

Pythonのwhisperモジュールをインストール

pip install -U git+https://github.com/openai/whisper.git

Whisperをインストール

Whisperのインストールはgitコマンドで実施します。
Dドライブ直下にWhisperをインストールするにはコマンドプロンプトで以下のようにgitコマンドを入力します。

D:
git clone --recursive https://github.com/openai/whisper.git

Whisperの実行
pytest用のtestフォルダーにある1961年1月20日のジョン・F・ケネディ大統領の就任演説音声ファイル(jfk.flac)を試しに文字起こししてみます。

D:\whispe>whisper D:\whisper\tests\jfk.flac --model large --language English
[00:00.000 --> 00:11.000]  And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.

「アメリカの国民のみなさん、国家があなた達に何をしてくれるかを求めず、国家のために何ができるかを考えてください。」

Whisper webuiのインストール

Whisper webuiはGitHubからクローンして手動でインストールします。

D:
git clone https://github.com/AndreMarkert/whisper-webui.git

インストール方法はGitHubのhttps://github.com/AndreMarkert/whisper-webuiページにも記載がありますが、この手順通りにインストールするとWhisperのcuDNN-8.9.2で構築したWhisperとは別のWhsperをpipでインストールしてしまいます。

そこで、webuiの動作に必要なPythonモジュールをrequeirements.txtを参照して手動でインストールします。

python -m venv venv
venv\Scripts\activate.bat
pip install pytube==15.0.0
pip install moviepy==1.0.3
pip install gradio==3.5
pip install -U git+https://github.com/openai/whisper.git
pip install -U torch torchvision torchaudio numpy --index-url https://download.pytorch.org/whl/cu118
pip install httpx==0.24.1
pip install python-dotenv==1.0.1

あとはGitHubページに説明がある通り、.env.exampleを.envにコピーしてDEFAULT_*のパラメタを必要に応じて変更します。
DEFAULT_*はwebuiが起動した直後にデフォルトで選択されるパラメータを示します。

私はDEFAULT_MODELをsmallからlargeに変更しました。

# multilingual, english-only
DEFAULT_MODEL_LANGUAGE=multilingual
# tiny, base, small, medium, large
DEFAULT_MODEL=large
# auto, english, german, ... (see options.py for the full list)
DEFAULT_LANGUAGE=auto
# relative or full path
DOWNLOAD_PATH=./downloads
# relative or full path
RECORDING_PATH=./recordings
# relative or full path
OUTPUT_PATH=./output
# True/true/1 or False/false/0
YT_SAVE_AUDIO=False
# True/true/1 or False/false/0
YT_USE_CACHE=True

Whisper webuiの起動
以下のような内容のwebui-user.batなどを作成して起動してください。

@echo off

REM reffer to https://github.com/AndreMarkert/whisper-webui
python app.py

もしwebui起動時に以下のエラーが発生した場合は「pip install httpx==0.24.1」を試してください。

(venv) Q:\OneDrive\Python\whisper-webui>python app.py
Traceback (most recent call last):
  File "Q:\OneDrive\Python\whisper-webui\app.py", line 1, in <module>
    import gradio as gr
  File "Q:\OneDrive\Python\whisper-webui\venv\lib\site-packages\gradio\__init__.py", line 3, in <module>
    import gradio.components as components
  File "Q:\OneDrive\Python\whisper-webui\venv\lib\site-packages\gradio\components.py", line 39, in <module>
    from gradio import media_data, processing_utils, utils
  File "Q:\OneDrive\Python\whisper-webui\venv\lib\site-packages\gradio\processing_utils.py", line 20, in <module>
    from gradio import encryptor, utils
  File "Q:\OneDrive\Python\whisper-webui\venv\lib\site-packages\gradio\utils.py", line 395, in <module>
    class Request:
  File "Q:\OneDrive\Python\whisper-webui\venv\lib\site-packages\gradio\utils.py", line 415, in Request
    client = httpx.AsyncClient()
  File "Q:\OneDrive\Python\whisper-webui\venv\lib\site-packages\httpx\_client.py", line 1397, in __init__
    self._transport = self._init_transport(
  File "Q:\OneDrive\Python\whisper-webui\venv\lib\site-packages\httpx\_client.py", line 1445, in _init_transport
    return AsyncHTTPTransport(
  File "Q:\OneDrive\Python\whisper-webui\venv\lib\site-packages\httpx\_transports\default.py", line 275, in __init__
    self._pool = httpcore.AsyncConnectionPool(
TypeError: **AsyncConnectionPool.__init__() got an unexpected keyword argument 'socket_options'**

NVIDIA cuDNNについて

NVIDIAドライバーは、CUDAを使用するために必要なドライバーです。CUDAは、NVIDIAのGPUでの並列演算をサポートするプログラミングプラットフォームです。

また、cuDNNは、深層学習の高速化を実現するためのNVIDIAのライブラリです。CUDAはNVIDIAが提供するGPU向けの並列処理環境であり、cuDNNはその中で深層学習の高速化を実現するためのライブラリとなっています。cuDNN以外にも、CUDAには様々なライブラリが提供されており、これらを組み合わせることで高速な深層学習処理を実現することができます。

cuDNN（CUDA Deep Neural Network）は、NVIDIAが提供するGPUアクセラレーションライブラリで、ディープラーニングの高速なプリミティブを提供します。これにより、ディープラーニングのトレーニングと推論の両方が高速化されます。cuDNN 8.9.2は、CUDA 12.0上のH100と、CUDA 11.8上の他のすべてのGPUで最大のパフォーマンスを発揮します。cuDNNはNVIDIA Developer Programに参加することでダウンロードできます。詳細については、NVIDIAの公式サイトを参照してください。

なお、先に述べた通り、CUDAを使用するためにはNVIDIAドライバーが必要です。また、CUDA 11.8とcuDNNのバージョンによっては、異なるドライバーが必要になる場合があります。cuDNNを正常に動作させるには、zlib.dllが必要です。

PytorchでのGPUとCUDAの有効確認

GPU ドライバーと CUDA が有効になっていて、PyTorch からアクセス可能かどうかを確認するには、次のコマンドを実行して、CUDA ドライバーが有効になっているかどうかを返します。

import torch
torch.cuda.is_available()

コマンドプロンプトから以下のコマンドを実行してTrueが応答されればGPUとCUDAが有効になっている

python -c "import torch;print(torch.cuda.is_available())"

実行例

(venv) Q:\OneDrive\Python\whisper-webui>**python -c "import torch;print(torch.cuda.is_available())"**
True

(venv) Q:\OneDrive\Python\whisper-webui>

もしcuDNNドライバなどを導入済みの環境で上記コマンドでFalseが応答されたらPyTorchのインストールを再実行する

(venv) Q:\OneDrive\Python\whisper-webui> python -c "import torch;print(torch.cuda.is_available())"
False
(venv) Q:\OneDrive\Python\whisper-webui> pip install -U pip
Requirement already satisfied: pip in c:\users\portf\appdata\local\programs\python\python310\lib\site-packages (24.0)
PS K:\whisper> pip install -U torch torchvision torchaudio numpy --index-url https://download.pytorch.org/whl/cu118
Looking in indexes: https://download.pytorch.org/whl/cu118
Requirement already satisfied: torch in c:\users\portf\appdata\local\programs\python\python310\lib\site-packages (2.0.0)
Collecting torch
  Downloading https://download.pytorch.org/whl/cu118/torch-2.2.2%2Bcu118-cp310-cp310-win_amd64.whl (2704.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.7/2.7 GB 1.7 MB/s eta 0:00:00
Requirement already satisfied: torchvision in c:\users\portf\appdata\local\programs\python\python310\lib\site-packages (0.15.1)
Collecting torchvision
  Downloading https://download.pytorch.org/whl/cu118/torchvision-0.17.2%2Bcu118-cp310-cp310-win_amd64.whl (4.9 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.9/4.9 MB 26.2 MB/s eta 0:00:00
Collecting torchaudio
  Downloading https://download.pytorch.org/whl/cu118/torchaudio-2.2.2%2Bcu118-cp310-cp310-win_amd64.whl (4.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.0/4.0 MB 50.7 MB/s eta 0:00:00
Requirement already satisfied: numpy in c:\users\portf\appdata\local\programs\python\python310\lib\site-packages (1.26.3)
Requirement already satisfied: filelock in c:\users\portf\appdata\local\programs\python\python310\lib\site-packages (from torch) (3.13.1)
Requirement already satisfied: typing-extensions>=4.8.0 in c:\users\portf\appdata\local\programs\python\python310\lib\site-packages (from torch) (4.9.0)
Requirement already satisfied: sympy in c:\users\portf\appdata\local\programs\python\python310\lib\site-packages (from torch) (1.12)
Requirement already satisfied: networkx in c:\users\portf\appdata\local\programs\python\python310\lib\site-packages (from torch) (3.2.1)
Requirement already satisfied: jinja2 in c:\users\portf\appdata\local\programs\python\python310\lib\site-packages (from torch) (3.1.3)
Requirement already satisfied: fsspec in c:\users\portf\appdata\local\programs\python\python310\lib\site-packages (from torch) (2023.12.2)
Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in c:\users\portf\appdata\local\programs\python\python310\lib\site-packages (from torchvision) (10.2.0)
Requirement already satisfied: MarkupSafe>=2.0 in c:\users\portf\appdata\local\programs\python\python310\lib\site-packages (from jinja2->torch) (2.1.5)
Requirement already satisfied: mpmath>=0.19 in c:\users\portf\appdata\local\programs\python\python310\lib\site-packages (from sympy->torch) (1.3.0)
Installing collected packages: torch, torchvision, torchaudio
  Attempting uninstall: torch
    Found existing installation: torch 2.0.0
    Uninstalling torch-2.0.0:
      Successfully uninstalled torch-2.0.0
  Attempting uninstall: torchvision
    Found existing installation: torchvision 0.15.1
    Uninstalling torchvision-0.15.1:
      Successfully uninstalled torchvision-0.15.1
Successfully installed torch-2.2.2+cu118 torchaudio-2.2.2+cu118 torchvision-0.17.2+cu118
(venv) Q:\OneDrive\Python\whisper-webui> python -c "import torch;print(torch.cuda.is_available())"
True
(venv) Q:\OneDrive\Python\whisper-webui>

参考ページ

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up