WhisperX セットアップ・使い方ガイド（Mac / Apple Silicon）

Posted at 2026-03-29

セットアップ

brew install miniconda
conda init zsh

ターミナルを再起動後：

# base環境の自動起動を無効化（推奨）
conda config --set auto_activate_base false

conda create --name whisperx python=3.10 -y
conda activate whisperx

pip install torch==2.8.0 torchaudio==2.8.0
pip install torchvision==0.23.0
brew install ffmpeg
pip install whisperx

huggingface.co で無料アカウントを作成（Gmailなど個人メール可）
以下のモデルページを開き、それぞれ「Agree」を押して利用規約に同意：
Settings → Access Tokens でトークンを作成
- Type: Read
- 生成された hf_ で始まる文字列をコピーして保存

conda activate whisperx

whisperx audio.wav \
  --model large-v2 \
  --language en \
  --compute_type float32

whisperx audio.wav \
  --model large-v2 \
  --language en \
  --compute_type float32 \
  --diarize \
  --hf_token ${HF_TOKEN}

# テキストファイル（デフォルト）
--output_format txt

# 字幕ファイル
--output_format srt

# タイムスタンプ・話者情報付き詳細
--output_format json

--output_dir ./output

オプション	説明	例
`--model`	モデルサイズ	`tiny` / `base` / `small` / `medium` / `large-v2`
`--language`	言語指定	`ja`（日本語）/ `en`（英語）
`--compute_type`	計算精度	`float32`（Mac推奨）
`--diarize`	話者分離を有効化	フラグのみ
`--hf_token`	Hugging Faceトークン	`hf_xxxx...`
`--output_format`	出力形式	`txt` / `srt` / `json`
`--output_dir`	出力先フォルダ	`./output`
`--min_speakers`	最小話者数	`2`
`--max_speakers`	最大話者数	`5`