動画, 音声ファイルからの文字起こしを行う

文字起こし

Last updated at 2025-01-22Posted at 2025-01-19

大まかな実施内容

何らかの動画から音声を抜き出す
音声から文字起こしを行う

前置き

環境

mac m3 sequoia
python 3.9.9

使用 soft

ffmpeg
whipser
- openai が公開している, 高品質の文字起こし soft

ファイルの用意

何らかの動画, 音声ファイルをご用意ください
短めが良いと思います

setup

環境が汚れているので, 以下以外にも必要かもしれません

cd ~/working_dir
pip install openai-whisper
brew install ffmpeg
brew install cmake

文字起こし

動画から音声の抜き出し

ffmpeg -i xxxxx.mp4 -vn -acodec pcm_s16le -ar 16000 -ac 1 yyyyyy.wav

16bit に変換しているのは, 文字起こしツールの制約になります

Note that the whisper-cli example currently runs only with 16-bit WAV files, so make sure to convert your input before running the tool

音声ファイルから文字起こし

whisper yyyyyy.wav --language ja --model turbo

stdout へ文字起こしの結果が逐次表示されます

whisper.cpp

通常の whipser は非常に動作が遅いため, c++ 実装の whisper を install する

install

git clone https://github.com/ggerganov/whisper.cpp.git


# DL model
cd whisper.cpp
sh ./models/download-ggml-model.sh large-v3-turbo

# build the project
cmake -B build
cmake --build build --config Release
make -j large-v3-turbo

サンプルファイルから文字起こし

./build/bin/whisper-cli -m ./models/ggml-large-v3-turbo.bin -f samples/jfk.wav

以下の様に, 時間範囲と文字起こしの結果が表示されます

日本語で用いる場合

./build/bin/whisper-cli -l ja -m ./models/ggml-large-v3-turbo.bin -f ~/sound_data/yyyyyy.wav

所管

openai の文字起こしのツールのため, 精度は高めだと思います.

高速性を求める場合, whisper.cpp の install が若干面倒ですが, 高速化の恩恵は大きいです
50分程度の音声を10分程で処理できました.

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up