1
2

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 1 year has passed since last update.

M1 Macで文字起こしを試してみる

Last updated at Posted at 2024-01-14

Appleが昨年末(2023年12月)にmlxなるAppleシリコン上で機械学習が動くライブラリを出したそうなのでWhisperで文字起こしを試してみたので備忘録

インストール

ml-explore/mlx-examplesのWhisperのREADMEを参照してインストールと、今回使うライブラリの導入

$ git clone https://github.com/ml-explore/mlx-examples.git
$ cd mlx-examples/whisper
$ pip install -r requirements.txt
$ brew install ImageMagick
$ pip install yt_dlp pysrt moviepy ImageMagick

others ffmpeg

適当な動画をダウンロード

download.py
from yt_dlp import YoutubeDL
ydl = YoutubeDL()
result = ydl.download(['https://www.youtube.com/watch?v=xxx'])

動画から文字起こし

デフォルトはtinyモデルなのでmlx-communityで好みのモデルを探す。
今回は、M1 Mac 8GBのメモリでlarge-v3を試してみたがどうやら動く模様。(なお、とても遅い)
一旦srt形式で字幕テキストを出力して後から別のコードで書き込むようにした。
あと、transcribeのdecode_optionsfp16=Trueを指定すると半精度で動きます。(若干速くなる)

speech2text.py
import whisper
import pysrt
 
speech_file = "sample.webm"
result = whisper.transcribe(speech_file, verbose=True, language='ja', 
                            path_or_hf_repo="mlx-community/whisper-large-v3-mlx", fp16=True)

# References: https://note.com/9256/n/nce2ddc5e006a
subs = pysrt.SubRipFile()
sub_idx = 1

for i in range(len(result["segments"])):
    start_time = result["segments"][i]["start"]
    end_time = result["segments"][i]["end"]
    duration = end_time - start_time
    timestamp = f"{start_time:.3f} - {end_time:.3f}"
    text = result["segments"][i]["text"]
    
    sub = pysrt.SubRipItem(index=sub_idx, start=pysrt.SubRipTime(seconds=start_time), 
                           end=pysrt.SubRipTime(seconds=end_time), text=text)
    subs.append(sub)
    sub_idx += 1
    
subs.save("sample.srt")

動画にテキストを書き込む

TextClipにsizeを指定すると折り返しにも対応してくれるようだ。
結構時間がかかるので用途によってはmkvで書き出した方が良さそうだ。

composite.py
# ffmpeg -i sample.webm -vf scale=-1:720 sample.mp4

from moviepy.editor import *
from moviepy.video.tools.subtitles import SubtitlesClip

video = VideoFileClip("sample.mp4")

generator = lambda txt: TextClip(
    txt, font='YuGothic-Medium', fontsize=48, color='white',
    stroke_width=10, method='caption', align='south', size=video.size)

subtitles = SubtitlesClip("sample.srt", generator)
result = CompositeVideoClip([video, subtitles.set_pos(('center','bottom'))])

result.write_videofile("out.mp4", fps=video.fps, temp_audiofile="temp-audio.m4a", remove_temp=True, codec="libx264", audio_codec="aac")
1
2
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
2

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?