VOICEVOX Engineを多重起動してnginxでロードバランシングしてみる

Posted at 2025-04-04

はじめに

とある事情で毎時大量にVOICEVOXに合成音声を作ってもらいたいのですが、どうやらVOICEVOX Engineは複数の音声合成作業を同時に行ってくれないようです。

じゃあ、VOICEVOX Engineをたくさん用意して上手いことAPIアクセスを振り分ければいいのでは？

今回の検証環境

ProLiant TM200 (Intel(R) Xeon(R) CPU D-1518) 上のProxmoxから仮想6コア
メインメモリ 6GB
VOICEVOX Linux CPU版 v0.23.0

VOICEVOX Engineを多重起動する

こういうことはsystemdに丸投げしてデーモン化してみましょう。今回はEngineを6機用意して検証します。

/etc/systemd/system/voicevoxN.service

[Unit]
Description=VOICEVOX Engine Service(N)

[Service]
Type=simple
User=user
PIDFile=/run/voicevox.pid
ExecStart=/home/user/bin/VOICEVOX/vv-engine/run --port 5002N --cpu_num_threads 2
TimeoutStopSec=20
PrivateTmp=true
Restart=always

[Install]
WantedBy=multi-user.target

※Nは連番で置き換えて複数のサービスを用意し、起動してください。

nginxでロードバランシングする

ロードバランシングの方法にはLeast-Connectedを採用します。

default

upstream voicevox {
        least_conn; 
        server localhost:50021;
        server localhost:50022;
        server localhost:50023;
        server localhost:50024;
        server localhost:50025;
        server localhost:50026;
}

server {
        listen 80 default_server;
        server_name 172.16.0.nn;
        location / {
                proxy_pass http://voicevox;
        }
}

検証コード

下記の通りPythonで検証用のコードを書きました。

voicevox_multi.py

import sys
import time
import requests
import datetime
import concurrent.futures

_ENGINE_HOST = "http://172.16.0.nn"
_VV_SPEAKER = 3
_SAMPLE_STR = "はじめましてなのだ。ボイスボックスずんだもんなのだ。"

args = sys.argv
max_workers = int(args[1]) if(len(args) > 1) else 1

voices_executor = concurrent.futures.ProcessPoolExecutor(max_workers=max_workers)

def createVoice(text, filename):
    audio_query_params = {'text': text, 'speaker': _VV_SPEAKER}
    audio_query_res = requests.post(_ENGINE_HOST + '/audio_query', params=audio_query_params, data=None)
    if audio_query_res.status_code != 200:
        return

    synth_params = {'speaker': _VV_SPEAKER}
    synth_res = requests.post(_ENGINE_HOST + '/synthesis', params=synth_params, data=audio_query_res.text.encode('utf-8'))
    if synth_res.status_code != 200:
        return

    with open(filename, mode='wb') as f:
        f.write(synth_res.content)

print("max_workers is setted to %d." % max_workers)
ts_start = datetime.datetime.now().timestamp()

for i in range(1, 101, 1):
    filename = "%03d.wav" % (i)
    voices_executor.submit(createVoice , _SAMPLE_STR, filename)

voices_executor.shutdown(wait=True)

ts_end = datetime.datetime.now().timestamp()
exec_sec = ts_end - ts_start

print("Exec. time: %fsec." % exec_sec)

実行時間

最大ワーカー数、すなわちAPIサーバーへの最大接続数を変えて検証コードの実行時間を計測しました。

max_worker	実行時間(秒)
1	420.56
2	241.99
3	230.38
4	238.90
5	246.49
6	244.64

最も効率の良くなった最大ワーカー数3の場合で、ワーカー数1のときより1.82倍高速という結果になりました。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up