はじめに
VOICEVOXをkubernetes(arm64)にデプロイしてNode-REDから利用してみました。
ソフトウェアには「中品質」とあるのですが、私には高品質と遜色ないように思えます。
前提環境
いつもの「おうちkubernetesクラスタ」と、arm64コンテナイメージのビルド用としてM1搭載のMac miniを使用しております。
- Raspberry Pi 4B
- kubernetes 1.22.0
- Private Registry (http)
- Mac mini 2020(M1) + Docker Desktop
- Node-RED v2.2.1 on kubernetes
VOICEVOXコンテナの作成
Mac mini 2020(M1)でコンテナをビルドします。(※Raspberry Pi 4Bでは試してません)
【2022/3/23】※COREとENGINEを0.11.4にしました。
FROM ubuntu:focal AS build
RUN apt update && \
DEBIAN_FRONTEND=noninteractive apt -y install wget unzip tar && \
mkdir -p /voicevox_engine && \
wget https://github.com/VOICEVOX/voicevox_core/releases/download/0.11.4/core.zip && \
unzip core.zip && \
mv core /voicevox_engine && \
wget https://github.com/VOICEVOX/onnxruntime-builder/releases/download/1.10.0.1/onnxruntime-linux-arm64-cpu-v1.10.0.tgz && \
tar xzvf onnxruntime-linux-arm64-cpu-v1.10.0.tgz && \
mv onnxruntime-linux-arm64-cpu-v1.10.0 /voicevox_engine
FROM ubuntu:focal
RUN apt update && \
DEBIAN_FRONTEND=noninteractive apt -y install git pip python3 python3-dev python3-wheel cmake g++ libsndfile1 && \
git clone -b 0.11.4 https://github.com/VOICEVOX/voicevox_engine.git && \
cd voicevox_engine/ && \
pip install -r requirements.txt -r requirements-test.txt
COPY --from=build /voicevox_engine /voicevox_engine
ENV VV_CPU_NUM_THREADS=4
CMD ["python3","/voicevox_engine/run.py","--voicelib_dir","/voicevox_engine/core","--runtime_dir","/voicevox_engine/onnxruntime-linux-arm64-cpu-v1.10.0/lib","--host","0.0.0.0"]
上記のブランチやファイルのバージョンの組み合わせ以外(armhfも)はほとんど試しておりません。
うちの環境だと120秒ほどでビルドできました。
% docker build -t 10.0.0.1:30500/voicevox_engine:20220223_arm64 .
[+] Building 120.0s (8/8) FINISHED
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 37B 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for docker.io/library/ubuntu:focal 2.0s
=> CACHED [build 1/2] FROM docker.io/library/ubuntu:focal@sha256:669e010b58baf5beb2836b253c1fd5768333f0d1dbcb834f7c07a4dc93f474be 0.0s
=> [build 2/2] RUN apt update && apt -y install wget unzip tar && mkdir /voicevox_engine && wget https://github.com/VOICEVOX/ 17.1s
=> [stage-1 2/3] RUN apt update && apt -y install git pip python3 python3-dev python3-wheel cmake g++ libsndfile1 && git clon 115.7s
=> [stage-1 3/3] COPY --from=build /voicevox_engine /voicevox_engine 0.1s
=> exporting to image 2.0s
=> => exporting layers 2.0s
=> => writing image sha256:0835677442abf1d1800637a63294ef55e3563e5163505103d64c237a19790792 0.0s
=> => naming to 10.0.0.1:30500/voicevox_engine:20220223_arm64 0.0s
Use 'docker scan' to run Snyk tests against images to find vulnerabilities and learn how to fix them
大きなイメージができますが、サイズを気にする人はご自身でなんとかしてください。
% docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
10.0.0.1:30500/voicevox_engine 20220223_arm64 0835677442ab 7 minutes ago 865MB
この状態でもdockerで起動して遊べます。
% docker run --rm -it -p '50021:50021' 10.0.0.1:30500/voicevox_engine:20220223_arm64
Downloading: "https://github.com/r9y9/open_jtalk/releases/download/v1.11.1/open_jtalk_dic_utf_8-1.11.tar.gz"
dic.tar.gz: 100%|███████████████████████████████████████████████████████████████████████████████████| 22.6M/22.6M [00:03<00:00, 6.95MB/s]
Extracting tar file /usr/local/lib/python3.8/dist-packages/pyopenjtalk/dic.tar.gz
Warning: cpu_num_threads is set to 0. ( The library leaves the decision to the synthesis runtime )
WARNING: Since openmp is enabled in this build, this API cannot be used to configure intra op num threads. Please use the openmp environment variables to control the number of threads.
INFO: Started server process [1]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:50021 (Press CTRL+C to quit)
MacOS上にcurlとsoxをインストールしておけば、コマンドラインで動作確認ができます。
% echo "こんにちはなのだ。" > a.txt
% curl -s -X POST "localhost:50021/audio_query?speaker=3" --get --data-urlencode text@a.txt > query.json
% curl -s -H "Content-Type: application/json" -X POST -d @query.json "localhost:50021/synthesis?speaker=3" > audio.wav
% play audio.wav
Private RegistryへのPush
自宅のPrivate Registryはhttpなので、Docker Engineの設定でinsecure-registriesとして登録しておきます。
Pushします。
% docker push 10.0.0.1:30500/voicevox_engine:20220223_arm64
The push refers to repository [10.0.0.1:30500/voicevox_engine]
fa548acaf8ff: Pushed
7b7b0f4b4cd6: Pushed
0c20a4bc193b: Layer already exists
20220223_arm64: digest: sha256:f8e22c0bf227f2e06f656f1d0f4c51ba6c636d51b5ca5bf42d8e122c97665833 size: 954
Registryを持ってないkubernetesの方は、tarでアーカイブして適当に圧縮してデプロイ予定のworkerノードにコピーしてloadしてください。
% docker save 10.0.0.1:30500/voicevox_engine:20220223_arm64 > voicevox.tar
<kubernetesノードへコピー>
$ docker load < voicevox.tar
Loaded image: 10.0.0.1:30500/voicevox_engine:20220223_arm64
kubernetesへのデプロイ
Serviceのマニフェスト例です。うちでは一応、ネームスペースを作ってNodePortでアクセスできるようにしますが、ご自宅のポリシーで良いと思います。
apiVersion: v1
kind: Service
metadata:
name: voicevox
namespace: voicevox
labels:
app: voicevox
spec:
type: NodePort
ports:
- name: http
port: 50021
targetPort: 50021
nodePort: 30521
selector:
app: voicevox
Podのマニフェスト例です。
Private RegistryもNodePortで公開しているので、コンテナイメージの指定は10.0.0.1ではなくlocalhostとしています。
apiVersion: v1
kind: Pod
metadata:
name: voicevox
namespace: voicevox
labels:
app: voicevox
spec:
containers:
- name: voicevox
image: localhost:30500/voicevox_engine:20220223_arm64
resources:
limits:
memory: 1.5Gi
ports:
- name: http
containerPort: 50021
あとはデプロイしてログを確認します。
$ kubectl apply -f voicevox-svc.yaml
service/voicevox created
$ kubectl apply -f voicevox-pod.yaml
pod/voicevox created
$ kubectl get all -n voicevox
NAME READY STATUS RESTARTS AGE
pod/voicevox 1/1 Running 0 3m7s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/voicevox NodePort 10.96.39.50 <none> 50021:30521/TCP 3m14s
$ kubectl logs voicevox -n voicevox
Downloading: "https://github.com/r9y9/open_jtalk/releases/download/v1.11.1/open_jtalk_dic_utf_8-1.11.tar.gz"
dic.tar.gz: 100%|██████████| 22.6M/22.6M [00:02<00:00, 11.0MB/s]
Warning: cpu_num_threads is set to 0. ( The library leaves the decision to the synthesis runtime )
INFO: Started server process [1]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:50021 (Press CTRL+C to quit)
私の力不足でスレッド数の変更ができなかったのですが、現時点では動けばいいかな考えているので問題ないです。
(速さを求めるならM1 Mac mini使えば良い事なので...)
Node-REDフローの作成
入力テキストを音声ファイルで出力するVOICEVOXサブフローを作りました。
処理内容はhttpリクエストを2回するだけです。
サブフローの環境変数とUIは以下のようにしています。
音声ファイルは、テストでいつもお世話になっている「node-red-contrib-play-audio」のノードでブラウザ上から再生します。
デバッグのタイムスタンプでinjectから再生開始までの時間を確認すると、10秒程度で音声ファイルが生成できています。
サブフローとテストのフローはここに貼っておきます。
[
{
"id": "32238322.9dad8c",
"type": "subflow",
"name": "今何時?",
"info": "\n**msg.hour**:時を代入します。(数値)\n\n**msg.min**:分を代入します。(数値)\n\n**msg.year**:年を代入します。(数値)\n\n**msg.month**:月を代入します。(数値)\n\n**msg.day**:日を代入します。(数値)\n\n**msg.dow**:曜日を代入します。(英語文字列)",
"category": "",
"in": [
{
"x": 60,
"y": 60,
"wires": [
{
"id": "82d3f0aa.7008e"
}
]
}
],
"out": [
{
"x": 480,
"y": 60,
"wires": [
{
"id": "fb2dc2fd8e339d6c",
"port": 0
}
]
}
],
"env": [
{
"name": "ZONE",
"type": "str",
"value": "+0900",
"ui": {
"icon": "font-awesome/fa-clock-o",
"label": {
"en-US": "タイムゾーン"
},
"type": "select",
"opts": {
"opts": [
{
"l": {
"en-US": "Asia/Tokyo"
},
"v": "+0900"
}
]
}
}
}
],
"meta": {},
"color": "#C0DEED",
"icon": "font-awesome/fa-clock-o"
},
{
"id": "82d3f0aa.7008e",
"type": "change",
"z": "32238322.9dad8c",
"name": "",
"rules": [
{
"t": "set",
"p": "zone",
"pt": "msg",
"to": "ZONE",
"tot": "env"
},
{
"t": "set",
"p": "millis",
"pt": "msg",
"to": "$millis()",
"tot": "jsonata"
},
{
"t": "set",
"p": "hour",
"pt": "msg",
"to": "$number($fromMillis(millis,'[H01]',zone))",
"tot": "jsonata"
},
{
"t": "set",
"p": "min",
"pt": "msg",
"to": "$number($fromMillis(millis,'[m01]',zone))",
"tot": "jsonata"
},
{
"t": "set",
"p": "dow",
"pt": "msg",
"to": "$fromMillis(millis,'[F]',zone)",
"tot": "jsonata"
},
{
"t": "set",
"p": "year",
"pt": "msg",
"to": "$number($fromMillis(millis,'[Y]',zone))",
"tot": "jsonata"
},
{
"t": "set",
"p": "month",
"pt": "msg",
"to": "$number($fromMillis(millis,'[M]',zone))",
"tot": "jsonata"
},
{
"t": "set",
"p": "day",
"pt": "msg",
"to": "$number($fromMillis(millis,'[D]',zone))",
"tot": "jsonata"
}
],
"action": "",
"property": "",
"from": "",
"to": "",
"reg": false,
"x": 200,
"y": 60,
"wires": [
[
"fb2dc2fd8e339d6c"
]
]
},
{
"id": "fb2dc2fd8e339d6c",
"type": "function",
"z": "32238322.9dad8c",
"name": "",
"func": "\nif(msg.zone == \"+0900\"){\n switch(msg.dow){\n case \"monday\":\n msg.dow = \"月曜日\";\n break;\n case \"tuesday\":\n msg.dow = \"火曜日\";\n break;\n case \"wednesday\":\n msg.dow = \"水曜日\";\n break;\n case \"thursday\":\n msg.dow = \"木曜日\";\n break;\n case \"friday\":\n msg.dow = \"金曜日\";\n break;\n case \"saturday\":\n msg.dow = \"土曜日\";\n break;\n case \"sunday\":\n msg.dow = \"日曜日\";\n break;\n }\n}\n\nreturn msg;",
"outputs": 1,
"noerr": 0,
"initialize": "",
"finalize": "",
"libs": [],
"x": 370,
"y": 60,
"wires": [
[]
]
},
{
"id": "07838d665898207e",
"type": "subflow",
"name": "VOICEVOX",
"info": "",
"category": "",
"in": [
{
"x": 40,
"y": 80,
"wires": [
{
"id": "6034de0ad57d9569"
}
]
}
],
"out": [
{
"x": 850,
"y": 80,
"wires": [
{
"id": "4a20a945c8e116e0",
"port": 0
}
]
}
],
"env": [
{
"name": "URL",
"type": "str",
"value": "http://10.0.0.1:30521",
"ui": {
"label": {
"ja": "URL"
},
"type": "input",
"opts": {
"types": [
"str"
]
}
}
},
{
"name": "SPEAKER",
"type": "str",
"value": "3",
"ui": {
"label": {
"ja": "キャラクター"
},
"type": "select",
"opts": {
"opts": [
{
"l": {
"ja": "四国めたん(ノーマル)"
},
"v": "2"
},
{
"l": {
"ja": "四国めたん(あまあま)"
},
"v": "0"
},
{
"l": {
"ja": "四国めたん(ツンツン)"
},
"v": "6"
},
{
"l": {
"ja": "四国めたん(セクシー)"
},
"v": "4"
},
{
"l": {
"ja": "ずんだもん(ノーマル)"
},
"v": "3"
},
{
"l": {
"ja": "ずんだもん(あまあま)"
},
"v": "1"
},
{
"l": {
"ja": "ずんだもん(ツンツン)"
},
"v": "7"
},
{
"l": {
"ja": "ずんだもん(セクシー)"
},
"v": "5"
},
{
"l": {
"ja": "春日部つむぎ(ノーマル)"
},
"v": "8"
},
{
"l": {
"ja": "雨晴はう(ノーマル)"
},
"v": "10"
},
{
"l": {
"ja": "波音リツ(ノーマル)"
},
"v": "9"
}
]
}
}
}
],
"meta": {},
"color": "#DDAA99"
},
{
"id": "4c110c0d3cddaff5",
"type": "http request",
"z": "07838d665898207e",
"name": "",
"method": "use",
"ret": "txt",
"paytoqs": "ignore",
"url": "",
"tls": "",
"persist": false,
"proxy": "",
"authType": "",
"senderr": false,
"credentials": {},
"x": 370,
"y": 80,
"wires": [
[
"82e35d27c239aca1"
]
]
},
{
"id": "6034de0ad57d9569",
"type": "change",
"z": "07838d665898207e",
"name": "",
"rules": [
{
"t": "set",
"p": "method",
"pt": "msg",
"to": "POST",
"tot": "str"
},
{
"t": "set",
"p": "url",
"pt": "msg",
"to": "URL",
"tot": "env"
},
{
"t": "change",
"p": "url",
"pt": "msg",
"from": "$",
"fromt": "re",
"to": "/audio_query?speaker=",
"tot": "str"
},
{
"t": "change",
"p": "url",
"pt": "msg",
"from": "$",
"fromt": "re",
"to": "SPEAKER",
"tot": "env"
},
{
"t": "set",
"p": "payload",
"pt": "msg",
"to": "$encodeUrlComponent(text)",
"tot": "jsonata"
},
{
"t": "change",
"p": "payload",
"pt": "msg",
"from": "^",
"fromt": "re",
"to": "&text=",
"tot": "str"
},
{
"t": "change",
"p": "url",
"pt": "msg",
"from": "$",
"fromt": "re",
"to": "payload",
"tot": "msg"
},
{
"t": "delete",
"p": "payload",
"pt": "msg"
}
],
"action": "",
"property": "",
"from": "",
"to": "",
"reg": false,
"x": 180,
"y": 80,
"wires": [
[
"4c110c0d3cddaff5"
]
]
},
{
"id": "82e35d27c239aca1",
"type": "change",
"z": "07838d665898207e",
"name": "",
"rules": [
{
"t": "delete",
"p": "headers",
"pt": "msg"
},
{
"t": "set",
"p": "headers",
"pt": "msg",
"to": "{\"Content-Type\":\"application/json\"}",
"tot": "json"
},
{
"t": "set",
"p": "method",
"pt": "msg",
"to": "POST",
"tot": "str"
},
{
"t": "set",
"p": "url",
"pt": "msg",
"to": "URL",
"tot": "env"
},
{
"t": "change",
"p": "url",
"pt": "msg",
"from": "$",
"fromt": "re",
"to": "/synthesis?speaker=",
"tot": "str"
},
{
"t": "change",
"p": "url",
"pt": "msg",
"from": "$",
"fromt": "re",
"to": "SPEAKER",
"tot": "env"
}
],
"action": "",
"property": "",
"from": "",
"to": "",
"reg": false,
"x": 550,
"y": 80,
"wires": [
[
"4a20a945c8e116e0"
]
]
},
{
"id": "4a20a945c8e116e0",
"type": "http request",
"z": "07838d665898207e",
"name": "",
"method": "use",
"ret": "bin",
"paytoqs": "ignore",
"url": "",
"tls": "",
"persist": false,
"proxy": "",
"authType": "",
"senderr": false,
"credentials": {},
"x": 730,
"y": 80,
"wires": [
[]
]
},
{
"id": "a895e86ec7e60f1c",
"type": "subflow:07838d665898207e",
"z": "c33d685b.315298",
"name": "",
"x": 590,
"y": 980,
"wires": [
[
"3f21056bafb1d156",
"f725696724db18ea"
]
]
},
{
"id": "3f21056bafb1d156",
"type": "play audio",
"z": "c33d685b.315298",
"name": "",
"voice": "",
"x": 780,
"y": 980,
"wires": []
},
{
"id": "93615a8e104b6ad6",
"type": "inject",
"z": "c33d685b.315298",
"name": "",
"props": [],
"repeat": "",
"crontab": "",
"once": false,
"onceDelay": 0.1,
"topic": "",
"x": 90,
"y": 1000,
"wires": [
[
"a574c6ae1e9ce8f3",
"f725696724db18ea"
]
]
},
{
"id": "a574c6ae1e9ce8f3",
"type": "subflow:32238322.9dad8c",
"z": "c33d685b.315298",
"name": "",
"x": 240,
"y": 980,
"wires": [
[
"76cf1a5d7739a242"
]
]
},
{
"id": "76cf1a5d7739a242",
"type": "template",
"z": "c33d685b.315298",
"name": "時報メッセージ",
"field": "text",
"fieldType": "msg",
"format": "handlebars",
"syntax": "mustache",
"template": "{{hour}}時{{min}}分をお知らせなのだ。",
"output": "str",
"x": 410,
"y": 980,
"wires": [
[
"a895e86ec7e60f1c"
]
]
},
{
"id": "f725696724db18ea",
"type": "debug",
"z": "c33d685b.315298",
"name": "",
"active": true,
"tosidebar": true,
"console": false,
"tostatus": false,
"complete": "text",
"targetType": "msg",
"statusVal": "",
"statusType": "auto",
"x": 490,
"y": 1120,
"wires": []
}
]
おわりに
個人的にVOICEVOXは将来性を感じているので、どんどん成長してほしい製品です。
あと、控えめに言っても「ずんだもん」ちゃんの声は最高に可愛いので、是非みなさんも聴いてほしいです!
音声を投稿する際にはクレジット表記の規約があります。
各キャラクターの利用規約をご確認ください。
ソフトウェアについては、VOICEVOXの利用規約もご確認ください。
GitHubに書かれてあるライセンスもご確認ください。