More than 3 years have passed since last update.

ラズパイ(Node-RED)でVOICEVOXを使い中品質のTTSを試す

Last updated at 2022-03-23Posted at 2022-02-23

はじめに

VOICEVOXをkubernetes(arm64)にデプロイしてNode-REDから利用してみました。
ソフトウェアには「中品質」とあるのですが、私には高品質と遜色ないように思えます。

前提環境

いつもの「おうちkubernetesクラスタ」と、arm64コンテナイメージのビルド用としてM1搭載のMac miniを使用しております。

Raspberry Pi 4B
kubernetes 1.22.0
Private Registry (http)
Mac mini 2020(M1) + Docker Desktop
Node-RED v2.2.1 on kubernetes

VOICEVOXコンテナの作成

Mac mini 2020(M1)でコンテナをビルドします。(※Raspberry Pi 4Bでは試してません)
【2022/3/23】※COREとENGINEを0.11.4にしました。

Dockerfile

FROM ubuntu:focal AS build
RUN apt update && \
 DEBIAN_FRONTEND=noninteractive apt -y install wget unzip tar && \
 mkdir -p /voicevox_engine && \
 wget https://github.com/VOICEVOX/voicevox_core/releases/download/0.11.4/core.zip && \
 unzip core.zip && \
 mv core /voicevox_engine && \
 wget https://github.com/VOICEVOX/onnxruntime-builder/releases/download/1.10.0.1/onnxruntime-linux-arm64-cpu-v1.10.0.tgz && \
 tar xzvf onnxruntime-linux-arm64-cpu-v1.10.0.tgz && \
 mv onnxruntime-linux-arm64-cpu-v1.10.0 /voicevox_engine

FROM ubuntu:focal
RUN apt update && \
 DEBIAN_FRONTEND=noninteractive apt -y install git pip python3 python3-dev python3-wheel cmake g++ libsndfile1 && \
 git clone -b 0.11.4 https://github.com/VOICEVOX/voicevox_engine.git && \
 cd voicevox_engine/ && \
 pip install -r requirements.txt -r requirements-test.txt
COPY --from=build /voicevox_engine /voicevox_engine
ENV VV_CPU_NUM_THREADS=4

CMD ["python3","/voicevox_engine/run.py","--voicelib_dir","/voicevox_engine/core","--runtime_dir","/voicevox_engine/onnxruntime-linux-arm64-cpu-v1.10.0/lib","--host","0.0.0.0"]

上記のブランチやファイルのバージョンの組み合わせ以外(armhfも)はほとんど試しておりません。
うちの環境だと１２０秒ほどでビルドできました。

% docker build -t 10.0.0.1:30500/voicevox_engine:20220223_arm64 .
[+] Building 120.0s (8/8) FINISHED                                                                                                       
 => [internal] load build definition from Dockerfile                                                                                0.0s
 => => transferring dockerfile: 37B                                                                                                 0.0s
 => [internal] load .dockerignore                                                                                                   0.0s
 => => transferring context: 2B                                                                                                     0.0s
 => [internal] load metadata for docker.io/library/ubuntu:focal                                                                     2.0s
 => CACHED [build 1/2] FROM docker.io/library/ubuntu:focal@sha256:669e010b58baf5beb2836b253c1fd5768333f0d1dbcb834f7c07a4dc93f474be  0.0s
 => [build 2/2] RUN apt update &&  apt -y install wget unzip tar &&  mkdir /voicevox_engine &&  wget https://github.com/VOICEVOX/  17.1s
 => [stage-1 2/3] RUN apt update &&  apt -y install git pip python3 python3-dev python3-wheel cmake g++ libsndfile1 &&  git clon  115.7s
 => [stage-1 3/3] COPY --from=build /voicevox_engine /voicevox_engine                                                               0.1s 
 => exporting to image                                                                                                              2.0s 
 => => exporting layers                                                                                                             2.0s 
 => => writing image sha256:0835677442abf1d1800637a63294ef55e3563e5163505103d64c237a19790792                                        0.0s 
 => => naming to 10.0.0.1:30500/voicevox_engine:20220223_arm64                                                                      0.0s 
                                                                                                                                         
Use 'docker scan' to run Snyk tests against images to find vulnerabilities and learn how to fix them

大きなイメージができますが、サイズを気にする人はご自身でなんとかしてください。

% docker images                                          
REPOSITORY                           TAG              IMAGE ID       CREATED         SIZE
10.0.0.1:30500/voicevox_engine       20220223_arm64   0835677442ab   7 minutes ago   865MB

この状態でもdockerで起動して遊べます。

% docker run --rm -it -p '50021:50021' 10.0.0.1:30500/voicevox_engine:20220223_arm64
Downloading: "https://github.com/r9y9/open_jtalk/releases/download/v1.11.1/open_jtalk_dic_utf_8-1.11.tar.gz"
dic.tar.gz: 100%|███████████████████████████████████████████████████████████████████████████████████| 22.6M/22.6M [00:03<00:00, 6.95MB/s]
Extracting tar file /usr/local/lib/python3.8/dist-packages/pyopenjtalk/dic.tar.gz
Warning: cpu_num_threads is set to 0. ( The library leaves the decision to the synthesis runtime )
WARNING: Since openmp is enabled in this build, this API cannot be used to configure intra op num threads. Please use the openmp environment variables to control the number of threads.
INFO:     Started server process [1]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:50021 (Press CTRL+C to quit)

MacOS上にcurlとsoxをインストールしておけば、コマンドラインで動作確認ができます。

% echo "こんにちはなのだ。" > a.txt
% curl -s -X POST "localhost:50021/audio_query?speaker=3" --get --data-urlencode text@a.txt > query.json
% curl -s -H "Content-Type: application/json" -X POST -d @query.json "localhost:50021/synthesis?speaker=3" > audio.wav
% play audio.wav

Private RegistryへのPush

自宅のPrivate Registryはhttpなので、Docker Engineの設定でinsecure-registriesとして登録しておきます。

Pushします。

% docker push 10.0.0.1:30500/voicevox_engine:20220223_arm64                         
The push refers to repository [10.0.0.1:30500/voicevox_engine]
fa548acaf8ff: Pushed 
7b7b0f4b4cd6: Pushed 
0c20a4bc193b: Layer already exists 
20220223_arm64: digest: sha256:f8e22c0bf227f2e06f656f1d0f4c51ba6c636d51b5ca5bf42d8e122c97665833 size: 954

Registryを持ってないkubernetesの方は、tarでアーカイブして適当に圧縮してデプロイ予定のworkerノードにコピーしてloadしてください。

% docker save 10.0.0.1:30500/voicevox_engine:20220223_arm64 > voicevox.tar
<kubernetesノードへコピー>
$ docker load < voicevox.tar
Loaded image: 10.0.0.1:30500/voicevox_engine:20220223_arm64

kubernetesへのデプロイ

Serviceのマニフェスト例です。うちでは一応、ネームスペースを作ってNodePortでアクセスできるようにしますが、ご自宅のポリシーで良いと思います。

voicevox-svc.yaml

apiVersion: v1
kind: Service
metadata:
  name: voicevox
  namespace: voicevox
  labels:
    app: voicevox
spec:
  type: NodePort
  ports:
    - name: http
      port: 50021
      targetPort: 50021
      nodePort: 30521
  selector:
    app: voicevox

Podのマニフェスト例です。
Private RegistryもNodePortで公開しているので、コンテナイメージの指定は10.0.0.1ではなくlocalhostとしています。

voicevox-pod.yaml

apiVersion: v1
kind: Pod
metadata:
  name: voicevox
  namespace: voicevox
  labels:
    app: voicevox
spec:
  containers:
  - name: voicevox
    image: localhost:30500/voicevox_engine:20220223_arm64
    resources:
      limits:
        memory: 1.5Gi
    ports:
    - name: http
      containerPort: 50021

あとはデプロイしてログを確認します。

$ kubectl apply -f voicevox-svc.yaml 
service/voicevox created
$ kubectl apply -f voicevox-pod.yaml 
pod/voicevox created
$ kubectl get all -n voicevox
NAME           READY   STATUS    RESTARTS   AGE
pod/voicevox   1/1     Running   0          3m7s

NAME               TYPE       CLUSTER-IP    EXTERNAL-IP   PORT(S)           AGE
service/voicevox   NodePort   10.96.39.50   <none>        50021:30521/TCP   3m14s
$ kubectl logs voicevox -n voicevox
Downloading: "https://github.com/r9y9/open_jtalk/releases/download/v1.11.1/open_jtalk_dic_utf_8-1.11.tar.gz"
dic.tar.gz: 100%|██████████| 22.6M/22.6M [00:02<00:00, 11.0MB/s]
Warning: cpu_num_threads is set to 0. ( The library leaves the decision to the synthesis runtime )
INFO:     Started server process [1]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:50021 (Press CTRL+C to quit)

私の力不足でスレッド数の変更ができなかったのですが、現時点では動けばいいかな考えているので問題ないです。
(速さを求めるならM1 Mac mini使えば良い事なので...)

Node-REDフローの作成

入力テキストを音声ファイルで出力するVOICEVOXサブフローを作りました。
処理内容はhttpリクエストを2回するだけです。

サブフローの環境変数とUIは以下のようにしています。

音声ファイルは、テストでいつもお世話になっている「node-red-contrib-play-audio」のノードでブラウザ上から再生します。

デバッグのタイムスタンプでinjectから再生開始までの時間を確認すると、１０秒程度で音声ファイルが生成できています。

サブフローとテストのフローはここに貼っておきます。

[
  {
    "id": "32238322.9dad8c",
    "type": "subflow",
    "name": "今何時？",
    "info": "\n**msg.hour**：時を代入します。(数値)\n\n**msg.min**：分を代入します。(数値)\n\n**msg.year**：年を代入します。(数値)\n\n**msg.month**：月を代入します。(数値)\n\n**msg.day**：日を代入します。(数値)\n\n**msg.dow**：曜日を代入します。(英語文字列)",
    "category": "",
    "in": [
      {
        "x": 60,
        "y": 60,
        "wires": [
          {
            "id": "82d3f0aa.7008e"
          }
        ]
      }
    ],
    "out": [
      {
        "x": 480,
        "y": 60,
        "wires": [
          {
            "id": "fb2dc2fd8e339d6c",
            "port": 0
          }
        ]
      }
    ],
    "env": [
      {
        "name": "ZONE",
        "type": "str",
        "value": "+0900",
        "ui": {
          "icon": "font-awesome/fa-clock-o",
          "label": {
            "en-US": "タイムゾーン"
          },
          "type": "select",
          "opts": {
            "opts": [
              {
                "l": {
                  "en-US": "Asia/Tokyo"
                },
                "v": "+0900"
              }
            ]
          }
        }
      }
    ],
    "meta": {},
    "color": "#C0DEED",
    "icon": "font-awesome/fa-clock-o"
  },
  {
    "id": "82d3f0aa.7008e",
    "type": "change",
    "z": "32238322.9dad8c",
    "name": "",
    "rules": [
      {
        "t": "set",
        "p": "zone",
        "pt": "msg",
        "to": "ZONE",
        "tot": "env"
      },
      {
        "t": "set",
        "p": "millis",
        "pt": "msg",
        "to": "$millis()",
        "tot": "jsonata"
      },
      {
        "t": "set",
        "p": "hour",
        "pt": "msg",
        "to": "$number($fromMillis(millis,'[H01]',zone))",
        "tot": "jsonata"
      },
      {
        "t": "set",
        "p": "min",
        "pt": "msg",
        "to": "$number($fromMillis(millis,'[m01]',zone))",
        "tot": "jsonata"
      },
      {
        "t": "set",
        "p": "dow",
        "pt": "msg",
        "to": "$fromMillis(millis,'[F]',zone)",
        "tot": "jsonata"
      },
      {
        "t": "set",
        "p": "year",
        "pt": "msg",
        "to": "$number($fromMillis(millis,'[Y]',zone))",
        "tot": "jsonata"
      },
      {
        "t": "set",
        "p": "month",
        "pt": "msg",
        "to": "$number($fromMillis(millis,'[M]',zone))",
        "tot": "jsonata"
      },
      {
        "t": "set",
        "p": "day",
        "pt": "msg",
        "to": "$number($fromMillis(millis,'[D]',zone))",
        "tot": "jsonata"
      }
    ],
    "action": "",
    "property": "",
    "from": "",
    "to": "",
    "reg": false,
    "x": 200,
    "y": 60,
    "wires": [
      [
        "fb2dc2fd8e339d6c"
      ]
    ]
  },
  {
    "id": "fb2dc2fd8e339d6c",
    "type": "function",
    "z": "32238322.9dad8c",
    "name": "",
    "func": "\nif(msg.zone == \"+0900\"){\n    switch(msg.dow){\n        case \"monday\":\n            msg.dow = \"月曜日\";\n            break;\n        case \"tuesday\":\n            msg.dow = \"火曜日\";\n            break;\n        case \"wednesday\":\n            msg.dow = \"水曜日\";\n            break;\n        case \"thursday\":\n            msg.dow = \"木曜日\";\n            break;\n        case \"friday\":\n            msg.dow = \"金曜日\";\n            break;\n        case \"saturday\":\n            msg.dow = \"土曜日\";\n            break;\n        case \"sunday\":\n            msg.dow = \"日曜日\";\n            break;\n    }\n}\n\nreturn msg;",
    "outputs": 1,
    "noerr": 0,
    "initialize": "",
    "finalize": "",
    "libs": [],
    "x": 370,
    "y": 60,
    "wires": [
      []
    ]
  },
  {
    "id": "07838d665898207e",
    "type": "subflow",
    "name": "VOICEVOX",
    "info": "",
    "category": "",
    "in": [
      {
        "x": 40,
        "y": 80,
        "wires": [
          {
            "id": "6034de0ad57d9569"
          }
        ]
      }
    ],
    "out": [
      {
        "x": 850,
        "y": 80,
        "wires": [
          {
            "id": "4a20a945c8e116e0",
            "port": 0
          }
        ]
      }
    ],
    "env": [
      {
        "name": "URL",
        "type": "str",
        "value": "http://10.0.0.1:30521",
        "ui": {
          "label": {
            "ja": "URL"
          },
          "type": "input",
          "opts": {
            "types": [
              "str"
            ]
          }
        }
      },
      {
        "name": "SPEAKER",
        "type": "str",
        "value": "3",
        "ui": {
          "label": {
            "ja": "キャラクター"
          },
          "type": "select",
          "opts": {
            "opts": [
              {
                "l": {
                  "ja": "四国めたん(ノーマル)"
                },
                "v": "2"
              },
              {
                "l": {
                  "ja": "四国めたん(あまあま)"
                },
                "v": "0"
              },
              {
                "l": {
                  "ja": "四国めたん(ツンツン)"
                },
                "v": "6"
              },
              {
                "l": {
                  "ja": "四国めたん(セクシー)"
                },
                "v": "4"
              },
              {
                "l": {
                  "ja": "ずんだもん(ノーマル)"
                },
                "v": "3"
              },
              {
                "l": {
                  "ja": "ずんだもん(あまあま)"
                },
                "v": "1"
              },
              {
                "l": {
                  "ja": "ずんだもん(ツンツン)"
                },
                "v": "7"
              },
              {
                "l": {
                  "ja": "ずんだもん(セクシー)"
                },
                "v": "5"
              },
              {
                "l": {
                  "ja": "春日部つむぎ(ノーマル)"
                },
                "v": "8"
              },
              {
                "l": {
                  "ja": "雨晴はう(ノーマル)"
                },
                "v": "10"
              },
              {
                "l": {
                  "ja": "波音リツ(ノーマル)"
                },
                "v": "9"
              }
            ]
          }
        }
      }
    ],
    "meta": {},
    "color": "#DDAA99"
  },
  {
    "id": "4c110c0d3cddaff5",
    "type": "http request",
    "z": "07838d665898207e",
    "name": "",
    "method": "use",
    "ret": "txt",
    "paytoqs": "ignore",
    "url": "",
    "tls": "",
    "persist": false,
    "proxy": "",
    "authType": "",
    "senderr": false,
    "credentials": {},
    "x": 370,
    "y": 80,
    "wires": [
      [
        "82e35d27c239aca1"
      ]
    ]
  },
  {
    "id": "6034de0ad57d9569",
    "type": "change",
    "z": "07838d665898207e",
    "name": "",
    "rules": [
      {
        "t": "set",
        "p": "method",
        "pt": "msg",
        "to": "POST",
        "tot": "str"
      },
      {
        "t": "set",
        "p": "url",
        "pt": "msg",
        "to": "URL",
        "tot": "env"
      },
      {
        "t": "change",
        "p": "url",
        "pt": "msg",
        "from": "$",
        "fromt": "re",
        "to": "/audio_query?speaker=",
        "tot": "str"
      },
      {
        "t": "change",
        "p": "url",
        "pt": "msg",
        "from": "$",
        "fromt": "re",
        "to": "SPEAKER",
        "tot": "env"
      },
      {
        "t": "set",
        "p": "payload",
        "pt": "msg",
        "to": "$encodeUrlComponent(text)",
        "tot": "jsonata"
      },
      {
        "t": "change",
        "p": "payload",
        "pt": "msg",
        "from": "^",
        "fromt": "re",
        "to": "&text=",
        "tot": "str"
      },
      {
        "t": "change",
        "p": "url",
        "pt": "msg",
        "from": "$",
        "fromt": "re",
        "to": "payload",
        "tot": "msg"
      },
      {
        "t": "delete",
        "p": "payload",
        "pt": "msg"
      }
    ],
    "action": "",
    "property": "",
    "from": "",
    "to": "",
    "reg": false,
    "x": 180,
    "y": 80,
    "wires": [
      [
        "4c110c0d3cddaff5"
      ]
    ]
  },
  {
    "id": "82e35d27c239aca1",
    "type": "change",
    "z": "07838d665898207e",
    "name": "",
    "rules": [
      {
        "t": "delete",
        "p": "headers",
        "pt": "msg"
      },
      {
        "t": "set",
        "p": "headers",
        "pt": "msg",
        "to": "{\"Content-Type\":\"application/json\"}",
        "tot": "json"
      },
      {
        "t": "set",
        "p": "method",
        "pt": "msg",
        "to": "POST",
        "tot": "str"
      },
      {
        "t": "set",
        "p": "url",
        "pt": "msg",
        "to": "URL",
        "tot": "env"
      },
      {
        "t": "change",
        "p": "url",
        "pt": "msg",
        "from": "$",
        "fromt": "re",
        "to": "/synthesis?speaker=",
        "tot": "str"
      },
      {
        "t": "change",
        "p": "url",
        "pt": "msg",
        "from": "$",
        "fromt": "re",
        "to": "SPEAKER",
        "tot": "env"
      }
    ],
    "action": "",
    "property": "",
    "from": "",
    "to": "",
    "reg": false,
    "x": 550,
    "y": 80,
    "wires": [
      [
        "4a20a945c8e116e0"
      ]
    ]
  },
  {
    "id": "4a20a945c8e116e0",
    "type": "http request",
    "z": "07838d665898207e",
    "name": "",
    "method": "use",
    "ret": "bin",
    "paytoqs": "ignore",
    "url": "",
    "tls": "",
    "persist": false,
    "proxy": "",
    "authType": "",
    "senderr": false,
    "credentials": {},
    "x": 730,
    "y": 80,
    "wires": [
      []
    ]
  },
  {
    "id": "a895e86ec7e60f1c",
    "type": "subflow:07838d665898207e",
    "z": "c33d685b.315298",
    "name": "",
    "x": 590,
    "y": 980,
    "wires": [
      [
        "3f21056bafb1d156",
        "f725696724db18ea"
      ]
    ]
  },
  {
    "id": "3f21056bafb1d156",
    "type": "play audio",
    "z": "c33d685b.315298",
    "name": "",
    "voice": "",
    "x": 780,
    "y": 980,
    "wires": []
  },
  {
    "id": "93615a8e104b6ad6",
    "type": "inject",
    "z": "c33d685b.315298",
    "name": "",
    "props": [],
    "repeat": "",
    "crontab": "",
    "once": false,
    "onceDelay": 0.1,
    "topic": "",
    "x": 90,
    "y": 1000,
    "wires": [
      [
        "a574c6ae1e9ce8f3",
        "f725696724db18ea"
      ]
    ]
  },
  {
    "id": "a574c6ae1e9ce8f3",
    "type": "subflow:32238322.9dad8c",
    "z": "c33d685b.315298",
    "name": "",
    "x": 240,
    "y": 980,
    "wires": [
      [
        "76cf1a5d7739a242"
      ]
    ]
  },
  {
    "id": "76cf1a5d7739a242",
    "type": "template",
    "z": "c33d685b.315298",
    "name": "時報メッセージ",
    "field": "text",
    "fieldType": "msg",
    "format": "handlebars",
    "syntax": "mustache",
    "template": "{{hour}}時{{min}}分をお知らせなのだ。",
    "output": "str",
    "x": 410,
    "y": 980,
    "wires": [
      [
        "a895e86ec7e60f1c"
      ]
    ]
  },
  {
    "id": "f725696724db18ea",
    "type": "debug",
    "z": "c33d685b.315298",
    "name": "",
    "active": true,
    "tosidebar": true,
    "console": false,
    "tostatus": false,
    "complete": "text",
    "targetType": "msg",
    "statusVal": "",
    "statusType": "auto",
    "x": 490,
    "y": 1120,
    "wires": []
  }
]

おわりに

個人的にVOICEVOXは将来性を感じているので、どんどん成長してほしい製品です。
あと、控えめに言っても「ずんだもん」ちゃんの声は最高に可愛いので、是非みなさんも聴いてほしいです！

音声を投稿する際にはクレジット表記の規約があります。
各キャラクターの利用規約をご確認ください。

ソフトウェアについては、VOICEVOXの利用規約もご確認ください。

GitHubに書かれてあるライセンスもご確認ください。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up