More than 1 year has passed since last update.

Unityからvoicevox_coreを直接呼び出す

Posted at 2023-04-26

はじめに

voicevox_engineを介さずにUnityから直接voicevox_coreを叩いてみました

unityからvoicevox_core直接叩いて動いたー pic.twitter.com/3Ju4qhipxd
— metaaa (@meta556) April 3, 2023

VOICEVOXとは

無料で使えるテキスト読み上げソフトウェアです。
詳しくは以下のリンクを参照してください。

voicevox_coreとは

VOICEVOX は役割に応じた 3 つのモジュール「エディター」「エンジン」「コア」で構成されています。

エディター
- GUIアプリケーション
エンジン
- Webサーバー
コア
- 動的ライブラリ

詳しくは以下のリンクを参照してください。
https://github.com/VOICEVOX/voicevox/blob/main/docs/%E5%85%A8%E4%BD%93%E6%A7%8B%E6%88%90.md

何故voicevox_coreを直接呼び出すのか

記事の執筆時点では、unityからvoicevox_engineに対してリクエストを送って利用する方法が一般的なようです。
voicevox_engineはLGPL v3 と、ソースコードの公開が不要な別ライセンスのデュアルライセンスで、別ライセンスを取得する際は管理者の方に対して求める必要があります。

実装

環境

Windows11
Unity2021.3.17f1
voicevox_core 0.14.3 CPUMode

コード全文

処理の流れ

Initialize後にttsかAudioQuery作成とSynthesisを行うことでwavを生成します

処理

voicevox_coreのAPIをラップしているだけなのであまり解説することがありませんが、以下がttsの例です。

test.cs

void Start()
{
    Initialize();
    TextToSpeech();
}

public void Initialize()
{
   VoicevoxUtils.Init();
   var option = Voicevox.MakeDefaultInitializeOptions();
   option.acceleration_mode = mode;
   var result = Voicevox.Initialize(option);
}

public void TextToSpeech()
{
   var option = Voicevox.MakeDefaultTtsOptions();
   var result = Voicevox.TextToSpeech(text, speakerId, option);
}

voicevox_coreのdllを呼び出す際にポインタを利用する必要があるためunsafeにしています。
筆者はunsafeのコードを今まで書いたことがなかったため、あまり良い書き方になっていないです。

Voicevox.cs

public static unsafe class Voicevox
{
    public static VoicevoxResultCode Initialize(VoicevoxInitializeOptions options)
    {
        string dict = VoicevoxUtils.GetOpenJTalkDict();
        if (!Directory.Exists(dict))
        {
            Debug.LogError("GetOpenJTalkDict not found : " + dict);
            return VoicevoxResultCode.VOICEVOX_RESULT_UNINITIALIZED_STATUS_ERROR;
        }

        byte[] dictBytes = Encoding.UTF8.GetBytes(dict);
        fixed (byte* pDictByte = dictBytes)
        {
            options.open_jtalk_dict_dir = pDictByte;
            var result = VoicevoxCoreApi.voicevox_initialize(options);
            
            return result;
        }
    }

    public static VoicevoxResultCode LoadModel(uint speakerId)
    {
        var result = VoicevoxCoreApi.voicevox_load_model(speakerId);
        return result;
    }

    public static VoicevoxResultCode TextToSpeech(string text, uint speakerId, VoicevoxTtsOptions options)
    {
        if (!IsModelLoaded(speakerId))
        {
            LoadModel(speakerId);
        }

        byte[] textBytes = Encoding.UTF8.GetBytes(text);
        var path = VoicevoxUtils.BasePath + VoicevoxConfig.wavSavePath;

        fixed (byte* stackTextBytesPtr =
                    stackalloc byte[textBytes.Length + VoicevoxCoreConstants.UTFByteArrayExcess])
        {
            for (int i = 0; i < textBytes.Length; i++)
            {
                stackTextBytesPtr[i] = textBytes[i];
            }

            nuint outputWavLength = 0;
            byte* outputWavPtr;

            VoicevoxResultCode result = VoicevoxCoreApi.voicevox_tts(
                stackTextBytesPtr,
                speakerId,
                options,
                &outputWavLength,
                &outputWavPtr
            );
            
            byte[] outputWav = new byte[outputWavLength];
            Marshal.Copy((IntPtr)outputWavPtr, outputWav, 0, (int)outputWavLength);


            if (!Directory.Exists(path))
            {
                VoicevoxUtils.InitializeFolder(VoicevoxConfig.wavSavePath);
            }

            path += VoicevoxConfig.wavFileName;

            File.WriteAllBytes(path, outputWav);
            VoicevoxCoreApi.voicevox_wav_free(outputWavPtr);
            return result;
        }
    }
}

参考

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up