More than 3 years have passed since last update.

VOICEVOXのAPIからリップシンク用のlabファイルを生成

Last updated at 2022-07-11Posted at 2022-07-11

GUIには機能があるが、API経由での機能が見つからなかったので。

def get_labdata(jsondata):
    labdata=""
    now_length=0
    timescale = 10000000/float(jsondata["speedScale"])
    for i in jsondata["accent_phrases"]:
      for j in i["moras"]:
        if j["consonant_length"] is not None:
          labdata += str(int(now_length*timescale)) + " "
          now_length += j["consonant_length"]
          labdata += str(int(now_length*timescale)) + " " + j["consonant"] + "\n"
        labdata += str(int(now_length*timescale)) + " "
        now_length += j["vowel_length"]
        labdata += str(int(now_length*timescale)) + " " + j["vowel"] + "\n"
      if i["pause_mora"] is not None:
        labdata += str(int(now_length*timescale)) + " "
        now_length += i["pause_mora"]["vowel_length"]
        labdata += str(int(now_length*timescale)) + " " + "pau\n"
    return labdata

request = requests.post('http://voicevoxサーバ/audio_query_from_preset?preset_id=プリセットID&text=読み上げメッセージ').json()

request["pitchScale"]=XXX
request["speedScale"]=XXX
request["intonationScale"]=XXX

labdata = get_labdata(request)
wavdata = requests.post('http://voicevoxサーバ/synthesis?speaker=スタイルID', data=json.dumps(request,ensure_ascii=False).encode("utf-8"))
wavf = open('static/tmp/'+ファイル名+'.wav', 'wb')
try: wavf.write(wavdata.content)
finally: wavf.close()
labf = open('static/tmp/'+ファイル名+'.lab', 'wb')
try: labf.write(labdata.encode('utf-8'))
finally: labf.close()

/audio_query_from_presetでクエリ作成した時点でconsonant_lengthやvowel_lengthの情報は出力されるが、その後にspeedScaleを変更すると、当然タイミングがずれる。
しかし/synthesisの後に改めてonsonant_lengthやvowel_lengthの情報を取得する方法が見つからなかったので、単純にspeedScale倍の速度で口を動かしてもらう。
これで一応は、それっぽく動いてくれた。

VOICEVOXのAPIからlabファイル生成、/audio_query_from_preset時の出力から無理やり変換したが、もっとスマートな方法は無いのかな？https://t.co/jbn1rf0PyU pic.twitter.com/DlSlUzmrXg
— lw05006 (@lw050061) July 11, 2022

（多分、厳密にはずれる。speedScaleが変わる時は、それ用のプリセットを毎回作るのが本来のやり方かな？」）

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up