More than 5 years have passed since last update.

Watson SpeechToText - 結果JSONからtranscriptだけ抜き出す（Mac）

Posted at 2017-09-18

コマンド

Watson STTから文章だけ抜き出すには、Terminalでこれを実行すればOK。

grep '"transcript"' speech_to_text_result.json \
 | sed -e 's/^[ ]*//g' \
 | sed -e 's/^"transcript": "//' \
 | sed -e 's/"$//' \
 | sed -e 's/D_[^ ]*//g' \
 | sed -e 's/ //g'

「speech_to_text_result.json」は、STT結果を保存したJSONファイル名を指定してください〜。
(※2017.09.18時点。JSON解析してるわけじゃないので、フォーマットが変わったら動かなくなるかも。)

本文

Watson Speech to Text（音声認識）サービスは、会話などの音声からテキストに文字起こししてくれるサービスです。

今回はここの記事を見て、mp3ファイルから音声認識させてみました。
https://www.ibm.com/think/jp-ja/watson/ai-transcription/

何を解決しているか

JSON形式で出力されるけど、テキストだけ抜き出したい（確信度の情報とか）
「D_アノー」などの相槌(?)も文字起こししてくれるので消したい
語の区切りにスペースが入ってくるので消したい

最後に

スミマセンが、ほぼ自分向けのメモです。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up