More than 5 years have passed since last update.

Watson Speech to Textをターミナルから実行してみた

Last updated at 2016-06-10Posted at 2016-06-10

この投稿はWatson Speech to Textをcurlを使って実行した記録です。

実行方法

Speech to TextをBluemix上に作成していない場合は以下から作成し、Service Credentialsにあるusernameとpasswordをメモします。(現在の料金体系では毎月1000分までの処理が無料)
https://console.ng.bluemix.net/catalog/speech-to-text

ドキュメントに従い、以下の様にcurlでPOSTすると文字起こしされた結果が返ってきます。

curl -X POST -u service-username:service-password --header "Content-Type: audio/wav" --header "Transfer-Encoding: chunked" --data-binary @out000.wav "https://stream.watsonplatform.net/speech-to-text/api/v1/recognize?continuous=true&model=ja-JP_BroadbandModel"

ハマったところ

m4a拡張子のファイルに対応していない

iPhoneのボイスメモで記録したm4a形式の音声をテストに使う場合は、wavなどの形式に変更します。

brew install faad2
faad sample.m4a -o sample.wav

curlのPOSTがタイムアウトする

音声ファイルのサイズが大きすぎるとPOSTが終わらず処理がエラーになったので、ファイルを小さいサイズに分割しました。実行時間も短くなるので試しやすくなります。

以下は60秒の音声ファイルに分割する場合の例です。

brew install ffmpeg
ffmpeg -i sample.wav -f segment -segment_time 60 -c copy out%03d.wav

英語で文字起こしされる

デフォルトは判定のモデルにen-US_BroadbandModelが使用されます。
現在、日本語に対しては以下の2つのモデルが用意されています。

ja-JP_BroadbandModel
ja-JP_NarrowbandModel

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up