More than 5 years have passed since last update.

watson SpeechToText をpythonで実行 2019年10月秋

Last updated at 2019-10-12Posted at 2019-10-12

導入

watson の SpeechToText を python のライブラリを使って実行できます。ライブラリの使い方が分からなくて困った方の参考になれたら幸いです。
[Python]WatsonのSpeech To Textを使うお話という良記事があったのですが、以下の二つの点でアップデートが必要なことが分かりました。

SpeechToTextV1 のコンストラクタに与える情報を、username と password から、api_key に変える。
SpeechToTextV1 インスタンスの返り値がjsonではなくwatson独自クラスのインスタンスであるため、jsonだと思って操作するとエラーが起こる。それを意識してコードを書く。
また自分は長時間のmp3ファイルを扱った関係で、mp3を分割するコードも仕込んであります。

api_key の取得

以下のサイトを参考にAPI鍵を取得する。 https://blog.apar.jp/web/9036/

環境構築

Windows10, Anaconda で検証

$ conda create -n stt python=3.7 anaconda
$ conda activate stt
$ pip install watson_developer_cloud
(以下はmp3分割用ライブラリなので任意)
$ conda install -c conda-forge pydub

また、ファイル分割関係でエラーが起こった場合は、以下のサイトを参考にしてください。
https://algorithm.joho.info/programming/python/pydub-install/

音声データの分割

2時間の音声ファイルを15分刻みに分割して送っていたが、エラーになるデータがありました。10分刻みに変更したらエラーが出なくなりました。

コード

導入で書いたことは、主に関数 stt() と split_data() のコードが対応しています。

from watson_developer_cloud import SpeechToTextV1
from pydub import AudioSegment # 音声分割用ライブラリ
import math
import json
import glob
import traceback

def split_data(path):
    sound = AudioSegment.from_file(path, format='mp3')
    # 10分刻みで分割
    unit = 1000*60*10
    for i in range(math.ceil((len(sound)/unit))):
        print(i)
        sound_tmp = sound[i*unit:(i+1)*unit]
        sound_tmp.export(f"split_{i}.mp3", format='mp3')

def stt():
    api_key = 'your api key'
    cont_type = "audio/mp3"
    lang = "ja-JP_BroadbandModel"
    audio_files = glob.glob('split_*.mp3')
    for i, audio_file_name in enumerate(audio_files):
        try:
            # watson connection
            print(audio_file_name)
            audio_file = open(audio_file_name, "rb")
            # ここが変わったその1
            stt = SpeechToTextV1(iam_apikey=api_key)
            print('start')
            result_json = stt.recognize(audio=audio_file, content_type=cont_type, model=lang)
            print('end')
            # ここが変わったその2
            # result_json.result とするのがjsonを取り出すミソ
            # json file save
            result = json.dumps(result_json.result, indent=2)
            f = open(f"stt_{i}.json", "w")
            f.write(result)
            f.close()

            # print
            results = result_json.result["results"]
            for res in results:
                print(res["alternatives"][0]["transcript"], end='\n')

        except Exception as e:
            traceback.print_exc()

def json2strings(watson_json):
    docs = []
    for trans in watson_json['results']:
        doc = trans["alternatives"][0]["transcript"]
        print(doc)
        docs += [doc]
    return docs

def strings2md(docs):
    f = open('result.md', 'w', encoding='utf-8')
    for x in docs:
        f.write(str(x) + "\n")
    f.close()

def json2md():
    strings = []
    json_files = glob.glob('stt_*.json')
    for json_file in json_files:
        with open(json_file, 'r') as f:
            d = json.load(f)
            docs += json2strings(d)

    strings2md(docs)

if __name__=='__main__':
    split_data('sound_data.mp3')
    stt()
    json2md()

最後に

必要なところだけ選んで使ってください。
翻訳の精度は大体の意味は通じるレベルだと思います。
ドキュメントはあまり整備されていない印象を受けます。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up