More than 5 years have passed since last update.

Node.jsでGoogle Cloud Speech-to-Text API用のストリームデータを作成する

Last updated at 2020-01-16Posted at 2019-01-27

2019/01/27時点においては、モジュール内部の設定を変更しないと動かないというやっかいな状態だったのでメモ
APIに送る部分をファイルストリームにすればファイル保存もできます

1. SoXのバイナリにパスを通す

録音をするために必要なプログラムです。
上記リンクからバイナリをダウンロードして、インストールディレクトリにパスを通します
以下のようにバージョンが表示されれば大丈夫です

> sox --version
sox:      SoX v14.4.2

2. モジュールのインストール

SoXをNode.jsから使うためのモジュール　node-record-lpcm16と
Cloud Speech-to-Text APIを使うためのモジュール　@google-cloud/speechを
npmでインストールします

> npm i node-record-lpcm16
> npm i @google-cloud/speech

3. node-record-lpcm16を修正する

node-record-lpcm16 v0.3.1では設定値に問題があるようなので、node-record-lpcm16/index.jsを直接修正します

27行目からのcase文を以下のように変更します

case 'sox':
      var cmd = 'sox';
      var cmdArgs = [
        '-q',                     // show no progress
        '-t', 'waveaudio',        // audio type
        '-d',                     // use default recording device
        '-r', options.sampleRate, // sample rate
        '-c', options.channels,   // channels
        '-e', 'signed-integer',   // sample encoding
        '-b', '16',               // precision (bits)
        '-t', 'wav', // この行を追加する
        '-',                      // pipe
        // end on silence
        'silence', '1', '0.1', options.thresholdStart || options.threshold + '%',
        '1', options.silence, options.thresholdEnd || options.threshold + '%'
      ];
      break

4. サンプルを実行する

以下のサンプルを実行します

// Imports the Google Cloud client library
const speech = require('@google-cloud/speech');
const record = require('node-record-lpcm16');

// Creates a client
const client = new speech.SpeechClient({
    projectId: '自分のプロジェクトID',
    keyFilename: 'サービスアカウントキーのjsonパス',
});

const encoding = 'LINEAR16';
const sampleRateHertz = 16000;
const languageCode = 'ja-JP';

const request = {
    config: {
        encoding: encoding,
        sampleRateHertz: sampleRateHertz,
        languageCode: languageCode,
    },
    interimResults: false, // If you want interim results, set this to true
};

// Create a recognize stream
const recognizeStream = client
    .streamingRecognize(request)
    .on('error', console.error)
    .on('data', data =>
        process.stdout.write(
            data.results[0] && data.results[0].alternatives[0]
                ? `Transcription: ${data.results[0].alternatives[0].transcript}\n`
                : `\n\nReached transcription time limit, press Ctrl+C\n`
        )
    );

// Start recording and send the microphone input to the Speech API
record
    .start({
        sampleRateHertz: sampleRateHertz,
        // Other options, see https://www.npmjs.com/package/node-record-lpcm16#options
        verbose: false,
        recordProgram: 'sox', // Try also "arecord" or "sox"
        silence: '0.5',
    })
    .on('error', console.error)
    .pipe(recognizeStream);


console.log('Listening, press Ctrl+C to stop.');

実行結果例

Listening, press Ctrl+C to stop.
Transcription: 音声認識のテストです

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up