LINEのボイスメッセージを文字起こしするBotの開発② #Node.js

前回

この投稿は、以下のgifのようにLINEのボイスメッセージを文字起こしするBotの開発を行います。前回の投稿では、LINE developerの登録、webhookの設定、簡単なbotのserverを立ち上げました。

今回はサーバ上での音声バッファの変換の実装、Google Cloud Speech APIの設定の仕方について、書きます。

音声の読み込み

前回の記事のserver.jsの続きを書きます。
前回のコード部分
以下を追加することで、botが受け取ったボイスメッセージの音声bufferを読み込めます。
debugとして、audio data bufferをlogとして出力しています。

server.js

function handleEvent(event) {

    // if message is audio type
    if (event.message.type === 'audio') {
        console.log("this is voice messege");
        const audioData = fetchAudioMessage(
            event.message.id
        ).then(function (audioData) {
            console.log("audio data buffer");
            console.log(audioData);
        });   
    }
}

function fetchAudioMessage(messageId) {
    console.log('[START]getVoiceMessage');
    return new Promise((resolve, reject) => {
      lineClient.getMessageContent(messageId).then((stream) => {
        const content = [];
        stream
            .on('data', (chunk) => {
              console.log(chunk);
              content.push(new Buffer(chunk));
            })
            .on('error', (err) => {
              reject(err);
            })
            .on('end', function() {
              console.log(content);
              console.log('[END  ]getVoiceMessage');
              resolve(Buffer.concat(content));
            });
        });
    });
}

音声ファイル形式の変換

上記でline botが受け取った音声ファイルをbufferとして読み込むことができました。
それをそのままgoogle speech apiに投げることができたら簡単なのですが、面倒くさいことにspeech apiはm4a形式をサポートしていません。
そのため、m4a形式のbufferをwav形式に変換します。今回は、ffmpegのnode用のラッパーライブラリnode-fdkaacを使用しました。
次のコマンドを実行して、node-fdkaacを導入してください。

npm install --save node-fdkaac

またnode-fdkaacを使用するためにはffmpegも必要なので、ffmpegをインストールします。
Debianをベースとしたosであれば、次を実行してください。

sudo apt install ffmpeg

convert関数をsever.jsに加えます。これは、m4a形式(fdkaac, bitrate 192)から、wave形式のbufferに変換しています。またgoogle speech apiの入力は、base64なので、この関数のPromisのコールバックは、bufferをbase64で変換したものになっています。

server.js

function convert(audioData) {
    return new Promise((resolve, reject) => {
        const decoder = new Fdkaac({
            "output": "buffer",
            "bitrate": 192
        }).setBuffer(audioData);

        decoder.decode()
            .then(() => {
                // Encoding finished
                console.log("Encoding finished");
                const buffer = decoder.getBuffer();
                const audioBytes = buffer.toString('base64');
                resolve(audioBytes);
            })
            .catch((error) => {
                // Something went wrong
                console.log("decode error: ", error);
            });
    });
}

Google Speech API

次にgoogle speech apiを設定を行います。
google cloudの設定は細かいので、次のGoogle Cloud SDK のドキュメントにお任せします。
次のドキュメントに従い、GCP Console プロジェクトをセットアップします。
https://cloud.google.com/sdk/docs/

ドキュメントに載っていないですが、GCP Console画面でGoogle Speech APIを有効にしてください。有効の仕方は次の記事で解説してあります。

GCP Consoleからダウンロードしたサービスアカウントキーが含まれる JSON ファイルをサーバの好きなところに置きます。
そして、環境変数 GOOGLE_APPLICATION_CREDENTIALS をのファイルパスに設定します。

export GOOGLE_APPLICATION_CREDENTIALS="/home/user/project-xxxxxxxxxx.json"

先程のserver.jsにspeech apiのコードを追加します。

server.js

const speech = require('@google-cloud/speech');
const speechClient = new speech.SpeechClient();

function describe(audioBytes) {
    return new Promise((resolve, reject) => {
        // The audio file's encoding, sample rate in hertz, and BCP-47 language code
        const audio = {
            content: audioBytes,
        };
        const config = {
            encoding: 'LINEAR16',
            sampleRateHertz: 16000,
            languageCode: 'ja-JP',
        };
        const request = {
            audio: audio,
            config: config,
        };

        speechClient
            .recognize(request)
            .then(data => {
                const response = data[0];
                const transcription = response.results
                .map(result => result.alternatives[0].transcript)
                .join('\n');
                console.log(`Transcription: ${transcription}`);
                resolve(transcription);
            })
            .catch(err => {
                console.error('ERROR:', err);
            });
    });
}

また先程実装したhandleEvent(event)関数で、convert関数とdescribe関数を呼ぶように変更します。またdescribe関数からのコールバックを受け取ったときに、line botが文字起こしを行ったテキストを送信するように書き加えます。

server.js


function handleEvent(event) {
    // if message is audio type
    if (event.message.type === 'audio') {
        console.log("this is voice messege");
        const audioData = fetchAudioMessage(
            event.message.id
        ).then(function (audioData) {           
            convert(audioData)
                .then(function (audioBytes) {
                    return describe(audioBytes);
                }).then(function (text) {
                    lineClient.replyMessage(event.replyToken, {
                        type: 'text',
                        text: text
                    });
                });
        });   
    }
}

これで一通りの、ボイスメッセージを送ったときに、バッファ形式を変換し、google speech apiを通して文字起こしを行ったテキストをbotの返信として送るまでが出来上がりました。
以下を実行し、botを正常に動くことを確認してください。
node server.js

まとめ

LINE Messeging APIを用いて、LINE botの基本的な実装を学びました。特にボイスメッセージに特化させ、受けとったm4a形式のボイスメッセージのbufferをwav形式に変換する機能を実装しました。文字起こしには、google speech apiを用いて、wav形式のbufferを投げたときに文字起こしされたテキストの取得を行いました。
最終的には、ボイスメッセージの文字起こしを行ったものをbotの返事として、返すことができるようになりました。

私が実装したときは、line botのボイスメッセージを扱う際に参考になる記事があまりなかったので、この記事が皆さんがボイスメッセージを扱うbotを作る上で参考になれば嬉しいです。