More than 3 years have passed since last update.

Google Cloud Speech API を使って話者分離を行い文字起こしする

Last updated at 2021-04-25Posted at 2021-01-27

はじめに

wavファイルから話者分離して文字起こしをする方法をまとめます．色々な記事で紹介して頂いていて参考にさせていただいていたのですが上手く行かないところが多かったのでこちらでまとめさせていただきます．

参照した記事

https://qiita.com/rxoxixyxd/items/ff0c44745db834123922
- 全体の流れについて
https://cloud.google.com/speech-to-text/docs/multiple-voices?hl=ja#speech_transcribe_diarization_gcs_beta-nodejs
- Googleのガイド
- 開発したソースコードはほんの少し変更したのみです．
- 「Google Cloud Storage バケットの使用」に書いてある「Node.js」のコードを使わせていただきました．
https://cloud.google.com/speech-to-text/docs/async-recognize?hl=ja
- 1分以上のwavファイルを文字起こしすることに関して
- 30分などの数十分単位の音声ファイルを文字起こしするので上記のリンクから少しだけ変更を加えて長時間の音声ファイルにも対応できるようにしました．
https://stackoverflow.com/questions/60433604/syntaxerror-await-is-only-valid-in-async-function-in-google-cloud-speech-to-tex
- SyntaxError: await is only valid in async functionというエラーに関して
https://cloud.google.com/speech-to-text/docs/transcription-model?hl=ja
- configのmodelに関して
- 音声文字変換モデルが4つほどあるようです．本当は'phone_call'モデルを使いたいところだったのですが'phone_call'と'video'は'ja-JP'対応していないようなので'default'か'command_and_search'で動かしました．

手順

次のコマンドでライブラリをインストール

npm install --save @google-cloud/speech

exportでjsonファイルをexportする際に次のコマンドが必要
- これ毎回やらないといけないかも知れません。

export GOOGLE_APPLICATION_CREDENTIALS="[JSONファイルの絶対パス]"

JSONファイルの絶対パスがわからないときは次のコマンドを実行
- filename.jsonにはJSONファイルのファイル名を記載
- たぶん結果は/home/アカウント名/jsonファイル名とかになると思います。

find / -name filename -type f

次のコマンドを実行

nano diarization.js

diarization.jsの中に次のコードを書き込む

diarization.js

// Imports the Google Cloud client library
const speech = require('@google-cloud/speech').v1p1beta1;
// Creates a client
const client = new speech.SpeechClient();
/**
 * TODO(developer): Uncomment the following line before running the sample.
 */
// const uri = path to GCS audio file e.g. `gs:/bucket/audio.wav`;
const config = {
  encoding: 'LINEAR16',
  sampleRateHertz: 44100,
  languageCode: 'ja-JP',
  enableSpeakerDiarization: true,
  diarizationSpeakerCount: 3,
  model: 'command_and_search',  //ここが'phone_call'とかに変えたいと箇所
};

const audio = {
  uri: 'gs://bucket/filename.wav',  //ここはそれぞれの動作環境で変えてください
};

const request = {
  config: config,
  audio: audio,
};

async function main() {
  const [operation] = await client.longRunningRecognize(request);
  const [response] = await operation.promise();
  // const [response] = await client.recognize(request);
  const transcription = response.results
    .map(result => result.alternatives[0].transcript)
    .join('\n');
  console.log(`Transcription: ${transcription}`);
  console.log('Speaker Diarization:');
  const result = response.results[response.results.length - 1];
  const wordsInfo = result.alternatives[0].words;
  // Note: The transcript within each result is separate and sequential per result.
  // However, the words list within an alternative includes all the words
  // from all the results thus far. Thus, to get all the words with speaker
  // tags, you only have to take the words list from the last result:
  wordsInfo.forEach(a =>
    console.log(` word: ${a.word}, speakerTag: ${a.speakerTag}`)
  );
}
main()

wavファイルのプロパティ->詳細からビットレートを確認
こちらのサイトを参考にしてビット数とsample rate hertzを確認
こちらのサイトを通じてchannelをmonoに変更
- ビットはプロパティで確認したビット数
- 周波数もプロパティで確認したsample rate hertz
- チャンネルはモノラル(1チャンネル)
バケットを作成
音声ファイルをアップロート
バケットの権限を編集->メンバーを追加
- サービスアカウント名([]@[].iam.gserviceaccount.comという名前のやつ)
- ロールはCloud Storageレガシー->Storageレガシーバケットオーナー
音声ファイル->権限を編集->エントリを追加
- エンティティはUser
- 名前はサービスアカウント名([]@[].iam.gserviceaccount.comという名前のやつ)
- アクセスはOwner
次のコマンドを実行

node diarization.js

terminal上に書き起こされた文字列たちが表示されると思います．

他にやってみたけど諦めたこと

Java

諦めた原因

Node.jsじゃなくて最初はJavaでやってみようとしました．しかし次のエラーが出て断念しました．

error: package com.google.cloud.speech.v1 does not exist

どこまでやって諦めたか

pom.xmlというファイルの中身を書き換えると良いようなのでやってみたのですが結局上手くできませんでした．もしJavaでやろうとしてお考えたの方がいらっしゃいましたらすみませんが私の記事は役に立たないかなと思います．すみません泣

手順

次のコマンドでpom.xmlというファイルの絶対パスを取得

find / -name pom.xml -type f

「find: ~」から始まって「: Permission denied」で終わる文章が何行も出てきて最後の行に次の文章が得られました．

/google/devshell/editor/language_servers/java/scripts/pom.xml

次のコマンドで該当の階層まで移ってpom.xmlを編集する

cd /google/devshell/editor/language_servers/java/scripts

nano pom.xml

で囲われている直下にタブたちを追記しました．
- こちらを参考を参考にしました．
同じエラーが出た．

IBM Cloud

IBMさんもクラウドを提供してくださっていたので利用させていただきました．個人的な感想ですがUIがめちゃめちゃ整理されていたのでとんでもなく使いやすかったです．
私が使ったwavファイルの音声がぼそぼそしたもので、ちゃんとしたマイクなどを使って音声を拾っていたわけではないので、精度はあまり出ませんでした泣
https://speech-to-text-demo.ng.bluemix.net/

おわりに

2020年の8月頃とかにやったことをまとめてみました．今と変わっている所もあるかも知れません．変わっていたら本当に申し訳ありません泣
音声書き起こしなどをやっていらっしゃる方がいたらどうぞ使っていってください．

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up