More than 1 year has passed since last update.

[Google Cloud Speech-to-Text API(V2)] JavaでSpeech-to-Text APIを使ってみる

Last updated at 2024-05-19Posted at 2024-05-12

Google Cloud Speech-to-Text API(V2)をJavaで使ってみました。
今回の記事はAPIを利用するにあたり準備する必要があるものを記事にしました。

Cloud Speech-to-Text APIとは
Googleが提供する機械学習を活用して音声をテキストに変換するAPIです。

機械学習の深い知識がなくとも変換を可能にしてくれますが、
下記の基本的な機能(リソース)を利用することで変換の精度を向上させることができます。

用語	内容
認識ツール	再利用可能な認識構成です。頻繁にテキスト変換を使用する場合、構成ツールを利用することでリクエストを簡素し、料金の削減にも寄与できます。
フレーズセット(PhraseSet)	モデル適応機能を実現させるためのリソースの1つです。単語とフレーズの認識精度を向上させることができます。
カスタムクラス(CustomClass)	モデル適応機能を実現させるためのリソースの1つです。固有名称(店名や商品名など)の認識させることができます。

料金は、サービスによって正常に処理された音声の量が1秒単位で測定され、
月単位で料金が設定されます。
詳細は、⧉ Speech-to-Text の料金(公式)を参照ください。

No			目次
1			Cloud Speech-to-Text APIを使用するための準備
	1		プロジェクトの作成
	2		Cloud Speech-to-Text APIの設定
	3		APIを使用するためのキー取得
		1	サービスアカウント
		2	OAuth2.0 クライアントID
2			Cloud Speech-to-Text APIを使ってみる
	1		googleライブラリの読み込み
	2		SpeechClientインスタンスを取得
		1	サービスアカウントを利用する場合
		2	OAuth 2.0 クライアントIDを利用する場合
	3		プログラムの実行
3			いろいろとAPIを叩いてみる
	1		v2.projects.locations
	2		v2.projects.locations.config
	3		v2.projects.locations.customClasses
	4		v2.projects.locations.operations
	5		v2.projects.locations.phraseSets
	6		v2.projects.locations.recognizers

1. Cloud Speech-to-Text APIを使用するための準備

設定は下記から行います。
⧉Google クラウドプラットフォーム

やることは下記の3つです。
1. GCPプロジェクトの作成
 2. Cloud Speech-to-Text APIの有効化
 3. プログラムで使用するキーの取得

1.1. プロジェクトの作成

まずはGCPでプロジェクトを作成します。
詳細はこちらの記事を参照ください
⧉[Google Sheets API] Google Sheets API v4をJavaで操作する(1.1. プロジェクトの作成)

1.2. Cloud Speech-to-Text APIの設定

Cloud Speech-to-Text APIを使えるようにします。

No	手順
1	メニューの「APIとサービス」 -> 「ライブラリ」を選択します。
2	speech-toと入力して「Cloud Speech-to-Text API」を検索します。
3	「有効にする」ボタンを押して「Cloud Speech-to-Text API」を使用可能にします。

1.3. APIを使用するためのキー取得

プログラムからAPIを実行するための準備を行います。
サービスアカウント、OAuth2.0を利用する方法があります。

1.3.1. サービスアカウント

同じ内容を下記の記事に掲載してますので、下記の記事を参照ください。
⧉[Google Sheets API] Google Sheets API v4をJavaで操作する(1.3. Googleサービスアカウントの作成)

1.3.2. OAuth2.0 クライアントID

同じ内容を下記の記事に掲載してますので、下記の記事を参照ください。
⧉[Google Drive API v3] JavaでDrive APIを使う(1.3.2. OAuth2.0 クライアントID)

2. Cloud Speech-to-Text APIを使ってみる

2.1. googleライブラリの読み込み

Cloud Speech-to-Text APIを使用するためにライブラリのパスを設定します。
私の環境はpom.xmlで下記を指定しています。
Javaのバージョンは21を使用しています。

<dependencies>
    <dependency>
        <groupId>com.google.api-client</groupId>
        <artifactId>google-api-client</artifactId>
    </dependency>
    <dependency>
        <groupId>com.google.cloud</groupId>
        <artifactId>google-cloud-speech</artifactId>
    </dependency>
    <!-- OAuth2.0でAPIを使う場合は下記を追加してください。 -->
    <dependency>
        <groupId>com.google.oauth-client</groupId>
        <artifactId>google-oauth-client-jetty</artifactId>
    </dependency>
</dependencies>

<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>com.google.cloud</groupId>
            <artifactId>libraries-bom</artifactId>
            <version>26.38.0</version>
            <type>pom</type>
            <scope>import</scope>
        </dependency>
    </dependencies>
</dependencyManagement>

2.2. SpeechClientインスタンスを取得

2.2.1. サービスアカウントを利用する場合

コード中の「JSONファイルのパス」は、
⧉[Google Sheets API] Google Sheets API v4をJavaで操作する(1.3. Googleサービスアカウントの作成)
で取得したJSONファイルのパスを指定してください。

private static SpeechClient getSpeechClient() throws Exception{
    try(InputStream input = new FileInputStream("JSONファイルのパス")){
        GoogleCredentials credentials = GoogleCredentials
                .fromStream(input)
                .createScoped(Arrays.asList("スコープ"));
        
        SpeechSettings.Builder builder = SpeechSettings.newBuilder();
        builder.setCredentialsProvider(FixedCredentialsProvider.create(credentials));
        
        return SpeechClient.create(builder.build());
    }
}

2.2.2. OAuth 2.0 クライアントIDを利用する場合

コード中の「JSONファイルのパス」は、
⧉[Google Drive API v3] JavaでDrive APIを使う(1.3.2.1 認証情報の作成)
で取得したJSONファイルのパスを指定してください。

コード中の「認証情報を保存するフォルダパス」は、
認証情報が保存されるフォルダのパスを指定してください。
このコードを実行するとOAuth同意画面が表示されます。
ユーザーが同意をすると認証情報を保存するフォルダにファイルが生成されます。
このファイルが存在した場合は、再度実行してもOAuth同意画面は表示されません。

private static SpeechClient getSpeechClient() throws Exception{
    HttpTransport transport = GoogleNetHttpTransport.newTrustedTransport();
    GsonFactory jsonFactory = GsonFactory.getDefaultInstance();
    
    try(Reader reader = new InputStreamReader(new FileInputStream("JSONファイルのパス"))){
        GoogleClientSecrets clientSecrets = GoogleClientSecrets.load(jsonFactory,reader);
        
        GoogleAuthorizationCodeFlow flow = new GoogleAuthorizationCodeFlow.Builder(
                transport, jsonFactory, clientSecrets, 
                Arrays.asList("スコープ"))
            .setDataStoreFactory(new FileDataStoreFactory(new java.io.File("認証情報を保存するフォルダパス")))
            .build();
        
        LocalServerReceiver receiver = new LocalServerReceiver.Builder().build();
        Credential credential = new AuthorizationCodeInstalledApp(flow, receiver).authorize("user");
        
        UserCredentials credentials = UserCredentials.newBuilder()
                .setClientId(clientSecrets.getDetails().getClientId())
                .setClientSecret(clientSecrets.getDetails().getClientSecret())
                .setRefreshToken(credential.getRefreshToken())
                .build();
        SpeechSettings.Builder builder = SpeechSettings.newBuilder();
        builder.setCredentialsProvider(FixedCredentialsProvider.create(credentials));

        return SpeechClient.create(builder.build());
    }
}

2.3. プログラムの実行

プログラムを動かします。

サンプルでは短い音声をテキストに変換しています。

public static void main(String[] args) throws Exception{
    try(SpeechClient client = getSpeechClient()){
        byte[] data = Files.readAllBytes(Paths.get("音声ファイルのパス"));
        ByteString audioBytes = ByteString.copyFrom(data);
        
        RecognitionConfig.Builder config = RecognitionConfig.newBuilder();
        config.addLanguageCodes("ja-JP");
        config.setModel("short");
        AutoDetectDecodingConfig.Builder decoding = AutoDetectDecodingConfig.newBuilder();
        config.setAutoDecodingConfig(decoding.build());
        
        RecognizeRequest.Builder builder = RecognizeRequest.newBuilder();
        builder.setConfig(config.build());
        builder.setContent(audioBytes);
        builder.setRecognizer(String.format("projects/%s/locations/%s/recognizers/_", "プロジェクトID","ロケーションID"));
        
        RecognizeResponse response = client.recognize(builder.build());
        List<SpeechRecognitionResult> results = response.getResultsList();
        for (SpeechRecognitionResult result : results) {
            if (result.getAlternativesCount() > 0) {
                SpeechRecognitionAlternative alternative = result.getAlternativesList().get(0);
                System.out.printf("Transcription: %s%n", alternative.getTranscript());
            }
        }
    }
}

3. いろいろとAPIを叩いてみる

いろいろとAPIを叩いてみます。

3.1. v2.projects.locations

ロケーションに関する情報の取得

メソッド	概要	記事
get	ロケーション情報を取得	⧉[Google Cloud Speech-to-Text API(V2)] Javaでサポートされている場所(ロケーション)の情報を取得する
list	ロケーション情報のリストを取得	⧉[Google Cloud Speech-to-Text API(V2)] Javaでサポートされている場所(ロケーション)の情報を取得する

3.2. v2.projects.locations.config

コンフィグレーションに関する操作

メソッド	概要	記事
get	コンフィグレーションを取得	⧉[Google Cloud Speech-to-Text API(V2)] Javaでコンフィグレーションを取得する
update	コンフィグレーションを更新	⧉[Google Cloud Speech-to-Text API(V2)] Javaでコンフィグレーションを更新する

3.3. v2.projects.locations.customClasses

カスタムクラスに関する操作

メソッド	概要	記事
create	カスタムクラスを作成	⧉[Google Cloud Speech-to-Text API(V2)] Javaで認識精度を向上させるカスタムクラスを作成・更新・削除する
delete	カスタムクラスを削除	⧉[Google Cloud Speech-to-Text API(V2)] Javaで認識精度を向上させるカスタムクラスを作成・更新・削除する
get	カスタムクラスを取得	⧉[Google Cloud Speech-to-Text API(V2)] Javaで認識精度を向上させるカスタムクラスを取得する
list	カスタムクラスのリストを取得	⧉[Google Cloud Speech-to-Text API(V2)] Javaで認識精度を向上させるカスタムクラスを取得する
patch	カスタムクラスを更新	⧉[Google Cloud Speech-to-Text API(V2)] Javaで認識精度を向上させるカスタムクラスを作成・更新・削除する
undelete	カスタムクラスの削除を取り消し	⧉[Google Cloud Speech-to-Text API(V2)] Javaで認識精度を向上させるカスタムクラスの削除を取り消す

3.4. v2.projects.locations.operations

長時間実行オペレーションに関する操作

メソッド	概要	記事
cancel	長時間実行オペレーションのキャンセル	⧉[Google Cloud Speech-to-Text API(V2)] Javaで長時間実行オペレーションをキャンセル・削除する
delete	長時間実行オペレーションの削除	⧉[Google Cloud Speech-to-Text API(V2)] Javaで長時間実行オペレーションをキャンセル・削除する
get	長時間実行オペレーションの取得	⧉[Google Cloud Speech-to-Text API(V2)] Javaで長時間実行オペレーションを取得する
list	長時間実行オペレーションリストの取得	⧉[Google Cloud Speech-to-Text API(V2)] Javaで長時間実行オペレーションを取得する

3.5. v2.projects.locations.phraseSets

フレーズセットに関する操作

メソッド	概要	記事
create	フレーズセットの生成	⧉[Google Cloud Speech-to-Text API(V2)] Javaで認識精度を向上させるフレーズセットを作成・更新・削除する
delete	フレーズセットの削除	⧉[Google Cloud Speech-to-Text API(V2)] Javaで認識精度を向上させるフレーズセットを作成・更新・削除する
get	フレーズセットの取得	⧉[Google Cloud Speech-to-Text API(V2)] Javaで認識精度を向上させるフレーズセットを取得する
list	フレーズセットリストの取得	⧉[Google Cloud Speech-to-Text API(V2)] Javaで認識精度を向上させるフレーズセットを取得する
patch	フレーズセットの更新	⧉[Google Cloud Speech-to-Text API(V2)] Javaで認識精度を向上させるフレーズセットを作成・更新・削除する
undelete	フレーズセットの削除を取り消し	⧉[Google Cloud Speech-to-Text API(V2)] Javaで認識精度を向上させるフレーズセットの削除を取り消す

3.6. v2.projects.locations.recognizers

音声のテキスト変換と認識ツールに関する操作

メソッド	概要	記事
batchRecognize	非同期的に長い音声をテキストに変換	⧉[Google Cloud Speech-to-Text API(V2)] Javaで音声をテキストに変換する
create	認識ツールを生成	⧉[Google Cloud Speech-to-Text API(V2)] Javaで認識ツールを作成・更新・削除する
delete	認識ツールを削除	⧉[Google Cloud Speech-to-Text API(V2)] Javaで認識ツールを作成・更新・削除する
get	認識ツールを取得	⧉[Google Cloud Speech-to-Text API(V2)] Javaで認識ツールを取得する
list	認識ツールリストを取得	⧉[Google Cloud Speech-to-Text API(V2)] Javaで認識ツールを取得する
patch	認識ツールを更新	⧉[Google Cloud Speech-to-Text API(V2)] Javaで認識ツールを作成・更新・削除する
recognize	同期的に短い音声(1分まで)をテキストに変換	⧉[Google Cloud Speech-to-Text API(V2)] Javaで音声をテキストに変換する
undelete	認識ツールの削除を取り消し	⧉[Google Cloud Speech-to-Text API(V2)] Javaで認識ツールの削除を取り消す

おしまい。。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up