[Google Cloud Text-to-Speech API(V1)] JavaでText-to-Speech APIを使ってみる

Last updated at 2024-05-20Posted at 2024-05-17

Google Cloud Text-to-Speech API(V1)をJavaで使ってみました。
今回の記事はAPIを利用するにあたり準備する必要があるものを記事にしました。

Cloud Text-to-Speech APIとは
テキストまたは音声合成マークアップ言語（SSML）の入力を
音声データ(MP3やLINEAR16)に変換するAPIです。

APIの料金は1ヶ月単位の無料枠を超えると料金が発生します。

種類	1ヶ月単位の無料枠	無料の使用量上限に到達した場合の料金
プレミアムな音声	0〜1,000,000byte	0.000016米ドル/byte
スタジオ音声	0～1,000,000byte	0.00016米ドル/byte
標準音声	0〜1,000,000文字	0.000004米ドル/文字

料金の詳細は⧉ Text-to-Speechの料金(公式)を参照ください。

No			目次
1			Cloud Text-to-Speech APIを使用するための準備
	1		プロジェクトの作成
	2		Cloud Text-to-Speech APIの設定
	3		APIを使用するためのキー取得
		1	サービスアカウント
		2	OAuth2.0 クライアントID
2			Cloud Text-to-Speech APIを使ってみる
	1		googleライブラリの読み込み
	2		TextToSpeechClientインスタンスを取得
		1	サービスアカウントを利用する場合
		2	OAuth 2.0 クライアントIDを利用する場合
	3		TextToSpeechLongAudioSynthesizeClientインスタンスを取得
		1	サービスアカウントを利用する場合
		2	OAuth 2.0 クライアントIDを利用する場合
	4		プログラムの実行
3			いろいろとAPIを叩いてみる
	1		v1.operations
	2		v1.projects.locations
	3		v1.projects.locations.operations
	4		v1.text
	5		v1.voices

1. Cloud Text-to-Speech APIを使用するための準備

設定は下記から行います。
⧉Google クラウドプラットフォーム

やることは下記の3つです。
1. GCPプロジェクトの作成
 2. Cloud Text-to-Speech APIの有効化
 3. プログラムで使用するキーの取得

1.1. プロジェクトの作成

まずはGCPでプロジェクトを作成します。
詳細はこちらの記事を参照ください
⧉[Google Sheets API] Google Sheets API v4をJavaで操作する(1.1. プロジェクトの作成)

1.2. Cloud Text-to-Speech APIの設定

Cloud Text-to-Speech APIを使えるようにします。

No	手順
1	メニューの「APIとサービス」 -> 「ライブラリ」を選択します。
2	text toと入力して「Cloud Text-to-Speech API」を検索します。
3	「有効にする」ボタンを押して「Cloud Text-to-Speech API」を使用可能にします。

1.3. APIを使用するためのキー取得

プログラムからAPIを実行するための準備を行います。
サービスアカウント、OAuth2.0を利用する方法があります。

1.3.1. サービスアカウント

同じ内容を下記の記事に掲載してますので、下記の記事を参照ください。
⧉[Google Sheets API] Google Sheets API v4をJavaで操作する(1.3. Googleサービスアカウントの作成)

1.3.2. OAuth2.0 クライアントID

同じ内容を下記の記事に掲載してますので、下記の記事を参照ください。
⧉[Google Drive API v3] JavaでDrive APIを使う(1.3.2. OAuth2.0 クライアントID)

2. Cloud Text-to-Speech APIを使ってみる

2.1. googleライブラリの読み込み

Cloud Text-to-Speech APIを使用するためにライブラリのパスを設定します。
私の環境はpom.xmlで下記を指定しています。
Javaのバージョンは21を使用しています。

<dependencies>
    <dependency>
        <groupId>com.google.api-client</groupId>
        <artifactId>google-api-client</artifactId>
    </dependency>
    <dependency>
        <groupId>com.google.cloud</groupId>
        <artifactId>google-cloud-texttospeech</artifactId>
    </dependency>
    <!-- OAuth2.0でAPIを使う場合は下記を追加してください。 -->
    <dependency>
        <groupId>com.google.oauth-client</groupId>
        <artifactId>google-oauth-client-jetty</artifactId>
    </dependency>
</dependencies>

<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>com.google.cloud</groupId>
            <artifactId>libraries-bom</artifactId>
            <version>26.38.0</version>
            <type>pom</type>
            <scope>import</scope>
        </dependency>
    </dependencies>
</dependencyManagement>

2.2. TextToSpeechClientインスタンスを取得

長時間実行オペレーション系のAPI以外を実行する際に使用するインスタンスです。

2.2.1. サービスアカウントを利用する場合

コード中の「JSONファイルのパス」は、
⧉[Google Sheets API] Google Sheets API v4をJavaで操作する(1.3. Googleサービスアカウントの作成)
で取得したJSONファイルのパスを指定してください。

private static TextToSpeechClient getTextToSpeechClient() throws Exception{
    try(InputStream input = new FileInputStream("JSONファイルのパス")){
        GoogleCredentials credentials = GoogleCredentials.fromStream(input));
        
        TextToSpeechSettings.Builder builder = TextToSpeechSettings.newBuilder();
        builder.setCredentialsProvider(FixedCredentialsProvider.create(credentials));
        
        return TextToSpeechClient.create(builder.build());
    }
}

2.2.2. OAuth 2.0 クライアントIDを利用する場合

コード中の「JSONファイルのパス」は、
⧉[Google Drive API v3] JavaでDrive APIを使う(1.3.2.1 認証情報の作成)
で取得したJSONファイルのパスを指定してください。

コード中の「認証情報を保存するフォルダパス」は、
認証情報が保存されるフォルダのパスを指定してください。
このコードを実行するとOAuth同意画面が表示されます。
ユーザーが同意をすると認証情報を保存するフォルダにファイルが生成されます。
このファイルが存在した場合は、再度実行してもOAuth同意画面は表示されません。

private static TextToSpeechClient getTextToSpeechClient() throws Exception{
    HttpTransport transport = GoogleNetHttpTransport.newTrustedTransport();
    GsonFactory jsonFactory = GsonFactory.getDefaultInstance();
    
    try(Reader reader = new InputStreamReader(new FileInputStream("JSONファイルのパス"))){
        GoogleClientSecrets clientSecrets = GoogleClientSecrets.load(jsonFactory,reader);
        
        GoogleAuthorizationCodeFlow flow = new GoogleAuthorizationCodeFlow.Builder(
                transport, jsonFactory, clientSecrets, 
                Arrays.asList("https://www.googleapis.com/auth/cloud-platform"))
            .setDataStoreFactory(new FileDataStoreFactory(new java.io.File("認証情報を保存するフォルダパス")))
            .build();
        
        LocalServerReceiver receiver = new LocalServerReceiver.Builder().build();
        Credential credential = new AuthorizationCodeInstalledApp(flow, receiver).authorize("user");
        
        UserCredentials credentials = UserCredentials.newBuilder()
                .setClientId(clientSecrets.getDetails().getClientId())
                .setClientSecret(clientSecrets.getDetails().getClientSecret())
                .setRefreshToken(credential.getRefreshToken())
                .build();
        TextToSpeechSettings.Builder builder = TextToSpeechSettings.newBuilder();
        builder.setCredentialsProvider(FixedCredentialsProvider.create(credentials));

        return TextToSpeechClient.create(builder.build());
    }
}

2.3. TextToSpeechLongAudioSynthesizeClientインスタンスを取得

長時間実行オペレーション系のAPIを実行する際に使用するインスタンスです。

2.3.1. サービスアカウントを利用する場合

private static TextToSpeechLongAudioSynthesizeClient getTextToSpeechLongAudioSynthesizeClient() throws Exception{
    try(InputStream input = new FileInputStream("JSONファイルのパス")){
        GoogleCredentials credentials = GoogleCredentials.fromStream(input);
        
        TextToSpeechLongAudioSynthesizeSettings.Builder builder = TextToSpeechLongAudioSynthesizeSettings.newBuilder();
        builder.setCredentialsProvider(FixedCredentialsProvider.create(credentials));
        
        return TextToSpeechLongAudioSynthesizeClient.create(builder.build());
    }
}

2.3.2. OAuth 2.0 クライアントIDを利用する場合

private static TextToSpeechLongAudioSynthesizeClient getTextToSpeechLongAudioSynthesizeClient2() throws Exception{
    HttpTransport transport = GoogleNetHttpTransport.newTrustedTransport();
    GsonFactory jsonFactory = GsonFactory.getDefaultInstance();
    
    try(Reader reader = new InputStreamReader(new FileInputStream("JSONファイルのパス"))){
        GoogleClientSecrets clientSecrets = GoogleClientSecrets.load(jsonFactory,reader);
        
        GoogleAuthorizationCodeFlow flow = new GoogleAuthorizationCodeFlow.Builder(
                transport, jsonFactory, clientSecrets, 
                Arrays.asList("https://www.googleapis.com/auth/cloud-platform"))
            .setDataStoreFactory(new FileDataStoreFactory(new java.io.File("認証情報を保存するフォルダパス")))
            .build();
        
        LocalServerReceiver receiver = new LocalServerReceiver.Builder().build();
        Credential credential = new AuthorizationCodeInstalledApp(flow, receiver).authorize("user");
        
        UserCredentials credentials = UserCredentials.newBuilder()
                .setClientId(clientSecrets.getDetails().getClientId())
                .setClientSecret(clientSecrets.getDetails().getClientSecret())
                .setRefreshToken(credential.getRefreshToken())
                .build();
        TextToSpeechLongAudioSynthesizeSettings.Builder builder = TextToSpeechLongAudioSynthesizeSettings.newBuilder();
        builder.setCredentialsProvider(FixedCredentialsProvider.create(credentials));

        return TextToSpeechLongAudioSynthesizeClient.create(builder.build());
    }
}

2.4. プログラムの実行

プログラムを動かします。

サンプルでは、「こんにちは」という文字列を音声ファイルに変換しています。

public static void main(String[] args) throws Exception{
    try(TextToSpeechClient client = getTextToSpeechClient()){
        SynthesizeSpeechRequest.Builder builder = SynthesizeSpeechRequest.newBuilder();
        
        SynthesisInput.Builder input = SynthesisInput.newBuilder();
        input.setText("こんにちは");
        builder.setInput(input.build());
        
        VoiceSelectionParams.Builder voice = VoiceSelectionParams.newBuilder();
        voice.setLanguageCode("ja");
        builder.setVoice(voice.build());
        
        AudioConfig.Builder audioConfig = AudioConfig.newBuilder();
        audioConfig.setAudioEncoding(AudioEncoding.MP3);
        builder.setAudioConfig(audioConfig.build());
        
        SynthesizeSpeechResponse response = client.synthesizeSpeech(builder.build());
        ByteString bytes = response.getAudioContent();
        try (OutputStream out = new FileOutputStream("output.mp3")) {
            out.write(bytes.toByteArray());
        }
    }
}

3. いろいろとAPIを叩いてみる

いろいろとAPIを叩いてみます。

3.1. v1.operations

メソッド	概要	記事
cancel	長時間実行操作で非同期キャンセルを開始	⧉[Google Cloud Text-to-Speech API(V1)] Javaで長時間実行オペレーションをキャンセル・削除する
delete	長時間実行されるオペレーションを削除	⧉[Google Cloud Text-to-Speech API(V1)] Javaで長時間実行オペレーションをキャンセル・削除する

3.2. v1.projects.locations

メソッド	概要	記事
synthesizeLongAudio	長い形式のテキストを非同期的に合成	⧉[Google Cloud Text-to-Speech API(V1)] Javaでテキストを音声に変換する

3.3. v1.projects.locations.operations

メソッド	概要	記事
get	長時間実行される操作の最新の状態を取得	⧉[Google Cloud Text-to-Speech API(V1)] Javaで長時間実行オペレーションを取得する
list	リクエスト内の指定されたフィルターに一致するオペレーションリストを取得	⧉[Google Cloud Text-to-Speech API(V1)] Javaで長時間実行オペレーションを取得する

3.4. v1.text

メソッド	概要	記事
synthesize	音声を同期的に合成	⧉[Google Cloud Text-to-Speech API(V1)] Javaでテキストを音声に変換する

3.5. v1.voices

メソッド	概要	記事
list	合成がサポートされている音声のリストを取得	⧉[Google Cloud Text-to-Speech API(V1)] Javaで音声の変換がサポートされている音声リストを取得する

おしまい。。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up