More than 1 year has passed since last update.

AWS CloudFormationを使用し、API GatewayとLambdaでTTS(Text To Speech) APIを作成する

Posted at 2023-04-04

この記事では、CloudFormationテンプレートを使用して、Amazon API GatewayとAWS Lambdaを作成する方法について説明します。このテンプレートでは、クエリ文字列で指定されたテキストをAmazon Pollyを使って音声に変換し、mp3形式で出力するAPIを作成します。

テンプレートの概要

このテンプレートでは、以下のAWSリソースを作成します。

IAM Role
Lambda Function
API Gateway REST API

1. IAM Role

IAMロールは、Lambda関数がAmazon Pollyサービスを使用できるようにするために必要です。次の記述でIAMロールを作成し、polly:SynthesizeSpeech アクションを許可するポリシーをアタッチします。

  LambdaExecutionRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
        - Effect: Allow
          Principal:
            Service:
            - lambda.amazonaws.com
          Action:
          - sts:AssumeRole
      Policies:
      - PolicyName: TextToSpeechLambdaPolicy
        PolicyDocument:
          Version: '2012-10-17'
          Statement:
          - Effect: Allow
            Action:
            - logs:CreateLogGroup
            - logs:CreateLogStream
            - logs:PutLogEvents
            Resource: '*'
          - Effect: Allow
            Action:
            - polly:SynthesizeSpeech
            Resource: '*'

2. Lambda Function

Lambda関数は、API Gatewayからのリクエストを処理し、Amazon Pollyを使って音声に変換します。次の記述でLambda関数を作成し、IAMロールを関連付けます。

  TextToSpeechLambda:
    Type: AWS::Lambda::Function
    Properties:
      Handler: index.handler
      Runtime: nodejs14.x
      Code:
        ZipFile: |
          const AWS = require('aws-sdk');
          const polly = new AWS.Polly();
          const { Readable } = require('stream');

          exports.handler = async (event) => {
              const text = event.queryStringParameters.text;
              const languageCode = event.queryStringParameters.languageCode || 'en-US';

              const params = {
                  OutputFormat: 'mp3',
                  Text: text,
                  VoiceId: languageCode.includes('en') ? 'Joanna' : 'Mizuki',
                  LanguageCode: languageCode,
                  TextType: 'text'
              };

              try {
                  const response = await polly.synthesizeSpeech(params).promise();
                  const buffer = response.AudioStream;

                  const base64Encoded = buffer.toString('base64');

                  return {
                      statusCode: 200,
                      headers: {
                          'Content-Type': 'audio/mpeg',
                          'Content-Disposition': 'attachment; filename="output.mp3"'
                      },
                      isBase64Encoded: true,
                      body: base64Encoded
                  };
              } catch (error) {
                  console.error(error);
                  return {
                      statusCode: 500,
                      body: JSON.stringify({
                          message: 'An error occurred while synthesizing speech'
                      })
                  };
              }
          };
      Role: !GetAtt 'LambdaExecutionRole.Arn'

3. API Gateway REST API

API Gatewayは、クライアントからのリクエストをLambda関数に転送し、レスポンスを返します。次の記述でAPI Gateway REST API、リソース、メソッド、ステージを作成します。

  TextToSpeechApi:
    Type: AWS::ApiGateway::RestApi
    Properties:
      Name: TextToSpeechApi
      Description: An API to convert text to speech using Amazon Polly
      Body:
        swagger: '2.0'
        info:
          title: TextToSpeechApi
        basePath: /v1
        schemes:
          - https
        x-amazon-apigateway-binary-media-types:
          - '*/*'
        paths:
          /synthesize:
            get:
              responses: {}
              x-amazon-apigateway-integration:
                uri:
                  Fn::Sub: arn:aws:apigateway:${AWS::Region}:lambda:path/2015-03-31/functions/${TextToSpeechLambda.Arn}/invocations
                httpMethod: POST
                type: aws_proxy

  TextToSpeechResource:
    Type: AWS::ApiGateway::Resource
    Properties:
      RestApiId: !Ref 'TextToSpeechApi'
      ParentId: !GetAtt 'TextToSpeechApi.RootResourceId'
      PathPart: synthesize

  TextToSpeechMethod:
    Type: AWS::ApiGateway::Method
    Properties:
      RestApiId: !Ref 'TextToSpeechApi'
      ResourceId: !Ref 'TextToSpeechResource'
      HttpMethod: GET
      AuthorizationType: NONE
      Integration:
        Type: AWS_PROXY
        IntegrationHttpMethod: POST
        Uri: !Sub 'arn:aws:apigateway:${AWS::Region}:lambda:path/2015-03-31/functions/${TextToSpeechLambda.Arn}/invocations'
        PassthroughBehavior: WHEN_NO_MATCH
        TimeoutInMillis: 29000
      MethodResponses:
      - StatusCode: 200

  LambdaPermission:
    Type: AWS::Lambda::Permission
    Properties:
      FunctionName: !Ref 'TextToSpeechLambda'
      Action: lambda:InvokeFunction
      Principal: apigateway.amazonaws.com
      SourceArn: !Sub 'arn:aws:execute-api:${AWS::Region}:${AWS::AccountId}:${TextToSpeechApi}/*/*/*'

  TextToSpeechApiDeployment:
    Type: AWS::ApiGateway::Deployment
    DependsOn: TextToSpeechMethod
    Properties:
      RestApiId: !Ref 'TextToSpeechApi'
      Description: 'Deployment for the TextToSpeech API'
      StageDescription:
        CacheClusterEnabled: false
        CacheClusterSize: '0.5'
        Variables:
          binaryMediaTypes: '*/*'

  TextToSpeechApiStage:
    Type: AWS::ApiGateway::Stage
    Properties:
      StageName: v1
      RestApiId: !Ref 'TextToSpeechApi'
      DeploymentId: !Ref 'TextToSpeechApiDeployment'
      Description: 'v1 stage for the TextToSpeech API'

CloudFormation テンプレートの最終形態

以下は今回作成したテンプレートの全文です。ファイル名は text_to_speech.yaml とします。

AWSTemplateFormatVersion: '2010-09-09'
Resources:
  TextToSpeechLambda:
    Type: AWS::Lambda::Function
    Properties:
      Handler: index.handler
      Runtime: nodejs14.x
      Code:
        ZipFile: |
          const AWS = require('aws-sdk');
          const polly = new AWS.Polly();
          const { Readable } = require('stream');

          exports.handler = async (event) => {
              const text = event.queryStringParameters.text;
              const languageCode = event.queryStringParameters.languageCode || 'en-US';

              const params = {
                  OutputFormat: 'mp3',
                  Text: text,
                  VoiceId: languageCode.includes('en') ? 'Joanna' : 'Mizuki',
                  LanguageCode: languageCode,
                  TextType: 'text'
              };

              try {
                  const response = await polly.synthesizeSpeech(params).promise();
                  const buffer = response.AudioStream;

                  const base64Encoded = buffer.toString('base64');

                  return {
                      statusCode: 200,
                      headers: {
                          'Content-Type': 'audio/mpeg',
                          'Content-Disposition': 'attachment; filename="output.mp3"'
                      },
                      isBase64Encoded: true,
                      body: base64Encoded
                  };
              } catch (error) {
                  console.error(error);
                  return {
                      statusCode: 500,
                      body: JSON.stringify({
                          message: 'An error occurred while synthesizing speech'
                      })
                  };
              }
          };
      Role: !GetAtt 'LambdaExecutionRole.Arn'

  LambdaExecutionRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
        - Effect: Allow
          Principal:
            Service:
            - lambda.amazonaws.com
          Action:
          - sts:AssumeRole
      Policies:
      - PolicyName: TextToSpeechLambdaPolicy
        PolicyDocument:
          Version: '2012-10-17'
          Statement:
          - Effect: Allow
            Action:
            - logs:CreateLogGroup
            - logs:CreateLogStream
            - logs:PutLogEvents
            Resource: '*'
          - Effect: Allow
            Action:
            - polly:SynthesizeSpeech
            Resource: '*'

  TextToSpeechApi:
    Type: AWS::ApiGateway::RestApi
    Properties:
      Name: TextToSpeechApi
      Description: An API to convert text to speech using Amazon Polly
      Body:
        swagger: '2.0'
        info:
          title: TextToSpeechApi
        basePath: /v1
        schemes:
          - https
        x-amazon-apigateway-binary-media-types:
          - '*/*'
        paths:
          /synthesize:
            get:
              responses: {}
              x-amazon-apigateway-integration:
                uri:
                  Fn::Sub: arn:aws:apigateway:${AWS::Region}:lambda:path/2015-03-31/functions/${TextToSpeechLambda.Arn}/invocations
                httpMethod: POST
                type: aws_proxy

  TextToSpeechResource:
    Type: AWS::ApiGateway::Resource
    Properties:
      RestApiId: !Ref 'TextToSpeechApi'
      ParentId: !GetAtt 'TextToSpeechApi.RootResourceId'
      PathPart: synthesize

  TextToSpeechMethod:
    Type: AWS::ApiGateway::Method
    Properties:
      RestApiId: !Ref 'TextToSpeechApi'
      ResourceId: !Ref 'TextToSpeechResource'
      HttpMethod: GET
      AuthorizationType: NONE
      Integration:
        Type: AWS_PROXY
        IntegrationHttpMethod: POST
        Uri: !Sub 'arn:aws:apigateway:${AWS::Region}:lambda:path/2015-03-31/functions/${TextToSpeechLambda.Arn}/invocations'
        PassthroughBehavior: WHEN_NO_MATCH
        TimeoutInMillis: 29000
      MethodResponses:
      - StatusCode: 200

  LambdaPermission:
    Type: AWS::Lambda::Permission
    Properties:
      FunctionName: !Ref 'TextToSpeechLambda'
      Action: lambda:InvokeFunction
      Principal: apigateway.amazonaws.com
      SourceArn: !Sub 'arn:aws:execute-api:${AWS::Region}:${AWS::AccountId}:${TextToSpeechApi}/*/*/*'

  TextToSpeechApiDeployment:
    Type: AWS::ApiGateway::Deployment
    DependsOn: TextToSpeechMethod
    Properties:
      RestApiId: !Ref 'TextToSpeechApi'
      Description: 'Deployment for the TextToSpeech API'
      StageDescription:
        CacheClusterEnabled: false
        CacheClusterSize: '0.5'
        Variables:
          binaryMediaTypes: '*/*'

  TextToSpeechApiStage:
    Type: AWS::ApiGateway::Stage
    Properties:
      StageName: v1
      RestApiId: !Ref 'TextToSpeechApi'
      DeploymentId: !Ref 'TextToSpeechApiDeployment'
      Description: 'v1 stage for the TextToSpeech API'

Outputs:
  ApiUrl:
    Description: The URL of the TextToSpeech API
    Value: !Sub 'https://${TextToSpeechApi}.execute-api.${AWS::Region}.amazonaws.com/v1'

スタックの作成

以下のAWS CLIコマンドを使用して、このCloudFormationテンプレートを使用してスタックを作成できます。コマンドは、テンプレートファイルのパスとスタック名を指定する必要があります。

aws cloudformation create-stack --stack-name TextToSpeechStack --template-body file://text_to_speech.yaml --capabilities CAPABILITY_NAMED_IAM

このコマンドでは、--stack-nameオプションでスタック名を指定し、--template-bodyオプションでテンプレートファイルのパスを指定しています。また、--capabilitiesオプションでCAPABILITY_NAMED_IAMを指定することで、IAMロールを作成する許可を与えています。

スタックが正常に作成されると、AWS CLIからの出力が表示されます。スタックの作成が完了するまで数分かかることがあります。

作成したエンドポイントにアクセス

以下のようなURLにアクセスすることでテキストから作成した音声ファイルをダウンロード可能です。

英語

https://<API Gatewayのエンドポイント>/v1/synthesize?text=This+is+a+test.

日本語

https://<API Gatewayのエンドポイント>/v1/synthesize?text=これはテストです。&languageCode=ja-JP

まとめ

この記事では、CloudFormationテンプレートを使用して、Amazon API GatewayとAWS Lambdaを作成する方法を説明しました。このテンプレートは、クエリ文字列で指定されたテキストをAmazon Pollyを使って音声に変換し、mp3形式で出力するAPIを作成します。また、AWS CLIコマンドを使用して、簡単にスタックを作成できます。

AWS CloudFormationを使用することで、リソースの作成と管理が容易になり、インフラストラクチャの変更が追跡可能になります。また、API GatewayとLambdaの統合により、サーバーレスアーキテクチャを実現し、スケーラビリティと柔軟性が向上します。

みなさんも、ぜひ自分の手で実際に試してみてくださいね。きっと楽しいはずです。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up