More than 5 years have passed since last update.

kintone gets lifelike speech from text via Amazon Polly

Last updated at 2016-12-04Posted at 2016-12-01

Amazon Polly

Yesterday, a new service that turns text into lifelike speech, “Amazon Polly” is announced at AWS re:Invent 2016. Amazon Polly lets us create applications that talk, enabling us to build entirely new categories of speech-enabled products. (press release, official blog, official document for developers).

Simple use case of kintone & Amazon Polly

In this case, I try a simple use case that if you input text you want to convert to voice, you'll able to obtain the voice file as MP3 style and to play on kintone.

Architecture

Here is that architecture.

Record view of kintone

Configuration of kintone & Amazon Polly

kintone app.

Create a kintone application

create a kintone application consisting of forms like this.

Field labels & code	Summary	Field types
Text	text field you want to convert to speech	SINGLE_LINE_TEXT
Attachment	file field to attach mp3-voice-file converted from text	FILE

(AWS) IAM

We create new role that can access from Lambda to S3, Rekognition and CloudWatch.

Create New Role

click "Create New Role".

input "lambda_polly_exec_role", and click "Next Step"

click "select" in the row of "AWS Lambda" and click "Next Step"

check "AmazonPollyFullAccess" and "CloudWatchFullAccess". and click "Next Step".

click "Create Role"

AWS Lambda

We create the Lambda function that access to kintone and Polly.
First, we create working directory, workspace like this.

$ mkdir workspace
$ cd workspace

Create Lambda function

We have to replace {kintone domain}, {app. id}, {api token} with your parameters (see also, kintone REST API).

lambda_function.py



from __future__ import print_function

import json
import boto3
from botocore.exceptions import BotoCoreError, ClientError
from contextlib import closing
import os
import sys
import subprocess
from tempfile import gettempdir
import requests, urllib2

# parameters of kintone
KINTONE_DOMAIN = "{kintone domain}"
BASE_URL = "https://"+ KINTONE_DOMAIN
APP_ID = "{app. id}"
API_TOKEN = "{api token}"

# parameters of AWS
REGION = 'us-east-1'

# file name attaced to kintone
MP3_FILE_NAME = 'speech.mp3'

print('Loading function')

# get a kintone record
def getRecord(id):
    query = 'app='+APP_ID+'&id='+id
    url = BASE_URL + "/k/v1/record.json?" + query
    headers = {"X-Cybozu-API-Token": API_TOKEN}
    # http request with requests
    res = requests.get(url, headers=headers, data={})
    print(res.text, res.status_code)
    return {"res":json.loads(res.text), "code":res.status_code}

# upload the MP3 file to kintone
def uploadMP3File(file):
    fileUrl = BASE_URL + "/k/v1/file.json"
    fileHeaders = {"X-Cybozu-API-Token": API_TOKEN}
    files = {'file': (MP3_FILE_NAME, open(file, 'rb'), 'audio/mp3', {'Expires': '0'})}
    res = requests.post(fileUrl, headers=fileHeaders, files=files)
    print(res.text, res.status_code)
    fileKey = json.loads(res.text)['fileKey']
    return fileKey

# update the kintone record by a fileKey
def updateRecordForFile(recordId, fileKey):
    fileKeys = [{"fileKey":fileKey}]
    record = {'Attachment':{'value':fileKeys}}
    request = {'app':APP_ID,'id': recordId,'record':record}
    requestJson = json.dumps(request)
    url = BASE_URL + "/k/v1/record.json"
    headers = {"X-Cybozu-API-Token": API_TOKEN, "Content-Type" : "application/json"}
    res = requests.put(url, headers=headers, data=requestJson)
    print(res.text, res.status_code)
    return {"res":json.loads(res.text), "code":res.status_code}

# build the MP3 file from text with Amazon Polly API
def buildMP3File(text):
    # Create a client using the credentials and region defined in the [adminuser]
    polly = boto3.client('polly')

    try:
        # Request speech synthesis
        response = polly.synthesize_speech(Text=text, OutputFormat="mp3",
                                            VoiceId="Joey")
    except (BotoCoreError, ClientError) as error:
        # The service returned an error, exit gracefully
        print(error)
        sys.exit(-1)

    # Access the audio stream from the response
    if "AudioStream" in response:
        # Note: Closing the stream is important as the service throttles on the
        # number of parallel connections. Here we are using contextlib.closing to
        # ensure the close method of the stream object will be called automatically
        # at the end of the with statement's scope.
        with closing(response["AudioStream"]) as stream:
            output = os.path.join(gettempdir(), MP3_FILE_NAME)
            try:
                # Open a file for writing the output as a binary stream
                with open(output, "wb") as file:
                    file.write(stream.read())
            except IOError as error:
                # Could not write to file, exit gracefully
                print(error)
                sys.exit(-1)

    else:
        # The response didn't contain audio data, exit gracefully
        print("Could not stream audio")
        sys.exit(-1)
    return output

# Lambda function
def lambda_handler(event, context):
    recordId = event['params']['querystring']['id'] # obtain the record id of kintone via Amazon API Gateway
    obj = getRecord(recordId)
    text = obj['res']['record']['Text']['value'] # obtain the text to convert
    if(len(text) is 0):
        text = 'Text field is empty.'
    mp3 = buildMP3File(text) # convert text to MP3-voice
    fileKey = uploadMP3File(mp3) # upload the MP3 file to kintone and obtain the fileKey
    updateRecordForFile(recordId, fileKey) # update the kintone record with attachment of the MP3 file
    return 'Process is completed.'

in workspace, save lambda_function.py and install requests[http://docs.python-requests.org/en/master/](http://docs.python-requests.org/en/master/) & boto3 libraries, and create a Lambda function as a deployment package.

$ pip install boto3 -t .
$ pip install requests -t .
$ zip -r upload.zip *

choose "N. Virginia(us-east-1)" and click “Create a Lambda function”.

choose “Blank Function” to configure a Lambda function.

click "Next"

configure a Lambda function as below.

click "create function".

Amazon API Gateway

create API set for invoking Lambda & pass kintone parameters to Lambda via API Gateway.

Create API

click "Create API".

type "kintone-polly" as API name and click "Create API".

choose "Create Method" from "Actions".

choose "GET" and click the check mark.

configure GET method API as below.

click "OK".

configuration only "Integration Request" to set minium although we have to set up all four sections.

configure Integration Request as follows.

choose "Deploy API" from "Actions".

conclude API settings as bellow.

copy invoke URL to call from kintone JavaScript customization later.

kintone JavaScript/CSS Customization

Finally, we set kintone JavaScript/CSS customization to call the API we configured.

Main file

We save this main file as "polly.js". We have to replace {api gw url} with your parameters.

polly.js

jQuery.noConflict();
(function($) {
  'use strict';

  // API to convert text to speech, and to upload the converted MP3 file to kintone with Amazon Polly
  var POLLY_URL = '{api gw url}';

  // show spinner
  var showSpinner = function() {
    // initialization
    if ($('.kintone-spinner').length == 0) {
      // create elements for spinner and background
      var spin_div = $('<div id ="kintone-spin" class="kintone-spinner"></div>');
      var spin_bg_div = $('<div id ="kintone-spin-bg" class="kintone-spinner"></div>');

      // append spinner element to "body"
      $(document.body).append(spin_div, spin_bg_div);

      // style for spinner
      $(spin_div).css({
        'position': 'fixed',
        'top': '50%',
        'left': '50%',
        'z-index': '510',
        'background-color': '#fff',
        'padding': '26px',
        '-moz-border-radius': '4px',
        '-webkit-border-radius': '4px',
        'border-radius': '4px'
      });
      $(spin_bg_div).css({
        'position': 'absolute',
        'top': '0px',
        'z-index': '500',
        'width': '150%',
        'height': '150%',
        'background-color': '#000',
        'opacity': '0.5',
        'filter': 'alpha(opacity=50)',
        '-ms-filter': "alpha(opacity=50)"
      });

      // options for spinner
      var opts = {
        'color': '#000'
      };

      // invoke spinner
      new Spinner(opts).spin(document.getElementById('kintone-spin'));
    }

    // start(show) spinner
    $('.kintone-spinner').show();
  };

  // stop(hide) spinner
  var hideSpinner = function() {
    // hide spinner element
    $('.kintone-spinner').hide();
  };

  // download file from kintone
  var downloadFile = function(fileKey, callback, errback) {
    var url = kintone.api.url('/k/v1/file', false) + '?fileKey=' + fileKey;
    var xhr = new XMLHttpRequest();
    xhr.open('GET', url);
    xhr.setRequestHeader('X-Requested-With', 'XMLHttpRequest');
    xhr.responseType = 'blob';
    xhr.onload = function() {
      if (xhr.status === 200) {
        // success
        var blob = new Blob([xhr.response]);
        return callback(blob);
      } else {
        // error
        var err = JSON.parse(xhr.responseText);
        if (errback) {
          return errback(err);
        }
      }
    };
    xhr.onerror = function(err) {
      // error
      if (errback) {
        return errback(err);
      }
    };
    xhr.send();
  };

  // convert Blob to Base64
  var convertBlobToBase64 = function(blob, callback, errback) {
    var reader = new window.FileReader();
    reader.readAsDataURL(blob);
    reader.onload = function() {
      // success
      var base64data = reader.result;
      return callback(base64data.split(',', 2)[1]);
    };
    reader.onerror = function(err) {
      // error
      if (errback) {
        return errback(err);
      }
    };
  };

  // submit success events
  kintone.events.on([
    'app.record.create.submit.success',
    'app.record.edit.submit.success',
  ], function(event) {
    // show spinner
    showSpinner();
    // call API with Amazon Polly
    var id = event.recordId;
    var query = 'id=' + id;
    var url = POLLY_URL + '?' + query;
    return kintone.proxy(url, 'GET', {}, {}).then(function(r){
      // hide spinner
      hideSpinner();
      return event;
    }).catch(function(e){
      // hide spinner
      hideSpinner();
      event.error = 'Error occurred.'
      return event;
    });
  });

  // after showing record details
  kintone.events.on(['app.record.detail.show'], function(event) {
    // show spinner
    showSpinner();
    var record = event.record;
    new kintone.Promise(function(resolve, reject) {
      // download the MP3 file
      var file = record['Attachment'].value[0];
      var fileKey = file.fileKey;
      downloadFile(fileKey, function(r) {
        return resolve(r);
      }, function(e) {
        return reject(e);
      });
    }).then(function(blob) {
      // convert the MP3 blob file to Base64 style
      return new kintone.Promise(function(resolve, reject) {
        convertBlobToBase64(blob, function(r) {
          return resolve(r);
        }, function(e) {
          return reject(e);
        });
      });
    }).then(function(base64data) {
      // hide spinner
      hideSpinner();
      // append the Base64-encoded MP3 file to "audio" element
      var audioUrl = 'data:audio/mp3;base64,' + base64data;
      $(document.body).append(
        $('<audio>').prop({
          'id': "voice",
          'preload': "auto"
        }).append(
          $('<source>').prop({
            'src': audioUrl,
            'type': "audio/mp3"
          })
        )
      );

      // append the button that starts to play the MP3 file
      var elAttachment = kintone.app.record.getFieldElement('Attachment');
      $(elAttachment).append(
        $('<button>').prop({
          id: 'speak'
        }).addClass('kintoneplugin-button-dialog-ok').text('Speak')
      );

      // attach click event to the "#speak" button for playing
      $('#speak').click(function() {
        var voice = $('#voice');
        voice[0].currentTime = 0;
        voice[0].play();
      });
      return;
    }).catch(function(e) {
      hideSpinner();
      console.log(e);
    });
  });
})(jQuery);

Customizing an App with JavaScript & CSS

set links and a file as follows at JavaScript and CSS Customization view.

Upload JavaScript for PC
- https://js.cybozu.com/jquery/3.0.0/jquery.min.js [link]
- https://js.cybozu.com/spinjs/2.3.2/spin.min.js [link]
- polly.js [saved file]
Upload CSS for PC
- 51-current-default.css [downloaded file]

51-current-default.css is downloaded from "kintone Plug-in SDK ".

Save text and convert

If you input text you want to convert to voice, you'll able to obtain the voice file as MP3 style and to play on kintone.

Information

Please stop by AWS re:Invent Booth #2534 to see what our team developed with it! Let's experience it!

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up