More than 5 years have passed since last update.

ドコモの音声合成APIにしゃべってもらおうと思ったけど...

Posted at 2015-06-22

ドコモの雑談対話APIとお話してみた。で雑談対話APIとお話をして返答をAVSpeechSynthesizerを使って読み上げるということをしました。
その時、AVSpeechSynthesizerでの読み上げがちょっと味気ない感じがしたのでまたまたドコモから提供されている音声合成APIを使ってみることにしたのですが...

雑談対話APIやOAuthと同じく音声合成APIもiOS用SDKが用意されています。

ViewController.swift

    //ＳＳＭＬ作成クラス
    var ssml: AiTalkSsml!
    //voiceタグ情報
    var voice: AiTalkVoice!
    //ＡＩ音声合成問い合わせ処理クラス
    var speech: AiTalkTextToSpeech!
    //エラー情報
    var sdkerror: SdkError!
    var voicedata = ""
    var resultdata: NSData!

初期化

ViewController.swift

        ssml = AiTalkSsml()
        sdkerror = SdkError()
        speech = AiTalkTextToSpeech()
        voice = AiTalkVoice()
       //音声を指定
        voice = AiTalkVoice(voiceName: "nozomi")
        voice.addText("こんにちわ")
        ssml.addVoice(voice)
        AuthApiKey.initializeAuth("")

音声合成リクエスト

ViewController.swift

        //リクエスト用ＸＭＬテキスト生成
        voicedata = ssml.makeSsml()
        //リクエスト開始
        speech.requestAiTalkSsmlToSound(voicedata, onComplete: { (resultdata) -> Void in
          self.playAudio(resultdata)
        }) { (sdkerror) -> Void in
          pritnln("\(sdkerror)")
        }

リクエストを投げてresultdataに音声PCMリニアデータが返ってくるとこまではできたのですが、
これを再生する処理(playAudio)でつまずいています。
(といってもObjective-Cのサンプルはちゃんと動いてるので勉強不足なだけですが...)

返ってくる音声PCMリニアデータ(バイナリ)のフォーマットはこれです。

【符号化方式】リニアPCM
【チャネル数】1(モノラル)
【サンプル周波数】16000
【ビット深度】16bit(ビッグエンディアン)

サンプルをみるとこんな処理をしているようです。
この部分でresultdataを再生できるファイルに変換していると思うのですが...

ViewController.m


- (void) playAudio:(NSData *)data
{
    NSLog(@"playAudio data.length=%d",(int)data.length);
    data = [AiTalkTextToSpeech convertByteOrder16:data];
    
    [[AiTalkAudioPlayer manager] playSound:[self addHeader:data]];
}

-(Byte *)setHeader:(long)dataLength
{
    static Byte header[44] = {0};
    long longSampleRate = 16000;
    int channels = 1;
    long byteRate = 16 * 11025.0 * channels/8;
    long totalDataLen = dataLength + 44;
    
    header[0] = 'R';
    header[1] = 'I';
    header[2] = 'F';
    header[3] = 'F';
    header[4] = (Byte) (totalDataLen & 0xff);
    header[5] = (Byte) ((totalDataLen >> 8) & 0xff);
    header[6] = (Byte) ((totalDataLen >> 16) & 0xff);
    header[7] = (Byte) ((totalDataLen >> 24) & 0xff);
    header[8] = 'W';
    header[9] = 'A';
    header[10] = 'V';
    header[11] = 'E';
    header[12] = 'f';
    header[13] = 'm';
    header[14] = 't';
    header[15] = ' ';
    header[16] = 16;
    header[17] = 0;
    header[18] = 0;
    header[19] = 0;
    header[20] = 1;
    header[21] = 0;
    header[22] = (Byte) channels;
    header[23] = 0;
    header[24] = (Byte) (longSampleRate & 0xff);
    header[25] = (Byte) ((longSampleRate >> 8) & 0xff);
    header[26] = (Byte) ((longSampleRate >> 16) & 0xff);
    header[27] = (Byte) ((longSampleRate >> 24) & 0xff);
    header[28] = (Byte) (byteRate & 0xff);
    header[29] = (Byte) ((byteRate >> 8) & 0xff);
    header[30] = (Byte) ((byteRate >> 16) & 0xff);
    header[31] = (Byte) ((byteRate >> 24) & 0xff);
    header[32] = (Byte) (2 * 8 / 8);
    header[33] = 0;
    header[34] = 16;
    header[35] = 0;
    header[36] = 'd';
    header[37] = 'a';
    header[38] = 't';
    header[39] = 'a';
    header[40] = (Byte) (dataLength & 0xff);
    header[41] = (Byte) ((dataLength >> 8) & 0xff);
    header[42] = (Byte) ((dataLength >> 16) & 0xff);
    header[43] = (Byte) ((dataLength >> 24) & 0xff);
    return header;
}
/**
 wavヘッダ情報を付加する
 
 @param path ファイルパス
 @param data wav音声データ
 */
- (NSData*)addHeader:(NSData*)data
{
    NSMutableData * soundFileData=nil;
    if([data length]>0)
    {
        Byte *header = [self setHeader:data.length];
        NSData *headerData = [NSData dataWithBytes:header length:44];
        soundFileData = [NSMutableData alloc];
        [soundFileData appendData:[headerData subdataWithRange:NSMakeRange(0, 44)]];
        [soundFileData appendData:data];
    }
    return soundFileData;
}

AiTalkAudioPlayer.m

- (void)playSound:(NSData *)data
{
    NSError * error;
    AVAudioPlayer* player = [[AVAudioPlayer alloc] initWithData:data error:&error];
    if (error){
        NSLog(@"error %d %@",(int)error.code, error.localizedDescription);
    }
    [player setNumberOfLoops:0];
    player.volume = _soundVolume;
    player.delegate = (id)self;
    [soundArray insertObject:player atIndex:0];
    [player prepareToPlay];
    [player play];
}

ということでおしゃべりするにはもう少し時間がかかりそうです。

ところで皆様はObjective-CをSwiftどっちが好きですか？ネットでは色々な意見を見かけますが、Swiftはプログラミング初心者の私でもなんとなく直感的に書ける気がします。なので私はSwiftが好きです。
OSS化されたことですし、Apple製品だけでなく色々なところで使えるようになってくれると嬉しいのですが。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up