More than 5 years have passed since last update.

音声文字おこしwebアプリの作成

Last updated at 2020-03-29Posted at 2020-03-29

変換

iphoneで録音した音源を文字起こししたかったので、m4aファイルをwavファイルに変換する。
まず、

!pip install pydub

でpydubをインストールして動かそうとしてみたら、

 [Errno 2] No such file or directory: 'ffprobe': 'ffprobe'

のようなエラーが出たので、

下記サイトからffmpeg.exeをダウンロードした
Download FFmpeg
ffmpeg-97026-gea46b45e9c.7zという圧縮ファイルだった。

Problems with AudioSegment.from_mp3

さらにそのファイルをPATHが通っている場所に移動させた。

$printenv
や
echo $PATH

でPATHを確認した。

ffmpegがあるなら、pydubを使う必要はないなと思い、

pip install ffmpeg-python

でffmpeg-pythonをインストール。以下のようにして無事変換できた。

import ffmpeg
stream = ffmpeg.input("sample.m4a")
stream = ffmpeg.output(stream, 'output.wav')
ffmpeg.run(stream)

webアプリ

server.py

from flask import Flask, render_template, request,send_file,after_this_request,make_response,jsonify,redirect, url_for, send_from_directory
import pandas as pd
import os
import ffmpeg
import wave

app = Flask(__name__)

UPLOAD_DIR = './uploads'
ALLOWED_EXTENSIONS = set(['m4a','mp3','wav',])
app.config['UPLOAD_FOLDER'] = UPLOAD_DIR


@app.route('/')
def hello():
    return render_template('index.html')


def allwed_file(filename):
    # .があるかどうかのチェックと、拡張子の確認
    # OKなら１、だめなら0
    return '.' in filename and filename.rsplit('.', 1)[1].lower() in ALLOWED_EXTENSIONS



def transcribe_file(speech_file,num):
    """Transcribe the given audio file."""
    from google.cloud import speech
    from google.cloud.speech import enums
    from google.cloud.speech import types
    client = speech.SpeechClient()

    with open(speech_file, 'rb') as audio_file:
        content = audio_file.read()

    if speech_file.encode == 'flac':
        encode = speech.enums.RecognitionConfig.AudioEncoding.FLAC
    elif speech_file.encode == 'wav':
        encode = speech.enums.RecognitionConfig.AudioEncoding.LINEAR16
    elif speech_file.encode == 'ogg':
        encode = speech.enums.RecognitionConfig.AudioEncoding.OGG_OPUS
    elif speech_file.encode == 'amr':
        encode = speech.enums.RecognitionConfig.AudioEncoding.AMR
    elif speech_file.encode == 'awb':
        encode = speech.enums.RecognitionConfig.AudioEncoding.AMR_WB
    else:
        encode = speech.enums.RecognitionConfig.AudioEncoding.LINEAR16



    audio = types.RecognitionAudio(content=content)
    config = types.RecognitionConfig(
        encoding=encode,
        
        sample_rate_hertz=num,
        language_code='ja-JP')
    

    response = client.recognize(config, audio)
    
    result_list=[]
    for result in response.results:
        
        result_list.append(result.alternatives[0].transcript)
        
    
    

    return result_list
    
@app.route('/result', methods=['POST'])
def uploads_file():
    
    # リクエストがポストかどうかの判別
    if request.method == 'POST':
        # ファイルがなかった場合の処理
        if 'file' not in request.files:
            make_response(jsonify({'result':'uploadFile is required.'}))
           
        # データの取り出し
        file = request.files['file']
        
        # ファイル名がなかった時の処理
        if file.filename == '':
            make_response(jsonify({'result':'filename must not empty.'}))
            
            
        # ファイルのチェック
        if file and allwed_file(file.filename):
            
            filename = file.filename


            # ファイルの保存
            file.save(os.path.join(app.config['UPLOAD_FOLDER'],filename))
            
            
            stream = ffmpeg.input("uploads/" + filename)
            stream = ffmpeg.output(stream, 'output1.wav')
            ffmpeg.run(stream)

            wfile = wave.open('output1.wav', "r")
            frame_rate = wfile.getframerate()
            print(frame_rate)
            result_list = transcribe_file('output1.wav',frame_rate)
            
            
            os.remove('output1.wav')

            return render_template('result.html',result_list=result_list)

            
            
    return  



if __name__ == "__main__":
    app.run(debug=True)

index.html

<!DOCTYPE html>
<html lang="ja">
  <head>
    
  </head>
  <body>
  <div class="index">
    <form method="post" action="/result" enctype="multipart/form-data" class="index">
      <div>音声ファイルのアップロード</div>
      <label>
      
      <div class="inputindex">ファイルを選択</div>
      <input type="file" name="file" size="30" class="index">
      </label>
     <div>
      <button type="submit" formmethod="POST" class="index">送信</button>
      </div>
    </form>
    </div>
 </body>
</html>

result.html

<html lang="ja">
  <head>
    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width,initial-scale=1.0,minimum-scale=1.0">
    <link rel="stylesheet" href="{{url_for('static', filename='index.css')}}">
    
  </head>
  <body>
  
  {% for result in result_list %}
  <ul>{{result}}</ul>
  {% endfor %}
</body>
</html>

export GOOGLE_APPLICATION_CREDENTIALS="[PATH]"

↑PATHのところに認証用のjsonファイルのPATHをいれる

参考サイト

Cloud Speech-to-Text
オリジナルの音声アシスタントを作ろう (3) - Cloud Speech API
Pythonで音声認識する方法を現役エンジニアが解説【初心者向け】

↓この記事おすすめです。
Google Cloud Speech-to-Text APIをいろいろ調査してみる

【Python】Google Cloud Speech-to-Text APIの2種類のライブラリを使い比べてみた

テストに向いている音声があるサイト

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up