LoginSignup
0
0

More than 3 years have passed since last update.

YouTube, Deepspeech, with Google Colaboratory [testing_0002]

Last updated at Posted at 2021-02-15

English

deepspeech-0.9.3-models

This version is not backwards compatible with earlier versions.

この記事は、YouTube 動画から音声をとって、deepspeech という英語の音声から音声認識してテキストにするプログラム(スピーチレコグニション とか ASR とか呼ばれているもの)を Google Colabratory で使うもののテストのノートです。おもしろそうなので。

deepspeech 0.6.1に対応していた記事を、deepspeech 0.9.3に合わせて書き直しました。ファイル構成とコマンドがより以前のバージョンとは異なっているので、今後にも同様な違いが現れると思いますのでバージョンを確認してください。

Google Colabratoryについて

よくある質問
基本
Colaboratory とは何ですか?
Colaboratory(略称: Colab)は、Google Research が提供するサービスです。Colab では、誰でもブラウザ上で Python を記述、実行できるため、機械学習、データ分析、教育に特に適しています。具体的には、GPU などのコンピューティング リソースに無料でアクセスしながら特別な設定なしにご利用いただけるホスト型の Jupyter Notebook サービスです。

本当に無料で利用できるのですか?
はい。Colab は無料でご利用いただけます。

話がうますぎるように思えます。なにか制限事項があるのではありませんか?
https://research.google.com/colaboratory/faq.html

YouTube 動画からの音声の取得の部分は youtube-dl が担当し、deepspeech は英語話者の音声に対して自動スピーチ認識をして対応するテキスト推量して表示します。( 以下のプログラムについては deepspeech-0.9.3-models / TensorFlow1 が使われています。)

これはつまり、「すぐできる」ものですが、 deepspeech-0.9.3-models のデータサイズが そこそこ 容量があることには注意してください。
ただし、設定を残さない場合は、 Google Colabratory ランタイム終了とともにデータも消えます。

むずかしそうな字が並んでいるように感じます。でも、端的に書きますが、以下の python コードを googlecolab のセルに貼りつけて実行していくだけです2
googlecolab では Control キー + Enter キーでセルのコードを実行できます。
いちばん心理的に障壁が高いのは、 google のアカウントをつくることですが、それ以上の難しさはここには無いです。しかしまた、実行する必要もないです。知りたい人のためにサンプルとして、しばしここにあるだけです。よく見直すと、 python プログラムというよりターミナルでコマンド並べているという感じです、ここでは。シェルの補助として python ぽいので書いているだけで、ほとんど要らない、省こうと思えば、シェルだけでいいような気がします。googlecolab を使うことにメリットがあるので googlecolab が IPython だから使用言語が python であるということですね。

googlecolab ではエディターの設定で vim のキーバインディングが可能なので、vim だと速い人は、Shift キー + insert キーでペーストできます。

これ

Setting up Google Colaboratory

GoogleColaboratory
from google.colab import drive 
drive.mount('/content/drive')

Rf.
外部データ: ローカル ファイル、ドライブ、スプレッドシート、Cloud Storage
https://colab.research.google.com/notebooks/io.ipynb

Speech Recognition with DeepSpeech

このワードで検索してみてください。以下の引用は全てここからのものでした。バージョンの違いもありますから、違いを認識した上で改良を加えるなりしてください。実際に動作するサンプルを見ないと、なかなか手が出ないので、レシピノートを公開してくれていることでプログラムの動作チェックできて、ありがたく思います。

  • MozillaDeepSpeech.ipynb ... mozilla/DeepSpeech with LM on Youtube videos

Rf.
Erdene-Ochir Tuguldur
tugstugi
Берлиний Техникийн Их Сургууль
https://github.com/tugstugi/dl-colab-notebooks

This notebook uses an open source project mozilla/DeepSpeech to transcribe a given youtube video.

For other deep-learning Colab notebooks, visit tugstugi/dl-colab-notebooks.

Install DeepSpeech

GoogleColaboratory
import os
from os.path import exists
import wave

!pip install -q deepspeech-gpu==0.9.3 youtube-dl

if not exists('deepspeech-0.9.3-models.pbmm'):
  !wget https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.pbmm 
if not exists('deepspeech-0.9.3-models.scorer'):
  !wget https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.scorer
  #!tar xvfz deepspeech-0.9.3-models.tar.gz

from IPython.display import YouTubeVideo

pre-trained model ... 学習済みのモデルってことでしょうね。

  • .pbmm ... TensorFlow runtime が参照するモデル

  • .tflite ... TensorFlow Lite runtime が参照するモデル。こっちはraspberry pi とかに使われる lite 版用ってことですね。 

ということがここに書いている。
https://deepspeech.readthedocs.io/en/v0.9.3/USING.html#getting-the-pre-trained-model

もうひとつ

  • .scorer 

この記事の中で使うのは、
deepspeech-0.9.3-models.pbmm
deepspeech-0.9.3-models.scorer
の2つです。このモデルが deepspeech プログラムから参照される。

log
--2021-02-15 15:40:31--  https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.pbmm
Resolving github.com (github.com)... 52.69.186.44
Connecting to github.com (github.com)|52.69.186.44|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://github-releases.githubusercontent.com/60273704/8b25f180-3b0f-11eb-8fc1-de4f4ec3b5a3?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20210215%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20210215T154032Z&X-Amz-Expires=300&X-Amz-Signature=de84c8f71f6fb0d61801e0e6eade089738aab5899a4bd80fdda9fed4e77735d6&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=60273704&response-content-disposition=attachment%3B%20filename%3Ddeepspeech-0.9.3-models.pbmm&response-content-type=application%2Foctet-stream [following]
--2021-02-15 15:40:32--  https://github-releases.githubusercontent.com/60273704/8b25f180-3b0f-11eb-8fc1-de4f4ec3b5a3?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20210215%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20210215T154032Z&X-Amz-Expires=300&X-Amz-Signature=de84c8f71f6fb0d61801e0e6eade089738aab5899a4bd80fdda9fed4e77735d6&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=60273704&response-content-disposition=attachment%3B%20filename%3Ddeepspeech-0.9.3-models.pbmm&response-content-type=application%2Foctet-stream
Resolving github-releases.githubusercontent.com (github-releases.githubusercontent.com)... 185.199.110.154, 185.199.111.154, 185.199.108.154, ...
Connecting to github-releases.githubusercontent.com (github-releases.githubusercontent.com)|185.199.110.154|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 188915987 (180M) [application/octet-stream]
Saving to: ‘deepspeech-0.9.3-models.pbmm’

deepspeech-0.9.3-mo 100%[===================>] 180.16M  20.4MB/s    in 9.1s    

2021-02-15 15:40:41 (19.9 MB/s) - ‘deepspeech-0.9.3-models.pbmm’ saved [188915987/188915987]

--2021-02-15 15:40:41--  https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.scorer
Resolving github.com (github.com)... 52.192.72.89
Connecting to github.com (github.com)|52.192.72.89|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://github-releases.githubusercontent.com/60273704/924cff80-3b0f-11eb-878c-cacaa2a0d946?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20210215%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20210215T154041Z&X-Amz-Expires=300&X-Amz-Signature=2a8ac24c6d349b794a20407523a3416878ee60c0f079d8c68c8eb6b59bc980af&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=60273704&response-content-disposition=attachment%3B%20filename%3Ddeepspeech-0.9.3-models.scorer&response-content-type=application%2Foctet-stream [following]
--2021-02-15 15:40:42--  https://github-releases.githubusercontent.com/60273704/924cff80-3b0f-11eb-878c-cacaa2a0d946?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20210215%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20210215T154041Z&X-Amz-Expires=300&X-Amz-Signature=2a8ac24c6d349b794a20407523a3416878ee60c0f079d8c68c8eb6b59bc980af&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=60273704&response-content-disposition=attachment%3B%20filename%3Ddeepspeech-0.9.3-models.scorer&response-content-type=application%2Foctet-stream
Resolving github-releases.githubusercontent.com (github-releases.githubusercontent.com)... 185.199.108.154, 185.199.109.154, 185.199.110.154, ...
Connecting to github-releases.githubusercontent.com (github-releases.githubusercontent.com)|185.199.108.154|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 953363776 (909M) [application/octet-stream]
Saving to: ‘deepspeech-0.9.3-models.scorer’

deepspeech-0.9.3-mo 100%[===================>] 909.20M  25.9MB/s    in 40s     

2021-02-15 15:41:22 (22.6 MB/s) - ‘deepspeech-0.9.3-models.scorer’ saved [953363776/953363776]

GoogleColaboratory
!apt-get install -qq sox

sox - The Python and Node.JS clients use SoX to resample files to 16kHz.

Extractiong YouTube video_id from YouTube URL

GoogleColaboratory
from urllib.parse import urlparse, parse_qs

urltext ='https://www.youtube.com/watch?v=qviM_GnJbOM' 
args = [urltext]
video_id = ''


def extract_video_id(url):
    query = urlparse(url)
    if query.hostname == 'youtu.be': return query.path[1:]
    if query.hostname in {'www.youtube.com', 'youtube.com'}:
        if query.path == '/watch': return parse_qs(query.query)['v'][0]
        if query.path[:7] == '/embed/': return query.path.split('/')[2]
        if query.path[:3] == '/v/': return query.path.split('/')[2]
    # fail?
    return None

for url in args:
    video_id = (extract_video_id(url))
    print('youtube video_id:',video_id)

Rf.
extracting youtube video id from youtube URL
https://qiita.com/dauuricus/private/9e70c4c25566fedb9c19

Transcribe Youtube Video

We are going to make speech recognition on the following youtube video

GoogleColaboratory

YouTubeVideo(video_id)

Download the above video, convert to a WAV file and do speech recognition

GoogleColaboratory
#!rm -rf *.wav
!youtube-dl --extract-audio --audio-format wav --output "extract.%(ext)s" {urltext}

youtube-dl --extract-audio --audio-format wav --output "test.%(ext)s"extract.wav というファイル名で wav フォーマットで動画から抽出します。deepspeech が対応するのはサンプリングレート 16000hz の音声のようです。

[youtube] qviM_GnJbOM: Downloading webpage
[download] Destination: extract.m4a
[download] 100% of 2.05MiB in 00:00
[ffmpeg] Destination: extract.wav
Deleting original file extract.m4a (pass -k to keep)

Rf.
Download Audio from YouTube
https://gist.github.com/umidjons/8a15ba3813039626553929458e3ad1fc

このテストケースでも、かならずしも YouTube の音声でなくてもいいので、youtube-dl のインストールが済んでない場合、 ffmpeg がインストールされていないかもしれません。音声のコンバートに別途 ffmpeg が必要な場合はこれでインストールできます。

!apt install ffmpeg
GoogleColaboratory
!ffmpeg -i extract.wav -vn -acodec pcm_s16le -ac 1 -ar 16000 -f wav test.wav

deepspeech ではデフォルトでは 16000hz の wav に対応しているらしいので、 44100 Hz:extract.wav の音声ファイルを PCM signed 16-bit little-endian 16000 Hz:test.wav へコンバートします。

ffmpeg_cheatsheet_audio
-codecs          # list codecs
-c:a             # audio codec (-acodec)
-fs SIZE         # limit file size (bytes)
-b:v 1M          # video bitrate (1M = 1Mbit/s)
-b:a 1M          # audio bitrate
-vn              # no video
-aq QUALITY      # audio quality (codec-specific)
-ar 16000        # audio sample rate (hz)
-ac 1            # audio channels (1=mono, 2=stereo)
-an              # no audio
-vol N           # volume (256=normal)

log
ffmpeg version 3.4.8-0ubuntu0.2 Copyright (c) 2000-2020 the FFmpeg developers
  built with gcc 7 (Ubuntu 7.5.0-3ubuntu1~18.04)
  configuration: --prefix=/usr --extra-version=0ubuntu0.2 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --enable-gpl --disable-stripping --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librubberband --enable-librsvg --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-omx --enable-openal --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libopencv --enable-libx264 --enable-shared
  libavutil      55. 78.100 / 55. 78.100
  libavcodec     57.107.100 / 57.107.100
  libavformat    57. 83.100 / 57. 83.100
  libavdevice    57. 10.100 / 57. 10.100
  libavfilter     6.107.100 /  6.107.100
  libavresample   3.  7.  0 /  3.  7.  0
  libswscale      4.  8.100 /  4.  8.100
  libswresample   2.  9.100 /  2.  9.100
  libpostproc    54.  7.100 / 54.  7.100
Guessed Channel Layout for Input Stream #0.0 : stereo
Input #0, wav, from 'test.wav':
  Metadata:
    encoder         : Lavf57.83.100
  Duration: 00:02:48.86, bitrate: 1411 kb/s
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, stereo, s16, 1411 kb/s
Stream mapping:
  Stream #0:0 -> #0:0 (pcm_s16le (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, wav, to 'test1.wav':
  Metadata:
    ISFT            : Lavf57.83.100
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s
    Metadata:
      encoder         : Lavc57.107.100 pcm_s16le
size=    5277kB time=00:02:48.85 bitrate= 256.0kbits/s speed=1.24e+03x    
video:0kB audio:5277kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.001444%

deepspeech で音声からテキストへ( STT )

GooogleColaboratory
!deepspeech --model deepspeech-0.9.3-models.pbmm --scorer deepspeech-0.9.3-models.scorer --audio test.wav

#!deepspeech --model deepspeech-0.6.1-models/output_graph.pbmm --lm deepspeech-0.6.1-models/lm.binary --trie deepspeech-0.6.1-models/trie --audio test.wav ## old version 

最新のバージョンの deepspeech とはコマンドが(たぶん)違います。
deepspeech-0.9.3-models:

deepspeech-0.9.3
usage: deepspeech [-h] --model MODEL [--scorer SCORER] --audio AUDIO
                  [--beam_width BEAM_WIDTH] [--lm_alpha LM_ALPHA]
                  [--lm_beta LM_BETA] [--version] [--extended] [--json]
                  [--candidate_transcripts CANDIDATE_TRANSCRIPTS]
                  [--hot_words HOT_WORDS]

Running DeepSpeech inference.

optional arguments:
  -h, --help            show this help message and exit
  --model MODEL         Path to the model (protocol buffer binary file)
  --scorer SCORER       Path to the external scorer file
  --audio AUDIO         Path to the audio file to run (WAV format)
  --beam_width BEAM_WIDTH
                        Beam width for the CTC decoder
  --lm_alpha LM_ALPHA   Language model weight (lm_alpha). If not specified,
                        use default from the scorer package.
  --lm_beta LM_BETA     Word insertion bonus (lm_beta). If not specified, use
                        default from the scorer package.
  --version             Print version and exits
  --extended            Output string from extended metadata
  --json                Output json from metadata with timestamp of each word
  --candidate_transcripts CANDIDATE_TRANSCRIPTS
                        Number of candidate transcripts to include in JSON
                        output
  --hot_words HOT_WORDS
                        Hot-words and their boosts.

log
2021-02-15 16:02:27.698878: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
Loading model from file deepspeech-0.9.3-models.pbmm
TensorFlow: v2.3.0-6-g23ad988
DeepSpeech: v0.9.3-0-gf2e9c85
2021-02-15 16:02:27.891101: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-02-15 16:02:27.892196: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2021-02-15 16:02:27.898478: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-15 16:02:27.899231: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:00:04.0 name: Tesla K80 computeCapability: 3.7
coreClock: 0.8235GHz coreCount: 13 deviceMemorySize: 11.17GiB deviceMemoryBandwidth: 223.96GiB/s
2021-02-15 16:02:27.899265: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-02-15 16:02:27.904517: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2021-02-15 16:02:27.907846: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2021-02-15 16:02:27.908329: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2021-02-15 16:02:27.911375: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2021-02-15 16:02:27.912475: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2021-02-15 16:02:27.917975: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2021-02-15 16:02:27.918097: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-15 16:02:27.918905: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-15 16:02:27.919609: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2021-02-15 16:02:28.101511: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-02-15 16:02:28.101591: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0 
2021-02-15 16:02:28.101610: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N 
2021-02-15 16:02:28.101755: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-15 16:02:28.102641: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-15 16:02:28.103507: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-15 16:02:28.104252: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2021-02-15 16:02:28.104298: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10597 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:04.0, compute capability: 3.7)
Loaded model in 0.228s.
Loading scorer from files deepspeech-0.9.3-models.scorer
Loaded scorer in 0.000237s.
Running inference.
2021-02-15 16:02:28.162010: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10

deepspeech の自動スピーチ認識( ASR )の結果は、

確認してみてください。まだ速くする方法など、色々わからないので、結構待つことになるので「リダイレクション '>'」(シェルコマンド)てやつで、出力をテキストファイルにしておくといいかもしれません。3

GooogleColaboratory
!deepspeech --model deepspeech-0.9.3-models.pbmm --scorer deepspeech-0.9.3-models.scorer --audio test.wav > test.txt

比較:YouTubeの字幕は、こちら。

youtube-cation
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
YouTube captions
- - - - - - - - - - - - - - - - - - -  YouTube  - - - - - - - - - - - - - - - - - - -


1    you may write me down in history with
2    your bitter twisted lies
3    you may tribe me in the very dirt but
4    still like dust a lie does my sassiness
5    upset you
6    why are you beset with gloom just
7    because I walked as if I have oil wells
8    pumping in my living room just like
9    moons and like Suns with the certainty
10    of tides just like hope springing high
11    still I rise did you want to see me
12    broken bowed head and lowered eyes
13    shoulders falling down like teardrops we
14    can buy my soul who cries does my
15    sassiness upset you don't take it too
16    hard just cuz I laugh as if I have gold
17    mines digging in my own backyard you can
18    shoot me with your words you can cut me
19    with your lies you can kill me with your
20    hatefulness but just like life arise
21    just my sexiness offend you oh does it
22    come as a surprise that I dance as if I
23    have diamonds at the meeting of my
24    thighs
25    out of the huts of history's shame I
26    rise up from a past rooted in pain I
27    rise a black ocean leaping and wide
28    Welling and swelling and bearing in the
29    time leaving behind nights of terror and
30    fear I rise into a daybreak miraculously
31    clear I rise bringing the gifts that my
32    ancestors gave I am the hope and the
33    dream of the slave and so there go


************************************************************************************

Rf.
@dauuricus
updated at 2021-02-07
Youtube subtitle (captions) を編集する。001

Cf. Still I Rise by MAYA ANGELOU

備考

当初、動作テストに deepspeech 0.6.1を使い、その後、最新版に近い deepspeech 0.9.3 に合わせてインストールパートから記事を書きかえました。

Cf.
Speech to Text
The IBM Watson Speech to Text service uses speech recognition capabilities to convert Arabic, English, Spanish, French, Brazilian Portuguese, Japanese, Korean, German, and Mandarin speech into text.
https://speech-to-text-demo.ng.bluemix.net/

VOSK
https://qiita.com/dauuricus/items/7da2e5f14c965da18106

Mandarin Speech to Text ; deepspeech test
https://qiita.com/dauuricus/items/eda0cf1d4710f583e7f4

Eg.
deepspeech.ipynb:Testing English case & Chinese case
https://colab.research.google.com/drive/1I6IgMp5qWA7xRsc1sKJowY3OaW94QwEg?usp=sharing

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0