More than 1 year has passed since last update.

WhisperをROS 2で使ってみた話

Last updated at 2023-06-30Posted at 2023-06-30

背景

音声認識をロボットに組み込んでみたいと思い，いろいろ探していたところ，こんなリポジトリを発見しました

ROS 2でWhisperを使った言語認識ができるとのこと

WhisperとはOpenAIが2022/09に公開した言語認識モデルです．

せっかくなので試してみることにしたのですが，以外にエラーが出てうまく行かなかったのでメモを残しておきます．

なお，こちらの記事は，ROS 2に関する基礎的知識を有している方向けです．ROS 2に関するInstallやTutorialについては，公式のDocをご覧ください！

環境

PC
- OS: Ubuntu 20.04
- CPU: 12th Gen Intel® Core™ i5-12400F × 12
- GPU: NVIDIA Corporation Device 2507 [GeForce RTX 3050]
Software
- ROS 2: Galactic (古くてすみません)
- CUDA: Version: 11.7

$ nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.105.01   Driver Version: 515.105.01   CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0  On |                  N/A |
| 30%   32C    P8     7W / 130W |    336MiB /  8192MiB |      2%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1161      G   /usr/lib/xorg/Xorg                 71MiB |
|    0   N/A  N/A      2361      G   /usr/lib/xorg/Xorg                124MiB |
|    0   N/A  N/A      2498      G   /usr/bin/gnome-shell               42MiB |
|    0   N/A  N/A      3711      G   ...483796245270011409,262144       63MiB |
|    0   N/A  N/A     12118      G   ...RendererForSitePerProcess       20MiB |
|    0   N/A  N/A     13576      G   gnome-control-center                1MiB |
+-----------------------------------------------------------------------------+

今回はGPUドライバ関連のインストールに空いては解説なしです．すみません...。

インストール

まずはROS 2のワークスペースにレポジトリをクローンします

$ cd ~/ros2_ws/src
$ git clone https://github.com/mgonzs13/whisper_ros.git

続いてビルド

$ cd ../
$ colcon build --symlink-install

一旦通ります

いざトライ

ここからは実際にハマったエラーも示しながら進めて行きます！

Install時のエラー

まずREADME.md通りのインストールで以下のエラーが発生

$ cd ~/ros2_ws/src/whisper_ros
$ pip3 install -r requirements.txt
# ERROR
ERROR: Command errored out with exit status 1:
   command: /usr/bin/python3 /tmp/tmpo3uzuytm build_wheel /tmp/tmp97r6mkp8
       cwd: /tmp/pip-install-xuyh05o2/pyaudio
  Complete output (18 lines):
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.linux-x86_64-cpython-38
  creating build/lib.linux-x86_64-cpython-38/pyaudio
  copying src/pyaudio/__init__.py -> build/lib.linux-x86_64-cpython-38/pyaudio
  running build_ext
  building 'pyaudio._portaudio' extension
  creating build/temp.linux-x86_64-cpython-38
  creating build/temp.linux-x86_64-cpython-38/src
  creating build/temp.linux-x86_64-cpython-38/src/pyaudio
  x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -I/usr/local/include -I/usr/include -I/usr/include/python3.8 -c src/pyaudio/device_api.c -o build/temp.linux-x86_64-cpython-38/src/pyaudio/device_api.o
  src/pyaudio/device_api.c:9:10: fatal error: portaudio.h: No such file or directory
      9 | #include "portaudio.h"
        |          ^~~~~~~~~~~~~
  compilation terminated.
  error: command '/usr/bin/x86_64-linux-gnu-gcc' failed with exit code 1
  ----------------------------------------
  ERROR: Failed building wheel for pyaudio

pyaudioというツールが必要らしい．
こちらを参考に以下でインストールします

$ sudo apt-get install portaudio19-dev
$ pip install pyaudio
# もう一度インストール
$ pip3 install -r requirements.txt

難なく通りました！

実行時のエラー

いざ勝負！

$ cd ~/ros2_ws
$ colcon build --symlink-install
$ source install/setup.bash
$ ros2 launch whisper_bringup whisper.launch.py

すると以下のエラーで止まります...。

[whisper_node-1] ImportError: /home/kosei/.local/lib/python3.8/site-packages/torch/lib/libtorch_cuda_cpp.so: undefined symbol: _ZNK3c104Type14isSubtypeOfExtERKS0_PSo

よくわからないのでエラーで検索すると以下のサイトに到達しました
https://github.com/pytorch/audio/issues/62#issuecomment-1166196925
とりあえずやってみる

$ pip install -U torch torchaudio --no-cache-dir Collecting torch

結構多めの量のインストールが始まります！気長に待ちましょう！

インストール後再度チャレンジ

$ ros2 launch whisper_bringup whisper.launch.py

今度はwhisperモジュールのエラーが発生しました...

# Error
[whisper_node-1] AttributeError: module 'whisper' has no attribute 'load_model'

こちらもよくわからないのでエラー検索するとここに到達しました

言われたとおりにやってみます！

$ pip install git+https://github.com/openai/whisper.git

再度実行

$ ros2 launch whisper_bringup whisper.launch.py

今度はPythonのNumpyモジュールに関するのエラー

[whisper_node-1] ImportError: Numba needs NumPy 1.21 or greater. Got NumPy 1.17.

言われたとおりにバージョンを変えてみます

$　pip install numpy==1.21.0

再度チャレンジ！

$ ros2 launch whisper_bringup whisper.launch.py

ついにノードが起動します！以下の表示が出たら...

[whisper_node-1] ALSA lib pcm_usb_stream.c:486:(_snd_pcm_usb_stream_open) Invalid type for card
[whisper_node-1] [INFO] [1688126941.120392044] [whisper_node]: Listening

PCにマイクをつなげた状態で，新しいターミナルで以下を実行してみてください！

$ source install/setup.bash
$ ros2 topic echo /whisper_text

マイクに喋りかけるとこんな感じの出力が出てきます！

data: 見てくれてありがとう!
---
data: ' 안녕' #　なんて読むんでしょうか
---
data: ' Hello.'
---
data: ' My name is Konsei.'
---
data: ' I''m from Japan'
---
data: ありがとう
---

動画だとこちら

日本語もかなり正確に認識していました！OpenAIのパワーですね
発話から表示までに多少時間がかかりました（1秒ほど？）パラメータ設定が必要かもしれないです！

まとめ

今回はWhisperとROS 2を使って音声認識を試しました！
以下に自分の環境で必要だった追加インストールコマンドをまとめます

$ sudo apt-get install portaudio19-dev
$ pip install pyaudio
$ cd ~/ros2_ws/src/whisper_ros
$ pip3 install -r requirements.txt
$ pip install -U torch torchaudio --no-cache-dir Collecting torch #時間かかる
$ pip install git+https://github.com/openai/whisper.git
$pip install numpy==1.21.0

ロボットを使った音声認識などの機能に利用できそうです．Hi, robotとかですかね

WhisperはMIT Licenseなので比較的自由に利用することができます

みなさんも一度試してみてはいかがでしょうか？

今回は以上です！

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up