More than 5 years have passed since last update.

pythonで音声をいじったり出力したりする方法まとめ

Last updated at 2019-03-01Posted at 2019-03-01

目的

ディープラーニングで声を変換したくて、とりあえず音声をpythonでいじったりする方法を調べたので（自分用に）まとめる。

まとめること一覧

形式変換して保存
メル周波数ケプストラム係数（MFCC）
録音してある音声の出力
リアルタイム入出力

形式変換して保存

import pydub

sound = pydub.AudioSegment.from_file('ファイル', '拡張子')
sound.export('ファイル.wav', format='wav')


# 以下ちょっとした音声情報を調べる
from matplotlib import pyplot as plt

channel_count = sound.channels
fps = sound.frame_rate
duration = sound.duration_seconds

print(channel_count)    # チャンネル数(1:mono, 2:stereo)
print(fps)              # サンプルレート(Hz)
print(duration)         # 再生時間(秒)

# 波形を表示
samples = sound.get_array_of_samples()
plt.plot(samples)
plt.show()

メル周波数ケプストラム係数（MFCC）

import librosa

x, sampling_rate = librosa.load('ファイル.wav')
mfcc = librosa.feature.mfcc(x, sr=sampling_rate)


# 以下視覚化
from librosa import display
from matplotlib import pyplot as plt
import numpy as np

display.specshow(mfcc, sr=sampling_rate, x_axis='time')
plt.colorbar()
plt.show()

# mfccはnumpy arrayなのでnumpyでいろいろできる
print(mfcc.shape)

録音してある音声の出力

import pydub
from pydub.playback import play

sound = pydub.AudioSegment.from_wav('ファイル.wav')
play(sound)

リアルタイム入出力

import pyaudio

chunk = 1024 # * 2    処理の重さによって値を変える
sr = 48000 # 小さくしていくと音質が悪くなる
speaker = pyaudio.PyAudio()

stream = speaker.open(
    format=pyaudio.paInt16,
    channels=1,
    rate=sr,
    frames_per_buffer=chunk,
    input=True,
    output=True
)

while stream.is_active():
    I = stream.read(chunk)
    # I = 何かしらの処理(I) 処理を加えるときはchunkをいくらか大きくする
    O = stream.write(I)

stream.stop_stream()
stream.close()
speaker.terminate()

以上。またいろいろわかり次第まとめていく。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up