1
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

Python音声処理&可視化チートシート

Last updated at Posted at 2024-01-12

概要

音声処理関連のチートシートです。書き途中。今後随時更新予定。

音声の読み込み

_, x = scipy.io.wavfile.read("read_audio.wav")

音声の書き出し

scipy.io.wavfile.write(
    filename="write_audio.wav",
    rate=16000,
    data=x,
)

音声波形の可視化

1つ

fig, ax = plt.subplots(figsize=(8,2))
librosa.display.waveshow(audio00, sr=16000)

ax.set_title("audio00")
ax.set_ylabel("Amplitude")

plt.tight_layout()

複数を縦に並べる

row = 2
fig, ax = plt.subplots(row, 1, figsize=(8,2*row))
librosa.display.waveshow(audio00, sr=16000, ax=ax[0])
librosa.display.waveshow(audio01, sr=16000, ax=ax[1])

ax[0].set_title("audio00")
ax[1].set_title("audio01")

for a in ax:
    a.set_ylabel("Amplitude")
plt.tight_layout()

音声の埋め込み

print("audio00")
IPython.display.display(IPython.display.Audio(audio00, rate=16000))

print("audio01")
IPython.display.display(IPython.display.Audio(audio01, rate=16000))

音声の対数振幅スペクトログラムとGriffin-Limアルゴリズム

シンプルに使いたい場合

クラスとして扱いたい場合

class Converter:
    def __init__(self, ref=1.0):
        self.ref = ref

    def logamplitudespectrum(self, audio):
        frame_shift = int(16000 * 0.005)
        n_fft = 2048

        X = librosa.stft(
            audio,
            n_fft=n_fft,
            win_length=n_fft,
            hop_length=frame_shift,
            window="hann",
            center=False,
        )
        spec = np.abs(X)
        logspec = librosa.amplitude_to_db(spec, ref=self.ref)
        return logspec

    def griffinlim_and_pad(self, logspec, audio_size):
        frame_shift = int(16000 * 0.005)

        spec = librosa.db_to_amplitude(logspec, ref=self.ref)
        audio = librosa.griffinlim(spec, hop_length=frame_shift, n_iter=200)

        if audio.shape[0] < audio_size:
            num_pad = audio_size - audio.shape[0]
            num_pad_top = num_pad // 2
            num_pad_bottom = num_pad - num_pad_top
            audio = np.concatenate(
                [
                    np.zeros(num_pad_top, dtype=np.float32),
                    audio,
                    np.zeros(num_pad_bottom, dtype=np.float32),
                ]
            )
        else:
            audio = audio[:audio_size]

        return audio

converter = Converter()

Griffin-Limのアルゴリズムに関して、そのままだと左詰めで復元されるため適切なパディングが必要

対数振幅スペクトログラム

logspec = converter.logamplitudespectrum(potential_adv)
fig, ax = plt.subplots(1, 1, figsize=(8, 4), sharex=True)
img = librosa.display.specshow(
    logspec,
    hop_length=int(16000 * 0.005),
    sr=16000,
    x_axis="time",
    y_axis="hz",
    ax=ax,
)
fig.colorbar(img, ax=ax, format="%+2.f dB")
ax.set_xlabel("Time [sec]")
ax.set_ylabel("Frequency [Hz]")
fig.savefig("log_amplitude_spectrogram.png")

Griffin-Limのアルゴリズム

audio = converter.griffinlim_and_pad(logspec, audio.shape[0])

音声の周波数変換

1
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?