More than 1 year has passed since last update.

音声ファイルの分析

Posted at 2024-04-10

はじめに

以下のライブラリを用いて音楽ファイルあれやこれやしてみたいと思います。

カテゴリ	用語	説明
ライブラリ	pydub	音声ファイルの操作や分析を行うPythonライブラリ
ライブラリ	librosa	音楽・音声信号処理のためのPythonライブラリ

前準備

googlecolabでファイルを読み込むための前準備を行います。


from google.colab import drive
drive.mount('/content/drive')

%cd "/content/drive/My Drive/Colab Notebooks/file"

!pip install pydub
!pip install librosa

ファイルの読み込み


# 関連ライブラリのインポート
from __future__ import print_function
import numpy as np
import matplotlib.pyplot as plt
import IPython.display
import librosa
import librosa.display

#ファイルの選択
file_name = '/content/drive/My Drive/Colab Notebooks/file/XXXX.m4a'
y, sr = librosa.load(file_name, sr=44100)

#波形の振幅エンベロープをプロット
plt.figure()
plt.figure(figsize=(15, 5))
librosa.display.waveplot(y, sr)

#描画
plt.show()

周波数と位相（の変化）の解析


#Short-time Fourier transform (短時間フーリエ変換)
#音声など時間変化する信号の周波数と位相（の変化）の解析

stft_result = librosa.stft(y)
abs_result = np.abs(stft_result)
power_spec = librosa.amplitude_to_db(abs_result, ref=np.max)

plt.figure(figsize=(20,5))
librosa.display.specshow(power_spec, y_axis='log', x_axis='time', sr = sr)
plt.title('Power Spectrogram')
plt.colorbar(format='%+2.0f dB')

plt.tight_layout()
plt.show()

メルスペクトログラムの可視化

次に、メルスペクトログラムと呼ばれる人間の聴覚特性に基づいた周波数スケール（メルスケール）を表示します。


#mer spectrogram 周波数の分布を求める
S = librosa.feature.melspectrogram(y, sr=sr, n_mels=128)
log_S = librosa.amplitude_to_db(S, ref=np.max)

plt.figure(figsize=(20, 5))
librosa.display.specshow(log_S, sr=sr, x_axis='time', y_axis='mel')
plt.title('mel power spectrogram')
plt.colorbar(format='%02.0f dB')
plt.tight_layout()

和声と打楽器の分離


#ハーモニー部分とパーカッション部分の分離
y_harmonic, y_percussive = librosa.effects.hpss(y)

H = librosa.feature.melspectrogram(y_harmonic, sr=sr, n_mels=128)
log_H = librosa.amplitude_to_db(H, ref=np.max)

#ハーモニー部分描画
plt.figure(figsize=(20, 5))
librosa.display.specshow(log_H, sr=sr, x_axis='time', y_axis='mel')
plt.title('harmonic db spectrogram')
plt.colorbar(format='%02.0f dB')
plt.tight_layout()

P = librosa.feature.melspectrogram(y_percussive, sr=sr, n_mels=128)
log_P = librosa.amplitude_to_db(P, ref=np.max)

#パーカッション部分描画
plt.figure(figsize=(20, 5))
librosa.display.specshow(log_P, sr=sr, x_axis='time', y_axis='mel')
plt.title('percussive db spectrogram')
plt.colorbar(format='%02.0f dB')
plt.tight_layout()

和音の抽出


#Chroma（和音）を抽出する
C = librosa.feature.chroma_cqt(y=y_harmonic, sr=sr)

#可視化
plt.figure(figsize=(20,5))
librosa.display.specshow(C, sr=sr, x_axis='time', y_axis='chroma', vmin=0, vmax=1)
plt.title('Chromagram')
plt.colorbar()
plt.tight_layout()

オンセット（強調箇所）を導出


#音声データからオンセット（強調箇所）を導出。配列にして返却
onset_env = librosa.onset.onset_strength(y, sr=sr,aggregate=np.median)

#推定されるbpm(termpo)と、ビートイベントの位置（配列）を抽出
tempo, beats = librosa.beat.beat_track(onset_envelope=onset_env,sr=sr)

#描画
hop_length = 512
plt.figure(figsize=(20, 5))
times = librosa.times_like(onset_env, sr=sr, hop_length=hop_length)
plt.plot(times, librosa.util.normalize(onset_env),label='Onset strength')
plt.vlines(times[beats], 0, 1, alpha=0.5, color='r',linestyle='--', label='Beats')

plt.legend(frameon=True, framealpha=0.75)
# Limit the plot to a 45-second window
#plt.xlim(0, 45)
plt.gca().xaxis.set_major_formatter(librosa.display.TimeFormatter())
plt.tight_layout()
plt.show()

関連用語まとめ

カテゴリ	用語	説明
ライブラリ	PyWORLD	音声分析合成システムWORLDのPython実装。音声合成や音声変換時に有用。
ライブラリ	pyreaper	F0の推定や有声区間の調査などを行うことができる。
ライブラリ	soundfile	音声ファイルの読み書きを行うためのライブラリ。
ライブラリ	pysptk	音声信号処理に特化したPythonライブラリ。
ライブラリ	nnmnkwii	音声合成や声質変換に関するユーティリティを提供するライブラリ。
音声処理	スペクトログラム	音声のスペクトル情報を時間軸に沿って表示したもの。
音声処理	メルスペクトログラム	メル尺度を用いてスペクトログラムを表示したもの。人間の聴覚特性に近い。
音声処理	MFCC	Mel Frequency Cepstral Coefficientsの略。音声の特徴を表すパラメータの一つ。
音声処理	F0	音声の基本周波数。音の高さを決定する。
音声処理	スペクトル包絡	音声の周波数特性を平滑化して示したもの。
音声処理	非周期性指標	音声信号の非周期成分を表す指標。
音声処理	MCEPs	Mel Cepstral Coefficientsの略。音声信号のスペクトル包絡をメル尺度で圧縮したもの。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up