More than 1 year has passed since last update.

音楽の楽器ごとの分割と音声分離 / 合成：spleeterツールの活用方法

Last updated at 2023-08-09Posted at 2023-08-09

0. 背景

ゲームセンターのドラムマニアで楽しむドラム。多くの人が思う「あの曲もこのゲームでプレイできたら…」という思いを、私も持っていました。
そこで、曲を自分でカスタマイズしてみる方法はないかと探してみると、SpleeterというOSSが目に留まりました。
このツールは楽器の音を分割することができるのですが、完璧に分割できるわけではありません。特に、楽器の音が重なっている部分では、ドラムの音が消えてしまったり、他の楽器の音に埋もれてしまうことがありました。

そこで、ChatGPTのアドバイスを参考に、クリアな音の抽出を試みました。そして、成功した音声を再合成することにチャレンジします。

1. 処理の流れの説明

このセクションでは、処理の全体の流れが説明されています。具体的には、以下のステップが実施されます。
0. 対象ファイルをアップロード

音楽を楽器ごとに分割
ドラム音とその他の抽出
わかりやすい音のみで合成

2. Spleeterのインストール

このセクションでは、音楽を楽器ごとに分割するためのツール「spleeter」をインストールします。Spleeterは、機械学習を使用して音楽を構成する各楽器の音を分割できるライブラリです。

3. サンプル音楽ファイルのダウンロード

ここでは、Spleeterで分割するためのサンプル音楽ファイルをダウンロードします。インターネットから直接ダウンロードするコマンドが実行されています。

4. ファイルパスの設定

ダウンロードした音楽ファイルのパスを設定します。このパスは後の処理で使用されます。

5. Spleeterを使用して楽器ごとに音楽を分割

最後に、Spleeterを使用して音楽ファイルを楽器ごとに分割します。この際、5つの部分（5stems）に分割する設定がされています。

このノートブックは、音楽の構造を理解し、特定の楽器の音だけを取り出す実験を支援するものと考えられます。例えば、ドラムの音だけを取り出して、新しい音楽を作成するような応用が考えられるでしょう。

6. 分割したデータを再合成

Spleeterによって分割した音声データから、必要そうな音声データのみに絞り合成します。
音声は波なので単純な加算(+)で対応します。

以下の処理順で実施

-- 0. 対象ファイルをアップロード
-- 1. 音楽を楽器ごとに分割
-- 2. ドラム音とその他の抽出
-- 3. わかりやすい音のみで合成

!pip install spleeter

Collecting spleeter
  Downloading spleeter-2.4.0-py3-none-any.whl (49 kB)
...
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
fiona 1.9.4.post1 requires click~=8.0, but you have click 7.1.2 which is incompatible.
flask 2.2.5 requires click>=8.0, but you have click 7.1.2 which is incompatible.
pip-tools 6.13.0 requires click>=8, but you have click 7.1.2 which is incompatible.
tensorflow-datasets 4.9.2 requires protobuf>=3.20, but you have protobuf 3.19.6 which is incompatible.
tensorflow-metadata 1.13.1 requires protobuf<5,>=3.20.3, but you have protobuf 3.19.6 which is incompatible.[0m[31m
[0mSuccessfully installed click-7.1.2 ffmpeg-python-0.2.0 flatbuffers-1.12 google-auth-oauthlib-0.4.6 h11-0.12.0 h2-4.1.0 hpack-4.0.0 httpcore-0.13.7 httpx-0.19.0 hyperframe-6.0.1 keras-2.9.0 keras-preprocessing-1.1.2 norbert-0.2.1 protobuf-3.19.6 rfc3986-1.5.0 spleeter-2.4.0 tensorboard-2.9.1 tensorboard-data-server-0.6.1 tensorboard-plugin-wit-1.8.1 tensorflow-2.9.3 tensorflow-estimator-2.9.0 typer-0.3.2

!wget https://github.com/deezer/spleeter/raw/master/audio_example.mp3

--2023-08-09 05:28:55--  https://github.com/deezer/spleeter/raw/master/audio_example.mp3
Resolving github.com (github.com)... 192.30.255.112
Connecting to github.com (github.com)|192.30.255.112|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/deezer/spleeter/master/audio_example.mp3 [following]
--2023-08-09 05:28:56--  https://raw.githubusercontent.com/deezer/spleeter/master/audio_example.mp3
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.111.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 262867 (257K) [audio/mpeg]
Saving to: ‘audio_example.mp3’

audio_example.mp3   100%[===================>] 256.71K  --.-KB/s    in 0.02s   

2023-08-09 05:28:56 (11.4 MB/s) - ‘audio_example.mp3’ saved [262867/262867]

your_file_path = "./audio_example.mp3"

!spleeter separate -p spleeter:5stems -o output $your_file_path

INFO:spleeter:Downloading model archive https://github.com/deezer/spleeter/releases/download/v1.4.0/5stems.tar.gz
INFO:spleeter:Validating archive checksum
INFO:spleeter:Extracting downloaded 5stems archive
INFO:spleeter:5stems model file(s) extracted
INFO:spleeter:File output/audio_example/vocals.wav written succesfully
INFO:spleeter:File output/audio_example/piano.wav written succesfully
INFO:spleeter:File output/audio_example/bass.wav written succesfully
INFO:spleeter:File output/audio_example/drums.wav written succesfully
INFO:spleeter:File output/audio_example/other.wav written succesfully

output配下に対象ファイル名フォルダが作成され、音データが分割される

music_file_name = your_file_path.split('/')[-1].split('.')[0]
print(f"music_file_name({music_file_name})")

music_file_name(audio_example)

# 自分の欲しい音ファイルを指定
inputs = [
    # f"./output/{music_file_name}/other.wav",
    f"./output/{music_file_name}/drums.wav",
    # f"./output/{music_file_name}/bass.wav",
    f"./output/{music_file_name}/piano.wav",
    f"./output/{music_file_name}/vocals.wav",
]

import moviepy.editor as mp

import numpy as np
import librosa
import librosa.display
import scipy.signal

import soundfile as sf

y_all = None
for input_file_path in inputs:
    y, sr = librosa.load(input_file_path, sr=None)
    if y_all is None:
        y_all = y
        pass
    else:
        y_all += y
        pass

sf.write(file=f"mixed_sounds_{music_file_name}.wav", data=y_all, samplerate=sr)

分割データで合成したファイルを使用し、別処理の抽出ロジックを実施

audio_path = f"mixed_sounds_{music_file_name}.wav"

# Load the audio file
y, sr = librosa.load(audio_path, sr=None)

# Design a high-pass filter
nyquist = 0.5 * sr
low = 300 / nyquist
b, a = scipy.signal.butter(4, low, btype='high')

# Apply the filter to the audio signal
filtered_audio = scipy.signal.filtfilt(b, a, y)

# Save the filtered audio
filtered_audio_path = audio_path.replace('.wav', '_filter.wav')
print(f"filtered_audio_path({filtered_audio_path})")

filtered_audio_path(mixed_sounds_audio_example_filter.wav)

sf.write(file=filtered_audio_path, data=filtered_audio, samplerate=sr)

検証用

以下の表示で音声再生ができる。
実際に自分の欲しい音声で調整して確認することができます。

import IPython.display
from IPython.display import display

display(IPython.display.Audio(filtered_audio, rate=sr))

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up