More than 1 year has passed since last update.

β-divergence規範のSSNMFを用いた単一チャネル音源分離

Last updated at 2023-11-13Posted at 2020-05-10

SSNMFによるシングルチャネル音声からの音源分離について。
(原論文ではSNMFと表記されていますが、SNMFについては他にも手法が存在するようで、区別するために本記事ではSSNMFで表記させていただきます。)

はじめに

研究で必要となり、単一のマイクで取得した音声からある成分のみを抽出する手法を探していたところ、非負値行列因子分解（NMF: Non-negative Matrix Factorization)による手法を見つけました。
その中でも、抽出したい対象が決まっている場合、その基底周波数なるものを事前に学習しておくことで、それに対応する音声を分離する手法として半教師ありNMF (Semi-Supervised NMF : SSNMF) というものがあります。
DNNによる音源分離など、深層学習を用いたものも最近は多い印象ですが、NMFによる手法も一般的な手法として用いられています。
以下の（個人的にいつも助けられている）slideshare¹にもわかりやすい記載があります。

このSSNMFに関しては、現状使用できるpythonライブラリは公開されておらず、ネット上に落ちているコードもユークリッド距離規準(EUC)のSSNMFアルゴリズムのみとなります。
一方、SSNMFを用いた文献を見ていくと、ユークリッド距離規準のものよりも、Itakura-Saito(IS) divergenceおよびneralized Kullback-Leibler(I) divergenceによるSSNMF(もしくはNMF)アルゴリズムの方が、より良い分離性能を示すことが多いようです。

そこで、本記事では、この二つのアルゴリズムについて実装を行うこと、また、原論文²³内で述べられている直交化罰則条件付きについても実装を行うことを目指します。(こちらのコード⁴をもとに処理を追加してます。)

SSNMFとは

まずは、NMFについて説明します。
NMFは、与えられたデータ行列$X=[\boldsymbol{x}_1,\cdots,\boldsymbol{x}_N]\in \mathbb{R^{}}^{M \times N}$に対して、指定した基底数Kのもとで非負の基底行列$F=[\boldsymbol{f}_1,\cdots,\boldsymbol{f}_K]\in\mathbb{R^{}}^{M \times K}$と非負の結合係数行列$G=[\boldsymbol{g}_1,\cdots,\boldsymbol{g}_N]\in\mathbb{R^{}}^{K \times N}$により

    X \simeq FG

という行列分解を行うものです。
これに対し、SSNMFでは、
観測した雑音重畳音声の振幅スペクトログラムを$Y_{\textit{train}}=[\boldsymbol{y}_1,\cdots,\boldsymbol{y}_N]\in \mathbb{R}^{M \times N}$としたときに、通常のNMFで学習済みの分離対象の非負の基底行列$F_{train}=[\boldsymbol{f}_1,\cdots,\boldsymbol{f}_{K_{\textit{t}}}]\in\mathbb{R^{}}^{M \times {K_{\textit{t}}}}$を用いることで、

    Y \simeq HU + F_{train}G

という分解を行います。
このとき、$H=[\boldsymbol{h}_1,\cdots,\boldsymbol{h}_{K_{\textit{n}}}]\in\mathbb{R}^{M \times K_{\textit{n}}}$と$U=[\boldsymbol{u}_1,\cdots,\boldsymbol{u}_N]\in\mathbb{R}^{K_{\textit{n}} \times N}$は分離対象音声以外の音声（つまり雑音）に対する基底行列およびそれに対応する結合係数行列です。
つまり、$F_{train}G$のみを抽出し、スペクトログラムから音声へ逆変換を行えば、目的音声を分離できるということです。

この行列分解のための目的関数としては，$Y$と$HU+FG$に対して$\beta$-divergenceによる距離をあてがったものがよく用いられます。
$y$の$x$に対する$\beta$-divergence規準の擬距離$D_\beta\mathit{(y|x)}$は，次のように定式化されます。

    D_{\beta}\mathit{(y|x)} = \frac{y^{\beta}}{\beta(\beta-1)} + \frac{x^{\beta}}{\beta} - \frac{yx^{\beta-1}}{\beta-1}

このβについて、β=0がIS-divergence, β=1がI-divergence, β=2がEUC-distanceに、それぞれ当てはまります。

そして、いろいろやった結果(²³)、目的関数を最小化するための更新式は以下のようになります。

    H_{\omega,k_{\textit{n}}}^{[n+1]} = H_{\omega,k_{\textit{n}}}^{[n]}\Biggl(\frac{\sum_{t}(\sum_{k_{\textit{n}}}H_{\omega,k_{\textit{n}}}U_{k_{\textit{n}},t} + \sum_{k_{\textit{t}}}F_{\omega,k_{\textit{t}}}G_{k_{\textit{t}},t})^{\beta-2}Y_{\omega,t}U_{k_{\textit{n}},t}}{\sum_{t}(\sum_{k_{\textit{n}}}H_{\omega,k_{\textit{n}}}U_{k_{\textit{n}},t} + \sum_{k_{\textit{t}}}F_{\omega,k_{\textit{t}}}G_{k_{\textit{t}},t})^{\beta-1}U_{k_{\textit{n}},t}}\Biggr)

    {U}_{k_{\textit{n}},t}^{[n+1]} = {U}_{k_{\textit{n}},t}^{[n]}\Biggl(\frac{\sum_{\omega}(\sum_{k_{\textit{n}}}{H}_{\omega,k_{\textit{n}}}{U}_{k_{\textit{n}},t} + \sum_{k_{\textit{t}}}{F}_{\omega,k_{\textit{t}}}{G}_{k_{\textit{t}},t})^{\beta-2}Y_{\omega,t}{H}_{\omega,k_{\textit{n}}}}{\sum_{\omega}(\sum_{k_{\textit{n}}}{H}_{\omega,k_{\textit{n}}}{U}_{k_{\textit{n}},t} + \sum_{k_{\textit{t}}}{F}_{\omega,k_{\textit{t}}}{G}_{k_{\textit{t}},t})^{\beta-1}{H}_{\omega,k_{\textit{n}}}}\Biggr)

    {G}_{k_{\textit{t}},t}^{[n+1]} = {G}_{k_{\textit{t}},t}^{[n]}\Biggl(\frac{\sum_{\omega}(\sum_{k_{\textit{n}}}{H}_{\omega,k_{\textit{n}}}{U}_{k_{\textit{n}},t} + \sum_{k_{\textit{t}}}{F}_{\omega,k_{\textit{t}}}{G}_{k_{\textit{t}},t})^{\beta-2}Y_{\omega,t}{F}_{\omega,k_{\textit{t}}}}{\sum_{\omega}(\sum_{k_{\textit{n}}}{H}_{\omega,k_{\textit{n}}}{U}_{k_{\textit{n}},t} + \sum_{k_{\textit{t}}}{F}_{\omega,k_{\textit{t}}}{G}_{k_{\textit{t}},t})^{\beta-1}{F}_{\omega,k_{\textit{t}}}}\Biggr)

また、直交化罰則条件付きSSNMFについては、目的関数に対し以下の制約を付けます。(μは重み係数です)

    J_\beta^{*}(\theta) =\sum_{\omega,t}D_\beta\Bigl(Y_{\omega,t} \Bigl|\sum_{k_{\textit{n}}}({H}_{\omega,k_{\textit{n}}}{U}_{k_{\textit{n}},t})+\sum_{k_{\textit{t}}}({F}_{\omega,k_{\textit{t}}}{G}_{k_{\textit{t}},t})\Bigr) + \mu\sum_{\omega,k_{\textit{n}},k_{\textit{t}}}\Bigl({F}_{\omega,k_{\textit{t}}}{H}_{\omega,k_{\textit{n}}}\Bigr)^2

この場合の更新式は、$H$のみ変わります。

    {H}_{\omega,k_{\textit{n}}}^{[n+1]} = {H}_{\omega,k_{\textit{n}}}^{[n]}\Biggl(\frac{\sum_{t}(\sum_{k_{\textit{n}}}{H}_{\omega,k_{\textit{n}}}{U}_{k_{\textit{n}},t} + \sum_{k_{\textit{t}}}{F}_{\omega,k_{\textit{t}}}{G}_{k_{\textit{t}},t})^{\beta-2}Y_{\omega,t}{U}_{k_{\textit{n}},t}}{\sum_{t}(\sum_{k_{\textit{n}}}{H}_{\omega,k_{\textit{n}}}{U}_{k_{\textit{n}},t} + \sum_{k_{\textit{t}}}{F}_{\omega,k_{\textit{t}}}{G}_{k_{\textit{t}},t})^{\beta-1}{U}_{k_{\textit{n}},t}+\mu{H}_{\omega,k_{\textit{n}}}\sum_{k_{\textit{t}}}{F}_{\omega,k_{\textit{t}}}^2}\Biggr)

実装

まずは定義部分（こちらのコードを参考にさせていただいています。）

import numpy as np
from sklearn.decomposition import NMF as nmf

def for_loss_data(Y, Yh):
    Yh_data = Yh.ravel()
    Y_data = Y.ravel()
    eps = np.spacing(1)
    indices = Y_data > eps
    Yh_data = Yh_data[indices]
    Y_data = Y_data[indices]
    Yh_data[Yh_data == 0] = eps
    return Y_data,Yh_data

def KL_divergence(Y, Yh, H, U, F=[], G=[]):
    Y_data,Yh_data =for_loss_data(Y,Yh)
    # fast and memory efficient computation of np.sum(np.dot(W, H))
    if len(F) == 0:
        sum_Yh = np.sum(np.dot(H,U))
    else:
        sum_Yh = np.sum(np.dot(H,U)+np.dot(F,G))
    # computes np.sum(X * log(X / WH)) only where X is nonzero
    div = Y_data / Yh_data
    res = np.dot(Y_data, np.log(div))
    # add full np.sum(np.dot(W, H)) - np.sum(X)
    res += sum_Yh - Y_data.sum()
    return res

def IS_divergence(Y, Yh):
    Y_data, Yh_data =for_loss_data(Y, Yh)
    div = Y_data / Yh_data
    res = np.sum(div) - np.product(Y.shape) - np.sum(np.log(div))
    return res

def euclid_divergence(Y, Yh):
    d = 1 / 2 * (Y ** 2 + Yh ** 2 - 2 * Y * Yh).sum()
    return d

def sigma_2matrix(P, S):
    sum_ = 0
    for i in range(len(P)):
        sum_ += P[i] * S[i]
    return sum_

def sigma_3matrix(P, S, U):
    sum_ = 0
    for i in range(len(P)):
        sum_ += P[i] * S[i] * U[i]
    return sum_

def SSNMF(Y, R=3, n_iter=500, F=[], init_G=[], init_H=[], init_U=[], beta=0, p = False, verbose=False):
    eps = np.spacing(1)
    pena = 1.0 * 10**20

    # size of input spectrogram
    M = Y.shape[0];
    N = Y.shape[1];
    X = F.shape[1]

    # initialization
    if len(init_G):
        G = init_G
        X = init_G.shape[1]
    else:
        G = np.random.rand(X, N)

    if len(init_U):
        U = init_U
        R = init_U.shape[0]
    else:
        U = np.random.rand(R, N)

    if len(init_H):
        H = init_H;
        R = init_H.shape[1]
    else:
        H = np.random.rand(M, R)

    # array to save the value of the euclid divergence
    cost = np.zeros(n_iter)
    # computation of Lambda (estimate of Y)
    Lambda = np.dot(F, G) + np.dot(H, U)
    # Set the basis punitive factor

IS-divergence(β=0)に対する更新式

    # IS-NMF
    if beta == 0:
    # iterative computation
        for it in range(n_iter):
            if verbose == True:
                print("Iter number : {}, Cost : {}".format(it, cost[it]))

            pow_2 = np.power(np.dot(H, U) + np.dot(F, G) + eps, -2)
            pow_1 = np.power(np.dot(H, U) + np.dot(F, G) + eps, -1)
            H_copy = H.copy()
            U_copy = U.copy()
            G_copy = G.copy()

            # update of H
            if p == False:
                H = H_copy * ((np.dot(pow_2 * Y, U_copy.T) + eps) / (np.dot(pow_1, U_copy.T) + eps))
            elif p == True:
                F_pow_2 = np.ones(H_copy.shape)
                for w in range(F_pow_2.shape[0]):
                    F_pow_2[w][:] = np.power(F[w], 2).sum()
                H = H_copy * ((np.dot(pow_2 * Y, U_copy.T) + eps) / ((np.dot(pow_1, U_copy.T) + eps) + (pena * H_copy) * F_pow_2))
            else :
                print("error")

            # update of U
            U = U_copy * (np.dot(H_copy.T, pow_2 * Y) / (np.dot(H_copy.T, pow_1) + eps))

            # update of G
            G = G_copy * (np.dot(F.T, pow_2 * Y) / (np.dot(F.T, pow_1) + eps))

            # recomputation of Lambda (estimate of V)
            Lambda = np.dot(H, U) + np.dot(F, G)

            # compute IS divergence
            cost[it] = IS_divergence(Y, Lambda)

        return [F, G, H, U, cost]

続いてI-divergence(β=1)に対する更新式

    # KL-NMF
    if beta == 1:
    # iterative computation
        for it in range(n_iter):
            if verbose == True:
                print("Iter number : {}, Cost : {}".format(it, cost[it]))

            pow_1 = np.power(np.dot(H, U) + np.dot(F, G) + eps, -1)
            pow_0 = np.ones(pow_1.shape)
            H_copy = H.copy()
            U_copy = U.copy()
            G_copy = G.copy()

            # update of H
            if p == False:
                H = H_copy * ((np.dot(pow_1 * Y, U_copy.T) + eps) / (np.dot(pow_0, U_copy.T) + eps))
            elif p == True:
                F_pow_2 = np.ones(H_copy.shape)
                for w in range(F_pow_2.shape[0]):
                    F_pow_2[w][:] = np.power(F[w], 2).sum()
                H = H_copy * ((np.dot(pow_1 * Y, U_copy.T) + eps) / ((np.dot(pow_0, U_copy.T) + eps) + (pena * H_copy) * F_pow_2))
            else :
                print("error")

            # update of U
            U = U_copy * (np.dot(H_copy.T, pow_1 * Y) / (np.dot(H_copy.T, pow_0) + eps))

            # update of G
            G = G_copy * (np.dot(F.T, pow_1 * Y) / (np.dot(F.T, pow_0) + eps))

            # recomputation of Lambda (estimate of V)
            Lambda = np.dot(H, U) + np.dot(F, G)

            # compute KL divergence
            cost[it] = KL_divergence(Y, Lambda, H, U, F=F, G=G)

        return [F, G, H, U, cost]

最後にEUC-distance(β=2)に対する更新式

    # Euclid-NMF
    if beta == 2:
        # iterative computation
            for it in range(n_iter):
                if verbose == True:
                    print("Iter number : {}, Cost : {}".format(it, cost[it]))

                pow_plus1 = np.power(np.dot(H, U) + np.dot(F, G) + eps, 1)
                pow_0 = np.ones(pow_plus1.shape)
                H_copy = H.copy()
                U_copy = U.copy()
                G_copy = G.copy()

                # update of H
                if p == False:
                    H = H_copy * ((np.dot(pow_0 * Y, U_copy.T) + eps) / (np.dot(pow_plus1, U_copy.T) + eps))
                elif p == True:
                    F_pow_2 = np.ones(H_copy.shape)
                    for w in range(F_pow_2.shape[0]):
                        F_pow_2[w][:] = np.power(F[w], 2).sum()
                    H = H_copy * ((np.dot(pow_0 * Y, U_copy.T) + eps) / ((np.dot(pow_plus1, U_copy.T) + eps) + (pena * H_copy) * F_pow_2))
                else :
                    print("error")

                # update of U
                U = U_copy * (np.dot(H_copy.T, pow_0 * Y) / (np.dot(H_copy.T, pow_plus1) + eps))

                # update of G
                G = G_copy * (np.dot(F.T, pow_0 * Y) / (np.dot(F.T, pow_plus1) + eps))

                # recomputation of Lambda (estimate of V)
                Lambda = np.dot(H, U) + np.dot(F, G)

                # compute euclid divergence
                cost[it] = euclid_divergence(Y, Lambda)

            return [F, G, H, U, cost]

それぞれp=Trueに設定すると直行化罰則条件付きのSSNMFに切り替わるようになっています。

確認

一応確認してみます。（こちらから引用）

from nmf import NMF, SSNMF
import numpy as np

np.random.seed(1)
comps = np.array(((1,0), (0,1), (1,1)))
activs = np.array(((0,0,1,0,1,5,0,7,9,6,5,0), (2,1,0,1,1,2,1,0,0,0,6,0)))
Y = np.dot(comps, activs)

print('original data\n---------------')
print('components:\n', comps)
print('activations:\n', activs)
print('Y:\n', Y)

#betaの値を0or1に設定
computed = SSNMF(Y, F=np.array(((1,0,1),)).T, R=2, beta=0)

print('\ndecomposed\n---------------')
print('F:\n', computed[0])
print('G:\n', computed[1])
print('H:\n', computed[2])
print('U:\n', computed[3])
print('FG + HU:\n', np.dot(computed[0], computed[1]) + np.dot(computed[2], computed[3]))

β=0の場合

original data
---------------
components:
 [[1 0]
 [0 1]
 [1 1]]
activations:
 [[0 0 1 0 1 5 0 7 9 6 5 0]
 [2 1 0 1 1 2 1 0 0 0 6 0]]
Y:
 [[ 0  0  1  0  1  5  0  7  9  6  5  0]
 [ 2  1  0  1  1  2  1  0  0  0  6  0]
 [ 2  1  1  1  2  7  1  7  9  6 11  0]]
SSNMF extract

decomposed
---------------
F:
 [[1]
 [0]
 [1]]
G:
 [[ 0.00  0.00  0.00  0.00  0.15  0.56  0.00  4.70  3.37  5.43  2.01  0.00]]
H:
 [[ 1.72  0.00]
 [ 0.00  2.35]
 [ 1.72  2.35]]
U:
 [[ 0.00  0.00  0.58  0.00  0.50  2.59  0.00  1.34  3.28  0.33  1.74  0.00]
 [ 0.85  0.43  0.00  0.43  0.43  0.85  0.43  0.00  0.00  0.00  2.55  0.00]]
FG + HU:
 [[ 0.00  0.00  1.00  0.00  1.00  5.00  0.00  7.00  9.00  6.00  5.00  0.00]
 [ 2.00  1.00  0.00  1.00  1.00  2.00  1.00  0.00  0.00  0.00  6.00  0.00]
 [ 2.00  1.00  1.00  1.00  2.00  7.00  1.00  7.00  9.00  6.00  11.00
   0.00]]

β=1の場合

original data
---------------
components:
 [[1 0]
 [0 1]
 [1 1]]
activations:
 [[0 0 1 0 1 5 0 7 9 6 5 0]
 [2 1 0 1 1 2 1 0 0 0 6 0]]
Y:
 [[ 0  0  1  0  1  5  0  7  9  6  5  0]
 [ 2  1  0  1  1  2  1  0  0  0  6  0]
 [ 2  1  1  1  2  7  1  7  9  6 11  0]]
SSNMF extract

decomposed
---------------
F:
 [[1]
 [0]
 [1]]
G:
 [[ 0.00  0.00  0.00  0.00  0.13  0.41  0.00  3.68  2.24  1.63  1.72  0.00]]
H:
 [[ 1.87  0.00]
 [ 0.00  1.66]
 [ 1.87  1.66]]
U:
 [[ 0.00  0.00  0.53  0.00  0.47  2.46  0.00  1.78  3.62  2.34  1.76  0.00]
 [ 1.21  0.60  0.00  0.60  0.60  1.21  0.60  0.00  0.00  0.00  3.63  0.00]]
FG + HU:
 [[ 0.00  0.00  1.00  0.00  1.00  5.00  0.00  7.00  9.00  6.00  5.00  0.00]
 [ 2.00  1.00  0.00  1.00  1.00  2.00  1.00  0.00  0.00  0.00  6.00  0.00]
 [ 2.00  1.00  1.00  1.00  2.00  7.00  1.00  7.00  9.00  6.00  11.00
   0.00]]

とりあえずはちゃんと更新できていることがわかりました。

https://www.slideshare.net/DaichiKitamura/acoustic-modeling-in-audio-source-separation ↩
β-divergence 規範による罰則条件付き教師あり非負値行列因子分解を用いた目的楽器音抽出, https://library.naist.jp/mylimedio/search/book.do?target=local&bibid=59439&lang=ja&charset=utf8&dimode=on ↩ ↩²
Music signal separation by orthogonality and maximum-distance constrained nonnegative matrix factorization with target signal information, https://library.naist.jp/dspace/handle/10061/9213 ↩ ↩²
https://keik.github.io/battlefield-acoustics/bss-with-ssnmf.html ↩

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up