More than 1 year has passed since last update.

【Unity×C#×Caress.Unity(RNNoise)】マイクから取得したオーディオデータに対してリアルタイムにノイズリダクション処理をかける

Posted at 2023-12-24

概要

Unity上でマイクから取得したオーディオデータに対し、リアルタイムにノイズリダクション（背景ノイズの低減）処理をかけるデモです。
ノイズリダクションの処理にはRNNoiseのラッパーライブラリであるCaress.Unityを使用しています。

Caress.Unityとは

Caress.Unityは、Unity(C#)でのエンコーディング、デコーディング、ノイズリダクションを可能にするgolangのラッパーライブラリです。
この内、ノイズリダクションの機能として、RNNoiseというリカレントニューラルネットワークに基づくノイズ抑制ライブラリを使用しています。

RNNoise使用時の注意点として、48 kHzでサンプリングされたRWA 16ビットモノPCMデータに対して適切に機能します。試してはいませんが、条件が合わない場合、逆にノイズのような音が乗ってしまうなどの影響があるようです。

開発環境

Windows 10
Unity 2019.4.31f1
Api Compatibility Level .NET Standard 2.0

使用したパッケージ

Caress.Unity

リンク先からunitypackageをダウンロードして、事前にプロジェクトにインポートしておきます。
https://github.com/tkmn0/Caress.Unity

UniTask

リンク先からunitypackageをダウンロードして、事前にプロジェクトにインポートしておきます。
https://github.com/Cysharp/UniTask

実装

実装にあたりほとんど情報がなく、どうしたものかと思っていましたが、"_Examples"の中にサンプルSceneとスクリプトがあったので、そちらのコードを参考にしました。
以下のSceneファイルをUnityのHierarchyに追加して、動作確認ができます。
Assets/Caress/_Examples/_Scenes/1_NoiseReducerExample.unity

利用できるモデルにいくつか種類があるのですが、"Speech"というモデルが抜群に背景ノイズを消してくれるので、このモデルを使用しました。話している間はノイズが多少乗りますが、それ以外のときはキーボードのタイプ音などをほぼ完全に消してくれます。

NoiseReducerHandler.cs

サンプルにあったコードをほぼそのまま利用しています。

using Caress;

public class NoiseReducerHandler
{
    private const SampleRate SampleRate = Caress.SampleRate._48000;
    private const NumChannels NumChannels = Caress.NumChannels.Mono;
    private NoiseReducer _noiseReducer;
    public NoiseReducer NoiseReducer => _noiseReducer;

    public void SetConfig(NoiseReducerConfig config)
    {
        if (_noiseReducer != null)
        {
            _noiseReducer.Destroy();
        }

        _noiseReducer = new NoiseReducer(config);
    }

    private void OnEnable()
    {
        _noiseReducer = new NoiseReducer(new NoiseReducerConfig()
        {
            SampleRate = (int)SampleRate,
            NumChannels = (int)NumChannels,
            Attenuation = 20,
            Model = RnNoiseModel.Voice
        });
    }

    private void OnDisable()
    {
        _noiseReducer.Destroy();
        _noiseReducer = null;
    }

    public void ProcessPcm(float[] pcm)
    {
        _noiseReducer?.ReduceNoiseFloat(pcm, 0); // channelは0しかサポートしていないみたい。(0以外にするとUnityが落ちる)
    }
}

WavUtility.cs（微修正したFromAudioClipメソッド部分のみ）

WAVファイルとして保存するスクリプトにはWavUtilityを一部書き換えて利用しました。

注意：以下のスクリプトは一部分のみです。これだけでは機能しません。

    // すべてコメントアウト
    // public static byte[] FromAudioClip(AudioClip audioClip)
    // {
    // 	string file;
    // 	return FromAudioClip(audioClip, out file, false);
    // }

    // filepathがoutである必要はないので、outを削除
    public static byte[] FromAudioClip(AudioClip audioClip, string filepath, bool saveAsFile = true, string dirname = "recordings")
    {
        MemoryStream stream = new MemoryStream();

        const int headerSize = 44;

        // get bit depth
        UInt16 bitDepth = 16; //BitDepth (audioClip);

        // NB: Only supports 16 bit
        //Debug.AssertFormat (bitDepth == 16, "Only converting 16 bit is currently supported. The audio clip data is {0} bit.", bitDepth);

        // total file size = 44 bytes for header format and audioClip.samples * factor due to float to Int16 / sbyte conversion
        int fileSize = audioClip.samples * BlockSize_16Bit + headerSize; // BlockSize (bitDepth)

        // chunk descriptor (riff)
        WriteFileHeader(ref stream, fileSize);
        // file header (fmt)
        WriteFileFormat(ref stream, audioClip.channels, audioClip.frequency, bitDepth);
        // data chunks (data)
        WriteFileData(ref stream, audioClip, bitDepth);

        byte[] bytes = stream.ToArray();

        // Validate total bytes
        Debug.AssertFormat(bytes.Length == fileSize, "Unexpected AudioClip to wav format byte count: {0} == {1}", bytes.Length, fileSize);

        // Save file to persistant storage location
        if (saveAsFile)
        {
            // filepathは引数として読み込むので、コメントアウト
            // filepath = string.Format ("{0}/{1}/{2}.{3}", Application.persistentDataPath, dirname, DateTime.UtcNow.ToString ("yyMMdd-HHmmss-fff"), "wav");
            Directory.CreateDirectory(Path.GetDirectoryName(filepath));
            File.WriteAllBytes(filepath, bytes);
            //Debug.Log ("Auto-saved .wav file: " + filepath);
        }
        else
        {
            // コメントアウト
            // filepath = null;
        }

        stream.Dispose();

        return bytes;
    }

DemoRealTimeNoiseReduction.cs

実際にCaress.Unityを使うにあたってかなりしんどかった部分です。
これもサンプルのコードを参考に試行錯誤して書きました。
以下の点に注意が必要です。

Microphoneのプロセスは、メインプロセスとは非同期に処理されている。
AudioClipのデータ全てをUpdateメソッドのループごとに処理していては、無駄が多すぎて処理しきれないため、480 Hzごと分割し更新箇所のみ逐次処理する。

このデモでは、以前に書いたスクリプトを流用して、処理前（originalAudio）と処理後（NoiseReducedAudio）をそれぞれWAVファイルに保存して、あとから比較できるようにしています。

前回書いたスクリプトをベースにしているため、WAVファイルに保存していますが、オーディオデータを直接操作したければ、Updateメソッド内で完結します。
前回記事：
https://qiita.com/Gamo_2683/items/8a00dc15efd064e5d6b8

using System.Collections;
using System.Collections.Generic;
using System;
using System.Linq;
using UnityEngine;
using Cysharp.Threading.Tasks;
using Caress;

public class DemoRealTimeNoiseReduction : MonoBehaviour
{
    // Start is called before the first frame update
    private string microphone;
    private AudioClip microphoneInput;
    private const int RECORD_LENGTH_SEC = 10;
    private const int SAMPLE_RATE = 48000; // RNNoiseの仕様に合わせる
    private NoiseReducerHandler noiseReducerHandler; // ノイズリダクションハンドラー
    private int _clipHead = 0; // ノイズリダクションの処理位置を保存
    [SerializeField, Range(0f, 100.0f)] int CaressAttenuation = 20; // ノイズキャンセリングの処理の強さ。
    private readonly float[] _processBuffer = new float[480]; // ノイズリダクションの処理用のバッファ
    private readonly float[] _microphoneBuffer = new float[RECORD_LENGTH_SEC * SAMPLE_RATE];
    private readonly float[] _noiseReducedBuffer = new float[RECORD_LENGTH_SEC * SAMPLE_RATE];

    void Start()
    {
        // 利用可能なマイクを検出
        microphone = Microphone.devices.FirstOrDefault();
        Debug.Log("microphone: " + microphone);
        if (microphone == null)
        {
            Debug.LogError("No microphone found");
            return;
        }

        // 第二引数をtrueにすると循環バッファとしてループ保存
        microphoneInput = Microphone.Start(microphone, false, RECORD_LENGTH_SEC, SAMPLE_RATE);
        Debug.Log("録音を開始します。タイピングなど背景ノイズを出しながら、10秒間、何かを話してください。");

        // ノイズリダクションハンドラーの初期化
        noiseReducerHandler = new NoiseReducerHandler();
        noiseReducerHandler.SetConfig(new NoiseReducerConfig
        {
            SampleRate = SAMPLE_RATE,
            NumChannels = microphoneInput.channels, // チャンネル数をモノラルに設定
            Attenuation = CaressAttenuation,
            Model = RnNoiseModel.Speech
        });

        // 10秒後にAudioClipのデータをWAVファイルに保存する
        SaveOriginalAudioToWavFile().Forget(); // Forget()メソッドで、awaitが無いけれど大丈夫？という警告を無視
        SaveNoiseReducedAudioToWavFile().Forget();
    }

    // Update is called once per frame
    void Update()
    {
        var currentPosition = Microphone.GetPosition(microphone);
        if (currentPosition < 0 || _clipHead == currentPosition)
        {
            return;
        }

        // if (currentPosition % 480 == 0)
        // {
        //     Debug.Log("currentPosition: " + currentPosition);
        //     Debug.Log("_clipHead: " + _clipHead);
        // }

        // AudioClipからAudioデータを取得
        microphoneInput.GetData(_microphoneBuffer, 0);

        // データを480サンプルのブロックに分割して処理。480サンプルに分割するのはRNNoiseの仕様
        while (GetDataLength(_microphoneBuffer.Length, _clipHead, currentPosition) > _processBuffer.Length)
        {
            var remain = _microphoneBuffer.Length - _clipHead;
            if (remain < _processBuffer.Length)
            {
                Array.Copy(_microphoneBuffer, _clipHead, _processBuffer, 0, remain);
                Array.Copy(_microphoneBuffer, 0, _processBuffer, 0, _processBuffer.Length - remain);
            }
            else
            {
                Array.Copy(_microphoneBuffer, _clipHead, _processBuffer, 0, _processBuffer.Length);
            }

            // ノイズリダクション処理
            // 結果はそのまま_processBufferに入る
            noiseReducerHandler.ProcessPcm(_processBuffer);

            AppendToNoiseReducedBuffer(_processBuffer);

            _clipHead += _processBuffer.Length; // 480サンプルずつ追加
            if (_clipHead > _microphoneBuffer.Length) _clipHead -= _microphoneBuffer.Length;
        }
    }

    /// <summary>
    /// 10秒間待機して、比較対象となるオリジナルの録音結果をWAVファイルに保存
    /// </summary>
    private async UniTask SaveOriginalAudioToWavFile()
    {
        // 10秒間待機
        Debug.Log("10秒待機");
        await UniTask.Delay(TimeSpan.FromSeconds(10));
        Debug.Log("録音を終了し、オリジナルの録音データをWAVファイルに保存します。");

        // 保存先ファイルの設定
        var filePath = string.Format("{0}/{1}/{2}", UnityEngine.Application.persistentDataPath, "recordings", "originalAudio.wav");
        Debug.Log("filePath: " + filePath);

        // AudioClipからWAVファイルを作成
        SaveWavFile(filePath, microphoneInput);
    }

    /// <summary>
    /// 10秒間待機して、ノイズリダクション処理後の録音結果をWAVファイルに保存
    /// </summary>
    private async UniTask SaveNoiseReducedAudioToWavFile()
    {
        // 10秒間待機
        Debug.Log("10秒待機");
        await UniTask.Delay(TimeSpan.FromSeconds(10));
        Debug.Log("録音を終了し、ノイズリダクション後の録音データをWAVファイルに保存します。");

        // _noiseReducedBufferから新しいオーディオクリップを作成
        AudioClip noiseReducedClip = AudioClip.Create("NoiseReducedClip", _noiseReducedBuffer.Length, 1, SAMPLE_RATE, false);

        // 新しいオーディオクリップにデータをセット
        noiseReducedClip.SetData(_noiseReducedBuffer, 0);

        // 保存先ファイルの設定
        var filePath = string.Format("{0}/{1}/{2}", UnityEngine.Application.persistentDataPath, "recordings", "noiseReducedAudio.wav");
        Debug.Log("filePath: " + filePath);

        // AudioClipからWAVファイルを作成
        SaveWavFile(filePath, noiseReducedClip);
    }

    /// <summary>
    /// AudioClipからWAVファイルを作成
    /// WavUtilityには以下のコードを利用。ただし、FromAudioClipメソッドの一部を修正
    /// https://github.com/deadlyfingers/UnityWav/tree/master
    /// </summary>
    private void SaveWavFile(string filepath, AudioClip clip)
    {
        // AudioClipからWAVファイルを作成
        byte[] wavBytes = WavUtility.FromAudioClip(clip, filepath, true);
    }

    private int GetDataLength(int bufferLength, int head, int tail)
    {
        return head < tail ? tail - head : bufferLength - head + tail;
    }

    /// <summary>
    /// 切り出したAudioBufferを元の配列順に戻す処理
    /// </summary>
    /// <param name="buffer">480 Hz切り出したAudioBuffer</param>
    private void AppendToNoiseReducedBuffer(float[] buffer)
    {
        var remain = _microphoneBuffer.Length - _clipHead;
        if (remain < buffer.Length)
        {
            Array.Copy(buffer, 0, _noiseReducedBuffer, _clipHead, remain);
            Array.Copy(buffer, 0, _noiseReducedBuffer, 0, buffer.Length - remain);
        }
        else
        {
            Array.Copy(buffer, 0, _noiseReducedBuffer, _clipHead, buffer.Length);
        }
    }
}

利用方法

Unityを起動します。
Cube等のGameObjectを用意します。
用意したGameObjectのコンポーネントにDemoRealTimeNoiseReduction.csを追加します。
Console画面を開き、Playモードを実行します。
Consoleに表示される指示に従って、マイクに話しかけます。
このとき、わざとタイプ音等のノイズを出し続けると違いがわかりやすくなります。
録音が完了したら、filePathに表示されるフォルダを開き、originalAudio.wavとnoiseReducedAudio.wavが保存されたことを確認します。
両方のファイルを再生し、noiseReducedAudio.wavで背景ノイズが除去できていたら成功です。
タイプ音程度ならほぼ完全に消えます。ただし、発話中のノイズがほとんど消えません。

参考資料

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up