Azure AI Search ベクタライザー + CLAPを用いた音声の類似検索

Posted at 2024-10-09

CLAPを用いることで音声データを用いたベクトル検索をおこなうことが可能です。

今回、音声ファイルをCLAPでベクトル化し、Azure AI Searchとベクタライザーの機能を用いて音声のベクトル検索をしてみます。

CLAPで音声をベクトル化 + Azure AI Searchに格納

CLAP（Contrastive Language-Audio Pretraining）は、音声とテキストのマルチモーダルモデルです。CLAPを活用することで例えば、音声同士の類似検索や、音声ファイルからその内容に合った文章を推測したり、テキストに基づいて関連する音声を見つけたりできるようです。今回はLAION-AI CLAPを活用します。

まずは、Google ColabでCLAPを動かして音声データのベクトル化をおこないます。
LAION-CLAPのレポジトリとこちらの記事を参考にさせて頂きました。

python

pip install audioset_download
pip install laion-clap

python

# サンプル音声データのダウンロード (時間かかる)
from audioset_download import Downloader
d = Downloader(root_path='/content/sample_data/wavfiles/', labels=None, n_jobs=16, download_type='eval', copy_and_replicate=False)
d.download(format = 'wav')

ダウンロードした音声データをベクトル化します。

python

import os
import numpy as np
import librosa
import torch
import laion_clap
import glob
from tqdm import tqdm

def int16_to_float32(x):
    return (x / 32767.0).astype(np.float32)

def float32_to_int16(x):
    x = np.clip(x, a_min=-1., a_max=1.)
    return (x * 32767.).astype(np.int16)

model = laion_clap.CLAP_Module(enable_fusion=False)
model.load_ckpt()

input_dir = '/content/sample_data/wavfiles/**/*.wav'
file_list = [p for p in glob.glob(input_dir, recursive=True) if os.path.isfile(p)]

print('file num:', len(file_list))

def get_class_name(file):
    dir_name = os.path.dirname(file)
    return dir_name.split('/')[1]

info_list = []
for file in tqdm(file_list):
    try:
        audio_embed = model.get_audio_embedding_from_filelist(x=[file], use_tensor=False)

        info = {}
        info['file'] = file
        info['class'] = get_class_name(file)
        info['embed'] = audio_embed[0]
        info_list.append(info)

    except Exception as e:
        print('error:', e)
        continue

print(info_list[0])
print(len(info_list[0]['embed']))

ベクトル化が完了し、次元数は512次元であることがわかりました。

{'file': '/content/sample_data/wavfiles/Squish/2jxgoUnZ8AI_30.0-40.0.wav', 'class': 'content', 'embed': array([ 4.53657936e-03, -5.38767911e-02, -4.80033755e-02,  7.13726738e-04,
       -3.35424058e-02, -2.60807499e-02, -2.56251041e-02, -1.69892162e-02,
        1.48186069e-02,  2.20915899e-02, -3.35830115e-02, -5.59791923e-02,
       -4.06089425e-02,  8.79613683e-02, -3.83913331e-02, -6.43652007e-02,
       -2.44782995e-02, -7.09670335e-02,  7.64381513e-02, -3.25911455e-02,
       -1.96495093e-02, -1.05264066e-02,  2.06979737e-02, -1.10696666e-02,
        6.93265572e-02,  7.91375563e-02, -6.17211163e-02,  3.33113447e-02,
  ...
# 512次元
512

次にベクトルデータベースとしてAzure AI Searchを準備します。

価格レベルはFreeでも大丈夫です。あとはデフォルトのままデプロイします。

デプロイが完了したら、AI Searchの画面に移動し、[インデックス] > [インデックスの追加]を選択します。
さらに [フィールドの追加] から file と embed というフィールドを追加します。embed フィールドはベクトル配列を格納するフィールドです。ディメンションは512次元にします。

フィールド追加ができたら、インデックスの作成を完了させます。

Google Colabに戻り、Azure AI Searchに先程ベクトル化した音声データを格納します。

pip install azure-core
pip install azure-search-documents
pip install azure-identity

python

import pandas as pd
from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient

AZURE_SEARCH_SERVICE_URL = '<ai-search-endpoint>'
AZURE_SEARCH_SERVICE_KEY =  '<ai-search-api-key>'

credential = AzureKeyCredential(AZURE_SEARCH_SERVICE_KEY)

# Create a search index
index_name = "<my-index-name>"
search_client = SearchClient(AZURE_SEARCH_SERVICE_URL, index_name, credential=credential)

# info_list(先程ベクトル化したデータ)をpandasで調整
df = pd.DataFrame(info_list)
df = df.drop(columns=['class'])
df['file'] = df['file'].apply(lambda x: x.split('/')[-1])
df['embed'] = df['embed'].apply(lambda x: x.tolist() if isinstance(x, np.ndarray) else x)
df['embed'].apply(type).unique()
df['id'] = df.index.astype(str)
df

  file                      embed
0	2jxgoUnZ8AI_30.0-40.0.wav	[0.0045365794, -0.05387679, -0.048003376, 0.00...
1	58zENxTDTjk_30.0-40.0.wav	[-0.019631093, -0.023695994, -0.012843471, 0.1...
2	03IZHwOh2eQ_240.0-250.0.wav	[-0.0011360282, -0.055622336, -0.038938936, 0....
3	1ISlTAGBMhs_10.0-20.0.wav	[-0.056855004, -0.06744653, 0.090874344, 0.053...
4	-ji1W2uf7iM_30.0-40.0.wav	[-0.097820416, 0.02645195, -0.12841497, -0.040...
...

上記のように調整したデータをAI Searchに格納します。

python

doc = df.to_dict(orient='records')
search_client.upload_documents(doc)
print(f"Uploaded {len(doc)} documents")

---
Uploaded 583 documents

カスタム WebAPIベクタライザー

Azure AI Searchでは、ベクタライザー機能を利用することで、クエリ処理時にAzure OpenAIやAzure AI Studio モデルカタログなどを介して埋め込みモデルに接続することができます。そしてその他にも、独自構築するAPIを呼ぶことも可能です。

今回は、CLAPを使用した埋め込み処理をAPI化し、その機能をAI Searchに組み込みます。

イメージとしては以下の図のような感じです。

下記のリポジトリにあるサンプルコードを参考に、ベクタライザーとして機能するコードをFlaskで構築し、Azure Container Appsにデプロイします。サンプルではAzure Functionsが使われていますが、モデルの初期化にかかるリソースやタイムアウトを考慮し、今回はAzure Container Appsを選択しています。

FlaskコードとDockerfile

app.py

from flask import Flask, request, jsonify
import jsonschema
import logging
import laion_clap

app = Flask(__name__)
model = laion_clap.CLAP_Module(enable_fusion=False)
model.load_ckpt()

@app.route('/', methods=['GET'])
def home():
    return "Hello!"

@app.route('/api', methods=['POST'])
def generate_text_embedding():
    logging.info('Python HTTP API processed a request.')
    request_data = request.get_json()

    try:
        jsonschema.validate(request_data, schema=get_request_schema())
    except jsonschema.exceptions.ValidationError as e:
        return jsonify({"error": str(e)}), 400

    query = request_data['values'][0]['data']['text']
    text_embed = model.get_text_embedding([query])
    return_embed = text_embed[0].tolist()

    if return_embed:
        response_body = {
            "values": [
                {
                    "recordId": request_data['values'][0]['recordId'],
                    "data": {
                        "vector": return_embed
                    },
                    "errors": None,
                    "warnings": None
                }
            ]
        }
        return jsonify(response_body), 200
    else:
        return "", 204

def get_request_schema():
    return {
        "$schema": "http://json-schema.org/draft-04/schema#",
        "type": "object",
        "properties": {
            "values": {
                "type": "array",
                "minItems": 1,
                "items": {
                    "type": "object",
                    "properties": {
                        "recordId": {"type": "string"},
                        "data": {
                            "type": "object",
                            "properties": {
                                "text": {"type": "string", "minLength": 1}
                            },
                            "required": ["text"],
                        },
                    },
                    "required": ["recordId", "data"],
                },
            }
        },
        "required": ["values"],
    }

if __name__ == '__main__':
    app.run()

Dockerfile

FROM python:3.11

WORKDIR /code

COPY requirements.txt .

RUN pip3 install -r requirements.txt

COPY . .

EXPOSE 5050

ENTRYPOINT ["gunicorn", "--config", "gunicorn.conf.py", "app:app"]

Azure Container AppsにCLIでデプロイします。

bash

az login
RG=<myResourceGroupName>
LOCATION=japaneast
NAME=<myAppName>

az containerapp up --resource-group $RG --location $LOCATION --name $NAME --ingress external --target-port 5050 --source .

# container app update
az containerapp update --name $NAME --resource-group $RG --cpu 2.0 --memory 4.0Gi --min-replicas 1

デプロイできたら、CURLで叩いてみます。問題なさそうです。

bash

 curl -X POST https://<azure-container-app-endpoint>/api \
 -H "Content-Type: application/json" \
 -d '{"values":[{"recordId":"0","data":{"text":"Quiet and gentle music"}}]}'
 
{"values":[{"data":{"vector":[0.004272684454917908,-0.025229282677173615...-0.07266686856746674]},"errors":null,"recordId":"0","warnings":null}]}

ベクトルプロファイルを修正 + 類似検索

最後にAzure AI Searchで作成済みインデックスのベクトルプロファイルを修正します。

対象のインデックスを選択し、[JSONの編集]を選びます。

以下のように該当箇所を修正します。

    ...
    "profiles": [
      {
        "name": "vector-profile",
        "algorithm": "vector-config-×××××",
        "vectorizer": "<VectorizerName>", //下記のVectorizerNameと同様
        "compression": null
      }
    ],
    "vectorizers": [
      {
        "name": "<VectorizerName>", //任意のVectorizerName
        "kind": "customWebApi", //外部API呼び出し
        "azureOpenAIParameters": null,
        "customWebApiParameters": {
          "httpMethod": "POST",
          "uri": "<azure-container-app-endpoint>/api", //APIエンドポイント
          "timeout": "PT30S", //タイムアウト
          "authResourceId": null,
          "httpHeaders": {},
          "authIdentity": null
        },
        "aiServicesVisionParameters": null,
        "amlParameters": null
      }
    ],
    ...

Google Colabに戻り、自然言語で検索をしてみます。

python

from azure.search.documents.models import VectorizableTextQuery

index_name = "index-audio"
 #「静かで優しい音楽」と検索してみる
query = "Quiet and gentle music"

search_client = SearchClient(AZURE_SEARCH_SERVICE_URL, index_name, credential=credential)
vector_query = VectorizableTextQuery(text=query, fields="embed")
results = search_client.search(
    vector_queries=[vector_query],
    select=["file"],
    top=3
)
for item in results:
  print(item)

検索の結果、類似結果上位3件がちゃんと返ってきました！

{'file': '-f6s6kQEHFY_0.0-10.0.wav', '@search.score': 0.6710902, ...}
{'file': '1PDg2ENa7jk_22.0-32.0.wav', '@search.score': 0.6642707, ...}
{'file': '-0Gj8-vB1q4_30.0-40.0.wav', '@search.score': 0.6578776, ...}

上位3件の音声ファイルを確認すると、確かにどれも「静かで優しい」感じの音声でした
とはいえ、感じ方は人それぞれだと思うので、気になる方は試してみてください。

さいごに

CLAPを用いることで音声の類似検索が可能になります。今回省略しましたが、音声同士の類似検索も可能です。

また、Azure AI Searchのベクタライザーやカスタムスキルを活用することで、より多種多様な検索システムが構築できそうだ。ということがわかりました。
今回は以上となります。拙い内容で恐縮ですが、最後までご覧頂きありがとうございました

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up