Azure Speech SDK for Python の入力デバイスIDって簡単に取れなくない？

Posted at 2025-08-06

Azure Speech SDK for Python を使って音声認識アプリケーションを開発していたとき、壁にぶつかった。任意のオーディオ入力デバイスを指定する際に設定する device_name が「デバイス ID」という簡単に取得することができない形式だったこと。

import azure.cognitiveservices.speech as speechsdk

# デバイス名は分かるけど、IDが分からない...
audio_config = speechsdk.audio.AudioConfig(device_name="my-microphone")
# ↑ これだとデフォルトデバイスしか使えない

Azure Speech SDK では、特定の音声入力デバイスを指定するためにデバイスIDが必要となることが分かった。デバイス名じゃダメだった。この問題は、Azure Speech SDK が OS の低レベル API と直接連携するため発生する。一般的な Python オーディオライブラリ（PyAudio、sounddevice など）では、人間が読みやすいデバイス名は取得できるが、Azure Speech SDK が要求する形式の ID は提供されない。

import azure.cognitiveservices.speech as speechsdk

# 理想的な使い方（しかし実際は動作しない）
audio_config = speechsdk.audio.AudioConfig(device_name="USB Microphone") 

# 実際に必要な形式（Windows の例）
audio_config = speechsdk.audio.AudioConfig(
    device_name="{0.0.0.00000000}.{12345678-1234-5678-9abc-def012345678}"
)

うーん、、、
まぁ勉強がてらいい解決方法とか調べてみる。

解決方法

Azure Speech SDK が要求する デバイスIDは、各 OS 固有の形式になっている。

Windows: WASAPI デバイスエンドポイントID
macOS: CoreAudio デバイスUID

Microsoft が提供するサンプルコードは、各 OS のネイティブ API を直接使用してこれらの ID を取得する方法を示している。しかし、これらは C++ や Objective-C で記述されており、Python 開発者にとっては直接利用が困難。

Windows (C++) 実装

_audio_devices.cpp では、MMDeviceEnumerator の EnumAudioEndpoints() を使用。

// 入力デバイス（キャプチャ）
CComPtr<IMMDeviceCollection> pInputColl;
hr = pEnum->EnumAudioEndpoints(eCapture, DEVICE_STATE_ACTIVE, &pInputColl);

デバイス名: USB Microphone
デバイスID: {0.0.0.00000000}.{a1b2c3d4-e5f6-7890-abcd-ef1234567890}

macOS (Objective-C) 実装

_audio_devices.m では、CoreAudio API を使用

// オーディオデバイス一覧を取得
AudioObjectPropertyAddress prop = {
    kAudioHardwarePropertyDevices,
    kAudioObjectPropertyScopeGlobal,
    kAudioObjectPropertyElementMaster
};

デバイス名: MacBook Airの内蔵マイク
デバイスUID: BuiltInMicrophoneDevice

デバイス名: AirPods Pro
デバイスUID: AAAA0000-BBBB-CCCC-DDDD-EEEEEEEEEEEE

Python ラッパー：ネイティブコードから Python へ

ネイティブコードから Python 側にインターフェースする方法ってどんなやり方があるんだろうか。

アーキテクチャ

ネイティブ API 活用: 各 OS の公式オーディオ API を直接使用
Python 拡張モジュール: C/C++, Objective-C コードを Python から呼び出し可能に
クロスプラットフォーム対応: プラットフォーム検出による適切な実装の選択
モダンなパッケージング: pyproject.toml + cibuildwheel による効率的なビルド・配布

ネイティブコード -> Python

ネイティブコードを Python から呼び出すため、Python C API を使用してブリッジ層を実装。これにより、OS 固有の複雑な API 呼び出しを Python の簡潔な関数として提供できる。

#include <Python.h>  // ← 必須：Python C API

static PyObject* list_device_uids(PyObject* self, PyObject* args) {
    PyObject* device_list = PyList_New(0);
    // ↑ Pythonから呼び出される関数
    // 戻り値：PyObject*（Pythonオブジェクト）
    // 引数：self（通常未使用）、args（Python引数）
    
    // 実装...
    PyList_Append(device_list, PyUnicode_FromString(device_info));
    
    return device_list; // ← 必ずPyObjectを返す
}

データ型変換の詳細

C/C++ のネイティブデータを Python オブジェクトに変換する際の具体的な手法例。

// 基本的なデータ型変換
PyUnicode_FromString("Hello World")           // char* → str
PyLong_FromUnsignedLong(42)                  // unsigned long → int
PyFloat_FromDouble(3.14159)                  // double → float
PyBool_FromLong(1)                          // int → bool

// 複雑なデータ構造の構築
PyObject *device_dict = PyDict_New();                    // 空の辞書作成
PyDict_SetItemString(device_dict, "name", PyUnicode_FromString(name));     // キー設定
PyDict_SetItemString(device_dict, "channels", PyLong_FromLong(channels));  // 数値設定

// リストへの要素追加
PyList_Append(device_list, device_dict);     // リストに辞書を追加
Py_DECREF(device_dict);                      // 参照カウンタの適切な管理

Python 拡張モジュールの定義

Python 拡張モジュールは、C/C++ などのネイティブコードで記述されたプログラムを、Python から直接呼び出し可能にする仕組み。以下のように両プラットフォーム共通で、公開する関数群を配列で定義し、モジュール初期化時に登録。

static PyMethodDef methods[] = {
    {"list_device_uids", list_device_uids, METH_NOARGS, "List audio devices"},
    {"list_device_details", list_device_details, METH_NOARGS, "Detailed device info"},
    {"list_devices_by_type", list_devices_by_type, METH_VARARGS, "Filter by device type"},
    {NULL, NULL, 0, NULL}
};

static struct PyModuleDef moddef = {
    PyModuleDef_HEAD_INIT, "_audio_devices", NULL, -1, methods
};

PyMODINIT_FUNC PyInit__audio_devices(void) {
    return PyModule_Create(&moddef);
}

モジュール登録

__init__.py でプラットフォーム検出を行い、対応するネイティブ拡張モジュールをロード

import sys

if sys.platform == "darwin":
    # macOS用 Objective-C拡張
    from ._audio_devices import list_device_uids
elif sys.platform == "win32":
    # Windows用 C++拡張  
    from ._audio_devices import list_device_uids
else:
    # その他のOS
    def list_device_uids():
        raise NotImplementedError(f"Audio device listing not supported on {sys.platform}")

PyPI パッケージ登録手順

1. プロジェクト構成

pyproject.toml でモダンなPythonパッケージング：

[build-system]
requires = ["setuptools>=64", "wheel"]
build-backend = "setuptools.build_meta"

[project]
name = "audio_devices"
version = "0.2.0"
description = "Cross-platform audio device enumeration library"
requires-python = ">=3.8"
license = {text = "MIT"}

[tool.setuptools.packages.find]
where = ["src"]

2. 拡張モジュールのビルド設定

setup.py でプラットフォーム別の拡張設定

from setuptools import setup, Extension
import sys

ext_modules = []
if sys.platform == "darwin":
    ext_modules.append(Extension(
        "audio_devices._audio_devices",
        sources=["src/audio_devices/_audio_devices.m"],
        extra_link_args=["-framework", "CoreAudio", "-framework", "AudioToolbox"],
        language="objc",
    ))
elif sys.platform == "win32":
    ext_modules.append(Extension(
        "audio_devices._audio_devices",
        sources=["src/audio_devices/_audio_devices.cpp"],
        libraries=["ole32", "uuid"],
    ))

setup(ext_modules=ext_modules, zip_safe=False)

CI/CDとの統合

GitHub Actions での自動ビルド。cibuildwheel による複数環境対応。

Windows: win32, win_amd64
macOS: Intel (x86_64) + Apple Silicon (arm64)
Python: 3.8~3.11

name: Build Wheels with cibuildwheel

on:
  push:
    branches: [ main, develop ]
  release:
    types: [ published ]

jobs:
  build_wheels:
    name: Build wheel on ${{ matrix.os }}
    runs-on: ${{ matrix.os }}
    strategy:
      matrix:
        os: [windows-latest, macos-latest]

    steps:
    - uses: actions/checkout@v4
    - uses: actions/setup-python@v4
      with:
        python-version: '3.11'

    - name: Install cibuildwheel
      run: python -m pip install cibuildwheel==2.16.2

    - name: Build wheels
      run: python -m cibuildwheel --output-dir wheelhouse
      env:
        CIBW_BUILD: "cp38-* cp39-* cp310-* cp311-*"
        CIBW_ARCHS_MACOS: "x86_64 arm64"
        CIBW_TEST_COMMAND: "python -c \"import audio_devices; print('Import successful')\""

PyPI へのパッケージの自動アップロード

テスト環境

PyPI には本番と同じだが URL が異なるテスト環境が用意されている。先にこちらでテストができる。repository-url : https://test.pypi.org/legacy とすることでテスト環境にアップロードされる。

  upload_test_pypi:
    name: Upload to Test PyPI
    needs: [build_wheels]
    runs-on: ubuntu-latest
    if: github.event_name == 'workflow_dispatch'
    
    steps:
    - name: Download all artifacts
      uses: actions/download-artifact@v4
      with:
        path: dist
        merge-multiple: true

    - name: Publish to Test PyPI
      uses: pypa/gh-action-pypi-publish@release/v1
      with:
        password: ${{ secrets.TEST_PYPI_API_TOKEN }}
        repository-url: https://test.pypi.org/legacy/
        verbose: true

本番環境

  upload_pypi:
    name: Upload to PyPI
    needs: [build_wheels]
    runs-on: ubuntu-latest
    if: github.event_name == 'release' && github.event.action == 'published'
    
    steps:
    - name: Download all artifacts
      uses: actions/download-artifact@v4
      with:
        path: dist
        merge-multiple: true

    - name: Publish to PyPI
      uses: pypa/gh-action-pypi-publish@release/v1
      with:
        password: ${{ secrets.PYPI_API_TOKEN }}

pip install

これで Azure Speech SDK を使った音声認識アプリケーションで、簡単にオーディオデバイスを列挙・選択できるようになった。PyPI への公開も含めていい勉強になった。

pip install audio_devices

PyPI

使用方法

import audio_devices

# Basic device listing (name: uid format)
devices = audio_devices.list_device_uids()
for device in devices:
    print(device)

# Detailed device information
device_details = audio_devices.list_device_details()
for device in device_details:
    print(f"Name: {device['name']}")
    print(f"UID: {device['uid']}")
    print(f"Type: {device['device_type']}")
    print(f"Input channels: {device['input_channels']}")
    print(f"Output channels: {device['output_channels']}")
    print(f"Manufacturer: {device['manufacturer']}")

GitHub

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up