VRMリップシンクの仕組みを動かしながら理解する（three-vrmとWeb Audio API）

Posted at 2025-11-15

Youtubeなどでよく見るしゃべってる内容に合わせてアバターの口が動くやつ。あれ、やりたくないですか？
この記事では、VRMモデルに音声に合わせて口を動かす（リップシンク）をさせるために、どのような技術が使われているのかを、実際のデモコード（vrm_lipsync_viewer.html）を参考に解説します。
VRMはクリエイターに作ってもらいました。

1. 動作の全体像

添付のデモは、主に3つのファイルを使ってVRMモデルを動かし、リップシンクを実現しています。

VRMファイル (.vrm): 3Dモデル本体です。口の形を変えるための「BlendShape（ブレンドシェイプ）」の情報も含まれています。
音声ファイル (.mp3 or .wav): モデルに喋らせるための音声データです。
Lipsync JSONファイル (.json): 「音声のどのタイミングで、どの口の形（母音など）になるか」を記録したデータです。

このデモビューアは、これらのファイルを読み込み、音声の再生時間に合わせてJSONデータを参照し、VRMモデルのBlendShapeをリアルタイムで変化させることでリップシンクを実現しています。

2. 登場する主要な技術要素

デモを動かしている主要な技術は以下の通りです。

A. 3D描画とVRMの制御

Three.js: Webブラウザで3Dグラフィックスを描画するためのライブラリです。モデルの表示、カメラ、照明などを扱います。
three-vrm: Three.js上でVRMファイルを扱うためのライブラリです。このライブラリのおかげで、複雑なVRMの仕様（BlendShapeの制御など）を簡単に操作できます。

B. 音声の再生と時間管理

Web Audio API: ブラウザで音声を扱うためのAPIです。
- AudioContext: 音声処理の中心となるオブジェクトで、音声の再生や現在時刻の取得に使われます。
- decodeAudioData: 読み込んだ音声ファイル（mp3など）のデータをブラウザが扱える形式に変換します。
- currentTime: 再生が始まってからの正確な時間を取得するために利用されます。

3. リップシンク処理の具体的なステップ

ステップ 1: ファイルの読み込み

ユーザーが3つのファイルをアップロードすると、それぞれ以下の処理が行われます。

VRMファイル: GLTFLoaderとTHREE.VRM.from()を使って3Dシーンに追加されます。
音声ファイル: Web Audio APIのAudioContextとdecodeAudioDataを使ってメモリ上のAudioBufferとして準備されます。
JSONファイル: JSON.parse()でパースされ、JavaScriptオブジェクトとして保持されます。

ステップ 2: 再生開始と時間の同期

「再生」ボタンが押されると、以下の処理が同時に行われます。

AudioBufferSourceNode.start(0)で音声の再生が開始されます。
audioContext.currentTimeをstartTimeとして記録し、再生開始時刻を保持します。

ステップ 3: リアルタイムな口の動き（BlendShapeの制御）

アニメーションループ（animate()関数）の中で、毎フレーム以下の処理が実行されます。

現在の再生時間の計算: currentTime = audioContext.currentTime - startTimeで、音声が再生されてから何秒経ったかを正確に計算します。
JSONデータの参照: JSONファイル内のmouthCues配列から、「現在の再生時間」に該当する**口の形（cue.value）**を探します。
VRM BlendShapeの適用:
- JSONの口の形（例: A, I, Uなど）を、VRMのBlendShape名（例: a, i, u）にマッピングします。（例: mouthShapeMapオブジェクトを参照）
- vrm.blendShapeProxy.setValue()メソッドを使い、対応するVRMのBlendShapeの値を1.0（最大）に設定します。

これにより、音声の再生と完全に同期して、VRMモデルの口が「あ」「い」「う」といった形にリアルタイムに変化し、リップシンクが実現されます。

4. BlendShapeの具体的なマッピング例

デモコードでは、JSONデータが持つ記号と、VRMが認識する口の形の名前を対応させています。

JSONの記号 (`cue.value`)	意味（推測）	VRMのBlendShape名（`vrmShape`）
`X`	無音/ニュートラル	`neutral` (NEUTRAL)
`A`	「あ」系	`a` (A)
`B`, `G`	「い」系	`i` (I)
`C`, `F`	「う」系	`u` (U)
`D`	「え」系	`e` (E)
`E`	「お」系	`o` (O)

このマッピング処理によって、音声解析で得られたシンプルな記号が、VRMモデルの豊かな表情として表現されるわけです。

<!DOCTYPE html>
<html lang="ja">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <meta http-equiv="Content-Security-Policy" content="default-src 'self' 'unsafe-eval' 'unsafe-inline' https://unpkg.com https://cdn.jsdelivr.net data: blob:;">
    <title>VRM Lipsync Viewer</title>
    <style>
        body {
            margin: 0;
            overflow: hidden;
            font-family: Arial, sans-serif;
        }
        #canvas {
            width: 100%;
            height: 100vh;
            display: block;
        }
        #controls {
            position: absolute;
            top: 20px;
            left: 20px;
            background: rgba(255, 255, 255, 0.9);
            padding: 20px;
            border-radius: 8px;
            box-shadow: 0 2px 10px rgba(0,0,0,0.3);
            max-width: 300px;
        }
        .file-input {
            margin: 10px 0;
        }
        label {
            display: block;
            margin-bottom: 5px;
            font-weight: bold;
            font-size: 14px;
        }
        input[type="file"] {
            margin-bottom: 10px;
            font-size: 12px;
        }
        button {
            background: #4CAF50;
            color: white;
            border: none;
            padding: 10px 20px;
            border-radius: 4px;
            cursor: pointer;
            font-size: 16px;
            margin-top: 10px;
            width: 100%;
        }
        button:hover {
            background: #45a049;
        }
        button:disabled {
            background: #cccccc;
            cursor: not-allowed;
        }
        #status {
            margin-top: 15px;
            padding: 10px;
            background: #f0f0f0;
            border-radius: 4px;
            font-size: 13px;
            line-height: 1.4;
        }
    </style>
</head>
<body>
    <canvas id="canvas"></canvas>
    
    <div id="controls">
        <h2 style="margin-top:0;">VRM Lipsync</h2>
        
        <div class="file-input">
            <label>VRMファイル:</label>
            <input type="file" id="vrmFile" accept=".vrm">
        </div>
        
        <div class="file-input">
            <label>音声ファイル:</label>
            <input type="file" id="audioFile" accept=".mp3,.wav">
        </div>
        
        <div class="file-input">
            <label>Lipsync JSON:</label>
            <input type="file" id="jsonFile" accept=".json">
        </div>
        
        <button id="playButton" disabled>再生</button>
        
        <div id="status">ファイルを選択してください</div>
    </div>

    <script src="https://unpkg.com/three@0.137.0/build/three.js"></script>
    <script src="https://unpkg.com/three@0.137.0/examples/js/loaders/GLTFLoader.js"></script>
    <script src="https://unpkg.com/@pixiv/three-vrm@0.6.11/lib/three-vrm.js"></script>

    <script defer>
        let scene, camera, renderer, vrm;
        let audioContext, audioSource, audioBuffer;
        let lipsyncData;
        let isPlaying = false;
        let startTime = 0;

        // 口の形状マッピング
        const mouthShapeMap = {
            'X': 'neutral',
            'A': 'a',
            'B': 'i',
            'C': 'u',
            'D': 'e',
            'E': 'o',
            'F': 'u',
            'G': 'i',
            'H': 'a'
        };

        function init() {
            scene = new THREE.Scene();
            scene.background = new THREE.Color(0x212121);

            camera = new THREE.PerspectiveCamera(45, window.innerWidth / window.innerHeight, 0.1, 100);
            camera.position.set(0, 1.4, 2.5);
            camera.lookAt(0, 1.4, 0);

            renderer = new THREE.WebGLRenderer({ 
                canvas: document.getElementById('canvas'), 
                antialias: true 
            });
            renderer.setSize(window.innerWidth, window.innerHeight);
            renderer.setPixelRatio(window.devicePixelRatio);

            const light = new THREE.DirectionalLight(0xffffff, 1);
            light.position.set(1, 1, 1);
            scene.add(light);
            scene.add(new THREE.AmbientLight(0xffffff, 0.5));

            window.addEventListener('resize', onWindowResize);

            document.getElementById('vrmFile').addEventListener('change', loadVRM);
            document.getElementById('audioFile').addEventListener('change', loadAudio);
            document.getElementById('jsonFile').addEventListener('change', loadJSON);
            document.getElementById('playButton').addEventListener('click', play);

            animate();
        }

        function loadVRM(e) {
            const file = e.target.files[0];
            if (!file) return;

            updateStatus('VRM読み込み中...');

            const reader = new FileReader();
            reader.onload = function(event) {
                const loader = new THREE.GLTFLoader();
                
                loader.parse(event.target.result, '', function(gltf) {
                    THREE.VRM.from(gltf).then(function(vrmInstance) {
                        if (vrm) {
                            scene.remove(vrm.scene);
                        }
                        
                        vrm = vrmInstance;
                        
                        // アバターを180度回転させて正面を向かせる
                        vrm.scene.rotation.y = Math.PI;
                        
                        scene.add(vrm.scene);
                        
                        console.log('VRM loaded');
                        console.log('BlendShapes:', vrm.blendShapeProxy ? 
                            Object.keys(vrm.blendShapeProxy._blendShapeGroups) : 'none');
                        
                        updateStatus('VRM読み込み完了');
                        checkReady();
                    }).catch(function(error) {
                        console.error('VRM error:', error);
                        updateStatus('VRMエラー: ' + error.message);
                    });
                }, function(error) {
                    console.error('GLTF error:', error);
                    updateStatus('GLTFエラー');
                });
            };
            
            reader.onerror = function() {
                updateStatus('ファイル読み込みエラー');
            };
            
            reader.readAsArrayBuffer(file);
        }

        async function loadAudio(e) {
            const file = e.target.files[0];
            if (!file) return;

            try {
                if (!audioContext) {
                    audioContext = new (window.AudioContext || window.webkitAudioContext)();
                }

                const arrayBuffer = await file.arrayBuffer();
                audioBuffer = await audioContext.decodeAudioData(arrayBuffer);
                
                updateStatus('音声読み込み完了');
                checkReady();
            } catch (error) {
                console.error('Audio error:', error);
                updateStatus('音声エラー: ' + error.message);
            }
        }

        async function loadJSON(e) {
            const file = e.target.files[0];
            if (!file) return;

            try {
                const text = await file.text();
                lipsyncData = JSON.parse(text);
                console.log('Lipsync data:', lipsyncData);
                updateStatus('JSON読み込み完了');
                checkReady();
            } catch (error) {
                console.error('JSON error:', error);
                updateStatus('JSONエラー: ' + error.message);
            }
        }

        function checkReady() {
            const hasVrm = !!vrm;
            const hasAudio = !!audioBuffer;
            const hasLipsync = !!lipsyncData;
            
            console.log('Check ready:', { hasVrm, hasAudio, hasLipsync });
            
            if (hasVrm && hasAudio && hasLipsync) {
                document.getElementById('playButton').disabled = false;
                updateStatus('準備完了！再生ボタンを押してください');
            } else {
                const missing = [];
                if (!hasVrm) missing.push('VRM');
                if (!hasAudio) missing.push('音声');
                if (!hasLipsync) missing.push('JSON');
                updateStatus('読み込み済み - 残り: ' + missing.join(', '));
            }
        }

        function play() {
            if (isPlaying) return;

            if (audioContext.state === 'suspended') {
                audioContext.resume();
            }

            audioSource = audioContext.createBufferSource();
            audioSource.buffer = audioBuffer;
            audioSource.connect(audioContext.destination);
            audioSource.start(0);

            startTime = audioContext.currentTime;
            isPlaying = true;

            audioSource.onended = function() {
                isPlaying = false;
                resetMouth();
                updateStatus('再生終了');
            };

            updateStatus('再生中...');
        }

        function updateLipsync() {
            if (!isPlaying || !vrm || !lipsyncData || !vrm.blendShapeProxy) return;

            const currentTime = audioContext.currentTime - startTime;
            const cue = lipsyncData.mouthCues.find(function(c) {
                return currentTime >= c.start && currentTime < c.end;
            });

            if (cue) {
                resetMouth();
                const vrmShape = mouthShapeMap[cue.value];
                if (vrmShape) {
                    try {
                        vrm.blendShapeProxy.setValue(THREE.VRMSchema.BlendShapePresetName[vrmShape.toUpperCase()], 1.0);
                    } catch (e) {
                        // シェイプが存在しない場合
                    }
                }
            }
        }

        function resetMouth() {
            if (!vrm || !vrm.blendShapeProxy) return;
            
            ['A', 'I', 'U', 'E', 'O', 'NEUTRAL'].forEach(function(shape) {
                try {
                    vrm.blendShapeProxy.setValue(THREE.VRMSchema.BlendShapePresetName[shape], 0);
                } catch (e) {}
            });
        }

        function updateStatus(message) {
            document.getElementById('status').textContent = message;
        }

        function animate() {
            requestAnimationFrame(animate);

            if (vrm) {
                updateLipsync();
                vrm.update(1/60);
            }

            renderer.render(scene, camera);
        }

        function onWindowResize() {
            camera.aspect = window.innerWidth / window.innerHeight;
            camera.updateProjectionMatrix();
            renderer.setSize(window.innerWidth, window.innerHeight);
        }

        window.addEventListener('DOMContentLoaded', function() {
            // THREE.jsが読み込まれるまで待機
            if (typeof THREE === 'undefined') {
                setTimeout(arguments.callee, 100);
                return;
            }
            init();
        });
    </script>
</body>
</html>

まとめ

このデモは、Web Audio APIで正確な再生時間を把握し、その時間をキーとしてJSONデータからどの口の形にするかを取得し、three-vrmを通じてVRMモデルのBlendShapeを操作するという、一連の流れでWebブラウザ上でのリアルタイムなリップシンクを実現しています。

VRMとWeb技術の組み合わせは、今後さらに多くの応用が期待できる分野です。ぜひ、ご自身のプロジェクトにも活用してみてください！

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up