AWS Transcribeで文字起こししたデータを用いて、Comprehendで感情分析

Last updated at 2025-07-17Posted at 2025-07-17

AWS Transcribe の文字起こし結果を Amazon Comprehend へ連携し、日本語音声の感情分析を行うまでの手順と JSON の読み解きポイントをまとめます。

全体フロー (シーケンス図)

Transcribe JSON ― channel_label と speaker_label の違い

AWS Transcribe には 2 通り の話者区別モードがあります。

モード	JSON パス	例	特徴
マルチチャネル	`results.items[].channel_label`	`"ch_1"`	通常はコールセンター録音等、左右で別トラックになっているデータ向け。
話者ダイアライゼーション	`results.speaker_labels.segments[].speaker_label`	`"spk_0"`	1 トラック録音から話者を自動推定。

1‑1. channel_label サンプル

{
  "results": {
    "items": [
      {
        "start_time": "0.25",
        "end_time": "0.74",
        "alternatives": [{ "content": "こんにちは" }],
        "type": "pronunciation",
        "channel_label": "ch_0"   // ★
      }
    ]
  }
}

話者推定が無い ため、items を start_time でソート → channel_label が切り替わる箇所でセグメント化 します。

1‑2. speaker_label サンプル

{
  "results": {
    "speaker_labels": {
      "segments": [
        {
          "speaker_label": "spk_1",   // ★
          "start_time": "3.12",
          "end_time": "4.58",
          "items": [{ "start_time": "3.12" }, { "start_time": "3.48" }]
        }
      ]
    },
    "items": [
      { "start_time": "3.12", "alternatives": [{ "content": "お世話に" }], "type": "pronunciation" },
      { "start_time": "3.48", "alternatives": [{ "content": "なります" }], "type": "pronunciation" }
    ]
  }
}

この場合は segments が作られているのでマッピングがシンプルです。

MP3 アップロード & Transcribe ジョブ作成

フロントエンドから multipart/form-data で MP3 を /api/upload-audio へ送信し、バックエンドは S3 にアップロード後 StartTranscriptionJob を実行します。

try {
  await this.client.send(
    new StartTranscriptionJobCommand({
      TranscriptionJobName: jobName,
      Media: { MediaFileUri: mediaUri },
      OutputBucketName: outputBucket,
      OutputKey: outputKey,
      LanguageCode: 'ja-JP',
      MediaFormat: 'mp3',
      Settings: forceMono
        ? { ShowSpeakerLabels: true, MaxSpeakerLabels: 2 }
        : { ChannelIdentification: true },
    }),
  );
} catch (e: any) {
  logger.error('Failed to start transcription job: %s', e.message);
  throw e;
}

Webhook 受信 & 感情分析ユースケース (`AnalyzeSentimentAsyncUseCase`)

Webhook で通知された transcripts/YYYYMMDD/uuid.json を処理し、以下を実行します。

文字起こし JSON をロード
セグメント抽出 — channel/speaker モードを自動判定
.txt と .meta.json を S3 へアップロード
StartBatchSentimentJob を発行 ➜ jobId 返却
ポーリング でジョブ完了を監視

詳細ロジックは公式を確認してください Amazon Comprehend Insightsの非同期分析

Comprehend 出力 `output.tar.gz` の解析

sentiments/YYYYMMDD/uuid/
├── output/output.tar.gz  ← 単一・無拡張子ファイル内に NDJSON
└── uuid.txt               ← インプット

行順が保証されない

Line フィールドで必ずソート:

lines.sort((a, b) => a.Line - b.Line);

`meta.json` とのマージ

meta.json は id / speaker / startTimeSec / text などを保持。行番号ソート後に :

const segments = meta.map((m, i) => ({
  ...m,
  sentiment: lines[i]?.Sentiment ?? 'UNKNOWN',
  score:     lines[i]?.SentimentScore ?? defaultScore,
}));

実装例

import { fetch } from 'undici';
import * as zlib from 'zlib';
import * as tar from 'tar-stream';
import { logger } from '~/infrastructure/logger/logger';

async function fetchAndParseSentimentOutput(uri: string) {
  logger.debug('Fetching sentiment output from: %s', uri);

  // 1. ダウンロード
  const res = await fetch(uri);
  if (!res.ok) throw new Error(`Failed to fetch output.tar.gz: ${res.status}`);
  const arrayBuffer = await res.arrayBuffer();
  const buffer = Buffer.from(arrayBuffer);

  // 2. 解凍＋tar 展開
  const extract = tar.extract();
  const sentiments: Array<{ Line: number; Sentiment: string; SentimentScore: any }> = [];

  extract.on('entry', async (header, stream, next) => {
    if (header.name.endsWith('.ndjson')) {
      let chunk = '';
      stream.on('data', (buf) => (chunk += buf.toString()));
      stream.on('end', () => {
        for (const line of chunk.split('\n').filter((l) => l.trim())) {
          try {
            sentiments.push(JSON.parse(line));
          } catch {}
        }
        next();
      });
    } else {
      // テキスト等、不要なファイルはスキップ
      stream.resume();
      stream.on('end', next);
    }
  });

  // zlib で gunzip → tar-stream で展開
  await new Promise<void>((resolve, reject) => {
    extract.on('finish', resolve);
    extract.on('error', reject);
    zlib.gunzip(buffer, (err, result) => {
      if (err) return reject(err);
      extract.end(result);
    });
  });

  // 3. Line フィールドでソート
  sentiments.sort((a, b) => a.Line - b.Line);
  logger.debug('Parsed %d sentiment records', sentiments.length);

  return sentiments;
}

まとめ

Transcribe の channel_label ( ch_1 ) と speaker_label ( spk_1 ) によって JSON 構造が異なる点を押さえる
Comprehend へ渡す前に 1 行 1 発話テキスト＋メタ情報の 2 本構成に整形
output.tar.gz は単一 NDJSON ファイル。行順はソートで補正

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up