AIソリューション Advent Calendar 2025

Agent-to-Agent通信とMCPを学ぶマルチエージェントシステム開発 - 05_本番環境での想定運用について

Last updated at 2025-12-18Posted at 2025-12-18

Azure Container Appsで作る企業級マルチエージェント環境

はじめに

ここまでの記事で、Agent-to-Agent（A2A）通信とModel Context Protocol（MCP）の理論から実装、実際の動作確認まで学んできました。最終回となる今回は、本番環境での運用を想定したAzure Container Appsへの展開について解説します。

単なるデプロイ手順ではなく、企業環境で求められるセキュリティ、スケーラビリティ、運用保守性を満たすアーキテクチャの構築方法を詳しく説明します。

🏗️ Azure アーキテクチャ設計

システム全体構成

主要コンポーネントの役割

コンポーネント	役割	特徴
Container Apps	マルチエージェントアプリケーション実行環境	オートスケール、ゼロダウンタイム
Azure OpenAI	GPT-4o基盤AI推論エンジン（将来GPT-5.1対応）	エンタープライズグレード、高度推論
Reasoning Coordinator	マルチエージェント推論協調システム	動的タスク決定、AI判断機能
Cosmos DB	エージェント間協調データ・顧客データ	グローバル分散、低レイテンシ
Storage Account	ファイル・ログ・バックアップ	高可用性、コスト効率
Redis Cache	リアルタイムデータキャッシュ	インメモリ高速アクセス
Azure OpenAI	AI推論エンジン	エンタープライズグレード
Key Vault	機密情報管理	ゼロトラスト対応

🚀 Infrastructure as Code 実装

🎯 学習ポイント: エンタープライズグレードのIaC設計

Bicep テンプレートの詳細解析

1. メインテンプレート構造

// infra/main.bicep
@description('アプリケーション名')
param applicationName string = 'multiagent'

@description('環境名 (dev/test/prod)')
param environment string

@description('Azure OpenAIのリージョン')
param openAiLocation string = 'eastus'

@description('デプロイするリージョン')
param location string = resourceGroup().location

// 命名規則に基づくリソース名生成
var resourceToken = toLower(uniqueString(subscription().id, resourceGroup().id, location))
var prefix = '${applicationName}-${environment}'

// タグの統一管理
var tags = {
  'azd-env-name': environment
  application: applicationName
  environment: environment
  'deployment-method': 'bicep'
  'last-updated': utcNow('yyyy-MM-dd')
}

📚 設計パターンの理解:

// 🏗️ 命名規則の重要性
// ❌ 悪い例: 手動命名
var badName = 'myapp-container-1'

// ✅ 良い例: システマティック命名  
var goodName = '${applicationName}-${environment}-${resourceToken}'
// → 結果: multiagent-prod-abc123 (一意性保証)

// 🏷️ タグ戦略
// ビジネス価値:
// - コスト管理: 環境別・アプリ別の費用追跡
// - 運用管理: 自動化スクリプトでの対象リソース特定
// - 監査対応: デプロイ履歴とガバナンス

2. Container Apps Environment

// Container Apps 環境の作成
resource containerAppsEnvironment 'Microsoft.App/managedEnvironments@2023-05-01' = {
  name: '${prefix}-env-${resourceToken}'
  location: location
  tags: tags
  properties: {
    appLogsConfiguration: {
      destination: 'log-analytics'
      logAnalyticsConfiguration: {
        customerId: logAnalyticsWorkspace.properties.customerId
        sharedKey: logAnalyticsWorkspace.listKeys().primarySharedKey
      }
    }
    // VNet統合設定（本番環境では推奨）
    vnetConfiguration: {
      infrastructureSubnetId: virtualNetwork.properties.subnets[0].id
      internal: false  // 外部アクセス許可
    }
    // ダイナミックSSL証明書
    customDomainConfiguration: {
      dnsSuffix: '${prefix}.${location}.azurecontainer.io'
    }
  }
}

3. データストレージ設計

// Cosmos DB - エージェント間協調データ
resource cosmosAccount 'Microsoft.DocumentDB/databaseAccounts@2023-04-15' = {
  name: '${prefix}-cosmos-${resourceToken}'
  location: location
  tags: tags
  kind: 'GlobalDocumentDB'
  properties: {
    databaseAccountOfferType: 'Standard'
    consistencyPolicy: {
      defaultConsistencyLevel: 'Session'  // 一貫性とパフォーマンスのバランス
    }
    locations: [
      {
        locationName: location
        failoverPriority: 0
        isZoneRedundant: true  // 高可用性
      }
    ]
    // セキュリティ設定
    publicNetworkAccess: 'Enabled'
    networkAclBypass: 'AzureServices'
    disableKeyBasedMetadataWriteAccess: true
    
    // バックアップ設定
    backupPolicy: {
      type: 'Periodic'
      periodicModeProperties: {
        backupIntervalInMinutes: 240
        backupRetentionIntervalInHours: 8760  // 1年間保持
      }
    }
  }
}

// Cosmos DB データベースとコンテナ
resource cosmosDatabase 'Microsoft.DocumentDB/databaseAccounts/sqlDatabases@2023-04-15' = {
  parent: cosmosAccount
  name: 'AgentCoordination'
  properties: {
    resource: {
      id: 'AgentCoordination'
    }
    options: {
      throughput: 400  // スタートアップ時は低コスト設定
    }
  }
}

// エージェント通信ログコンテナ
resource agentLogsContainer 'Microsoft.DocumentDB/databaseAccounts/sqlDatabases/containers@2023-04-15' = {
  parent: cosmosDatabase
  name: 'AgentLogs'
  properties: {
    resource: {
      id: 'AgentLogs'
      partitionKey: {
        paths: ['/agent_id']
        kind: 'Hash'
      }
      indexingPolicy: {
        indexingMode: 'consistent'
        includedPaths: [
          { path: '/timestamp/?' }
          { path: '/workflow_id/?' }
          { path: '/query_type/?' }
        ]
      }
      // データのライフサイクル管理
      defaultTtl: 2592000  // 30日後自動削除
    }
  }
}

4. AI サービス統合

// Azure OpenAI Service
resource openAiService 'Microsoft.CognitiveServices/accounts@2023-05-01' = {
  name: '${prefix}-openai-${resourceToken}'
  location: openAiLocation
  tags: tags
  kind: 'OpenAI'
  sku: {
    name: 'S0'  // 従量課金制
  }
  properties: {
    customSubDomainName: '${prefix}-openai-${resourceToken}'
    publicNetworkAccess: 'Enabled'
    networkAcls: {
      defaultAction: 'Allow'
      ipRules: []
      virtualNetworkRules: []
    }
  }
}

// GPT-4o デプロイメント
resource gpt4Deployment 'Microsoft.CognitiveServices/accounts/deployments@2023-05-01' = {
  parent: openAiService
  name: 'gpt-4o'
  properties: {
    model: {
      format: 'OpenAI'
      name: 'gpt-4o'
      version: '2024-05-13'
    }
    raiPolicyName: 'Microsoft.Default'
    scaleSettings: {
      scaleType: 'Standard'
      capacity: 10  // スタートアップ時は低容量
    }
  }
}

// Text Embedding モデル
resource embeddingDeployment 'Microsoft.CognitiveServices/accounts/deployments@2023-05-01' = {
  parent: openAiService
  name: 'text-embedding-3-small'
  properties: {
    model: {
      format: 'OpenAI'
      name: 'text-embedding-3-small'
      version: '1'
    }
    scaleSettings: {
      scaleType: 'Standard'
      capacity: 5
    }
  }
  dependsOn: [gpt4Deployment]  // 順次デプロイ
}

5. Container App の詳細設定

// メインアプリケーション Container App
resource containerApp 'Microsoft.App/containerApps@2023-05-01' = {
  name: '${prefix}-app-${resourceToken}'
  location: location
  tags: tags
  properties: {
    managedEnvironmentId: containerAppsEnvironment.id
    
    configuration: {
      ingress: {
        external: true
        targetPort: 8000
        allowInsecure: false  // HTTPS強制
        traffic: [
          {
            weight: 100
            latestRevision: true
          }
        ]
        corsPolicy: {
          allowedOrigins: ['*']  // 本番環境では制限すること
          allowedMethods: ['GET', 'POST', 'PUT', 'DELETE', 'OPTIONS']
          allowedHeaders: ['*']
          allowCredentials: true
        }
      }
      
      // レジストリ設定
      registries: [
        {
          server: containerRegistry.properties.loginServer
          identity: userAssignedIdentity.id
        }
      ]
      
      // シークレット管理
      secrets: [
        {
          name: 'azure-openai-key'
          keyVaultUrl: '${keyVault.properties.vaultUri}secrets/azure-openai-key'
          identity: userAssignedIdentity.id
        }
        {
          name: 'cosmos-connection-string'
          keyVaultUrl: '${keyVault.properties.vaultUri}secrets/cosmos-connection-string'
          identity: userAssignedIdentity.id
        }
      ]
    }
    
    template: {
      containers: [
        {
          image: '${containerRegistry.properties.loginServer}/multiagent:latest'
          name: 'multiagent'
          resources: {
            cpu: json('1.0')    // 1 vCPU
            memory: '2Gi'       // 2GB RAM
          }
          
          // 環境変数設定
          env: [
            {
              name: 'AZURE_OPENAI_ENDPOINT'
              value: openAiService.properties.endpoint
            }
            {
              name: 'AZURE_OPENAI_API_KEY'
              secretRef: 'azure-openai-key'
            }
            {
              name: 'COSMOS_CONNECTION_STRING'
              secretRef: 'cosmos-connection-string'
            }
            {
              name: 'ENVIRONMENT'
              value: environment
            }
            {
              name: 'DEBUG'
              value: environment == 'dev' ? 'true' : 'false'
            }
          ]
          
          // ヘルスチェック設定
          probes: [
            {
              type: 'Liveness'
              httpGet: {
                path: '/health'
                port: 8000
                scheme: 'HTTP'
              }
              initialDelaySeconds: 30
              periodSeconds: 30
              timeoutSeconds: 10
              failureThreshold: 3
            }
            {
              type: 'Readiness'
              httpGet: {
                path: '/health'
                port: 8000
                scheme: 'HTTP'
              }
              initialDelaySeconds: 5
              periodSeconds: 10
              timeoutSeconds: 5
              failureThreshold: 3
            }
          ]
        }
      ]
      
      // オートスケール設定
      scale: {
        minReplicas: environment == 'prod' ? 2 : 1
        maxReplicas: environment == 'prod' ? 10 : 3
        rules: [
          {
            name: 'http-scaling'
            http: {
              metadata: {
                concurrentRequests: '100'
              }
            }
          }
          {
            name: 'cpu-scaling'
            custom: {
              type: 'cpu'
              metadata: {
                type: 'Utilization'
                value: '70'
              }
            }
          }
        ]
      }
    }
  }
  identity: {
    type: 'UserAssigned'
    userAssignedIdentities: {
      '${userAssignedIdentity.id}': {}
    }
  }
}

6. セキュリティとアクセス管理

// ユーザー割り当てマネージドID
resource userAssignedIdentity 'Microsoft.ManagedIdentity/userAssignedIdentities@2023-01-31' = {
  name: '${prefix}-identity-${resourceToken}'
  location: location
  tags: tags
}

// Key Vault
resource keyVault 'Microsoft.KeyVault/vaults@2023-02-01' = {
  name: '${prefix}-kv-${resourceToken}'
  location: location
  tags: tags
  properties: {
    sku: {
      family: 'A'
      name: 'standard'
    }
    tenantId: tenant().tenantId
    
    // アクセスポリシー
    accessPolicies: [
      {
        tenantId: tenant().tenantId
        objectId: userAssignedIdentity.properties.principalId
        permissions: {
          secrets: ['get', 'list']
          certificates: ['get', 'list']
        }
      }
    ]
    
    // セキュリティ設定
    enabledForDeployment: false
    enabledForTemplateDeployment: true
    enabledForDiskEncryption: false
    enableRbacAuthorization: false
    enableSoftDelete: true
    softDeleteRetentionInDays: 90
    enablePurgeProtection: true
    
    // ネットワークアクセス制御
    networkAcls: {
      defaultAction: 'Allow'  // 本番環境では'Deny'を推奨
      bypass: 'AzureServices'
    }
  }
}

// シークレット格納
resource openAiKeySecret 'Microsoft.KeyVault/vaults/secrets@2023-02-01' = {
  parent: keyVault
  name: 'azure-openai-key'
  properties: {
    value: openAiService.listKeys().key1
    contentType: 'text/plain'
    attributes: {
      enabled: true
    }
  }
}

resource cosmosConnectionSecret 'Microsoft.KeyVault/vaults/secrets@2023-02-01' = {
  parent: keyVault
  name: 'cosmos-connection-string'
  properties: {
    value: cosmosAccount.listConnectionStrings().connectionStrings[0].connectionString
    contentType: 'text/plain'
    attributes: {
      enabled: true
    }
  }
}

パラメータファイル設定

// infra/main.parameters.json
{
  "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentParameters.json#",
  "contentVersion": "1.0.0.0",
  "parameters": {
    "applicationName": {
      "value": "multiagent"
    },
    "environment": {
      "value": "prod"
    },
    "location": {
      "value": "japaneast"
    },
    "openAiLocation": {
      "value": "eastus"
    }
  }
}

🔧 Azure Developer CLI による展開

1. azd 設定ファイル

# azure.yaml
name: multiagent-system
metadata:
  template: multiagent-system@0.0.1-beta
  
services:
  multiagent:
    project: .
    language: python
    host: containerapp
    
# フック設定
hooks:
  preprovision:
    posix:
      shell: sh
      run: echo "リソースプロビジョニング開始..."
      continueOnError: false
      
  postprovision:
    posix:
      shell: sh
      run: |
        echo "追加設定を実行中..."
        # カスタム設定スクリプトの実行
        
  predeploy:
    posix:
      shell: sh
      run: |
        echo "デプロイ前処理..."
        python -m pytest tests/ --maxfail=1
        
  postdeploy:
    posix:
      shell: sh
      run: |
        echo "デプロイ後検証..."
        python scripts/health_check.py

2. Docker イメージ最適化

# Dockerfile - 本番環境用最適化版
FROM python:3.12-slim as builder

# セキュリティ更新
RUN apt-get update && apt-get upgrade -y && \
    apt-get install -y --no-install-recommends \
    gcc \
    && rm -rf /var/lib/apt/lists/*

# 依存関係インストール
COPY requirements.txt .
RUN pip install --no-cache-dir --user -r requirements.txt

# 本番環境イメージ
FROM python:3.12-slim

# 非rootユーザー作成
RUN groupadd -r appuser && useradd -r -g appuser appuser

# セキュリティ更新
RUN apt-get update && apt-get upgrade -y && \
    rm -rf /var/lib/apt/lists/*

# Python依存関係コピー
COPY --from=builder /root/.local /home/appuser/.local

# アプリケーションコード
WORKDIR /app
COPY src/ ./src/
COPY *.py ./

# 権限設定
RUN chown -R appuser:appuser /app
USER appuser

# PATH設定
ENV PATH=/home/appuser/.local/bin:$PATH

# ポート公開
EXPOSE 8000

# ヘルスチェック
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
  CMD python -c "import requests; requests.get('http://localhost:8000/health')"

# アプリケーション起動
CMD ["python", "src/main.py"]

3. 展開スクリプト

🎯 学習ポイント: エンタープライズデプロイの自動化

#!/bin/bash
# deploy.sh - 本番環境展開スクリプト

set -e

echo "🚀 Multi-Agent System - 本番環境展開開始"

# 環境確認
if [ -z "$AZURE_ENV_NAME" ]; then
    echo "環境名を設定してください: export AZURE_ENV_NAME=prod"
    exit 1
fi

# Azure認証確認
if ! az account show &> /dev/null; then
    echo "Azureにログインしてください: az login"
    exit 1
fi

# azd環境初期化
echo "📋 環境設定確認..."
azd env select $AZURE_ENV_NAME 2>/dev/null || azd env new $AZURE_ENV_NAME

# リソースプロビジョニング
echo "🏗️  Azureリソースプロビジョニング..."
azd provision --no-prompt

# アプリケーションビルド・デプロイ
echo "📦 アプリケーションデプロイ..."
azd deploy --no-prompt

# デプロイ後検証
echo "✅ デプロイ後検証..."
ENDPOINT=$(azd env get-values | grep "SERVICE_MULTIAGENT_ENDPOINT_URL" | cut -d'=' -f2 | tr -d '"')

if [ -n "$ENDPOINT" ]; then
    echo "🌐 エンドポイント: $ENDPOINT"
    
    # ヘルスチェック
    echo "💊 ヘルスチェック実行..."
    if curl -sf "$ENDPOINT/health" > /dev/null; then
        echo "✅ アプリケーション正常稼働中"
    else
        echo "❌ ヘルスチェック失敗"
        exit 1
    fi
    
    # 基本機能テスト
    echo "🧪 基本機能テスト..."
    response=$(curl -s -X POST "$ENDPOINT/coordinator/orchestrate" \
        -H "Content-Type: application/json" \
        -d '{"query": "system test", "query_type": "health"}')
    
    if echo "$response" | grep -q "workflow_id"; then
        echo "✅ エージェント協調動作確認"
    else
        echo "❌ 機能テスト失敗"
        echo "$response"
        exit 1
    fi
    
else
    echo "❌ エンドポイント情報取得失敗"
    exit 1
fi

echo "🎉 デプロイ完了！"
echo "📊 Azure Portal: https://portal.azure.com"
echo "📖 API仕様: $ENDPOINT/docs"

📚 デプロイプロセスの段階的理解:

# 🔐 Step 1: 認証・環境確認
# なぜ重要？ → 誤った環境へのデプロイを防止
az account show    # アクティブなサブスクリプション確認
azd env list       # 利用可能な環境一覧

# 🏗️ Step 2: インフラストラクチャプロビジョニング 
azd provision --no-prompt
# 内部処理：
# 1. Bicepテンプレート解析 → ARM JSONに変換
# 2. リソース依存関係解決 → 適切な順序でリソース作成
# 3. パラメーター検証 → 設定値の妥当性確認
# 4. プログレッシブデプロイ → 段階的リソース作成

# 📦 Step 3: アプリケーションデプロイ
azd deploy --no-prompt  
# 内部処理：
# 1. Dockerイメージビルド → Multi-stage build最適化
# 2. Container Registry プッシュ → イメージの安全な格納
# 3. Container Apps 更新 → Blue-Green deployment
# 4. 設定注入 → Key Vault からの自動シークレット取得

# ✅ Step 4: 検証・ヘルスチェック
curl $ENDPOINT/health
# 検証項目：
# - HTTP応答性 → Webサーバー稼働確認
# - エージェント初期化状態 → 4つのエージェントの準備完了
# - 外部接続性 → Azure OpenAI、Cosmos DB接続
# - メモリ・CPU使用率 → リソース使用状況

🔧 トラブルシューティングのアプローチ:

# デプロイ失敗時の診断コマンド
azd env get-values                    # 環境変数確認
az containerapp logs show --follow    # リアルタイムログ確認  
az monitor app-insights events show   # Application Insights イベント
az resource list --tag azd-env-name=prod  # 関連リソース一覧

# よくある問題と解決策:
# ❌ リソース作成権限不足 → az account show でロール確認
# ❌ リージョン制限 → openAiLocation パラメーター変更
# ❌ 命名競合 → resourceToken の再生成（Resource Group削除・再作成）
# ❌ 容量不足 → SKU サイズの調整（development → standard）

📊 運用・監視設定

1. Application Insights 統合

# src/shared/telemetry.py
from azure.monitor.opentelemetry import configure_azure_monitor
from opentelemetry import trace
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
from opentelemetry.instrumentation.requests import RequestsInstrumentor
import os

class TelemetryManager:
    def __init__(self):
        self.connection_string = os.getenv('APPLICATIONINSIGHTS_CONNECTION_STRING')
        self.tracer = None
        
    def setup_telemetry(self, app):
        """Azure Monitor テレメトリ設定"""
        if self.connection_string:
            # Azure Monitor設定
            configure_azure_monitor(
                connection_string=self.connection_string,
                disable_logging=False,
                disable_metrics=False,
                disable_tracing=False
            )
            
            # 自動計測設定
            FastAPIInstrumentor.instrument_app(app)
            RequestsInstrumentor().instrument()
            
            # カスタムトレーサー
            self.tracer = trace.get_tracer(__name__)
            
    def trace_agent_communication(self, agent_name: str, operation: str):
        """エージェント間通信のトレース"""
        if self.tracer:
            with self.tracer.start_as_current_span(f"agent_{operation}") as span:
                span.set_attribute("agent.name", agent_name)
                span.set_attribute("operation.type", operation)
                return span
        return None

# 使用例
telemetry = TelemetryManager()

# src/main.py でのセットアップ
from src.shared.telemetry import telemetry

@app.on_event("startup")
async def startup_event():
    telemetry.setup_telemetry(app)
    # ... その他の初期化処理

2. カスタムメトリクス定義

# src/shared/metrics.py
from azure.monitor.opentelemetry.exporter import AzureMonitorMetricExporter
from opentelemetry import metrics
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader
import time

class AgentMetrics:
    def __init__(self):
        # Azure Monitor Metrics設定
        exporter = AzureMonitorMetricExporter(
            connection_string=os.getenv('APPLICATIONINSIGHTS_CONNECTION_STRING')
        )
        reader = PeriodicExportingMetricReader(exporter, export_interval_millis=60000)
        metrics.set_meter_provider(MeterProvider(metric_readers=[reader]))
        
        meter = metrics.get_meter(__name__)
        
        # カスタムメトリクス定義
        self.agent_requests = meter.create_counter(
            name="agent_requests_total",
            description="エージェントへのリクエスト総数",
            unit="request"
        )
        
        self.agent_response_time = meter.create_histogram(
            name="agent_response_time",
            description="エージェント応答時間",
            unit="ms"
        )
        
        self.agent_errors = meter.create_counter(
            name="agent_errors_total", 
            description="エージェントエラー総数",
            unit="error"
        )
        
        self.active_workflows = meter.create_up_down_counter(
            name="active_workflows",
            description="実行中ワークフロー数",
            unit="workflow"
        )
    
    def record_request(self, agent_name: str, query_type: str):
        """リクエスト記録"""
        self.agent_requests.add(1, {
            "agent": agent_name,
            "query_type": query_type
        })
    
    def record_response_time(self, agent_name: str, duration_ms: float):
        """応答時間記録"""
        self.agent_response_time.record(duration_ms, {
            "agent": agent_name
        })
    
    def record_error(self, agent_name: str, error_type: str):
        """エラー記録"""
        self.agent_errors.add(1, {
            "agent": agent_name,
            "error_type": error_type
        })

3. アラート設定

// monitoring/alerts.json - Azure Monitor アラート定義
{
  "alerts": [
    {
      "name": "High Error Rate",
      "description": "エージェントエラー率が閾値を超過",
      "condition": {
        "metric": "agent_errors_total",
        "threshold": 10,
        "timeWindow": "PT5M",
        "operator": "GreaterThan"
      },
      "actions": [
        {
          "type": "email",
          "recipients": ["admin@company.com"]
        },
        {
          "type": "webhook",
          "url": "https://hooks.slack.com/services/..."
        }
      ]
    },
    {
      "name": "High Response Time",
      "description": "応答時間が閾値を超過",
      "condition": {
        "metric": "agent_response_time",
        "threshold": 5000,
        "timeWindow": "PT10M",
        "operator": "GreaterThan"
      }
    },
    {
      "name": "Container App Down",
      "description": "Container Appが停止",
      "condition": {
        "metric": "RevisionReadyReplicas",
        "threshold": 1,
        "timeWindow": "PT1M",
        "operator": "LessThan"
      }
    }
  ]
}

🔐 セキュリティ設定

1. ネットワークセキュリティ

// VNet統合によるネットワーク分離
resource virtualNetwork 'Microsoft.Network/virtualNetworks@2023-04-01' = {
  name: '${prefix}-vnet-${resourceToken}'
  location: location
  tags: tags
  properties: {
    addressSpace: {
      addressPrefixes: ['10.0.0.0/16']
    }
    subnets: [
      {
        name: 'container-subnet'
        properties: {
          addressPrefix: '10.0.1.0/24'
          delegations: [
            {
              name: 'Microsoft.App/environments'
              properties: {
                serviceName: 'Microsoft.App/environments'
              }
            }
          ]
        }
      }
      {
        name: 'private-endpoints-subnet'
        properties: {
          addressPrefix: '10.0.2.0/24'
          privateEndpointNetworkPolicies: 'Disabled'
        }
      }
    ]
  }
}

// Private Endpoint for Cosmos DB
resource cosmosPrivateEndpoint 'Microsoft.Network/privateEndpoints@2023-04-01' = {
  name: '${prefix}-cosmos-pe-${resourceToken}'
  location: location
  tags: tags
  properties: {
    subnet: {
      id: virtualNetwork.properties.subnets[1].id
    }
    privateLinkServiceConnections: [
      {
        name: 'cosmos-private-link'
        properties: {
          privateLinkServiceId: cosmosAccount.id
          groupIds: ['Sql']
        }
      }
    ]
  }
}

2. アクセス制御

# src/shared/security.py
from fastapi import HTTPException, Depends, status
from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
from azure.identity import DefaultAzureCredential
from azure.keyvault.secrets import SecretClient
import jwt
import os

security = HTTPBearer()

class SecurityManager:
    def __init__(self):
        self.kv_url = os.getenv('KEY_VAULT_URL')
        self.credential = DefaultAzureCredential()
        self.secret_client = SecretClient(
            vault_url=self.kv_url,
            credential=self.credential
        ) if self.kv_url else None
    
    async def verify_token(
        self, 
        credentials: HTTPAuthorizationCredentials = Depends(security)
    ):
        """JWTトークン検証"""
        try:
            token = credentials.credentials
            
            # Azure AD トークン検証
            payload = jwt.decode(
                token,
                options={"verify_signature": False},  # Azure ADで署名検証済み
                audience=os.getenv('AZURE_CLIENT_ID')
            )
            
            return payload
            
        except jwt.ExpiredSignatureError:
            raise HTTPException(
                status_code=status.HTTP_401_UNAUTHORIZED,
                detail="Token has expired"
            )
        except jwt.JWTError:
            raise HTTPException(
                status_code=status.HTTP_401_UNAUTHORIZED,
                detail="Invalid token"
            )
    
    async def check_admin_role(self, user_info = Depends(verify_token)):
        """管理者権限確認"""
        roles = user_info.get('roles', [])
        if 'Admin' not in roles:
            raise HTTPException(
                status_code=status.HTTP_403_FORBIDDEN,
                detail="Admin role required"
            )
        return user_info

# 使用例
security_manager = SecurityManager()

@app.post("/admin/reset-system")
async def reset_system(
    user_info = Depends(security_manager.check_admin_role)
):
    """管理者のみアクセス可能なシステムリセット"""
    try:
        # システム状態の確認
        system_status = await check_system_health()
        
        if system_status.get("active_operations", 0) > 0:
            return {
                "error": "アクティブな操作があるため、リセットできません",
                "active_operations": system_status["active_operations"]
            }
        
        # 各エージェントに安全な停止指示
        agents = ["customer", "city", "enterprise", "coordinator"]
        shutdown_results = {}
        
        for agent in agents:
            try:
                result = await request_agent_shutdown(agent)
                shutdown_results[agent] = result
            except Exception as e:
                shutdown_results[agent] = {"error": str(e)}
        
        # システム状態のクリア
        await clear_system_state()
        
        # 監査ログの記録
        await log_admin_action(
            user_id=user_info.get("user_id"),
            action="system_reset",
            timestamp=datetime.now().isoformat(),
            details=shutdown_results
        )
        
        return {
            "status": "success",
            "message": "システムリセットが完了しました",
            "shutdown_results": shutdown_results,
            "reset_by": user_info.get("user_id")
        }
        
    except Exception as e:
        # エラーログの記録
        await log_error("system_reset_failed", str(e))
        return {
            "error": f"システムリセットに失敗しました: {str(e)}",
            "timestamp": datetime.now().isoformat()
        }

📈 スケーリング戦略

1. 自動スケーリング設定

# 高負荷時のスケーリング設定
scale:
  minReplicas: 2  # 最小2インスタンス（高可用性）
  maxReplicas: 20  # 最大20インスタンス
  rules:
    # HTTP負荷ベース
    - name: http-scaling
      http:
        metadata:
          concurrentRequests: '50'  # インスタンス当たり50リクエスト
          
    # CPU使用率ベース  
    - name: cpu-scaling
      custom:
        type: 'cpu'
        metadata:
          type: 'Utilization'
          value: '70'  # CPU 70%で拡張
          
    # メモリ使用率ベース
    - name: memory-scaling
      custom:
        type: 'memory'
        metadata:
          type: 'Utilization'  
          value: '80'  # メモリ 80%で拡張
          
    # カスタムメトリクスベース
    - name: queue-scaling
      custom:
        type: 'azure-servicebus'
        metadata:
          queueName: 'agent-queue'
          messageCount: '10'  # キュー内10メッセージで拡張

2. データベーススケーリング

// Cosmos DB オートスケール設定
resource cosmosDatabase 'Microsoft.DocumentDB/databaseAccounts/sqlDatabases@2023-04-15' = {
  parent: cosmosAccount
  name: 'AgentCoordination'
  properties: {
    resource: {
      id: 'AgentCoordination'
    }
    options: {
      autoscaleSettings: {
        maxThroughput: 4000  // 最大4000 RU/s
      }
    }
  }
}

🔄 CI/CD パイプライン

1. GitHub Actions ワークフロー

# .github/workflows/deploy-production.yml
name: Deploy to Production

on:
  push:
    branches: [main]
  workflow_dispatch:

env:
  AZURE_CLIENT_ID: ${{ secrets.AZURE_CLIENT_ID }}
  AZURE_TENANT_ID: ${{ secrets.AZURE_TENANT_ID }}
  AZURE_SUBSCRIPTION_ID: ${{ secrets.AZURE_SUBSCRIPTION_ID }}

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Setup Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.12'
          
      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          pip install -r requirements.txt
          pip install pytest pytest-asyncio pytest-cov
          
      - name: Run tests
        run: |
          pytest tests/ --cov=src --cov-report=xml
          
      - name: Code quality check
        run: |
          pip install black flake8 mypy
          black --check src/
          flake8 src/
          mypy src/

  security-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Security vulnerability scan
        run: |
          pip install safety bandit
          safety check -r requirements.txt
          bandit -r src/

  deploy:
    needs: [test, security-scan]
    runs-on: ubuntu-latest
    environment: production
    
    steps:
      - uses: actions/checkout@v4
      
      - name: Setup Azure CLI
        uses: azure/login@v1
        with:
          client-id: ${{ secrets.AZURE_CLIENT_ID }}
          tenant-id: ${{ secrets.AZURE_TENANT_ID }}
          subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
          
      - name: Setup Azure Developer CLI
        uses: Azure/setup-azd@v0.1.0
        
      - name: Deploy to Azure
        run: |
          azd env select production
          azd deploy --no-prompt
          
      - name: Post-deployment verification
        run: |
          # デプロイ後のヘルスチェック
          python scripts/verify_deployment.py

2. デプロイ後検証スクリプト

# scripts/verify_deployment.py
import asyncio
import aiohttp
import os
import sys
from typing import List, Dict

class DeploymentVerifier:
    def __init__(self, base_url: str):
        self.base_url = base_url.rstrip('/')
        self.session = None
    
    async def __aenter__(self):
        self.session = aiohttp.ClientSession()
        return self
    
    async def __aexit__(self, exc_type, exc_val, exc_tb):
        if self.session:
            await self.session.close()
    
    async def verify_health(self) -> bool:
        """ヘルスチェック"""
        try:
            async with self.session.get(f"{self.base_url}/health") as response:
                if response.status == 200:
                    data = await response.json()
                    return data.get('status') == 'healthy'
                return False
        except Exception as e:
            print(f"Health check failed: {e}")
            return False
    
    async def verify_agents(self) -> Dict[str, bool]:
        """各エージェントの動作確認"""
        agents = [
            'customer-support',
            'smart-city', 
            'enterprise',
            'coordinator'
        ]
        
        results = {}
        for agent in agents:
            try:
                async with self.session.get(f"{self.base_url}/agents/{agent}/status") as response:
                    results[agent] = response.status == 200
            except Exception as e:
                print(f"Agent {agent} check failed: {e}")
                results[agent] = False
        
        return results
    
    async def verify_coordination(self) -> bool:
        """エージェント協調動作確認"""
        payload = {
            "query": "deployment verification test",
            "query_type": "system_test"
        }
        
        try:
            async with self.session.post(
                f"{self.base_url}/coordinator/orchestrate",
                json=payload
            ) as response:
                if response.status == 200:
                    data = await response.json()
                    return 'workflow_id' in data
                return False
        except Exception as e:
            print(f"Coordination test failed: {e}")
            return False

async def main():
    endpoint = os.getenv('DEPLOYMENT_ENDPOINT')
    if not endpoint:
        print("ERROR: DEPLOYMENT_ENDPOINT environment variable not set")
        sys.exit(1)
    
    print(f"🔍 Verifying deployment: {endpoint}")
    
    async with DeploymentVerifier(endpoint) as verifier:
        # ヘルスチェック
        print("1. Health check...")
        health_ok = await verifier.verify_health()
        print(f"   {'✅' if health_ok else '❌'} Health: {'OK' if health_ok else 'FAILED'}")
        
        if not health_ok:
            print("❌ Deployment verification failed at health check")
            sys.exit(1)
        
        # エージェント確認
        print("2. Agent status check...")
        agent_results = await verifier.verify_agents()
        for agent, status in agent_results.items():
            print(f"   {'✅' if status else '❌'} {agent}: {'OK' if status else 'FAILED'}")
        
        if not all(agent_results.values()):
            print("❌ Some agents are not responding")
            sys.exit(1)
        
        # 協調動作確認
        print("3. Coordination test...")
        coord_ok = await verifier.verify_coordination()
        print(f"   {'✅' if coord_ok else '❌'} Coordination: {'OK' if coord_ok else 'FAILED'}")
        
        if not coord_ok:
            print("❌ Agent coordination test failed")
            sys.exit(1)
    
    print("🎉 Deployment verification completed successfully!")

if __name__ == "__main__":
    asyncio.run(main())

💰 コスト最適化

1. リソース使用量監視

-- Azure Cost Management クエリ例
// コスト分析クエリ
summarize TotalCost = sum(CostInBillingCurrency) by bin(Date, 1d), ServiceName
| where Date >= ago(30d)
| render timechart

// リソース別コスト
summarize TotalCost = sum(CostInBillingCurrency) by ResourceName
| order by TotalCost desc
| take 10

2. 自動最適化スクリプト

# scripts/cost_optimizer.py
import asyncio
from azure.identity import DefaultAzureCredential
from azure.mgmt.consumption import ConsumptionManagementClient
from azure.mgmt.monitor import MonitorManagementClient
import datetime

class CostOptimizer:
    def __init__(self, subscription_id: str):
        self.subscription_id = subscription_id
        self.credential = DefaultAzureCredential()
        self.consumption_client = ConsumptionManagementClient(
            self.credential, subscription_id
        )
    
    async def analyze_usage_patterns(self):
        """使用パターン分析"""
        end_date = datetime.datetime.now()
        start_date = end_date - datetime.timedelta(days=30)
        
        # 使用量データ取得
        usage_details = self.consumption_client.usage_details.list(
            scope=f"/subscriptions/{self.subscription_id}",
            start_date=start_date.strftime('%Y-%m-%d'),
            end_date=end_date.strftime('%Y-%m-%d')
        )
        
        # コスト分析
        costs_by_service = {}
        for usage in usage_details:
            service = usage.consumed_service
            cost = usage.cost
            costs_by_service[service] = costs_by_service.get(service, 0) + cost
        
        return costs_by_service
    
    async def recommend_optimizations(self):
        """最適化提案"""
        costs = await self.analyze_usage_patterns()
        recommendations = []
        
        # Container Apps の使用率確認
        if 'Microsoft.App' in costs:
            recommendations.append({
                "service": "Container Apps",
                "current_cost": costs['Microsoft.App'],
                "recommendation": "低使用率時間帯のmin replicasを1に設定",
                "potential_savings": costs['Microsoft.App'] * 0.3
            })
        
        # Cosmos DB の使用率確認
        if 'Microsoft.DocumentDB' in costs:
            recommendations.append({
                "service": "Cosmos DB", 
                "current_cost": costs['Microsoft.DocumentDB'],
                "recommendation": "低使用率時間帯のRU/sを削減",
                "potential_savings": costs['Microsoft.DocumentDB'] * 0.2
            })
        
        return recommendations

# 実行例
optimizer = CostOptimizer(subscription_id="your-subscription-id")
recommendations = await optimizer.recommend_optimizations()
for rec in recommendations:
    print(f"💰 {rec['service']}: {rec['recommendation']}")
    print(f"   潜在的節約額: ${rec['potential_savings']:.2f}/month")

🎯 運用のベストプラクティス

1. 環境管理

# 環境別展開スクリプト
#!/bin/bash

deploy_to_environment() {
    local env=$1
    local resource_group="multiagent-${env}-rg"
    
    echo "🚀 Deploying to ${env} environment..."
    
    # 環境別パラメータファイル
    local params_file="infra/main.parameters.${env}.json"
    
    if [ ! -f "$params_file" ]; then
        echo "❌ Parameters file not found: $params_file"
        exit 1
    fi
    
    # 展開実行
    azd env select $env
    azd provision --no-prompt
    azd deploy --no-prompt
    
    # 環境別後処理
    case $env in
        "dev")
            echo "🔧 Setting up development configuration..."
            # 開発環境固有設定
            ;;
        "staging")
            echo "🧪 Setting up staging configuration..."
            # ステージング環境固有設定
            ;;
        "prod")
            echo "🏭 Setting up production configuration..."
            # 本番環境固有設定
            ./scripts/setup_production_monitoring.sh
            ;;
    esac
}

# 使用例
# ./deploy_to_environment.sh dev
# ./deploy_to_environment.sh staging  
# ./deploy_to_environment.sh prod

2. 災害復旧計画

# disaster-recovery.yml
disaster_recovery:
  backup_strategy:
    cosmos_db:
      continuous_backup: enabled
      geo_redundancy: enabled
      retention_days: 30
      
    storage_account:
      geo_redundant_storage: enabled
      soft_delete: enabled
      retention_days: 365
      
    application_data:
      export_schedule: "0 2 * * *"  # 毎日2時
      storage_location: "backup-storage-account"
      
  recovery_procedures:
    rto: "4 hours"  # Recovery Time Objective
    rpo: "1 hour"   # Recovery Point Objective
    
    steps:
      1: "Primary regionの状態確認"
      2: "Secondary regionでのコンテナアプリ起動"
      3: "データベース復旧"
      4: "DNSフェイルオーバー"
      5: "アプリケーション動作確認"
      
  testing:
    schedule: "quarterly"
    automated_checks:
      - backup_integrity
      - restoration_process
      - failover_procedures

📊 まとめ

本記事では、マルチエージェントシステムの企業級Azure展開について詳しく解説しました。

🎯 実現できること

技術的成果

✅ 可用性99.9%以上: Azure Container Appsのマルチリージョン展開
✅ 自動スケーリング: 需要に応じた柔軟なリソース調整
✅ セキュリティ強化: VNet統合・Private Endpoint・Key Vault
✅ 運用自動化: CI/CD・監視・アラート・ログ収集

ビジネス効果

📈 応答時間短縮: 数時間 → 数分
💰 運用コスト削減: 自動化による人的コスト減少
🔒 コンプライアンス対応: エンタープライズセキュリティ基準
🚀 スケーラビリティ: ビジネス成長に対応

🛣️ 導入ロードマップ

Phase 1: 基盤構築 (2-3週間)

# 最小構成でのデプロイ
azd init --template multiagent-system
azd provision --environment dev
azd deploy

Phase 2: 機能拡張 (2-4週間)

カスタムエージェントの追加
外部システム連携
UI/UXの最適化

Phase 3: 本番運用 (1-2週間)

本番環境構築
監視・アラート設定
災害復旧テスト

🎯 学習成果の振り返り

この記事シリーズを通して習得できた技術：

Agent-to-Agent (A2A) 通信

エージェント設計パターン
協調アルゴリズム実装
パフォーマンス最適化

Model Context Protocol (MCP)

標準化されたデータアクセス
WebSocketベース実装
拡張可能なアーキテクチャ

Azure クラウドアーキテクチャ

Container Apps運用
Infrastructure as Code
DevOps・CI/CD実践

🚀 次のステップ

さらなる発展として検討できる領域：

AI/ML統合強化

Azure Machine Learning連携
リアルタイム予測分析
自然言語処理高度化

マイクロサービス拡張

Service Mesh導入
Event-Driven Architecture
CQRS/Event Sourcingパターン

業界特化カスタマイズ

金融・医療・製造業向け特化
コンプライアンス自動化
業界標準API統合

Agent-to-Agent通信とMCPの世界へようこそ！

この技術が、あなたのビジネス・プロジェクトにおいて、人間とAIの協調による新しい価値創造の礎となることを願っています。

🤝 実際のビジネス課題解決に向けて、ぜひこのシステムをベースとした更なる発展を続けてください！

#Azure #ContainerApps #Production #DevOps #Enterprise #A2A #MCP #CloudArchitecture #MultiAgent

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up