【Amazon ECS】コンソール以外でタスクの停止理由を確認する方法

Last updated at 2024-10-24Posted at 2024-10-24

ECSを利用していると、タスクが何らかの理由で起動に失敗したり、停止してしまうことがごくたまにあります。
その場合、AWSコンソール上で停止理由を確認できますが、タスクが停止してから1時間以上経過してしまうとその情報がコンソールから見えなくなってしまいます。

こんな感じで何も表示されなくなる↓

しかし、その状態でも諦めなくて大丈夫です。
タスクの停止理由を確認する方法が3つあります。

1.CLIを利用する

まず、停止したタスクIDを取得します。

$ aws ecs list-tasks --cluster <your-cluster-name> --desired-status STOPPED

取得したタスクIDを使って以下のコマンドを実行

$ aws ecs describe-tasks \
  --cluster <your-cluster-name> \
  --tasks <your-task-id>

そうするとこのような形でレスポンスが返ってきます。
stoppedReasonにタスクの停止理由が書いてあります。
今回はEssential container in task exitedと書いてあります。

{
  "tasks": [
    {
      "attachments": [
        {
          "id": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
          "type": "ElasticNetworkInterface",
          "status": "DELETED",
          "details": [
            {
              "name": "subnetId",
              "value": "subnet-xxxxxxxx"
            },
            {
              "name": "networkInterfaceId",
              "value": "eni-xxxxxxxxxxxxxxxxx"
            },
            {
              "name": "macAddress",
              "value": "xx:xx:xx:xx:xx:xx"
            },
            {
              "name": "privateDnsName",
              "value": "ip-xxx-xx-xx-xx.ap-northeast-1.compute.internal"
            },
            {
              "name": "privateIPv4Address",
              "value": "xxx.xx.xx.xx"
            }
          ]
        }
      ],
      "attributes": [
        {
          "name": "ecs.cpu-architecture",
          "value": "x86_64"
        }
      ],
      "availabilityZone": "ap-northeast-1a",
      "clusterArn": "arn:aws:ecs:ap-northeast-1:xxxxxxxxxx:cluster/xxxxxxxxxx",
      "connectivity": "CONNECTED",
      "connectivityAt": "2024-10-03T16:35:54.826000+09:00",
      "containers": [
        {
          "containerArn": "arn:aws:ecs:ap-northeast-1:xxxxxxxxxx:container/xxxxxxxxxx/xxxxxxxxxxxxxxxxx/xxxxxxxxxxxxxxxxx",
          "taskArn": "arn:aws:ecs:ap-northeast-1:xxxxxxxxxx:task/xxxxxxxxxx/xxxxxxxxxxxxxxxxx",
          "name": "sample-project",
          "image": "xxxxxxxxxx.dkr.ecr.ap-northeast-1.amazonaws.com/sample-project:xxxxxxxxxx",
          "imageDigest": "sha256:xxxxxxxxxxxxxxxxx",
          "runtimeId": "xxxxxxxxxxxxxxxxx-xxxxxxxxxxxxxxxxx",
          "lastStatus": "STOPPED",
          "exitCode": 0,
          "networkBindings": [],
          "networkInterfaces": [
            {
              "attachmentId": "xxxxxxxxxxxxxxxxx",
              "privateIpv4Address": "xxx.xx.xx.xx"
            }
          ],
          "healthStatus": "UNKNOWN",
          "cpu": "0"
        }
      ],
      "cpu": "2048",
      "createdAt": "2024-10-03T16:35:48.660000+09:00",
      "desiredStatus": "STOPPED",
      "enableExecuteCommand": false,
      "executionStoppedAt": "2024-10-03T16:42:26.311000+09:00",
      "group": "family:sample-project-task-def",
      "healthStatus": "UNKNOWN",
      "lastStatus": "STOPPED",
      "launchType": "FARGATE",
      "memory": "8192",
      "overrides": {
        "containerOverrides": [
          {
            "name": "sample-project",
            "command": [
              "/sample/sample.sh"
            ],
            "cpu": 2048,
            "memory": 8192
          }
        ],
        "inferenceAcceleratorOverrides": []
      },
      "platformVersion": "1.4.0",
      "platformFamily": "Linux",
      "pullStartedAt": "2024-10-03T16:36:02.893000+09:00",
      "pullStoppedAt": "2024-10-03T16:36:24.927000+09:00",
      "startedAt": "2024-10-03T16:36:36.584000+09:00",
      "startedBy": "xxxxx",
      "stopCode": "EssentialContainerExited",
      "stoppedAt": "2024-10-03T16:42:52.780000+09:00",
      "stoppedReason": "Essential container in task exited",
      "stoppingAt": "2024-10-03T16:42:37.306000+09:00",
      "tags": [],
      "taskArn": "arn:aws:ecs:ap-northeast-1:xxxxxxxxxx:task/xxxxxxxxxx/xxxxxxxxxxxxxxxxx",
      "taskDefinitionArn": "arn:aws:ecs:ap-northeast-1:xxxxxxxxxx:task-definition/sample-task-def:1",
      "version": 5,
      "ephemeralStorage": {
        "sizeInGiB": 20
      }
    }
  ],
  "failures": []
}

2. tracerを使用する

tracerを使用すると、時系列でログを確認することができます。

まずtracerをインストールします。

$ brew install fujiwara/tap/tracer

クラスター名とタスク名を指定して実行します。
一つ注意したいのが、デフォルトのprofileを使う必要があるというところです。

$ tracer <your-cluster-name> <your-task-id>

ログが確認できます。
吐き出されているログはCLIで確認したものと同じですね。

2024-10-03T16:42:26.311+09:00	TASK	Execution stopped
2024-10-03T16:42:37.306+09:00	TASK	Stopping
2024-10-03T16:42:37.306+09:00	TASK	StoppedReason:Essential container in task exited
2024-10-03T16:42:37.306+09:00	TASK	StoppedCode:EssentialContainerExited
2024-10-03T16:42:52.780+09:00	TASK	Stopped

3. CloudWatch Logsにログを保存する

CloudWatch Logsから簡単にタスクの停止理由を見たいという場合は、
タスクの停止をトリガーにEventBridgeを実行しCloudWatch Logsにログを保存することもできます。

詳しくはClassmethodさんの記事がわかりやすかったので参考にしてみてください。
https://dev.classmethod.jp/articles/stop-ecs-task-reason-cloudwatch-logs/

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up