AWS Databricksでクラスター起動時に「insufficient capacity error」が発生する際の対策

Last updated at 2021-05-17Posted at 2021-05-17

これはクラウドサービスプロバイダー側でインスタンスを確保できなかったことに起因します。サービスクォータの引き上げが必要となります。

AWS のサービスクォータ - AWS 全般的なリファレンス

Databricks側の対策としては、ログ監視、そして、以下のAuto-AZ(アベイラビリティゾーンの自動選択)を活用することになります。Auto-AZはGUIではなくCluster APIでAuto-AZを有効化する必要があります。

アベイラビリティゾーンの自動選択(Auto-AZ)

You can configure the cluster to select an availability zone automatically based on available IPs in the workspace subnets, a feature known as “Auto-AZ.” You must use the Clusters API to enable Auto-AZ, setting awsattributes.zone_id = "auto". Auto-AZ retries in other availability zones if AWS returns insufficient capacity errors.

キャパシティ不足のエラーが発生した場合には他のAZでリトライします。
起動後は選択されたAZが保持されます。再度Auto-AZを有効化するには、クラスターの削除、再作成が必要になります。

パーソナルアクセストークンを作成し、以下のエンドポイントに設定をポストします。トークンはBearer認証などで指定ください。認証方法詳細に関しては[こちら](Authentication using Databricks personal access tokens | Databricks on AWS)を参照してください。

https://<DatabricksワークスペースURL>/api/2.0/clusters/create

以下の設定では、i3.xlargeのスポットインスタンスをAuto-AZで作成します。zone_id=autoとすることで、Auto-AZが有効になります。

{
  "cluster_name": "my-cluster",
  "spark_version": "7.3.x-scala2.12",
  "node_type_id": "i3.xlarge",
  "aws_attributes": {
    "availability": "SPOT",
    "zone_id": "auto"
  },
  "num_workers": 1
}

Databricks 無料トライアル

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up