【AWS】【boto3】Rate Exceeded を回避する方法

Last updated at 2022-12-27Posted at 2020-04-08

TL;DR

Rate Exceededとは、短い期間に大量のAPIコールを行った際に起きるエラー
回避する方法はretryを入れること
- いわゆるexponential backoffを取り入れる
client設定時にretryを増やせるみたいだが効果ない
- 恐らくだがこのリトライはbotocore.exceptions.ClientErrorを対象にしていない
なのでpython のretryモジュールを使ってbotocore.exceptions.ClientErrorをハンドリングする
- AWSretryもあるが、これはlinuxのみなので今回は対象外
(2022/12/27追記) 現在だとretryモードでstandardを設定するのがおすすめです。
- 本記事で実装しているExceptionハンドリングもstandardにすることでboto3内でリトライしてくれます

Rate Exceededとは

Rate Exceededとは、短い期間に大量のAPIコールを行った際に起きるエラー
- APIコールの負荷に依存するので、起きたり起きなかったりする
- これがスクリプトで大量にAPIコールするときはかなりウザい

回避するには

回避する方法はretryを入れること
しかし、ただのretryでは効果が薄い
- ただの失敗 -> もう一度APIコールでは余計にAP負荷をかけてしまう。
- 負のループで死にます
でもただsleepを入れるのも効率が悪い
- sleep 5秒 -> API実行したとしていつまで待てば解消されるの？？
- sleep 5秒の根拠は？それで本当に負荷は減るのか

結局単なるretry /sleepでは安定したAPIコールは担保されません。

では、どうするのか？？

exponential backoff

短期間かつ大量のAPIコールが原因なので、失敗するたびにAPIコールの間隔を空けれて効率よくリトライする
- いわゆるexponential backoffを取り入れる
exponential backoffとは、リトライするごとに、次のAPIコールまでの待機時間を長くするリトライ手法
- 指数関数的にsleepの時間を延ばします
- 1,2,4,8,16...
- こうすると、少ないリトライ回数かつAPIコールの負荷を下げる時間間隔をとることができる
AWSのAPIコールの基本なので、覚えておくのが◎
といっても自作するのは結構大変

exponential backoffを実現するモジュール

python のretryモジュール
[AWSretry](https://pypi.org/project/awsretry/
- boto特化のpythonモジュール
- しかしこれはlinuxのみ
boto3.client定義時にretryを増やす
- しかし、何度試しても効果ない
- 恐らくだがこのリトライはRate Exceededを対象にしていない。

なのでおすすめはretryモジュールを使うことです。

retryモジュールでRate Exceededをハンドルする

Rate Exceededのエラークラスはbotocore.exceptions.ClientError
なので、これをcatchする
下記のような例で説明します
- AWS quotasのlist_requested_service_quota_change_history_by_quotaを大量のquota_codeに対して実行

import boto3
from botocore.exceptions import ClientError
import logging
from logging import getLogger, StreamHandler, FileHandler, DEBUG
from retry import retry

@retry(ClientError, delay=1, backoff=2, max_delay=64)
def get_quota_history(quota_info_dict, quota_client):
    history = {}
    for service, quota_info in quota_info_dict.items():
        logger.info('[HISTORY] {} {}'.format(service, quota_info['quota_name']))
        responce = {
            service: quota_client.list_requested_service_quota_change_history_by_quota(
                ServiceCode=quota_info['service_code'],
                QuotaCode=quota_info['quota_code'],
            )['RequestedQuotas']
        }
        history.update(responce)
    return history

def main():
    quota_info_dict = {quota_info_dict の中身参照}
    session = boto3.Session()
    quota_client = session.client('service-quotas')
    get_quota_history()

quota_info_dict の中身(長いのでcollapseします)

{
  "AutoScalingGroup": {
    "adjustable": "true",
    "count": "4",
    "default": "200.0",
    "quota_code": "L-CDE20ADC",
    "quota_name": "Auto Scaling groups per region",
    "service_code": "autoscaling"
  },
  "EIP": {
    "adjustable": "true",
    "count": "3",
    "default": "5.0",
    "quota_code": "L-0263D0A3",
    "quota_name": "Number of EIPs - VPC EIPs",
    "service_code": "ec2"
  },
  "FileSystem": {
    "adjustable": "true",
    "count": "1",
    "default": "1000.0",
    "quota_code": "L-848C634D",
    "quota_name": "File systems per account",
    "service_code": "elasticfilesystem"
  },
  "Instance": {
    "adjustable": "true",
    "count": "32",
    "default": "5.0",
    "quota_code": "L-1216C47A",
    "quota_name": "Running On-Demand Standard (A, C, D, H, I, M, R, "
    "T, Z) instances",
    "service_code": "ec2"
  },
  "InternetGateway": {
    "adjustable": "true",
    "count": "1",
    "default": "5.0",
    "quota_code": "L-A4707A72",
    "quota_name": "Internet gateways per Region",
    "service_code": "vpc"
  },
  "LaunchConfiguration": {
    "adjustable": "true",
    "count": "4",
    "default": "200.0",
    "quota_code": "L-6B80B8FA",
    "quota_name": "Launch configurations per region",
    "service_code": "autoscaling"
  },
  "LoadBalancer": {
    "adjustable": "true",
    "count": "1",
    "default": "20.0",
    "quota_code": "L-53DA6B97",
    "quota_name": "Application Load Balancers per Region",
    "service_code": "elasticloadbalancing"
  },
  "MountTarget": {
    "adjustable": "false",
    "count": "2",
    "default": "400.0",
    "quota_code": "L-7391004C",
    "quota_name": "Mount targets per VPC",
    "service_code": "elasticfilesystem"
  },
  "NatGateway": {
    "adjustable": "true",
    "count": "2",
    "default": "5.0",
    "quota_code": "L-FE5A380F",
    "quota_name": "NAT gateways per Availability Zone",
    "service_code": "vpc"
  },
  "Route": {
    "adjustable": "true",
    "count": "13",
    "default": "50.0",
    "quota_code": "L-93826ACB",
    "quota_name": "Routes per route table",
    "service_code": "vpc"
  },
  "RouteTable": {
    "adjustable": "true",
    "count": "4",
    "default": "200.0",
    "quota_code": "L-589F43AA",
    "quota_name": "Route tables per VPC",
    "service_code": "vpc"
  },
  "Rule": {
    "adjustable": "true",
    "count": "2",
    "default": "100.0",
    "quota_code": "L-7BF8015E",
    "quota_name": "Rules",
    "service_code": "waf-regional"
  },
  "SecurityGroup": {
    "adjustable": "true",
    "count": "4",
    "default": "2500.0",
    "quota_code": "L-E79EC296",
    "quota_name": "VPC security groups per Region",
    "service_code": "vpc"
  },
  "SecurityGroupIngress": {
    "adjustable": "true",
    "count": "10",
    "default": "60.0",
    "quota_code": "L-0EA8095F",
    "quota_name": "Inbound or outbound rules per "
    "security group",
    "service_code": "vpc"
  },
  "Stacks": {
    "adjustable": "true",
    "count": "12",
    "default": "200.0",
    "quota_code": "L-0485CB21",
    "quota_name": "Stack count",
    "service_code": "cloudformation"
  },
  "Subnet": {
    "adjustable": "true",
    "count": "4",
    "default": "200.0",
    "quota_code": "L-407747CB",
    "quota_name": "Subnets per VPC",
    "service_code": "vpc"
  },
  "VPC": {
    "adjustable": "true",
    "count": "1",
    "default": "5.0",
    "quota_code": "L-F678F1CE",
    "quota_name": "VPCs per Region",
    "service_code": "vpc"
  },
  "VPCPeeringConnection": {
    "adjustable": "true",
    "count": "1",
    "default": "50.0",
    "quota_code": "L-7E9ECCDB",
    "quota_name": "Active VPC peering connections per "
    "VPC",
    "service_code": "vpc"
  },
  "WebACL": {
    "adjustable": "true",
    "count": "1",
    "default": "50.0",
    "quota_code": "L-55785BA2",
    "quota_name": "Web ACLs",
    "service_code": "waf-regional"
  }
}

retry moduleはデコレーターとして使用することができます。便利なのでおすすめです
@retry(ClientError, delay=1, backoff=2, max_delay=64)の解説
- ClientErrorをcatchしたときにリトライします
- delay=1 -> =1 -> 1秒のsleepからスタートします
- backoff=2 → sleepの時間を前回の時間×2をして増やしていきます
- max_delay=64 -> 最大のsleep時間は64秒とします
特にmax_delayはdeloyが長くなりすぎて困るときにおすすめです。
これで大体のRate Exceededによるスクリプトエラーを防ぐことができます
ちなみにretryを入れないとこんな感じでエラーになります

2020-03-24 23:10:53,993 - INFO - [HISTORY] Route Routes per route table
2020-03-24 23:10:54,085 - INFO - [HISTORY] SecurityGroupIngress Inbound or outbound rules per security group
2020-03-24 23:10:54,140 - INFO - [HISTORY] VPC VPCs per Region
2020-03-24 23:10:54,178 - INFO - [HISTORY] InternetGateway Internet gateways per Region
2020-03-24 23:10:54,206 - INFO - [HISTORY] VPCPeeringConnection Active VPC peering connections per VPC
2020-03-24 23:10:54,247 - INFO - [HISTORY] EIP Number of EIPs - VPC EIPs
2020-03-24 23:10:54,306 - INFO - [HISTORY] NatGateway NAT gateways per Availability Zone
2020-03-24 23:10:54,381 - INFO - [HISTORY] Subnet Subnets per VPC
2020-03-24 23:10:54,397 - INFO - [HISTORY] RouteTable Route tables per VPC
2020-03-24 23:10:54,474 - INFO - [HISTORY] SecurityGroup VPC security groups per Region
2020-03-24 23:10:54,518 - INFO - [HISTORY] Rule Rules
2020-03-24 23:10:54,572 - INFO - [HISTORY] WebACL Web ACLs
2020-03-24 23:10:54,624 - INFO - [HISTORY] Instance Running On-Demand Standard (A, C, D, H, I, M, R, T, Z) instances
2020-03-24 23:10:54,678 - INFO - [HISTORY] FileSystem File systems per account
2020-03-24 23:10:54,743 - INFO - [HISTORY] MountTarget Mount targets per VPC
2020-03-24 23:10:54,788 - INFO - [HISTORY] LoadBalancer Application Load Balancers per Region
2020-03-24 23:10:54,875 - INFO - [HISTORY] LaunchConfiguration Launch configurations per region
2020-03-24 23:10:54,962 - INFO - [HISTORY] AutoScalingGroup Auto Scaling groups per region
2020-03-24 23:10:54,996 - INFO - [HISTORY] Stacks Stack count
Traceback (most recent call last):
  File "request_increase_quota.py", line 129, in <module>
    main(aws_account_id, env_count_per_account, quota_client, global_quota_client, is_dryrun)
  File "request_increase_quota.py", line 102, in main
    quota_history = get_quota_history(quota_info_dict, quota_client)
  File "request_increase_quota.py", line 86, in get_quota_history
    QuotaCode=quota_info['quota_code'],
  File "C:\Python36\lib\site-packages\botocore\client.py", line 316, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "C:\Python36\lib\site-packages\botocore\client.py", line 626, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.errorfactory.TooManyRequestsException: An error occurred (TooManyRequestsException) when calling the ListRequestedServiceQuotaChangeHistoryByQuota operation: Rate exceeded

ClientError内のエラーを細かくハンドリングしたい

上記の場合は、ClientErrorをすべてリトライします
しかし、AWSのAPIエラーの多くはClientErrorです
なので、本来リトライする必要のないエラーもリトライしてしまいます。
またはClientErrorの中の特定のエラーを別途ハンドルしたい場合、ハンドリングできません。
その場合は、try/exceptで一度Exceptionをキャッチしたのち、リトライさせたいエラーは個別でClientErrorとしてraiseをさせます

@retry(botocore.exceptions.ClientError, delay=1, backoff=2, max_delay=64)
def request_quota_handler(quota_info, quota_client, request_quota_count):
  try:
    responce = quota_client.request_service_quota_increase(
        ServiceCode=quota_info['service_code'],
        QuotaCode=quota_info['quota_code'],
        DesiredValue=request_quota_count
    )
  except botocore.exceptions.ClientError as e:
    if e.response['Error']['Code'] == 'ResourceAlreadyExistsException':
        logger.warning(e)
    elif e.response['Error']['Code'] == 'TooManyRequestsException':
        error_response = {"Error": {"Code": "TooManyRequestsException"}}
        raise botocore.exceptions.ClientError(error_response, 'RequestServiceQuotaIncrease')
    else:
        raise Exception(e)
  return responce

except botocore.exceptions.ClientError as e: でClientErrorをキャッチ
e.response['Error']['Code']でエラーコードをStr形式で取得できるので、文字列比較で対象のエラーをハンドリングする
error_response = {"Error": {"Code": "TooManyRequestsException"}}を定義し、それを引数として渡してraise botocore.exceptions.ClientError() を実行する
ClientErrorの引数は(error_response, operation_name)(operation_name=APIの名前)

これで、raiseされたClientErrorのみretry処理を行わせることができます。

終わりに

Rate Exceededはいざ直面するとかなりウザいエラーです。
許容できるAPI Rateはサービスごとに異なるので、今は必要なくても新サービスを使うと発生するかも
- 今回例で出したservice quotaはそのままではかなり厳しいAPI Rateでした
今は必要なくても、知識として覚えておくと快適なAWS活用ができるのではないでしょうか

参考

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up