2
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

【Ansible】Route53モジュールがホストゾーンのレコードが多いとRateExceedになる

Posted at

AWSの運用をする中で、結構難儀な課題にぶつかったので記事にまとめます

この記事はこんな人におすすめ

  • Route53モジュールのRateExceedに悩んでいる人
  • Ansibleモジュールの修正(カスタムモジュール)をしてみたい人
  • 利用しているツール側でエラーになった際に自分で修正することができるようになりたい人
  • 大規模AWS運用ならではの悩みが知りたい人

背景

  • 弊社製品では1つのホストゾーンに10000件以上のレコードを保持しています
  • 製品のインフラ管理にAnsibeを使用しており、上記ホストゾーンのレコード管理にamazon.aws.route53モジュールを使用しています
  • 弊社製品のインフラメンテナンスは管理している複数環境に対して実行しており、Ansibeの実行が重複する可能性があります

実行環境

  • python 3.12
  • ansible: 2.18.4
  • collection amazon.aws: 9.3.0

発生事象

ListResourceRecordSets APIの実行がRate exceededでエラーになります

An exception occurred during task execution. To see the full traceback, use -vvv. The error was: botocore.exceptions.ClientError: An error occurred (Throttling) when calling the ListResourceRecordSets operation: Rate exceeded
fatal: [localhost]: FAILED! => changed=false 
  attempts: 10
  module_stderr: |-
    Traceback (most recent call last):
      File "<stdin>", line 107, in <module>
      File "<stdin>", line 99, in _ansiballz_main
      File "<stdin>", line 47, in invoke_module
      File "<frozen runpy>", line 226, in run_module
      File "<frozen runpy>", line 98, in _run_module_code
      File "<frozen runpy>", line 88, in _run_code
      File "/tmp/ansible_route53_payload_yo8n1f4e/ansible_route53_payload.zip/ansible_collections/amazon/aws/plugins/modules/route53.py", line 873, in <module>
      File "/tmp/ansible_route53_payload_yo8n1f4e/ansible_route53_payload.zip/ansible_collections/amazon/aws/plugins/modules/route53.py", line 718, in main
      File "/tmp/ansible_route53_payload_yo8n1f4e/ansible_route53_payload.zip/ansible_collections/amazon/aws/plugins/modules/route53.py", line 490, in get_record
      File "/tmp/ansible_route53_payload_yo8n1f4e/ansible_route53_payload.zip/ansible_collections/amazon/aws/plugins/module_utils/cloud.py", line 119, in _retry_wrapper
      File "/tmp/ansible_route53_payload_yo8n1f4e/ansible_route53_payload.zip/ansible_collections/amazon/aws/plugins/module_utils/cloud.py", line 68, in _retry_func
      File "/tmp/ansible_route53_payload_yo8n1f4e/ansible_route53_payload.zip/ansible_collections/amazon/aws/plugins/modules/route53.py", line 480, in _list_record_sets
      File "/usr/local/lib/python3.12/dist-packages/botocore/paginate.py", line 483, in build_full_result
        for response in self:
      File "/usr/local/lib/python3.12/dist-packages/botocore/paginate.py", line 272, in __iter__
        response = self._make_request(current_kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.12/dist-packages/botocore/context.py", line 124, in wrapper
        return func(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.12/dist-packages/botocore/paginate.py", line 361, in _make_request
        return self._method(**current_kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.12/dist-packages/botocore/client.py", line 570, in _api_call
        return self._make_api_call(operation_name, kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.12/dist-packages/botocore/context.py", line 124, in wrapper
        return func(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.12/dist-packages/botocore/client.py", line 1031, in _make_api_call
        raise error_class(parsed_response, operation_name)
    botocore.exceptions.ClientError: An error occurred (Throttling) when calling the ListResourceRecordSets operation: Rate exceeded
  module_stdout: ''
  msg: |-
    MODULE FAILURE: No start of json char found
    See stdout/stderr for the exact error
  rc: 1

原因

amazon.aws.route53における以下のコード部分が原因です。
この処理ではlist_resource_record_setsをpaginateで実行し、指定のホストゾーンのすべてのレコードを取得しています。

また、この関数は2か所で呼び出していることが分かります。関数名を見るに、毎回の処理で実行をしていそうな関数です

実際には、モジュール1回での実行では問題ありませんが、Ansibe内では複数のRoute53レコードの取得・作成・削除を行っており、このモジュールが短期間で複数回実行されています。

また、複数環境に対して同じ時間帯でメンテナンスを行う場合、さらに実行回数は上がります

その結果、list_resource_record_setsが短期間で大量に実行されることになり、Rate exceededが発生していました。

解決方法検討

Route53モジュールの実行回数を減らす

まず考えるべきはこの対応です。
しかしながら、route53レコードは継続的に今後も作成・削除されることが考えられるため、今は減らせてもいずれ再び負荷が上がることが考えられます。

そのため、今後の運用を考えた場合、容易には選択したくありません。

retryを増やせばよいのでは?

amazon.aws.route53にはすでにリトライロジックが組み込まれています
以下の部分でroute53 clientにおけるリトライが定義されています。

そして、それを前述の関数では @AWSRetry.jittered_backoff(retries=MAX_AWS_RETRIES) という形でjittered backoffによるリトライを実施しています。

つまり、ベストプラクティスに基づくリトライロジックはすでに実装されています
それでも、レコード数とモジュール実行数が多いがために、耐えられない状態になっています

retryを増やす / 時間を伸ばす

これは、初めに私たちが実施した対応です。

しかし、結果としてはむしろ状況を悪くすることになりました

  • リトライを増やした結果、モジュールの実行時間が増える
  • モジュール実行中にほかの環境の実行も始まる
  • 上記が積み重なった結果、リトライをしている実行が増えていき、API負荷がさらに上がるため、すべての実行がRateExceedになってエラーになる

全件取得は本当に必要なのか

そもそもこのモジュールの処理はroute53レコードの取得・作成・削除です。

リファレンスを見ても、zone / record / typeを指定して実行するものです。

そのため、実行時にすでに対象のレコードは指定されており、わざわざ全件を取得する必要はありません

指定したレコードの存在有無が判断できればよいです

その観点で、前述のコードを見ると、全件取得ののち、指定のレコードがあるかをフィルタリングしています。

つまり、このフィルタリング処理をAPI側で実施できれば良いということになります

全件取得を防ぐ方法はあるのか

boto3のリファレンスを見ると、StartRecordName / StartRecordTypeという引数があります。

response = client.list_resource_record_sets(
    HostedZoneId='string',
    StartRecordName='string',
    StartRecordType='SOA'|'A'|'TXT'|'NS'|'CNAME'|'MX'|'NAPTR'|'PTR'|'SRV'|'SPF'|'AAAA'|'CAA'|'DS'|'TLSA'|'SSHFP'|'SVCB'|'HTTPS',
    StartRecordIdentifier='string',
    MaxItems='string'
)

実際に説明が書かれており、これを使うと、指定のレコード・Typeから検索を始めることが可能に見えます
そしてMaxItemsを1に設定することで、指定のレコード1件のみを取得することができます

Specifying where to start listing records
You can use the name and type elements to specify the resource record set that the list begins with:

If you do not specify Name or Type

The results begin with the first resource record set that the hosted zone contains.

If you specify Name but not Type

The results begin with the first resource record set in the list whose name is greater than or equal to Name.

If you specify Type but not Name

Amazon Route 53 returns the InvalidInput error.

If you specify both Name and Type

The results begin with the first resource record set in the list whose name is greater than or equal to Name, and whose type is greater than or equal to Type.

StartRecordName / StartRecordTypeの注意点

これらに指定したレコードが存在しない場合は、ResourceRecordSetsが空になることを想定していましたが、実際には異なります。
その場合は、指定した条件に近いレコードを表示します

例えばhoge.exmple.comというレコードがある場合に、StartRecordNameに存在しないレコードfoo.exmple.comを指定すると、hoge.exmple.comの結果が返ってくる場合があります

なので、取得レコードのValidationはpython側で実施する必要があります。

なお、そのホストゾーンの最後のレコードよりもアルファベット順で後ろの文字列をStartRecordNameにした場合やなど、近いレコードがない場合はResourceRecordSetsが空配列になる場合もあります

修正コード

上記の調査より、list_resource_record_setsの全件取得は不要で、対象のレコードを1件取得するロジックに修正が可能です。

また、同様の全件取得をホストゾーンでも実施しているので、そちらも合わせて修正します

コード全量

#!/usr/bin/python
# -*- coding: utf-8 -*-

# Copyright: (c) 2018, Ansible Project
# GNU General Public License v3.0+ (see COPYING or https://www.gnu.org/licenses/gpl-3.0.txt)

DOCUMENTATION = r"""
module: custom_route53
version_added: 1.0.0
short_description: Custom Route53 module to handle rate limiting issues
description:
  - Creates and deletes DNS records in Amazons Route 53 service.
  - This is a custom module based on amazon.aws.route53 with modifications to handle rate limiting.
  - Custom implementation to resolve rateExceed errors in Route53 operations.
  - This module was originally added to C(amazon.aws) in release 9.3.0.
options:
  state:
    description:
      - Specifies the state of the resource record.
    required: true
    aliases: [ 'command' ]
    choices: [ 'present', 'absent', 'get', 'create', 'delete' ]
    type: str
  zone:
    description:
      - The DNS zone to modify.
      - This is a required parameter, if parameter O(hosted_zone_id) is not supplied.
    type: str
  hosted_zone_id:
    description:
      - The Hosted Zone ID of the DNS zone to modify.
      - This is a required parameter, if parameter O(zone) is not supplied.
    type: str
  record:
    description:
      - The full DNS record to create or delete.
    required: true
    type: str
  ttl:
    description:
      - The TTL, in second, to give the new record.
      - Mutually exclusive with O(alias).
    default: 3600
    type: int
  type:
    description:
      - The type of DNS record to create.
      - Support for V(SSHFP) was added in release 9.2.0. See AWS Doc for more information
        U(https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/ResourceRecordTypes.html).
    required: true
    choices: [ 'A', 'CNAME', 'MX', 'AAAA', 'TXT', 'PTR', 'SRV', 'SPF', 'CAA', 'NS', 'SOA', 'SSHFP' ]
    type: str
  alias:
    description:
      - Indicates if this is an alias record.
      - Mutually exclusive with O(ttl).
      - Defaults to V(false).
    type: bool
  alias_hosted_zone_id:
    description:
      - The hosted zone identifier.
    type: str
  alias_evaluate_target_health:
    description:
      - Whether or not to evaluate an alias target health. Useful for aliases to Elastic Load Balancers.
    type: bool
    default: false
  value:
    description:
      - The new value when creating a DNS record. YAML lists or multiple comma-spaced values are allowed for non-alias records.
    type: list
    elements: str
  overwrite:
    description:
      - Whether an existing record should be overwritten on create if values do not match.
    type: bool
  retry_interval:
    description:
      - In the case that Route 53 is still servicing a prior request, this module will wait and try again after this many seconds.
        If you have many domain names, the default of V(500) seconds may be too long.
    default: 500
    type: int
  private_zone:
    description:
      - If set to V(true), the private zone matching the requested name within the domain will be used if there are both public and private zones.
      - The default is to use the public zone.
    type: bool
    default: false
  identifier:
    description:
      - Have to be specified for Weighted, latency-based and failover resource record sets only.
        An identifier that differentiates among multiple resource record sets that have the same combination of DNS name and type.
    type: str
  weight:
    description:
      - Weighted resource record sets only. Among resource record sets that
        have the same combination of DNS name and type, a value that
        determines what portion of traffic for the current resource record set
        is routed to the associated location.
      - Mutually exclusive with O(region) and O(failover).
    type: int
  region:
    description:
      - Latency-based resource record sets only Among resource record sets
        that have the same combination of DNS name and type, a value that
        determines which region this should be associated with for the
        latency-based routing
      - Mutually exclusive with O(weight) and O(failover).
    type: str
  geo_location:
    description:
      - Allows to control how Amazon Route 53 responds to DNS queries based on the geographic origin of the query.
      - Two geolocation resource record sets that specify same geographic location cannot be created.
      - Non-geolocation resource record sets that have the same values for the Name and Type elements as geolocation
        resource record sets cannot be created.
    suboptions:
      continent_code:
        description:
          - The two-letter code for the continent.
          - Specifying O(geo_location.continent_code) with either O(geo_location.country_code) or O(geo_location.subdivision_code)
            returns an InvalidInput error.
        type: str
      country_code:
        description:
          - The two-letter code for a country.
          - Amazon Route 53 uses the two-letter country codes that are specified in ISO standard 3166-1 alpha-2 .
        type: str
      subdivision_code:
        description:
          - The two-letter code for a state of the United States.
          - To specify O(geo_location.subdivision_code), O(geo_location.country_code) must be set to V(US).
        type: str
    type: dict
    version_added: 3.3.0
    version_added_collection: community.aws
  health_check:
    description:
      - Health check to associate with this record
    type: str
  failover:
    description:
      - Failover resource record sets only. Whether this is the primary or
        secondary resource record set. Allowed values are V(PRIMARY) and V(SECONDARY).
      - Mutually exclusive with O(weight) and O(region).
    type: str
    choices: ['SECONDARY', 'PRIMARY']
  vpc_id:
    description:
      - When used in conjunction with O(private_zone=true), this will only modify records in the private hosted zone attached to this VPC.
      - This allows you to have multiple private hosted zones, all with the same name, attached to different VPCs.
    type: str
  wait:
    description:
      - Wait until the changes have been replicated to all Amazon Route 53 DNS servers.
    type: bool
    default: false
  wait_timeout:
    description:
      - How long to wait for the changes to be replicated, in seconds.
    default: 300
    type: int
author:
  - Bruce Pennypacker (@bpennypacker)
  - Mike Buzzetti (@jimbydamonk)
extends_documentation_fragment:
  - amazon.aws.common.modules
  - amazon.aws.boto3
"""

RETURN = r"""
nameservers:
  description: Nameservers associated with the zone.
  returned: when state is 'get'
  type: list
  sample:
  - ns-1036.awsdns-00.org.
  - ns-516.awsdns-00.net.
  - ns-1504.awsdns-00.co.uk.
  - ns-1.awsdns-00.com.
resource_record_sets:
  description: Info specific to the resource record.
  returned: when state is 'get'
  type: complex
  contains:
    alias:
      description: Whether this is an alias.
      returned: always
      type: bool
      sample: false
    failover:
      description: Whether this is the primary or secondary resource record set.
      returned: always
      type: str
      sample: "PRIMARY"
    geo_location:
      description: Geograpic location based on which Route53 resonds to DNS queries.
      returned: when configured
      type: dict
      sample: { continent_code: "NA", country_code: "US", subdivision_code: "CA" }
      version_added: 3.3.0
      version_added_collection: community.aws
    health_check:
      description: Health check associated with this record.
      returned: always
      type: str
    identifier:
      description: An identifier that differentiates among multiple resource record sets that have the same combination of DNS name and type.
      returned: always
      type: str
    name:
      description: Domain name for the record set.
      returned: always
      type: str
      sample: "new.foo.com"
    record:
      description: Domain name for the record set.
      returned: always
      type: str
      sample: "new.foo.com"
    region:
      description: Which region this should be associated with for latency-based routing.
      returned: always
      type: str
      sample: "us-west-2"
    resource_records:
      description: Information about the resource records to act upon.
      type: list
      returned: always
      sample: [{"value": "1.1.1.1"}]
    ttl:
      description: Resource record cache TTL.
      returned: always
      type: str
      sample: "3600"
    type:
      description: Resource record set type.
      returned: always
      type: str
      sample: "A"
    value:
      description: Record value.
      returned: always
      type: str
      sample: "52.43.18.27"
    values:
      description: Record Values.
      returned: always
      type: list
      sample:
      - 52.43.18.27
    weight:
      description: Weight of the record.
      returned: always
      type: str
      sample: '3'
    zone:
      description: Zone this record set belongs to.
      returned: always
      type: str
      sample: "foo.bar.com"
wait_id:
  description:
    - The wait ID for the applied change. Can be used to wait for the change to propagate later on when O(wait=false).
  type: str
  returned: when changed
  version_added: 6.3.0
"""

EXAMPLES = r"""
- name: Add new.foo.com as an A record with 3 IPs and wait until the changes have been replicated
  custom_route53:
    state: present
    zone: foo.com
    record: new.foo.com
    type: A
    ttl: 7200
    value: 1.1.1.1,2.2.2.2,3.3.3.3
    wait: true

- name: Update new.foo.com as an A record with a list of 3 IPs and wait until the changes have been replicated
  custom_route53:
    state: present
    zone: foo.com
    record: new.foo.com
    type: A
    ttl: 7200
    value:
      - 1.1.1.1
      - 2.2.2.2
      - 3.3.3.3
    wait: true

- name: Retrieve the details for new.foo.com
  custom_route53:
    state: get
    zone: foo.com
    record: new.foo.com
    type: A
  register: rec

- name: Delete new.foo.com A record using the results from the get command
  custom_route53:
    state: absent
    zone: foo.com
    record: "{{ rec.set.record }}"
    ttl: "{{ rec.set.ttl }}"
    type: "{{ rec.set.type }}"
    value: "{{ rec.set.value }}"

# Add an AAAA record.  Note that because there are colons in the value
# that the IPv6 address must be quoted. Also shows using the old form command=create.
- name: Add an AAAA record
  custom_route53:
    command: create
    zone: foo.com
    record: localhost.foo.com
    type: AAAA
    ttl: 7200
    value: "::1"

# For more information on SRV records see:
# https://en.wikipedia.org/wiki/SRV_record
- name: Add a SRV record with multiple fields for a service on port 22222
  custom_route53:
    state: present
    zone: foo.com
    record: "_example-service._tcp.foo.com"
    type: SRV
    value: "0 0 22222 host1.foo.com,0 0 22222 host2.foo.com"

# Note that TXT and SPF records must be surrounded
# by quotes when sent to Route 53:
- name: Add a TXT record.
  custom_route53:
    state: present
    zone: foo.com
    record: localhost.foo.com
    type: TXT
    ttl: 7200
    value: '"bar"'

- name: Add an alias record that points to an Amazon ELB
  custom_route53:
    state: present
    zone: foo.com
    record: elb.foo.com
    type: A
    value: "{{ elb_dns_name }}"
    alias: true
    alias_hosted_zone_id: "{{ elb_zone_id }}"

- name: Retrieve the details for elb.foo.com
  custom_route53:
    state: get
    zone: foo.com
    record: elb.foo.com
    type: A
  register: rec

- name: Delete an alias record using the results from the get command
  custom_route53:
    state: absent
    zone: foo.com
    record: "{{ rec.set.record }}"
    ttl: "{{ rec.set.ttl }}"
    type: "{{ rec.set.type }}"
    value: "{{ rec.set.value }}"
    alias: true
    alias_hosted_zone_id: "{{ rec.set.alias_hosted_zone_id }}"

- name: Add an alias record that points to an Amazon ELB and evaluates it health
  custom_route53:
    state: present
    zone: foo.com
    record: elb.foo.com
    type: A
    value: "{{ elb_dns_name }}"
    alias: true
    alias_hosted_zone_id: "{{ elb_zone_id }}"
    alias_evaluate_target_health: true

- name: Add an AAAA record with Hosted Zone ID
  custom_route53:
    state: present
    zone: foo.com
    hosted_zone_id: Z2AABBCCDDEEFF
    record: localhost.foo.com
    type: AAAA
    ttl: 7200
    value: "::1"

- name: Use a routing policy to distribute traffic
  custom_route53:
    state: present
    zone: foo.com
    record: www.foo.com
    type: CNAME
    value: host1.foo.com
    ttl: 30
    # Routing policy
    identifier: "host1@www"
    weight: 100
    health_check: "d994b780-3150-49fd-9205-356abdd42e75"

- name: Add a CAA record (RFC 6844)
  custom_route53:
    state: present
    zone: example.com
    record: example.com
    type: CAA
    value:
      - 0 issue "ca.example.net"
      - 0 issuewild ";"
      - 0 iodef "mailto:security@example.com"

- name: Create a record with geo_location - country_code
  custom_route53:
    state: present
    zone: '{{ zone_one }}'
    record: 'geo-test.{{ zone_one }}'
    identifier: "geohost@www"
    type: A
    value: 1.1.1.1
    ttl: 30
    geo_location:
      country_code: US

- name: Create a record with geo_location - subdivision code
  custom_route53:
    state: present
    zone: '{{ zone_one }}'
    record: 'geo-test.{{ zone_one }}'
    identifier: "geohost@www"
    type: A
    value: 1.1.1.1
    ttl: 30
    geo_location:
      country_code: US
      subdivision_code: TX

- name: Add new.foo.com as an SSHFP record
  custom_route53:
    state: present
    zone: test-zone.com
    record: new.foo.com
    type: SSHFP
    ttl: 7200
    value: 1 1 11F1A11D1111112B111C1B11B1C11C11C1234567

- name: Delete new.foo.com as an SSHFP record
  custom_route53:
    state: absent
    zone: test-zone.com
    record: new.foo.com
    type: SSHFP
"""

from operator import itemgetter

try:
    import botocore
except ImportError:
    pass  # Handled by AnsibleAWSModule

from ansible.module_utils._text import to_native
from ansible.module_utils.common.dict_transformations import camel_dict_to_snake_dict

from ansible_collections.amazon.aws.plugins.module_utils.botocore import is_boto3_error_message
from ansible_collections.amazon.aws.plugins.module_utils.modules import AnsibleAWSModule
from ansible_collections.amazon.aws.plugins.module_utils.retries import AWSRetry
from ansible_collections.amazon.aws.plugins.module_utils.transformation import scrub_none_parameters
from ansible_collections.amazon.aws.plugins.module_utils.waiters import get_waiter

MAX_AWS_RETRIES = 10  # How many retries to perform when an API call is failing
WAIT_RETRY = 5  # how many seconds to wait between propagation status polls


def get_record(route53, zone_id, record_name, record_type, record_identifier):
    request_args = {
        "HostedZoneId": zone_id,
        "StartRecordName": record_name,
        "StartRecordType": record_type,
        "MaxItems": "1",
    }
    if record_identifier:
        request_args["StartRecordIdentifier"] = record_identifier

    record_sets_results = route53.list_resource_record_sets(
        aws_retry=True, **request_args
    )["ResourceRecordSets"]

    if len(record_sets_results) == 0:
        return None

    record_set = record_sets_results[0]

    record_set["Name"] = record_set["Name"].encode().decode("unicode_escape")
    # list_resource_record_sets may return another record name even StartRecordName is not exist.
    # So if the record name and type is not equal, return None
    if (record_name.lower(), record_type) != (
        record_set["Name"].lower(),
        record_set["Type"],
    ):
        return None

    if record_identifier and record_identifier != record_set.get("SetIdentifier"):
        return None

    return record_set


def get_zone_id_by_name(route53, module, zone_name, want_private, want_vpc_id):
    """Finds a zone by name or zone_id"""
    hosted_zones_results = route53.list_hosted_zones_by_name(
        aws_retry=True, DNSName=zone_name
    )["HostedZones"]

    for zone in hosted_zones_results:
        # only save this zone id if the private status of the zone matches
        # the private_zone_in boolean specified in the params
        private_zone = module.boolean(zone["Config"].get("PrivateZone", False))
        zone_id = zone["Id"].replace("/hostedzone/", "")

        if private_zone == want_private and zone["Name"] == zone_name:
            if want_vpc_id:
                # NOTE: These details aren't available in other boto3 methods, hence the necessary
                # extra API call
                hosted_zone = route53.get_hosted_zone(aws_retry=True, Id=zone_id)
                if want_vpc_id in [v["VPCId"] for v in hosted_zone["VPCs"]]:
                    return zone_id
            else:
                return zone_id
    return None


def format_record(record_in, zone_in, zone_id):
    """
    Formats a record in a way that's consistent with the pre-boto3 migration values
    as well as returning the 'normal' boto3 style values
    """
    if not record_in:
        return None

    record = dict(record_in)
    record["zone"] = zone_in
    record["hosted_zone_id"] = zone_id

    record["type"] = record_in.get("Type", None)
    record["record"] = record_in.get("Name").encode().decode("unicode_escape")
    record["ttl"] = record_in.get("TTL", None)
    record["identifier"] = record_in.get("SetIdentifier", None)
    record["weight"] = record_in.get("Weight", None)
    record["region"] = record_in.get("Region", None)
    record["failover"] = record_in.get("Failover", None)
    record["health_check"] = record_in.get("HealthCheckId", None)

    if record["ttl"]:
        record["ttl"] = str(record["ttl"])
    if record["weight"]:
        record["weight"] = str(record["weight"])
    if record["region"]:
        record["region"] = str(record["region"])

    if record_in.get("AliasTarget"):
        record["alias"] = True
        record["value"] = record_in["AliasTarget"].get("DNSName")
        record["values"] = [record_in["AliasTarget"].get("DNSName")]
        record["alias_hosted_zone_id"] = record_in["AliasTarget"].get("HostedZoneId")
        record["alias_evaluate_target_health"] = record_in["AliasTarget"].get("EvaluateTargetHealth")
    else:
        record["alias"] = False
        records = [r.get("Value") for r in record_in.get("ResourceRecords")]
        record["value"] = ",".join(sorted(records))
        record["values"] = sorted(records)

    return record


def get_hosted_zone_nameservers(route53, zone_id):
    hosted_zone_name = route53.get_hosted_zone(aws_retry=True, Id=zone_id)["HostedZone"]["Name"]
    # Since NS records cannot be deleted from the host zone, ResourceRecordSets elements must always exist.
    # Therefore, empty arrays of ResourceRecordSets need not be considered.
    nameservers_records = route53.list_resource_record_sets(
        aws_retry=True,
        HostedZoneId=zone_id,
        StartRecordName=hosted_zone_name,
        StartRecordType="NS",
        MaxItems="1",
    )["ResourceRecordSets"][0]["ResourceRecords"]

    return [ns_record["Value"] for ns_record in nameservers_records]


def main():
    argument_spec = dict(
        state=dict(
            type="str", required=True, choices=["absent", "create", "delete", "get", "present"], aliases=["command"]
        ),
        zone=dict(type="str"),
        hosted_zone_id=dict(type="str"),
        record=dict(type="str", required=True),
        ttl=dict(type="int", default=3600),
        type=dict(
            type="str",
            required=True,
            choices=["A", "AAAA", "CAA", "CNAME", "MX", "NS", "PTR", "SOA", "SPF", "SSHFP", "SRV", "TXT"],
        ),
        alias=dict(type="bool"),
        alias_hosted_zone_id=dict(type="str"),
        alias_evaluate_target_health=dict(type="bool", default=False),
        value=dict(type="list", elements="str"),
        overwrite=dict(type="bool"),
        retry_interval=dict(type="int", default=500),
        private_zone=dict(type="bool", default=False),
        identifier=dict(type="str"),
        weight=dict(type="int"),
        region=dict(type="str"),
        geo_location=dict(
            type="dict",
            options=dict(
                continent_code=dict(type="str"), country_code=dict(type="str"), subdivision_code=dict(type="str")
            ),
            required=False,
        ),
        health_check=dict(type="str"),
        failover=dict(type="str", choices=["PRIMARY", "SECONDARY"]),
        vpc_id=dict(type="str"),
        wait=dict(type="bool", default=False),
        wait_timeout=dict(type="int", default=300),
    )

    module = AnsibleAWSModule(
        argument_spec=argument_spec,
        supports_check_mode=True,
        required_one_of=[["zone", "hosted_zone_id"]],
        # If alias is True then you must specify alias_hosted_zone as well
        required_together=[["alias", "alias_hosted_zone_id"]],
        # state=present, absent, create, delete THEN value is required
        required_if=(
            ("state", "present", ["value"]),
            ("state", "create", ["value"]),
        ),
        # failover, region and weight are mutually exclusive
        mutually_exclusive=[
            ("failover", "region", "weight"),
            ("alias", "ttl"),
        ],
        # failover, region, weight and geo_location require identifier
        required_by=dict(
            failover=("identifier",),
            region=("identifier",),
            weight=("identifier",),
            geo_location=("identifier",),
        ),
    )

    if module.params["state"] in ("present", "create"):
        command_in = "create"
    elif module.params["state"] in ("absent", "delete"):
        command_in = "delete"
    elif module.params["state"] == "get":
        command_in = "get"

    zone_in = (module.params.get("zone") or "").lower()
    hosted_zone_id_in = module.params.get("hosted_zone_id")
    ttl_in = module.params.get("ttl")
    record_in = module.params.get("record").lower()
    type_in = module.params.get("type")
    value_in = module.params.get("value") or []
    alias_in = module.params.get("alias")
    alias_hosted_zone_id_in = module.params.get("alias_hosted_zone_id")
    alias_evaluate_target_health_in = module.params.get("alias_evaluate_target_health")
    retry_interval_in = module.params.get("retry_interval")

    if module.params["vpc_id"] is not None:
        private_zone_in = True
    else:
        private_zone_in = module.params.get("private_zone")

    identifier_in = module.params.get("identifier")
    weight_in = module.params.get("weight")
    region_in = module.params.get("region")
    health_check_in = module.params.get("health_check")
    failover_in = module.params.get("failover")
    vpc_id_in = module.params.get("vpc_id")
    wait_in = module.params.get("wait")
    wait_timeout_in = module.params.get("wait_timeout")
    geo_location = module.params.get("geo_location")

    if zone_in[-1:] != ".":
        zone_in += "."

    if record_in[-1:] != ".":
        record_in += "."

    if command_in == "create" or command_in == "delete":
        if alias_in and len(value_in) != 1:
            module.fail_json(msg="parameter 'value' must contain a single dns name for alias records")
        if (
            weight_in is None and region_in is None and failover_in is None and geo_location is None
        ) and identifier_in is not None:
            module.fail_json(
                msg=(
                    "You have specified identifier which makes sense only if you specify one of: weight, region,"
                    " geo_location or failover."
                )
            )

    retry_decorator = AWSRetry.jittered_backoff(
        retries=MAX_AWS_RETRIES,
        delay=retry_interval_in,
        catch_extra_error_codes=["PriorRequestNotComplete"],
        max_delay=max(60, retry_interval_in),
    )

    # connect to the route53 endpoint
    try:
        route53 = module.client("route53", retry_decorator=retry_decorator)
    except botocore.exceptions.HTTPClientError as e:
        module.fail_json_aws(e, msg="Failed to connect to AWS")

    # Find the named zone ID
    zone_id = hosted_zone_id_in or get_zone_id_by_name(route53, module, zone_in, private_zone_in, vpc_id_in)

    # Verify that the requested zone is already defined in Route53
    if zone_id is None:
        errmsg = f"Zone {zone_in or hosted_zone_id_in} does not exist in Route53"
        module.fail_json(msg=errmsg)

    aws_record = get_record(route53, zone_id, record_in, type_in, identifier_in)

    resource_record_set = scrub_none_parameters(
        {
            "Name": record_in,
            "Type": type_in,
            "Weight": weight_in,
            "Region": region_in,
            "Failover": failover_in,
            "TTL": ttl_in,
            "ResourceRecords": [dict(Value=value) for value in value_in],
            "HealthCheckId": health_check_in,
            "SetIdentifier": identifier_in,
        }
    )

    if geo_location:
        continent_code = geo_location.get("continent_code")
        country_code = geo_location.get("country_code")
        subdivision_code = geo_location.get("subdivision_code")

        if continent_code and (country_code or subdivision_code):
            module.fail_json(
                changed=False,
                msg=(
                    "While using geo_location, continent_code is mutually exclusive with country_code and"
                    " subdivision_code."
                ),
            )

        if not any([continent_code, country_code, subdivision_code]):
            module.fail_json(
                changed=False,
                msg="To use geo_location please specify either continent_code, country_code, or subdivision_code.",
            )

        if geo_location.get("subdivision_code") and geo_location.get("country_code").lower() != "us":
            module.fail_json(changed=False, msg="To use subdivision_code, you must specify country_code as US.")

        # Build geo_location suboptions specification
        resource_record_set["GeoLocation"] = {}
        if continent_code:
            resource_record_set["GeoLocation"]["ContinentCode"] = continent_code
        if country_code:
            resource_record_set["GeoLocation"]["CountryCode"] = country_code
        if subdivision_code:
            resource_record_set["GeoLocation"]["SubdivisionCode"] = subdivision_code

    if command_in == "delete" and aws_record is not None:
        resource_record_set["TTL"] = aws_record.get("TTL")
        if not resource_record_set["ResourceRecords"]:
            resource_record_set["ResourceRecords"] = aws_record.get("ResourceRecords")

    if alias_in:
        resource_record_set["AliasTarget"] = dict(
            HostedZoneId=alias_hosted_zone_id_in,
            DNSName=value_in[0],
            EvaluateTargetHealth=alias_evaluate_target_health_in,
        )
        if "ResourceRecords" in resource_record_set:
            del resource_record_set["ResourceRecords"]
        if "TTL" in resource_record_set:
            del resource_record_set["TTL"]

    # On CAA records order doesn't matter
    if type_in == "CAA":
        resource_record_set["ResourceRecords"] = sorted(resource_record_set["ResourceRecords"], key=itemgetter("Value"))
        if aws_record:
            aws_record["ResourceRecords"] = sorted(aws_record["ResourceRecords"], key=itemgetter("Value"))

    if command_in == "create" and aws_record == resource_record_set:
        rr_sets = [camel_dict_to_snake_dict(resource_record_set)]
        module.exit_json(changed=False, resource_records_sets=rr_sets)

    if command_in == "get":
        if type_in == "NS":
            ns = aws_record.get("values", [])
        else:
            # Retrieve name servers associated to the zone.
            ns = get_hosted_zone_nameservers(route53, zone_id)

        formatted_aws = format_record(aws_record, zone_in, zone_id)

        if formatted_aws is None:
            # record does not exist
            module.exit_json(changed=False, set=[], nameservers=ns, resource_record_sets=[])

        rr_sets = [camel_dict_to_snake_dict(aws_record)]
        module.exit_json(changed=False, set=formatted_aws, nameservers=ns, resource_record_sets=rr_sets)

    if command_in == "delete" and not aws_record:
        module.exit_json(changed=False)

    if command_in == "create" or command_in == "delete":
        if command_in == "create" and aws_record:
            if not module.params["overwrite"]:
                module.fail_json(msg="Record already exists with different value. Set 'overwrite' to replace it")
            command = "UPSERT"
        else:
            command = command_in.upper()

    wait_id = None
    if not module.check_mode:
        try:
            change_resource_record_sets = route53.change_resource_record_sets(
                aws_retry=True,
                HostedZoneId=zone_id,
                ChangeBatch=dict(Changes=[dict(Action=command, ResourceRecordSet=resource_record_set)]),
            )
            wait_id = change_resource_record_sets["ChangeInfo"]["Id"]

            if wait_in:
                waiter = get_waiter(route53, "resource_record_sets_changed")
                waiter.wait(
                    Id=change_resource_record_sets["ChangeInfo"]["Id"],
                    WaiterConfig=dict(
                        Delay=WAIT_RETRY,
                        MaxAttempts=wait_timeout_in // WAIT_RETRY,
                    ),
                )
        except is_boto3_error_message("but it already exists"):
            module.exit_json(changed=False)
        except botocore.exceptions.WaiterError as e:
            module.fail_json_aws(e, msg="Timeout waiting for resource records changes to be applied")
        except (
            botocore.exceptions.BotoCoreError,
            botocore.exceptions.ClientError,
        ) as e:  # pylint: disable=duplicate-except
            module.fail_json_aws(e, msg="Failed to update records")
        except Exception as e:
            module.fail_json(msg=f"Unhandled exception. ({to_native(e)})")

    formatted_aws = format_record(aws_record, zone_in, zone_id)
    formatted_record = format_record(resource_record_set, zone_in, zone_id)

    return_result = dict(
        changed=True,
        wait_id=wait_id,
        resource_record_sets=[camel_dict_to_snake_dict(formatted_record)] if command_in != "delete" else {},
    )

    if module._diff:
        return_result.update(
            {
                "diff": {
                    "before": formatted_aws,
                    "after": camel_dict_to_snake_dict(formatted_record) if command_in != "delete" else {},
                }
            }
        )

    module.exit_json(**return_result)


if __name__ == "__main__":
    main()

changes

  • list_resource_record_setsはStartRecordName・StartRecordTypeを使うことで全件取得しなくても、必要なレコードを取得するようにする
    • paginate処理の削除
    • NSレコード処理のループ処理削除
  • list_hosted_zonesもlist_hosted_zones_by_nameを使用することで全レコード取得を不要にする

主な修正点

モジュール名の修正

カスタムモジュールではモジュール名の変更をお勧めします

また、コードはこの部分で指定したものと同じファイル名である必要があります

module: route53

module: custom_route53

加えて、DOCUMENTATION内の例で呼び出すモジュール名も修正しておきましょう

get_recordの修正

paginateは不要になったため、_list_record_setsは削除
record_identifierが渡される場合もあるので、その有無によってrequest_argsの中身を変更しています

また、1件の取得で良いので、for文を削除しました。
一方でResourceRecordSetsが空になる場合もあるので、空配列化のチェックを追加しています

def get_record(route53, zone_id, record_name, record_type, record_identifier):
    request_args = {
        "HostedZoneId": zone_id,
        "StartRecordName": record_name,
        "StartRecordType": record_type,
        "MaxItems": "1",
    }
    if record_identifier:
        request_args["StartRecordIdentifier"] = record_identifier

    record_sets_results = route53.list_resource_record_sets(
        aws_retry=True, **request_args
    )["ResourceRecordSets"]

    if len(record_sets_results) == 0:
        return None

    record_set = record_sets_results[0]

    record_set["Name"] = record_set["Name"].encode().decode("unicode_escape")
    # list_resource_record_sets may return another record name even StartRecordName is not exist.
    # So if the record name and type is not equal, return None
    if (record_name.lower(), record_type) != (
        record_set["Name"].lower(),
        record_set["Type"],
    ):
        return None

    if record_identifier and record_identifier != record_set.get("SetIdentifier"):
        return None

    return record_set

get_zone_id_by_nameの修正

list_hosted_zones_by_name というぴったりのAPIがあるため、それを使用します。
これによって指定のホストゾーンを直接検索します

def get_zone_id_by_name(route53, module, zone_name, want_private, want_vpc_id):
    """Finds a zone by name or zone_id"""
    hosted_zones_results = route53.list_hosted_zones_by_name(
        aws_retry=True, DNSName=zone_name
    )["HostedZones"]

    for zone in hosted_zones_results:
        # only save this zone id if the private status of the zone matches
        # the private_zone_in boolean specified in the params
        private_zone = module.boolean(zone["Config"].get("PrivateZone", False))
        zone_id = zone["Id"].replace("/hostedzone/", "")

        if private_zone == want_private and zone["Name"] == zone_name:
            if want_vpc_id:
                # NOTE: These details aren't available in other boto3 methods, hence the necessary
                # extra API call
                hosted_zone = route53.get_hosted_zone(aws_retry=True, Id=zone_id)
                if want_vpc_id in [v["VPCId"] for v in hosted_zone["VPCs"]]:
                    return zone_id
            else:
                return zone_id
    return None

get_hosted_zone_nameserversの修正

StartRecordType="NS"とすることで、nameserversを取得します

ホストゾーンにおいて、NSレコードは必ず存在するため、ここでは空配列の考慮は不要です

def get_hosted_zone_nameservers(route53, zone_id):
    hosted_zone_name = route53.get_hosted_zone(aws_retry=True, Id=zone_id)["HostedZone"]["Name"]
    # Since NS records cannot be deleted from the host zone, ResourceRecordSets elements must always exist.
    # Therefore, empty arrays of ResourceRecordSets need not be considered.
    nameservers_records = route53.list_resource_record_sets(
        aws_retry=True,
        HostedZoneId=zone_id,
        StartRecordName=hosted_zone_name,
        StartRecordType="NS",
        MaxItems="1",
    )["ResourceRecordSets"][0]["ResourceRecords"]

    return [ns_record["Value"] for ns_record in nameservers_records]

カスタムモジュールのテスト

リファレンスを参考に実施してください。基本的にlibraryフォルダにPythonをデプロイし、TASKにてモジュールを指定すれば実行可能です

まとめ

今回の取り組みで、以下の点を学びました。今後ともより良い運用のために課題解決に取り組んでいきます

  • 大規模・大量の環境管理において、安易なリトライはむしろAPI負荷を上げることになる
  • 実現したい目的とコードを理解し、根本的な解決策を実施する
  • 大規模運用ならではの課題解決の面白さ
2
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
2
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?