More than 3 years have passed since last update.

AthenaでクエリできるようにGlacierのオブジェクトの復元、コピーを行う

Posted at 2022-07-04

AthenaでのGlacierオブジェクトの制限

Athenaでは Amazon S3 Glacier Flexible RetrievalとAmazon S3 Glacier Deep Archiveストレージクラスのオブジェクトに関してはクエリされません。
また、マニュアルにあるように、Amazon S3 Glacier Flexible RetrievalとAmazon S3 Glacier Deep Archiveストレージクラスのオブジェクトを復元したとしてもデータの参照は行われないので、参照を行うためにはコピーを行い別オブジェクトを作成する必要があります。
これらの処理を実行するPythonスクリプトを作成してみました。

オブジェクトの復元を行うスクリプト

第１引数にバケット名、第２引数にキー名を指定して実行します。指定したオブジェクトの復元をおこなったら処理が完了します。

import boto3
import logging
import sys

logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)

formatter = logging.Formatter('%(asctime)s  %(name)s  %(levelname)s: %(message)s')

file_handler = logging.FileHandler('glacier_restore.log')
file_handler.setFormatter(formatter)

logger.addHandler(file_handler)

s3 = boto3.client("s3")

if len(sys.argv) !=3:
    logger.info(f"Invalid Parameters")
    sys.exit(1)

bucketName = sys.argv[1].strip()
key = sys.argv[2].strip()

def restoreObject(bucketName, key):
    try:
        s3.restore_object(Bucket=bucketName, Key=key, RestoreRequest={'Days': 1, 'GlacierJobParameters': {'Tier': 'Standard'}})
        logger.info(f"Restore action for '{key}' is performed.")
    except s3.exceptions.ObjectAlreadyInActiveTierError:
        logger.error(f"Restore action is not allowed against this storage tier.")

def main():
    restoreObject(bucketName,key)

if __name__ == "__main__":
    main()

オブジェクトのコピーを行うスクリプト

オブジェクトの復元を行うスクリプトの実行後に実行をするスクリプトになります。
復元は完了するまでに時間がかかるので復元のステータスを確認した後、完了している場合にはオブジェクトのコピーを行います。

import boto3
import logging
import sys
import os

logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)

formatter = logging.Formatter('%(asctime)s  %(name)s  %(levelname)s: %(message)s')

file_handler = logging.FileHandler('glacier_restore.log')
file_handler.setFormatter(formatter)

logger.addHandler(file_handler)

s3 = boto3.client("s3")

if len(sys.argv) !=3:
    logger.info(f"Invalid Parameters")
    sys.exit(1)

bucketName = sys.argv[1].strip()
key = sys.argv[2].strip()

def copyObject(bucketName, key):

    try:
        # Check restore status
        response = s3.head_object(
            Bucket = bucketName,
            Key = key)
    except s3.exceptions.NoSuchBucket:
        logger.error (f"Bucket '{bucketName}' does not exists. Exiting out ...")
        sys.exit()
    restoreStatus = response['ResponseMetadata']['HTTPHeaders'].get('x-amz-restore')
    if restoreStatus is None:
        logger.info(f"Restore operation for '{key}' is not performed.")
    elif restoreStatus == 'ongoing-request="true"':
        logger.info(f"A restore of an Object '{key}' is already in progress")
    else:
        logger.info(f"An object '{key}' is already restored")
        copySource = {'Bucket': bucketName, 'Key': key}
        copiedFileName = os.path.basename(key) + "_copy"
        copiedPath = os.path.join(os.path.dirname(key), copiedFileName)
        s3.copy_object(CopySource=copySource, Bucket=bucketName, Key=copiedPath)
        logger.info(f"'{key}' is copied to '{copiedPath}'.")

def main():
    copyObject(bucketName,key)


if __name__ == "__main__":
    main()

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up