More than 1 year has passed since last update.

AWS S3のオブジェクト（GLACIERも）を丸ごとダウンロードする

Last updated at 2024-08-28Posted at 2019-06-12

概要

S3バケットに保管されているオブジェクト（標準・GLACIER）を丸ごとダウンロードするスクリプトです。
S3に保管しているオブジェクトがライフサイクルルールでGLACIERストレージクラスにアーカイブされてしまったので書きました。

使い方

サンプルのバケット名と保存ディレクトリ名を書き換え、スクリプト名を指定してPythonインタプリタを起動します。

$ python restore_n_download.py

GLACIERからの復元には通常3時間～5時間かかるとのことなので、このスクリプトを一回実行して復元を開始し、3時間～5時間後にもう一回実行すればダウンロードできます。ダウンロード済みのファイルは処理を飛ばしますし、オブジェクトが復元中かどうかも知ることができます。

restore_n_download.py

import boto3
import os
from botocore.exceptions import ClientError

# 設定
CONFIG = {
    'bucket_name': 'awsexamplebucket',
    'filter_prefix': '',
    'base_dir': '/Users/name/awsexamplebucket',
    'restore_days': 1
}

def download_object(s3_client, bucket, key, base_dir):
    save_path = os.path.join(base_dir, key)
    if os.path.exists(save_path):
        print(f'Already exists: {key}')
        return

    os.makedirs(os.path.dirname(save_path), exist_ok=True)
    try:
        s3_client.download_file(bucket, key, save_path)
        print(f'Downloaded: {key}')
    except ClientError as e:
        print(f'Error downloading {key}: {e}')

def main():
    s3 = boto3.client('s3')
    
    paginator = s3.get_paginator('list_objects_v2')
    for page in paginator.paginate(Bucket=CONFIG['bucket_name'], Prefix=CONFIG['filter_prefix']):
        for obj in page.get('Contents', []):
            key = obj['Key']
            try:
                obj_info = s3.head_object(Bucket=CONFIG['bucket_name'], Key=key)
                storage_class = obj_info.get('StorageClass', 'STANDARD')

                if storage_class == 'GLACIER':
                    restore_status = obj_info.get('Restore', '')
                    if 'ongoing-request="true"' in restore_status:
                        print(f'Restoration in-progress: {key}')
                    elif 'ongoing-request="false"' in restore_status:
                        download_object(s3, CONFIG['bucket_name'], key, CONFIG['base_dir'])
                    else:
                        print(f'Submitting restoration request: {key}')
                        s3.restore_object(
                            Bucket=CONFIG['bucket_name'],
                            Key=key,
                            RestoreRequest={'Days': CONFIG['restore_days']}
                        )
                else:
                    download_object(s3, CONFIG['bucket_name'], key, CONFIG['base_dir'])
            except ClientError as e:
                print(f'Error processing {key}: {e}')

if __name__ == '__main__':
    main()

参照

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up