【AWS,Python】Lambdaを使ってJSONファイルをS3に保存する

Last updated at 2024-09-05Posted at 2024-09-05

LambdaからS3に保存

以前作ったlambdaの関数↑から、取得したJSONファイルをS3へ保存させるようにしました。

S3への保存にはboto3が必要

以前作成したコードに
boto3
のインポートを追加します。

import json
import boto3 //ここを追加
from datetime import datetime
import requests
from bs4 import BeautifulSoup
import urllib3

boto3からS3オブジェクトを呼び出す

boto3.resource('s3')
でS3のオブジェクトを呼び出し、ファイルを読み書きできる変数を作ります。

# 警告を無視する設定
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

s3 = boto3.resource('s3')

以下は同様

def get_h3_with_images(class_name_h3, class_name_img, url):
    response = requests.get(url, verify=False)
    soup = BeautifulSoup(response.content, 'html.parser')

    h3_tags = soup.find_all('h3', class_=class_name_h3)
    
    result = []
    for h3 in h3_tags:
        img_tag = h3.find_next('img', class_=class_name_img)
        img_src = img_tag['src'] if img_tag else None
        
        result.append({
            "title": h3.get_text(strip=True),
            "img_src": img_src
        })
    
    return result


def get_event_details(url):
    response = requests.get(url, verify=False)
    soup = BeautifulSoup(response.content, 'html.parser')
    
    details_list = []
    
    events = soup.find_all('div', class_='event_detail')
    
    for event in events:
        details = {}
        

        start_time = event.find('div', class_='col left', string='開催時期')
        if start_time:
            details['開催時期'] = start_time.find_next_sibling('div').get_text(strip=True)

        end_time = event.find('div', class_='col left', string='終了時期')
        if end_time:
            details['終了時期'] = end_time.find_next_sibling('div').get_text(strip=True)
        
        location = event.find('div', class_='col left', string='場所')
        if location:
            details['場所'] = location.find_next_sibling('div').get_text(strip=True)
        
        details_list.append(details)
    
    return details_list


def lambda_handler(event, context):

    current_month = datetime.now().month
    if 3 <= current_month <= 5:
        season = 'spring'
    elif 6 <= current_month <= 8:
        season = 'summer'
    elif 9 <= current_month <= 11:
        season = 'fall'
    else:
        season = 'winter'

    url = f'https://otaru.gr.jp/{season}'
    
    class_name_h3 = 'head_item event_title'
    class_name_img = 'attachment-3x2 size-3x2 wp-post-image'

    h3_with_images = get_h3_with_images(class_name_h3, class_name_img, url)
    event_details = get_event_details(url)

    events = []
    if h3_with_images and event_details:
        for i in range(min(len(h3_with_images), len(event_details))):
            events.append({
                "title": h3_with_images[i]['title'],
                "img_src": h3_with_images[i]['img_src'],
                "details": event_details[i]
            })

requirements.txtの編集

boto3を追加します。

requests
beautifulsoup4
urllib3==1.26.5
boto3　// 追加

取得したJSONをS3にアップロードする

s3変数に
buket：バケット名
key：保存するJSONファイル名
を.Object関数で設定します。

.put関数でJSONファイルをアップロードし、成功するとステータスコード200を返すようにします。

    file_contents = json.dumps(events, ensure_ascii=False)
    bucket = 'バケット名'
    key = 'event_data_' + datetime.now().strftime('%Y-%m-%d-%H-%M-%S') + '.json'
    s3_object = s3.Object(bucket, key)
    response = s3_object.put(Body=file_contents)
    
    return {
        'statusCode': 200,
        'body': json.dumps({"message": "File uploaded successfully", "key": key}, ensure_ascii=False)
    }

lambdaへコードをアップロード

以前と同様、インポートしているモジュールをファイルへ保存し、
そこへlambda_function.pyも格納してzipファイルを作成、
lambdaにアップロードします。

手順は以下

実行結果

テストを実行した結果、ステータス200で成功しているのがわかります。

S3の該当バケットを確認するときちんとJSONがいらっしゃいます。
「event_data_..... .json」がWEBスクレイピングで取得できたデータです。

取得内容をダウンロードして中身確認。大丈夫そうです。

リポジトリ

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up