0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 3 years have passed since last update.

【Python】S3上の複数JSONファイルを結合する

Posted at

awswranglerを使用してS3上の複数JSONファイルを結合し、S3に出力する

概要

AWS Data Wranglerを使用する。
読み込みには以下のJSONLファイルを圧縮した[sample1.json.gz][sample2.json.gz]を使用する。

sample1.jsonl
{"id":1,"father":"Mark","mother":"Charlotte","children":["Tom"]}
{"id":2,"father":"John","mother":"Ann","children":["Jessika","Antony","Jack"]}
sample2.jsonl
{"id":3,"father":"Bob","mother":"Monika","children":["Jerry","Karol"]}

事前準備

事前にawswranglerをインストールする

$ pip install awswrangler

コード

import awswrangler as wr
import pandas as pd
from datetime import datetime,timezone

# 入力
file_list = ["s3://testbucket/prefix/sample1.json.gz",
             "s3://testbucket/prefix/sample2.json.gz"]
dfs = wr.s3.read_json(path=file_list, lines=True)

# 出力
today = datetime.now(timezone.utc).strftime("%Y%m%dT%H%M%SZ")
output_path = 's3://testbucket/output/{}'.format(today)
wr.s3.to_json(
    df=dfs,
    path=output_path,
    orient="records",
    lines=True
)
0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?