0
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

Glue上のPySparkでS3上のファイルをリネーム

Last updated at Posted at 2018-11-02

リネーム処理

いろいろ調べた結果以下のように

# SparkContext
sc = SparkContext()

# Javaのクラス
URI           = sc._gateway.jvm.java.net.URI
Path          = sc._gateway.jvm.org.apache.hadoop.fs.Path
FileSystem    = sc._gateway.jvm.org.apache.hadoop.fs.s3.S3FileSystem

# HDFSのFileSystemを設定
fs = FileSystem.get(URI("s3://{}".format('your.bucket.name')), sc._jsc.hadoopConfiguration())
fs.rename(
    Path("s3://your.bucket.name/BEFORE_RENAME.csv"),
    Path("s3://your.bucket.name/RENAMED.csv")
)

s3上で使うときにバグを踏む

s3に対してoverwriteを指定して書き込んだ時、ファイルが存在しないときにコードの条件文がおかしいのでExceptionになる?

0
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?