1
3

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 1 year has passed since last update.

AWS S3 にある Shift_JIS の CSV ファイルをアップロードした後、 UTF-8 に変換し、ダウンロードせずに展開する例 ( #AWS + #Ruby gem )

Last updated at Posted at 2020-01-06

ポイント

  • ファイルダウンロードはせず IO から string を読み取る
  • IO や String をそのまま取り扱って UTF-8 の状態で Ruby の CSV class に読ませる
  • あとはこちらのものだ
  • 途中のstringやioの取り扱いでものすごく無駄なことをやっている可能性もある
  • Ruby 2.5の罠に注意 ( #Ruby 2.5 では 1行のBodyだけのCSV ファイルで headers が取得できないっぽい? ( Ruby 2.6 では可能 ) - Qiita https://qiita.com/YumaInaura/items/56d42540c43523682abf )

確認用ファイル

sjis.csv.zip

"ヘッダ1","ヘッダ2","ヘッダ3"
"1234","わー","56789"
"5678","わー","56780"

S3

バケットのトップレベルに sjis.csv があるものとする

image

Code

require 'aws-sdk-s3'
require 'csv'

credentials = Aws::Credentials.new(ENV['AWS_ACCESS_KEY_ID'], ENV['AWS_SECRET_ACCESS_KEY'])
s3_resource = Aws::S3::Resource::new(region: ENV['AWS_REGION'], credentials: credentials) # e.g reagion: 'ap-northeast-1'
s3_bucket = s3_resource.bucket(ENV['AWS_BUCKET']) # e.g bucket name : yourname : in this case "yumainaura"


# If bucket does not exists then create it
# s3_bucket.create
#
# Already exists error is it
# Aws::S3::Errors::BucketAlreadyOwnedByYou: Your previous request to create the named bucket succeeded and you already own it.

# For preparing upload local to storage
s3_bucket.object('sjis.csv').upload_file('sjis.csv')

# Do not specify top level slash
storage_object = s3_bucket.object('sjis.csv')
io = storage_object.get.body
file_body = io.read
csv_utf8_strings = file_body.encode('UTF-8', 'Shift_JIS')
csv_string_io = CSV.new(csv_utf8_strings, headers: true)

csv_string_io.each do |line|
  puts line[0]
  puts line[1]
  puts line[2]
end

Reulst

storage_object
# => #<Aws::S3::Object:0x00007fc9a82fb0b0
#  @bucket_name="yumainaura",
#  @client=#<Aws::S3::Client>,
#  @data=nil,
#  @key="sjis.csv",
#  @waiter_block_warned=false>

storage_object_got = storage_object.get

# <struct Aws::S3::Types::GetObjectOutput
#  body=#<StringIO:0x00007fc9ac0a98d0>,
#  delete_marker=nil,
#  accept_ranges="bytes",
#  expiration=nil,
#  restore=nil,
#  last_modified=2020-01-05 03:09:31 +0000,
#  content_length=73,
#  etag="\"cdd7902d140737b497bdccae379ea93c\"",
#  missing_meta=nil,
#  version_id=nil,
#  cache_control=nil,
#  content_disposition=nil,
#  content_encoding=nil,
#  content_language=nil,
#  content_range=nil,
#  content_type="",
#  expires=nil,
#  expires_string=nil,
#  website_redirect_location=nil,
#  server_side_encryption=nil,
#  metadata={},
#  sse_customer_algorithm=nil,
#  sse_customer_key_md5=nil,
#  ssekms_key_id=nil,
#  storage_class=nil,
#  request_charged=nil,
#  replication_status=nil,
#  parts_count=nil,
#  tag_count=nil,
#  object_lock_mode=nil,
#  object_lock_retain_until_date=nil,
#  object_lock_legal_hold_status=nil>

io = storage_object_got.body
# => #<StringIO:0x00007fc9ac0a98d0>

file_body = io.read
# => "\"\x83w\x83b\x83_1\",\"\x83w\x83b\x83_2\",\"\x83w\x83b\x83_3\"\n\"1234\",\"\x82\xED\x81[\",\"56789\"\n\"5678\",\"\x82\xED\x81[\",\"56780\""

csv_utf8_strings = file_body.encode('UTF-8', 'Shift_JIS')
# => "\"ヘッダ1\",\"ヘッダ2\",\"ヘッダ3\"\n\"1234\",\"わー\",\"56789\"\n\"5678\",\"わー\",\"56780\""

csv_string_io = CSV.new(csv_utf8_strings, headers: true)
# => <#CSV io_type:StringIO encoding:UTF-8 lineno:0 col_sep:"," row_sep:"\n" quote_char:"\"" headers:true>

csv_string_io.each do |line|
   puts line[0]
   puts line[1]
   puts line[2]
 end

# 1234
# わー
# 56789

# 5678
# わー
# 56780

image

Original by Github issue

チャットメンバー募集

何か質問、悩み事、相談などあればLINEオープンチャットもご利用ください。

Twitter

1
3
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
3

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?