0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

macのdata.frame::fread()でShift-JISのデータを読み込む

Posted at

Rでcsvデータを読み込む方法は色々あります。

  • read.csv()
  • readr::read_csv()
  • data.table::fread()

その中で高速に読み込むことができるのはdata.table::fread()なのですが、WindwosのShift-JISデータを読み込むと文字化けが発生するので少し工夫が必要です。

freadのhelp

encoding:
default is "unknown". Other possible options are "UTF-8" and "Latin-1". Note: it is not used to re-encode the input, rather enables handling of encoded strings in their native encoding.

対処法

# ライブラリの読み込み
# install.packages("tidyverse")
# install.packages("data.table")

library(tidyverse)

# 都道府県別の面積(km2)と人口(人)データ
prefecture_data <- tibble(
  都道府県 = c(
    "北海道","青森県","岩手県","宮城県","秋田県","山形県","福島県",
    "茨城県","栃木県","群馬県","埼玉県","千葉県","東京都","神奈川県",
    "新潟県","富山県","石川県","福井県","山梨県","長野県","岐阜県",
    "静岡県","愛知県","三重県","滋賀県","京都府","大阪府","兵庫県",
    "奈良県","和歌山県","鳥取県","島根県","岡山県","広島県","山口県",
    "徳島県","香川県","愛媛県","高知県","福岡県","佐賀県","長崎県",
    "熊本県","大分県","宮崎県","鹿児島県","沖縄県"
  ),
  面積_km2 = c(
    83422.27, 9645.10, 15275.05, 7282.30, 11637.52, 9323.15, 13784.39,
    6098.31, 6408.09, 6362.28, 3797.75, 5156.48, 2199.94, 2416.55,
    12583.88, 4247.54, 4186.20, 4190.57, 4465.27, 13561.56, 10621.29,
    7776.99, 5173.21, 5774.48, 4017.38, 4612.21, 1905.34, 8400.82,
    3690.94, 4724.66, 3507.03, 6707.78, 7114.44, 8478.16, 6113.00,
    4147.00, 1876.86, 5675.89, 7102.28, 4987.66, 2440.64, 4131.20,
    7409.18, 6340.70, 7734.16, 9186.20, 2282.11
  ),
  人口_人 = c(
    3820016, 754751, 783242, 1829565, 560429, 710838, 1247000,
    2245065, 1502202, 1520630, 6633932, 5690156, 14399144, 8524492,
    1525004, 761719, 896801, 572885, 611586, 1581949, 1468392,
    2828823, 6676331, 1347202, 1222791, 2075975, 7263182, 4357576,
    950365, 631619, 405528, 496994, 1510460, 2229527, 926183,
    480669, 724120, 944634, 450980, 4479021, 620873, 868817,
    1355329, 841343, 796631, 1170602, 1391013
  )
)

# データの確認
print(prefecture_data)

# CSVファイルへ書き出し, Shift-JIS(cp932)で保存
write.csv(prefecture_data, "prefecture_data.csv", fileEncoding = "cp932")

library(data.table)

# 普通にfreadで読み込むと文字化けする
fread("prefecture_data.csv")

# 文字コードを指定して読み込む
f <- "prefecture_data.csv"
fread(cmd = paste("iconv -f CP932 -t UTF-8", shQuote(f)))

解説

iconv

  • iconv は文字コードを変換するコマンド
  • -f CP932 は Shift-JIS を指定
  • -t UTF-8 は UTF-8 に変換
iconv -f CP932 -t UTF-8 ファイル名

と実行すると、変換後のテキストを標準出力に流します。

shQuote(f)

  • f:対象ファイルのパスを表す文字列(例:"データ/売上.csv")
  • shQuote():シェル特殊文字(スペースや日本語、記号など)を適切にクオートし、安全なコマンド引数にする
f <- "C:/データ/売上 2025.csv"
shQuote(f)
# => "'C:/データ/売上 2025.csv'"

fread(cmd = cmdstr)

  • fread() の cmd= 引数にシェルコマンドを渡すと、そのコマンドの標準出力を読み込む
  • cmd= 経由で入ってきたテキストをパースし、data.table オブジェクトとして返す

まとめ

paste("iconv -f CP932 -t UTF-8", shQuote(f)) でシェルコマンド文字列を生成
fread(cmd = <その文字列>) で、生成したコマンドの標準出力を直接読み込む

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?