LoginSignup
0
1

More than 3 years have passed since last update.

Rで基礎分析結果をCSV出力②量的変数編

Last updated at Posted at 2021-02-15

はじめに

前回Rで基礎分析結果をCSV出力①全体編にて、データ型と欠損数を出力した。続いて、量的変数について扱う。

量的変数と質的変数の分割

default of credit card clients Data Setの説明によると、データは以下のような属性をもつ。

Attribute Information:
This research employed a binary variable, default payment (Yes = 1, No = 0), as the response variable. This study reviewed the literature and used the following 23 variables as explanatory variables:
X1: Amount of the given credit (NT dollar): it includes both the individual consumer credit and his/her family (supplementary) credit.
X2: Gender (1 = male; 2 = female).
X3: Education (1 = graduate school; 2 = university; 3 = high school; 4 = others).
X4: Marital status (1 = married; 2 = single; 3 = others).
X5: Age (year).
X6 - X11: History of past payment. We tracked the past monthly payment records (from April to September, 2005) as follows: X6 = the repayment status in September, 2005; X7 = the repayment status in August, 2005; . . .;X11 = the repayment status in April, 2005. The measurement scale for the repayment status is: -1 = pay duly; 1 = payment delay for one month; 2 = payment delay for two months; . . .; 8 = payment delay for eight months; 9 = payment delay for nine months and above.
X12-X17: Amount of bill statement (NT dollar). X12 = amount of bill statement in September, 2005; X13 = amount of bill statement in August, 2005; . . .; X17 = amount of bill statement in April, 2005.
X18-X23: Amount of previous payment (NT dollar). X18 = amount paid in September, 2005; X19 = amount paid in August, 2005; . . .;X23 = amount paid in April, 2005.

1, 5, 12-23列目のカラムが量的変数、2-4, 6-11, 24列目が質的変数となっているので、それら2つに分割する。(なお d および h は前記事Rで基礎分析結果をCSV出力①全体編ですでに読み込み済み。)

divide.R
# divide the data into 2 parts (1: quantity, 2: quality)
quantity.idx <- c(1, 5, 12:23)
quality.idx  <- c(2:4, 6:11, 24)

d.1 <- d[, quantity.idx]
d.2 <- d[, quality.idx]
h.1 <- data.frame(header[quantity.idx, ])
h.2 <- data.frame(header[quality.idx, ])
names(h.1)[1] <- "ROWNAMES"
names(h.2)[1] <- "ROWNAMES"

量的変数の基礎統計量をCSV出力

量的変数に関して、平均値、標準偏差、最小値、第1四分位、中央値、第3四分位、最大値の7項目を求め、その後ヘッダーにひもづけてCSV形式にて出力。

stats.R
five <- data.frame()
for (i in 1:length(d.1)) {
  five <- rbind(five, c(fivenum(d.1[, i])))
}

h.1$mean <- sapply(d.1, mean, na.rm=TRUE)
h.1$sd <- sapply(d.1, sd, na.rm=TRUE)
h.1$min <- five[, 1]
h.1$qualtile.1st <- five[, 2]
h.1$median <- five[, 3]
h.1$qualtile.3rd <- five[, 4]
h.1$max <- five[, 5]

write.csv(x=h.1, file="stats.csv")

出力結果は以下のようになる。

quantity.png

以降の記事で質的変数の基礎分析を行って、CSV形式で出力をしていく。

0
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
1