はじめに
Rのデータフレームの集計の仕方について、サンプルデータを用いて具体的に練習してみました。
目次
Rのデータフレームの集計操作
Rのデータフレームの集計について、サンプルデータを用いて具体的に練習します。
Rのパッケージdplyr
, tidyr
を使います。
Rの標準パッケージを使ったデータフレーム操作については、こちら。
なお、以下については別記事でまとめています。
参考:
https://r4ds.had.co.nz/transform.html
https://github.com/tidyverse/dplyr
https://heavywatal.github.io/rstats/dplyr.html
練習に用いるデータ
Rのサンプルデータiris
とanscombe
を用います。
行番号を表示した方が分かりやすいため、tibble::rowid_to_column()
で行番号を列名にしています。
(こちらの記事と同じデータを用います。)
library(dplyr)
library(tibble)
library(tidyr)
iris
# 行番号を列名に
iris_tbl <- iris %>% as_tibble() %>% rowid_to_column("id")
iris_tbl
# # A tibble: 150 x 6
# id Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# <int> <dbl> <dbl> <dbl> <dbl> <fct>
# 1 1 5.1 3.5 1.4 0.2 setosa
# 2 2 4.9 3 1.4 0.2 setosa
# 3 3 4.7 3.2 1.3 0.2 setosa
# 4 4 4.6 3.1 1.5 0.2 setosa
# 5 5 5 3.6 1.4 0.2 setosa
# 6 6 5.4 3.9 1.7 0.4 setosa
# 7 7 4.6 3.4 1.4 0.3 setosa
# 8 8 5 3.4 1.5 0.2 setosa
# 9 9 4.4 2.9 1.4 0.2 setosa
# 10 10 4.9 3.1 1.5 0.1 setosa
# # ... with 140 more rows
anscombe
anscombe_tbl <- anscombe %>% as_tibble() %>% rowid_to_column("id")
anscombe_tbl
# # A tibble: 11 x 9
# id x1 x2 x3 x4 y1 y2 y3 y4
# <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 10 10 10 8 8.04 9.14 7.46 6.58
# 2 2 8 8 8 8 6.95 8.14 6.77 5.76
# 3 3 13 13 13 8 7.58 8.74 12.7 7.71
# 4 4 9 9 9 8 8.81 8.77 7.11 8.84
# 5 5 11 11 11 8 8.33 9.26 7.81 8.47
# 6 6 14 14 14 8 9.96 8.1 8.84 7.04
# 7 7 6 6 6 8 7.24 6.13 6.08 5.25
# 8 8 4 4 4 19 4.26 3.1 5.39 12.5
# 9 9 12 12 12 8 10.8 9.13 8.15 5.56
# 10 10 7 7 7 8 4.82 7.26 6.42 7.91
# 11 11 5 5 5 8 5.68 4.74 5.73 6.89
tally():カウント集計
行をカウント集計します。
# tally()
iris_tbl %>%
tally()
# # A tibble: 1 x 1
# n
# <int>
# 1 150
iris_tbl %>%
group_by(Species) %>%
tally()
# # A tibble: 3 x 2
# Species n
# <fct> <int>
# 1 setosa 50
# 2 versicolor 50
# 3 virginica 50
count():グループ別カウント集計
グループ別にカウント集計します。
# count()
iris_tbl %>%
count()
# # A tibble: 1 x 1
# n
# <int>
# 1 150
# これと同じ
iris_tbl %>%
tally()
iris_tbl %>%
count(Species)
# # A tibble: 3 x 2
# Species n
# <fct> <int>
# 1 setosa 50
# 2 versicolor 50
# 3 virginica 50
# これと同じ
iris_tbl %>%
group_by(Species) %>%
tally()
引数name
を指定すると、カウントした結果の列名を変更できます(デフォルトの列名は"n")。
# 引数name
iris_tbl %>%
count(name = "count")
# # A tibble: 1 x 1
# count
# <int>
# 1 150
iris_tbl %>%
count(Species, name = "count")
# # A tibble: 3 x 2
# Species count
# <fct> <int>
# 1 setosa 50
# 2 versicolor 50
# 3 virginica 50
summarise():基本統計量の集計
基本統計量を集計します。
# summarise()
iris_tbl %>%
summarise(n = n(), max = max(Sepal.Length), min = min(Sepal.Length))
# # A tibble: 1 x 3
# n max min
# <int> <dbl> <dbl>
# 1 150 7.9 4.3
iris_tbl %>%
summarise(count = n(),
mean(Sepal.Length), mean(Sepal.Width),
mean(Petal.Length), mean(Petal.Width))
# # A tibble: 1 x 5
# count `mean(Sepal.Length)` `mean(Sepal.Width)` `mean(Petal.Length)` `mean(Petal.Width)`
# <int> <dbl> <dbl> <dbl> <dbl>
# 1 150 5.84 3.06 3.76 1.20
iris_tbl %>%
summarise(count = n(),
mean_sl = mean(Sepal.Length), mean_sw = mean(Sepal.Width),
mean_pl = mean(Petal.Length), mean_pw = mean(Petal.Width))
# # A tibble: 1 x 5
# count mean_sl mean_sw mean_pl mean_pw
# <int> <dbl> <dbl> <dbl> <dbl>
# 1 150 5.84 3.06 3.76 1.20
iris_tbl %>%
group_by(Species) %>%
summarise(count = n(),
mean_sl = mean(Sepal.Length), mean_sw = mean(Sepal.Width),
mean_pl = mean(Petal.Length), mean_pw = mean(Petal.Width))
# # A tibble: 3 x 6
# Species count mean_sl mean_sw mean_pl mean_pw
# <fct> <int> <dbl> <dbl> <dbl> <dbl>
# 1 setosa 50 5.01 3.43 1.46 0.246
# 2 versicolor 50 5.94 2.77 4.26 1.33
# 3 virginica 50 6.59 2.97 5.55 2.03
iris_tbl %>%
group_by(Species) %>%
summarise(n = n(), # count
n_distinct = n_distinct(Sepal.Length), # length(unique(x))
mean = mean(Sepal.Length), # mean
median = median(Sepal.Length), # median
sd = sd(Sepal.Length), # SD (standard deviation)
IQR = IQR(Sepal.Length), # IQR (interquartile range)
mad = mad(Sepal.Length), # MAD (median absolute deviation)
min = min(Sepal.Length), # min
max = max(Sepal.Length), # max
Q1 = quantile(Sepal.Length, 0.25), # quantile
Q3 = quantile(Sepal.Length, 0.75), # quantile
sum = sum(Sepal.Length), # sum
prod = prod(Sepal.Length)) # prod
# # A tibble: 3 x 14
# Species n n_distinct mean median sd IQR mad min max Q1 Q3 sum prod
# <fct> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 setosa 50 15 5.01 5 0.352 0.4 0.297 4.3 5.8 4.8 5.2 250. 8.35e34
# 2 versicolor 50 21 5.94 5.9 0.516 0.7 0.519 4.9 7 5.6 6.3 297. 3.93e38
# 3 virginica 50 21 6.59 6.5 0.636 0.675 0.593 4.9 7.9 6.22 6.9 329. 6.88e40
iris_tbl %>%
group_by(Species) %>%
summarise(first = first(Sepal.Length), # 先頭
last = last(Sepal.Length), # 末尾
nth_1 = nth(Sepal.Length, n = 1), # 1番目
nth_2 = nth(Sepal.Length, n = 2), # 2番目
nth_3 = nth(Sepal.Length, n = 3), # 3番目
nth_last_3 = nth(Sepal.Length, n = -3), # 後ろから3番目
nth_last_2 = nth(Sepal.Length, n = -2), # 後ろから2番目
nth_last_1 = nth(Sepal.Length, n = -1)) # 後ろから1番目
# # A tibble: 3 x 9
# Species first last nth_1 nth_2 nth_3 nth_last_3 nth_last_2 nth_last_1
# <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 setosa 5.1 5 5.1 4.9 4.7 4.6 5.3 5
# 2 versicolor 7 5.7 7 6.4 6.9 6.2 5.1 5.7
# 3 virginica 6.3 5.9 6.3 5.8 7.1 6.5 6.2 5.9
なお、統計量を集計する関数の結果が長さ1でない場合でも使えます。
iris_tbl %>%
summarise(range(Sepal.Length), range(Sepal.Width))
# # A tibble: 2 x 2
# `range(Sepal.Length)` `range(Sepal.Width)`
# <dbl> <dbl>
# 1 4.3 2
# 2 7.9 4.4
iris_tbl %>%
summarise(n = n(), range_sl = range(Sepal.Length), range_sw = range(Sepal.Width))
# # A tibble: 2 x 3
# n range_sl range_sw
# <int> <dbl> <dbl>
# 1 150 4.3 2
# 2 150 7.9 4.4
iris_tbl %>%
summarise(n = n(),
max = max(Sepal.Length),
min = min(Sepal.Length),
range = range(Sepal.Length))
# # A tibble: 2 x 4
# n max min range
# <int> <dbl> <dbl> <dbl>
# 1 150 7.9 4.3 4.3
# 2 150 7.9 4.3 7.9
summarise(across()):複数列にわたる集計
across
を使うと、複数の列にわたって同じ統計量を計算する関数を適用して集計できます。
# summarise(across())
iris_tbl %>%
group_by(Species) %>%
summarise(across(Sepal.Length:Petal.Width, mean))
# # A tibble: 3 x 5
# Species Sepal.Length Sepal.Width Petal.Length Petal.Width
# <fct> <dbl> <dbl> <dbl> <dbl>
# 1 setosa 5.01 3.43 1.46 0.246
# 2 versicolor 5.94 2.77 4.26 1.33
# 3 virginica 6.59 2.97 5.55 2.03
iris_tbl %>%
group_by(Species) %>%
summarise(across(Sepal.Length:Petal.Width, ~ mean(.) * 50))
iris_tbl %>%
group_by(Species) %>%
summarise(across(Sepal.Length:Petal.Width, ~ mean(.x) * 50))
# A tibble: 3 x 5
# Species Sepal.Length Sepal.Width Petal.Length Petal.Width
# <fct> <dbl> <dbl> <dbl> <dbl>
# 1 setosa 250. 171. 73.1 12.3
# 2 versicolor 297. 138. 213 66.3
# 3 virginica 329. 149. 278. 101.
# これと同じ結果になる
iris_tbl %>%
group_by(Species) %>%
summarise(across(Sepal.Length:Petal.Width, sum))
where()
across()
に渡す列の指定にwhere()
で条件を指定できます。
# where()
iris_tbl %>%
group_by(Species) %>%
summarise(across(where(is.numeric), mean))
# # A tibble: 3 x 6
# Species id Sepal.Length Sepal.Width Petal.Length Petal.Width
# <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 setosa 25.5 5.01 3.43 1.46 0.246
# 2 versicolor 75.5 5.94 2.77 4.26 1.33
# 3 virginica 126. 6.59 2.97 5.55 2.03
iris_tbl %>%
group_by(Species) %>%
summarise(across(where(is.double), mean))
# # A tibble: 3 x 5
# Species Sepal.Length Sepal.Width Petal.Length Petal.Width
# <fct> <dbl> <dbl> <dbl> <dbl>
# 1 setosa 5.01 3.43 1.46 0.246
# 2 versicolor 5.94 2.77 4.26 1.33
# 3 virginica 6.59 2.97 5.55 2.03
iris_tbl %>%
group_by(Species) %>%
summarise(across(where(is.integer), mean))
# # A tibble: 3 x 2
# Species id
# <fct> <dbl>
# 1 setosa 25.5
# 2 versicolor 75.5
# 3 virginica 126.
anscombe_tbl %>%
summarise(across(where(is.double), mean))
# # A tibble: 1 x 8
# x1 x2 x3 x4 y1 y2 y3 y4
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 9 9 9 9 7.50 7.50 7.5 7.50
引数.fns:複数の集計関数
across
した場合にも統計量を計算する関数を複数リストで与えることができます。
# 引数.fns
iris_tbl %>%
group_by(Species) %>%
summarise(across(starts_with("Sepal"),
.fns = list(mean, sd)))
# # A tibble: 3 x 5
# Species Sepal.Length_1 Sepal.Length_2 Sepal.Width_1 Sepal.Width_2
# <fct> <dbl> <dbl> <dbl> <dbl>
# 1 setosa 5.01 0.352 3.43 0.379
# 2 versicolor 5.94 0.516 2.77 0.314
# 3 virginica 6.59 0.636 2.97 0.322
anscombe_tbl %>%
summarise(across(x1:y4, .fns = list(mean, sd)))
# # A tibble: 1 x 16
# x1_1 x1_2 x2_1 x2_2 x3_1 x3_2 x4_1 x4_2 y1_1 y1_2 y2_1 y2_2 y3_1 y3_2 y4_1 y4_2
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 9 3.32 9 3.32 9 3.32 9 3.32 7.50 2.03 7.50 2.03 7.5 2.03 7.50 2.03
引数.names:出力列名指定
出力結果の列名を引数.names
で指定できます。
# 引数.names
iris_tbl %>%
group_by(Species) %>%
summarise(across(Sepal.Length:Petal.Width,
.fns = mean, .names = "mean_{col}"))
# # A tibble: 3 x 5
# Species mean_Sepal.Length mean_Sepal.Width mean_Petal.Length mean_Petal.Width
# <fct> <dbl> <dbl> <dbl> <dbl>
# 1 setosa 5.01 3.43 1.46 0.246
# 2 versicolor 5.94 2.77 4.26 1.33
# 3 virginica 6.59 2.97 5.55 2.03
iris_tbl %>%
group_by(Species) %>%
summarise(across(starts_with("Sepal"),
.fns = list(Mean = mean, SD = sd),
.names = "{fn}_{col}"))
# # A tibble: 3 x 5
# Species Mean_Sepal.Length SD_Sepal.Length Mean_Sepal.Width SD_Sepal.Width
# <fct> <dbl> <dbl> <dbl> <dbl>
# 1 setosa 5.01 0.352 3.43 0.379
# 2 versicolor 5.94 0.516 2.77 0.314
# 3 virginica 6.59 0.636 2.97 0.322
iris_tbl %>%
group_by(Species) %>%
summarise(across(starts_with("Sepal"),
.fns = list(Mean = mean, SD = sd),
.names = "{fn}({col})"))
# # A tibble: 3 x 5
# Species `Mean(Sepal.Length)` `SD(Sepal.Length)` `Mean(Sepal.Width)` `SD(Sepal.Width)`
# <fct> <dbl> <dbl> <dbl> <dbl>
# 1 setosa 5.01 0.352 3.43 0.379
# 2 versicolor 5.94 0.516 2.77 0.314
# 3 virginica 6.59 0.636 2.97 0.322
iris_tbl %>%
group_by(Species) %>%
summarise(across(starts_with("Sepal"),
.fns = list(mean, sd),
.names = "f{fn}_{col}"))
# # A tibble: 3 x 5
# Species f1_Sepal.Length f2_Sepal.Length f1_Sepal.Width f2_Sepal.Width
# <fct> <dbl> <dbl> <dbl> <dbl>
# 1 setosa 5.01 0.352 3.43 0.379
# 2 versicolor 5.94 0.516 2.77 0.314
# 3 virginica 6.59 0.636 2.97 0.322
anscombe_tbl %>%
summarise(across(x1:y4,
.fns = list(mean = mean, sd = sd),
.names = "{fn}_{col}"))
# # A tibble: 1 x 16
# mean_x1 sd_x1 mean_x2 sd_x2 mean_x3 sd_x3 mean_x4 sd_x4 mean_y1 sd_y1 mean_y2 sd_y2 mean_y3 sd_y3 mean_y4 sd_y4
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 9 3.32 9 3.32 9 3.32 9 3.32 7.50 2.03 7.50 2.03 7.5 2.03 7.50 2.03
おまけ:pivotと組み合わせてsummary
tidyr
のpivot_longer()
, pivot_wider()
と組み合わせると、summary()
関数のような結果が出力できます。
iris_tbl %>%
summarise(across(Sepal.Length:Petal.Width,
.fns = list(mean = mean, sd = sd),
.names = "{fn}_{col}")) %>% print() %>%
pivot_longer(cols = everything(),
names_to = "name",
values_to = "value") %>% print() %>%
separate(name, c("stat", "var"), sep = "_") %>% print() %>%
pivot_wider(names_from = var, values_from = value) %>% print() %>%
column_to_rownames(var = "stat")
# Sepal.Length Sepal.Width Petal.Length Petal.Width
# mean 5.8433333 3.0573333 3.758000 1.1993333
# sd 0.8280661 0.4358663 1.765298 0.7622377
anscombe_tbl %>%
summarise(across(x1:y4,
.fns = list(mean = mean, sd = sd),
.names = "{fn}_{col}")) %>% print() %>%
pivot_longer(cols = everything(),
names_to = "name",
values_to = "value") %>% print() %>%
separate(name, c("stat", "var"), sep = "_") %>% print() %>%
pivot_wider(names_from = var, values_from = value) %>% print() %>%
column_to_rownames(var = "stat")
# x1 x2 x3 x4 y1 y2 y3 y4
# mean 9.000000 9.000000 9.000000 9.000000 7.500909 7.500909 7.500000 7.500909
# sd 3.316625 3.316625 3.316625 3.316625 2.031568 2.031657 2.030424 2.030579
iris_tbl %>%
group_by(Species) %>%
summarise(across(Sepal.Length:Petal.Width,
list(mean = mean, sd = sd),
.names = "{fn}_{col}")) %>% print() %>%
pivot_longer(-Species,
names_to = "name", values_to = "value") %>% print() %>%
separate(name, c("stat", "var"), sep = "_") %>% print() %>%
pivot_wider(names_from = c(var), values_from = value)
# # A tibble: 6 x 6
# Species stat Sepal.Length Sepal.Width Petal.Length Petal.Width
# <fct> <chr> <dbl> <dbl> <dbl> <dbl>
# 1 setosa mean 5.01 3.43 1.46 0.246
# 2 setosa sd 0.352 0.379 0.174 0.105
# 3 versicolor mean 5.94 2.77 4.26 1.33
# 4 versicolor sd 0.516 0.314 0.470 0.198
# 5 virginica mean 6.59 2.97 5.55 2.03
# 6 virginica sd 0.636 0.322 0.552 0.275
iris_tbl %>%
summarise(across(Sepal.Length:Petal.Width,
.fns = list(Min = min,
Q1 = ~ quantile(., 0.25),
Median = median,
Mean = mean,
Q3 = ~ quantile(., 0.75),
Max = max),
.names = "{fn}_{col}")) %>% print() %>%
pivot_longer(cols = everything(),
names_to = "name",
values_to = "value") %>% print() %>%
separate(name, c("stat", "var"), sep = "_") %>% print() %>%
pivot_wider(names_from = c(var), values_from = value) %>%
column_to_rownames(var = "stat")
# Sepal.Length Sepal.Width Petal.Length Petal.Width
# Min 4.300000 2.000000 1.000 0.100000
# Q1 5.100000 2.800000 1.600 0.300000
# Median 5.800000 3.000000 4.350 1.300000
# Mean 5.843333 3.057333 3.758 1.199333
# Q3 6.400000 3.300000 5.100 1.800000
# Max 7.900000 4.400000 6.900 2.500000
# これと同じ結果に
summary(iris[1:4])
# Sepal.Length Sepal.Width Petal.Length Petal.Width
# Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
# 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
# Median :5.800 Median :3.000 Median :4.350 Median :1.300
# Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
# 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
# Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
anscombe_tbl %>%
summarise(across(x1:y4,
.fns = list(Min = min,
Q1 = ~ quantile(., 0.25),
Median = median,
Mean = mean,
Q3 = ~ quantile(., 0.75),
Max = max),
.names = "{fn}_{col}")) %>% print() %>%
pivot_longer(cols = everything(),
names_to = "name",
values_to = "value") %>% print() %>%
separate(name, c("stat", "var"), sep = "_") %>% print() %>%
pivot_wider(names_from = c(var), values_from = value) %>%
column_to_rownames(var = "stat")
# x1 x2 x3 x4 y1 y2 y3 y4
# Min 4.0 4.0 4.0 8 4.260000 3.100000 5.39 5.250000
# Q1 6.5 6.5 6.5 8 6.315000 6.695000 6.25 6.170000
# Median 9.0 9.0 9.0 8 7.580000 8.140000 7.11 7.040000
# Mean 9.0 9.0 9.0 9 7.500909 7.500909 7.50 7.500909
# Q3 11.5 11.5 11.5 8 8.570000 8.950000 7.98 8.190000
# Max 14.0 14.0 14.0 19 10.840000 9.260000 12.74 12.500000
# これと同じ結果に
summary(anscombe)
# x1 x2 x3 x4 y1 y2 y3 y4
# Min. : 4.0 Min. : 4.0 Min. : 4.0 Min. : 8 Min. : 4.260 Min. :3.100 Min. : 5.39 Min. : 5.250
# 1st Qu.: 6.5 1st Qu.: 6.5 1st Qu.: 6.5 1st Qu.: 8 1st Qu.: 6.315 1st Qu.:6.695 1st Qu.: 6.25 1st Qu.: 6.170
# Median : 9.0 Median : 9.0 Median : 9.0 Median : 8 Median : 7.580 Median :8.140 Median : 7.11 Median : 7.040
# Mean : 9.0 Mean : 9.0 Mean : 9.0 Mean : 9 Mean : 7.501 Mean :7.501 Mean : 7.50 Mean : 7.501
# 3rd Qu.:11.5 3rd Qu.:11.5 3rd Qu.:11.5 3rd Qu.: 8 3rd Qu.: 8.570 3rd Qu.:8.950 3rd Qu.: 7.98 3rd Qu.: 8.190
# Max. :14.0 Max. :14.0 Max. :14.0 Max. :19 Max. :10.840 Max. :9.260 Max. :12.74 Max. :12.500
pivot_wider():ピボット集計
ピボット集計については、tidyr
のpivot_wider()
関数(引数values_fn
に集計関数を指定)で計算できます(こちら参照)。
まず、使用するデータを作成しておきます。
# データの準備
iris_long <- iris_tbl %>%
pivot_longer(cols = -c(id, Species), names_to = c("part", "measure"), names_sep = "\\.", values_to = "value")
iris_long
# # A tibble: 600 x 5
# id Species part measure value
# <int> <fct> <chr> <chr> <dbl>
# 1 1 setosa Sepal Length 5.1
# 2 1 setosa Sepal Width 3.5
# 3 1 setosa Petal Length 1.4
# 4 1 setosa Petal Width 0.2
# 5 2 setosa Sepal Length 4.9
# 6 2 setosa Sepal Width 3
# 7 2 setosa Petal Length 1.4
# 8 2 setosa Petal Width 0.2
# 9 3 setosa Sepal Length 4.7
# 10 3 setosa Sepal Width 3.2
# # ... with 590 more rows
anscombe_long <- anscombe_tbl %>%
pivot_longer(cols = -id, names_to = c("axis", "set"), names_sep = 1L, values_to = "value")
anscombe_long
# # A tibble: 88 x 4
# id axis set value
# <int> <chr> <chr> <dbl>
# 1 1 x 1 10
# 2 1 x 2 10
# 3 1 x 3 10
# 4 1 x 4 8
# 5 1 y 1 8.04
# 6 1 y 2 9.14
# 7 1 y 3 7.46
# 8 1 y 4 6.58
# 9 2 x 1 8
# 10 2 x 2 8
# # ... with 78 more rows
ピボット集計でカウント
# ピボット集計でカウント
iris_long %>%
pivot_wider(id_cols = c(part, measure),
names_from = Species, values_from = value,
values_fn = list(value = length))
# # A tibble: 4 x 5
# part measure setosa versicolor virginica
# <chr> <chr> <int> <int> <int>
# 1 Sepal Length 50 50 50
# 2 Sepal Width 50 50 50
# 3 Petal Length 50 50 50
# 4 Petal Width 50 50 50
anscombe_long %>%
pivot_wider(id_cols = set,
names_from = axis, values_from = value,
values_fn = list(value = length))
# # A tibble: 4 x 3
# set x y
# <chr> <int> <int>
# 1 1 11 11
# 2 2 11 11
# 3 3 11 11
# 4 4 11 11
ピボット集計で合計
# ピボット集計で合計
iris_long %>%
pivot_wider(id_cols = c(part, measure),
names_from = Species, values_from = value,
values_fn = list(value = sum))
# # A tibble: 4 x 5
# part measure setosa versicolor virginica
# <chr> <chr> <dbl> <dbl> <dbl>
# 1 Sepal Length 250. 297. 329.
# 2 Sepal Width 171. 138. 149.
# 3 Petal Length 73.1 213 278.
# 4 Petal Width 12.3 66.3 101.
anscombe_long %>%
pivot_wider(id_cols = set,
names_from = axis, values_from = value,
values_fn = list(value = sum))
# # A tibble: 4 x 3
# set x y
# <chr> <dbl> <dbl>
# 1 1 99 82.5
# 2 2 99 82.5
# 3 3 99 82.5
# 4 4 99 82.5
ピボット集計で平均
# ピボット集計で平均
iris_long %>%
pivot_wider(id_cols = c(part, measure),
names_from = Species, values_from = value,
values_fn = list(value = mean))
# # A tibble: 4 x 5
# part measure setosa versicolor virginica
# <chr> <chr> <dbl> <dbl> <dbl>
# 1 Sepal Length 5.01 5.94 6.59
# 2 Sepal Width 3.43 2.77 2.97
# 3 Petal Length 1.46 4.26 5.55
# 4 Petal Width 0.246 1.33 2.03
anscombe_long %>%
pivot_wider(id_cols = set,
names_from = axis, values_from = value,
values_fn = list(value = mean))
# # A tibble: 4 x 3
# set x y
# <chr> <dbl> <dbl>
# 1 1 9 7.50
# 2 2 9 7.50
# 3 3 9 7.5
# 4 4 9 7.50
R標準パッケージによる集計操作
参考
- tidyverse
- R for Data Science
- Heavy Watal - R stats
- Rのデータベースの操作についての自前のまとめ