2
3

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 3 years have passed since last update.

Rのaggregate関数を使った集約・集計

Last updated at Posted at 2020-04-16

##aggregate関数による集約やグルーピング

データを集計する際に合計を求めたり、平均値などの基礎統計量を算出することはよくやる作業です。
Rを使用したクラシックなやり方として、aggregate関数を示します。

なお、aggregateは作業時間が比較的かかるため、dplyrなどが推奨されています。

##まずはデータ読み込み

ggplot2ライブラリ内になるdiamondsデータを使用します。

R
library(ggplot2)
data(diamonds)
diamonds

#列名調べる
colnames(diamonds)
 [1] "carat"   "cut"     "color"   "clarity" "depth"   "table"   "price"   "x"       "y"       "z"  

##早速、集計作業

"cut"の種類別に(group_byのようなもの)価格の平均値を平均・合計を算出します。
~(チルダ記号)を使って、算出の対象と層別の定義をします)

R
#データは第2引数で指定し、第3引数で関数を指定
aggregate(price ~  cut , diamonds, mean)
        cut    price
1      Fair 4358.758
2      Good 3928.864
3 Very Good 3981.760
4   Premium 4584.258
5     Ideal 3457.542

aggregate(price ~  cut , diamonds, sum)
        cut    price
1      Fair  7017600
2      Good 19275009
3 Very Good 48107623
4   Premium 63221498
5     Ideal 74513487

#summaryでざくっと集計
aggregate(price ~  cut , diamonds, summary)

        cut price.Min. price.1st Qu. price.Median price.Mean price.3rd Qu. price.Max.
1      Fair    337.000      2050.250     3282.000   4358.758      5205.500  18574.000
2      Good    327.000      1145.000     3050.500   3928.864      5028.000  18788.000
3 Very Good    336.000       912.000     2648.000   3981.760      5372.750  18818.000
4   Premium    326.000      1046.000     3185.000   4584.258      6296.000  18823.000
5     Ideal    326.000       878.000     1810.000   3457.542      4678.500  18806.000

##集計軸の追加

集計軸の追加(グルーピング変数追加)をしてグルーピングする場合は、+記号で変数名を指定

R
aggregate(price ~  cut + color, diamonds, summary)
         cut color price.Min. price.1st Qu. price.Median price.Mean price.3rd Qu. price.Max.
1       Fair     D    536.000      2204.500     3730.000   4291.061      4797.000  16386.000
2       Good     D    361.000       957.250     2728.500   3405.382      4581.000  18468.000
3  Very Good     D    357.000       850.000     2310.000   3470.467      4633.000  18542.000
4    Premium     D    367.000       958.000     2009.000   3631.293      4915.000  18575.000
5      Ideal     D    367.000       854.250     1576.000   2629.095      3102.000  18693.000
6       Fair     E    337.000      1589.500     2956.000   3682.312      4518.250  15584.000
7       Good     E    327.000       969.000     2420.000   3423.644      4535.000  18236.000
8  Very Good     E    352.000       755.000     1989.500   3214.652      4355.000  18731.000
9    Premium     E    326.000       964.000     1928.000   3538.914      4628.000  18477.000
10     Ideal     E    326.000       872.000     1437.000   2597.550      3013.500  18729.000

```R:R

aggregate(price ~  cut + color + depth, diamonds, mean)
          cut color depth     price
1        Fair     G  43.0  3634.000
2       Ideal     J  43.0  4778.000
3        Fair     G  44.0  4032.000
4        Fair     I  50.8  6727.000
5        Fair     E  51.0   945.000

##2つの変数を集約する場合は、cbind()などで結合する

R
 aggregate( cbind(price,carat) ~ cut, diamonds, mean)
        cut    price     carat
1      Fair 4358.758 1.0461366
2      Good 3928.864 0.8491847
3 Very Good 3981.760 0.8063814
4   Premium 4584.258 0.8919549
5     Ideal 3457.542 0.7028370
2
3
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
2
3

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?