LoginSignup
2
2

More than 5 years have passed since last update.

data.frameとdata.tableでは、summaryの表示が異なる

Posted at

はじめに

data.frame と data.tableでsummary()の表示結果が異なる!と質問がきたので、まとめてみた。

結論

data.frameでは文字列の列はデフォルトだとfactor型。data.tableでは文字列はcharacter型になる。

character型だと、summaryで統計量がでないようです。

factor型とcharacter型で、summaryの結果が異なるのですね。

データを作る

> aaa <- seq(1,100)
> bbb <- seq(1,200,by=2)
> ccc <- c(rep("c",50), rep("d",50))

data.frameの場合

> df <- data.frame(a1 = aaa, b1 = bbb, c1 = ccc)
> sapply(df, class)
       a1        b1        c1 
"integer" "numeric"  "factor" 
> summary(df)
       a1               b1        c1    
 Min.   :  1.00   Min.   :  1.0   c:50  
 1st Qu.: 25.75   1st Qu.: 50.5   d:50  
 Median : 50.50   Median :100.0         
 Mean   : 50.50   Mean   :100.0         
 3rd Qu.: 75.25   3rd Qu.:149.5         
 Max.   :100.00   Max.   :199.0         

data.tableの場合

> dt <- data.table(a1 = aaa, b1 = bbb, c1 = ccc)
> sapply(dt, class)
         a1          b1          c1 
  "integer"   "numeric" "character" 
> summary(dt)
       a1               b1             c1           
 Min.   :  1.00   Min.   :  1.0   Length:100        
 1st Qu.: 25.75   1st Qu.: 50.5   Class :character  
 Median : 50.50   Median :100.0   Mode  :character  
 Mean   : 50.50   Mean   :100.0                     
 3rd Qu.: 75.25   3rd Qu.:149.5                     
 Max.   :100.00   Max.   :199.0
2
2
3

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
2
2