LoginSignup
3

More than 5 years have passed since last update.

tidyr::spreadでkeyの変数で自動的にソートされないようにする

Last updated at Posted at 2017-04-06

タイトルの通りで,自分用のメモ。結論から言うとspreadでkeyに指定する変数をfactor型にしとけば順序は指定できる

library(dplyr)
#>  
#>  Attaching package: 'dplyr'
#>  The following objects are masked from 'package:stats':
#>  
#>      filter, lag
#>  The following objects are masked from 'package:base':
#>  
#>      intersect, setdiff, setequal, union
library(tidyr)

まずはデータセット準備。

df <- tibble::rownames_to_column(iris, var = "id") %>% 
  mutate(id=as.integer(id)) %>% 
  gather(key, value, contains("l."))

str(df)
#>  'data.frame':   600 obs. of  4 variables:
#>   $ id     : int  1 2 3 4 5 6 7 8 9 10 ...
#>   $ Species: Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
#>   $ key    : chr  "Sepal.Length" "Sepal.Length" "Sepal.Length" "Sepal.Length" ...
#>   $ value  : num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...

そのままspreadしてしまうと,変数の順番が自動で出てきてしまう:

df %>% 
  spread(key, value) %>%
  head()
#>    id Species Petal.Length Petal.Width Sepal.Length Sepal.Width
#>  1  1  setosa          1.4         0.2          5.1         3.5
#>  2  2  setosa          1.4         0.2          4.9         3.0
#>  3  3  setosa          1.3         0.2          4.7         3.2
#>  4  4  setosa          1.5         0.2          4.6         3.1
#>  5  5  setosa          1.4         0.2          5.0         3.6
#>  6  6  setosa          1.7         0.4          5.4         3.9

そこでkey変数を一旦factor型(orderは出てきた順)に設定してspreadするとうまくいく:

df %>% 
  mutate(key = forcats::fct_inorder(key)) %>% 
  spread(key, value) %>% 
  head()
#>    id Species Sepal.Length Sepal.Width Petal.Length Petal.Width
#>  1  1  setosa          5.1         3.5          1.4         0.2
#>  2  2  setosa          4.9         3.0          1.4         0.2
#>  3  3  setosa          4.7         3.2          1.3         0.2
#>  4  4  setosa          4.6         3.1          1.5         0.2
#>  5  5  setosa          5.0         3.6          1.4         0.2
#>  6  6  setosa          5.4         3.9          1.7         0.4

ちなみにgather()にはfactor_keyという引数があって,これをTRUEにするとkeyが自動的にfactor型にしてくれる:

df2 <- tibble::rownames_to_column(iris, var = "id") %>% 
  mutate(id=as.integer(id)) %>% 
  gather(key, value, contains("l."), factor_key = TRUE)

str(df2)
#>  'data.frame':   600 obs. of  4 variables:
#>   $ id     : int  1 2 3 4 5 6 7 8 9 10 ...
#>   $ Species: Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
#>   $ key    : Factor w/ 4 levels "Sepal.Length",..: 1 1 1 1 1 1 1 1 1 1 ...
#>   $ value  : num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...

なので,これをそのままspreadすればOK:

df2 %>% 
  spread(key, value) %>% 
  head()
#>    id Species Sepal.Length Sepal.Width Petal.Length Petal.Width
#>  1  1  setosa          5.1         3.5          1.4         0.2
#>  2  2  setosa          4.9         3.0          1.4         0.2
#>  3  3  setosa          4.7         3.2          1.3         0.2
#>  4  4  setosa          4.6         3.1          1.5         0.2
#>  5  5  setosa          5.0         3.6          1.4         0.2
#>  6  6  setosa          5.4         3.9          1.7         0.4

Enjoy!

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
3