tidyr::spreadでkeyの変数で自動的にソートされないようにする #R

タイトルの通りで，自分用のメモ。結論から言うとspreadでkeyに指定する変数をfactor型にしとけば順序は指定できる。

library(dplyr)
#>  
#>  Attaching package: 'dplyr'
#>  The following objects are masked from 'package:stats':
#>  
#>      filter, lag
#>  The following objects are masked from 'package:base':
#>  
#>      intersect, setdiff, setequal, union
library(tidyr)

まずはデータセット準備。

df <- tibble::rownames_to_column(iris, var = "id") %>% 
  mutate(id=as.integer(id)) %>% 
  gather(key, value, contains("l."))

str(df)
#>  'data.frame':   600 obs. of  4 variables:
#>   $ id     : int  1 2 3 4 5 6 7 8 9 10 ...
#>   $ Species: Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
#>   $ key    : chr  "Sepal.Length" "Sepal.Length" "Sepal.Length" "Sepal.Length" ...
#>   $ value  : num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...

そのままspreadしてしまうと，変数の順番が自動で出てきてしまう:

df %>% 
  spread(key, value) %>%
  head()
#>    id Species Petal.Length Petal.Width Sepal.Length Sepal.Width
#>  1  1  setosa          1.4         0.2          5.1         3.5
#>  2  2  setosa          1.4         0.2          4.9         3.0
#>  3  3  setosa          1.3         0.2          4.7         3.2
#>  4  4  setosa          1.5         0.2          4.6         3.1
#>  5  5  setosa          1.4         0.2          5.0         3.6
#>  6  6  setosa          1.7         0.4          5.4         3.9

そこでkey変数を一旦factor型(orderは出てきた順)に設定してspreadするとうまくいく:

df %>% 
  mutate(key = forcats::fct_inorder(key)) %>% 
  spread(key, value) %>% 
  head()
#>    id Species Sepal.Length Sepal.Width Petal.Length Petal.Width
#>  1  1  setosa          5.1         3.5          1.4         0.2
#>  2  2  setosa          4.9         3.0          1.4         0.2
#>  3  3  setosa          4.7         3.2          1.3         0.2
#>  4  4  setosa          4.6         3.1          1.5         0.2
#>  5  5  setosa          5.0         3.6          1.4         0.2
#>  6  6  setosa          5.4         3.9          1.7         0.4

ちなみにgather()にはfactor_keyという引数があって，これをTRUEにするとkeyが自動的にfactor型にしてくれる:

df2 <- tibble::rownames_to_column(iris, var = "id") %>% 
  mutate(id=as.integer(id)) %>% 
  gather(key, value, contains("l."), factor_key = TRUE)

str(df2)
#>  'data.frame':   600 obs. of  4 variables:
#>   $ id     : int  1 2 3 4 5 6 7 8 9 10 ...
#>   $ Species: Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
#>   $ key    : Factor w/ 4 levels "Sepal.Length",..: 1 1 1 1 1 1 1 1 1 1 ...
#>   $ value  : num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...

なので，これをそのままspreadすればOK:

df2 %>% 
  spread(key, value) %>% 
  head()
#>    id Species Sepal.Length Sepal.Width Petal.Length Petal.Width
#>  1  1  setosa          5.1         3.5          1.4         0.2
#>  2  2  setosa          4.9         3.0          1.4         0.2
#>  3  3  setosa          4.7         3.2          1.3         0.2
#>  4  4  setosa          4.6         3.1          1.5         0.2
#>  5  5  setosa          5.0         3.6          1.4         0.2
#>  6  6  setosa          5.4         3.9          1.7         0.4

Enjoy!