More than 5 years have passed since last update.

データフレームである列で重複がある行だけを残す

Posted at 2017-06-01

以前r-wakalangで教えてもらったのを忘れそうだったのでメモ．

以下のようなデータフレームを準備:

df <- data.frame(
  id = 1:6,
  value = c("a", "b", "b", NA, "c", NA)
)

この時，df$valueで重複が存在する行だけを残したい場合のコードです．

df %>% 
  group_by(value) %>% 
  filter(n() > 1) %>% 
  ungroup()
# > # A tibble: 4 x 2
# >      id  value
# >   <int> <fctr>
# > 1     2      b
# > 2     3      b
# > 3     4     NA
# > 4     6     NA

また，逆に重複が存在する行を全て削除する場合は以下のような感じ:

df %>% 
  group_by(value) %>% 
  filter(n() == 1) %>% 
  ungroup()
# > # A tibble: 2 x 2
# >      id  value
# >   <int> <fctr>
# > 1     1      a
# > 2     5      c

filterの条件を書き換えればOK．

NAが不要ならNAを除去するように処理すればOK．教えていただきありがとうございました．

Enjoy!

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up