More than 1 year has passed since last update.

[R+dplyr] データフレームの1行1行に関数を適用し、その結果を各行に展開する

Posted at 2024-01-28

普段オブジェクト指向言語で業務システムばかり書いていると、なかなかR言語やdplyr流のやり方というのが馴染めません。中々良いやり方を見つけられず苦労したことの一つに、下のように、データフレームの各行に任意の関数を適用したいというものがあります。

do() はもう古い、 purrr::map を使う、nest/unnestを使う…など、様々な情報が混在しますが、自分として最もしっくりきた方法は下のものでした。

require(tidyverse)

process_row <- function(df) {
  # df =
  #   tibble [1 × 2] (S3: tbl_df/tbl/data.frame)
  #   $ x: int 1
  #   $ y: int 6

  return(df %>% summarize(sum_x_y = sum(c(x, y)), mean_x_y = mean(c(x, y))))

  # (returned value) =
  #   tibble [1 × 2] (S3: tbl_df/tbl/data.frame)
  #     $ sum_x_y : int 7
  #     $ mean_x_y: num 3.5
}

df_foo <- tibble(x = 1:5, y = 6:10)

# df_foo =
#   tibble [5 × 2] (S3: tbl_df/tbl/data.frame)
#   $ x: int [1:5] 1 2 3 4 5
#   $ y: int [1:5] 6 7 8 9 10

result <- df_foo %>%
  # 1行1グループにする
  rowwise() %>%
  # 1行分のデータフレーム(`pick(everything())`)をprocess_row()に渡し、その結果をdataに格納する
  transmute(data = process_row(pick(everything()))) %>%
  # ネストされたデータフレームであるdataを展開する
  unnest(cols = data)

# result =
#   # A tibble: 5 × 2
#   sum_x_y mean_x_y
#   <int>    <dbl>
#   1       7      3.5
#   2       9      4.5
#   3      11      5.5
#   4      13      6.5
#   5      15      7.5

rowwise() と transumute() (元の列を残したいならmutate()) を使って、行そのものをネストしたデータフレームとして扱い、最後に unnest() でネストを解除しています。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up