More than 3 years have passed since last update.

2020年版　Rによるデータ分析メモ

Last updated at 2020-11-01Posted at 2020-03-18

はじめに

Rでデータ分析をするときによく使うライブラリ・関数などをまとめていきます。
ほとんど自分用のメモですが、この記事を読んでくださった方のお役に立てたら幸いです。
2020年版とか書いておいてイケてないんですけど！みたいな箇所がありましたら教えていただけると嬉しいです。

ライブラリ読み込みなど

スクリプト実行時に他のスクリプトの実行結果が残っていたりすると思わぬ誤動作を招く可能性があるため、スクリプトの先頭で環境の初期化rm(list=ls())とガベージコレクションgc()を行います。
tidyverseにおおよそ必要なライブラリは揃っているのであまりたくさん読み込む必要はないですね。

ライブラリ読み込み

rm(list=ls())
gc();  gc();

library(tidyverse)
library(knitr)
library(readxl) #input
library(vroom) #input
library(skimr) #display
library(kableExtra) #display
library(ggthemes) #visualisation
library(patchwork) #visualisation

ファイル入出力

Rを使って読み込むのは主にcsv, tsv, Excelなどだと思います。
Oracle, MySQLなどRDBにつなぐこともあるかと思いますが、後日追記とします。

readr, readxl, utils

基本的に入出力ともにエンコーディングの指定はしたほうが無難です。
readr::write_csvでは文字コードの指定ができないため、
私はutil::write.csvを使用しています。
Excelの入出力にどのパッケージを使うかはは悩ましいところですが、単純にExcelファイルを読み込んでR上で操作するだけであればreadxlパッケージのread_excelを使うのがシンプルでしょう。
Excelファイルの読み書きをより細かく行いたい場合はRでExcelファイルの読み込み・編集・出力 (XLConnect)をご参照ください。

# Excel読み込み
read_excel("filename.xlsx",sheet="sheetname", col_names = TRUE, col_types = NULL, na = "", skip = 0)) 
# CSV読み込み（エンコーディング指定）
read_csv(filename,locale = locale(encoding = "CP932"),skip=6,col_names = TRUE)) 
# TSV（テキストファイル）読み込み（エンコーディング指定）
read_tsv(filename,locale = locale(encoding = "CP932"),skip=6,col_names = TRUE) 

# CSV書き出し（エンコーディング指定）
utils::write.csv(file, outfile, fileEncoding = "CP932",row.names=F,na="")

vroom

2019/5にリリースされたvroomというパッケージがあり、最近見かける頻度が多くなった気がします。github/r-lib/vroom
データの読み込みが早い、読み込み時に列名を指定できるなどの特徴があります。
データはtibble形式で読み込まれます。

vroom

vroom(filename)
vroom(filename, delim = ",") # 区切り文字を指定
vroom(filename, col_select = c(col1, col2, col3)) # 読み込む列名を指定

データの確認

読み込んだデータはとりあえずざっと確認したいですが、そんなときによく使うライブラリを紹介します。

head

データフレームの先頭数行を確認することができます。
データが正しく読み込まれているか確認するために最もよく使われる方法ではないでしょうか。

head

head(dat, 5) # 先頭5行を表示
> # A tibble: 5 x 5
>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
>          <dbl>       <dbl>        <dbl>       <dbl> <chr>  
> 1          5.1         3.5          1.4         0.2 setosa 
> 2          4.9         3            1.4         0.2 setosa 
> 3          4.7         3.2          1.3         0.2 setosa 
> 4          4.6         3.1          1.5         0.2 setosa 
> 5          5           3.6          1.4         0.2 setosa

sample_n

sample_n()を使うと、ランダムサンプリングをすることができます。
set.seed()してから実行すると変数テーブルが固定となるため、実行結果が再現できるようになります。

sample_n

set.seed(1234)
sample_n(dat, 5)
> # A tibble: 5 x 5
>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species   
>          <dbl>       <dbl>        <dbl>       <dbl> <chr>     
> 1          5.2         3.5          1.5         0.2 setosa    
> 2          5.7         2.6          3.5         1   versicolor
> 3          6.3         3.3          6           2.5 virginica 
> 4          6.5         3.2          5.1         2   virginica 
> 5          6.3         3.4          5.6         2.4 virginica

glimpse

ほぼhead()と同じ感じですが、行数・列数と各列の先頭数行を確認できます。
head()だと列数が多いときに見にくくたってしまいますが、glimpse()をすると列名が縦に表示するので見やすいです。

glimpse

glimpse(dat)
> Rows: 150
> Columns: 5
> $ Sepal.Length <dbl> 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9, 5.4, ...
> $ Sepal.Width  <dbl> 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1, 3.7, ...
> $ Petal.Length <dbl> 1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5, 1.4, 1.5, 1.5, ...
> $ Petal.Width  <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3, 0.2, 0.2, 0.1, 0.2, ...
> $ Species      <chr> "setosa", "setosa", "setosa", "setosa", "setosa", "seto...

summary

定番ですが、最大値・最小値・平均値・中央値などの代表値を確認できます。
明らかな外れ値がないか等を確認するのに使えます。

summary

summary(dat)
>  Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
> Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
> 1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
> Median :5.800   Median :3.000   Median :4.350   Median :1.300  
> Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
> 3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
> Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
>   Species         
> Length:150        
> Class :character  
> Mode  :character

skim

skimrのskim()を使うと、データの各列の分布についてsummary()よりも詳細な情報が表示されます。
表示内容はデータ列の型によって異なり、character型であれば文字列長に関する情報やユニーク数、空白の数などが表示され、numeric型であれば平均・標準偏差やパーセンタイルが表示されます。
データが欠けている数やcomplete_rate (全データのうち、欠損値でない行の割合) などは共通して表示されます。

skim

skim(dat)
> -- Data Summary ------------------------
>                            Values
> Name                       dat   
> Number of rows             150   
> Number of columns          5     
> _______________________          
> Column type frequency:           
>   character                1     
>   numeric                  4     
> ________________________         
> Group variables            None  
>
> -- Variable type: character -----------------------------------------------------
> # A tibble: 1 x 8
>   skim_variable n_missing complete_rate   min   max empty n_unique whitespace
> * <chr>             <int>         <dbl> <int> <int> <int>    <int>      <int>
> 1 Species               0             1     6    10     0        3          0
> 
> -- Variable type: numeric -------------------------------------------------------
> # A tibble: 4 x 11
>   skim_variable n_missing complete_rate  mean    sd    p0   p25   p50   p75  p100
> * <chr>             <int>         <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
> 1 Sepal.Length          0             1  5.84 0.828   4.3   5.1  5.8    6.4   7.9
> 2 Sepal.Width           0             1  3.06 0.436   2     2.8  3      3.3   4.4
> 3 Petal.Length          0             1  3.76 1.77    1     1.6  4.35   5.1   6.9
> 4 Petal.Width           0             1  1.20 0.762   0.1   0.3  1.3    1.8   2.5
>   hist 
> * <chr>
> 1 ▆▇▇▅▂
> 2 ▁▆▇▂▁
> 3 ▇▁▆▇▂
> 4 ▇▁▇▅▃

kable

knitrのkable()を使うと表形式のデータをきれいに表示させることができます。
kable_styling()を使うとHTML形式の出力を得ることができます。
Notebook等で表を表示するときに便利です。

kable

head(dat) %>% kable() %>% kable_styling()
> <table class="table" style="margin-left: auto; margin-right: auto;">
>  <thead>
>   <tr>
>    <th style="text-align:right;"> Sepal.Length </th>
>    <th style="text-align:right;"> Sepal.Width </th>
>    <th style="text-align:right;"> Petal.Length </th>
>    <th style="text-align:right;"> Petal.Width </th>
>    <th style="text-align:left;"> Species </th>
>   </tr>
>  </thead>
> <tbody>
>   <tr>
>    <td style="text-align:right;"> 5.1 </td>
>    <td style="text-align:right;"> 3.5 </td>
>    <td style="text-align:right;"> 1.4 </td>
>    <td style="text-align:right;"> 0.2 </td>
>    <td style="text-align:left;"> setosa </td>
>   </tr>
>   <tr>
>    <td style="text-align:right;"> 4.9 </td>
>    <td style="text-align:right;"> 3.0 </td>
>    <td style="text-align:right;"> 1.4 </td>
>    <td style="text-align:right;"> 0.2 </td>
>    <td style="text-align:left;"> setosa </td>
>   </tr>

・・・

> </tbody>
> </table>

Viewerで見ると以下のような出力を確認することができます。

データ整形

データ分析において最も（モデル構築よりも）肝となるデータ整形の機能を紹介します。
ここの部分をいかに早く・正確に・読みやすく書けるかがスキルレベルの差になると思っています。

最近のRユーザから多くの支持を得ているtidyverseライブラリを使っていきます。
パイプ(%>%)を使ってスマートに書けると可読性も上がりgoodです。
サンプルデータソースはお馴染みのirisを使っていきます。

iris

iris
>     Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
> 1            5.1         3.5          1.4         0.2     setosa
> 2            4.9         3.0          1.4         0.2     setosa
> 3            4.7         3.2          1.3         0.2     setosa
> 4            4.6         3.1          1.5         0.2     setosa
> 5            5.0         3.6          1.4         0.2     setosa
> 6            5.4         3.9          1.7         0.4     setosa

行を抽出

列の値を条件にして行を絞る時にはfilter()を使います。
条件式には等号・不等号のほか、包含を表す集合演算子%in%など、trueやfalseを返してデータフレームの列数と一致するものであればなんでも使うことができます。
否定を表すには!(条件式)とします。

# 列の値を範囲指定
iris %>% filter(Sepal.Length > 5 & Petal.Width > 1)  

# 列の値の包含で指定(Species がversicolorまたはvirginica)
iris %>% filter(Species %in% c("versicolor", "virginica"))
# 下記でも同じ結果になる(Speciesがsetosa以外)
iris %>% filter(!(Species %in% c("setosa")))

# ユニークな値を抽出する。
iris %>% distinct(Species, .keep_all = TRUE) 
>   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
> 1          5.1         3.5          1.4         0.2     setosa
> 2          7.0         3.2          4.7         1.4 versicolor
> 3          6.3         3.3          6.0         2.5  virginica

# .keep_all = FALSEとするとキー項目のみになる
iris %>% distinct(Species, .keep_all = FALSE) 
>      Species
> 1     setosa
> 2 versicolor
> 3  virginica

列を抽出

列の抽出にはselectを使います。
抽出したい項目を指定するか、除外したい項目を指定することができます。

# 残す列名を指定
iris %>% select(Sepal.Length, Species) 
# 下記でも同じ結果になる
iris %>% select(-c(Sepal.Length, Species)) 
>     Sepal.Length    Species
> 1            5.1     setosa
> 2            4.9     setosa
> 3            4.7     setosa

# 捨てる列名を指定
iris %>% select(-Petal.Length,-Petal.Width) 
# 下記でも同じ結果になる
iris %>% select(-c(Petal.Length,Petal.Width)) 
>     Sepal.Width Petal.Length Petal.Width
> 1           3.5          1.4         0.2
> 2           3.0          1.4         0.2
> 3           3.2          1.3         0.2

行・列の追加・更新・結合

mutate()で列の作成・更新をすることができます。
mutate_if()を使うと条件に合う列のみ更新することができます。
列の型等を更新するのに便利です。

列の追加・更新

# 列を作成・更新する
iris %>% mutate(Sepal.WplusL = Sepal.Width + Sepal.Length)  # Sepal.WplusLという新しい列を作成
# 条件に合う列のみ更新する
iris %>% mutate_if(is.numeric, as.character) # 数値型を文字列に変換
# 列名を変更する
iris %>% rename(花弁の長さ = Petal.Length, 花弁の幅 = Petal.Width) # 変更後 = 変更前
# 2つの列をつなげて新しい列を作成する（元の列はなくなる）
iris %>% unite("Petal.Length-Width", Petal.Length, Petal.Width, sep = "-")
>    Sepal.Length Sepal.Width `Petal.Length-Width` Species
>           <dbl>       <dbl> <chr>                <chr>  
>  1          5.1         3.5 1.4-0.2              setosa 
>  2          4.9         3   1.4-0.2              setosa 
>  3          4.7         3.2 1.3-0.2              setosa 
>  4          4.6         3.1 1.5-0.2              setosa

補間

# 欠損値を補間
dat %>% replace_na(list(col1 = 0)) #0で補間

left_join()でデータフレーム同士の結合（左結合）をすることができます。

結合

# 左結合（キー名が同一）
leftdata %>% left_join(rightdata, by = "key")
# 左結合（キー名が異なる）
leftdata %>% left_join(rightdata, by = c("key_l" = "key_r"))
# 縦に結合（ユニオン）
dat1 %>% bind_rows(dat2)
# 行を一行追加

データの縦持ち⇔横持ちの変換は、覚えるまでは引数がなかなか覚えにくいですがpivot_wider()とpivot_longer()を使って簡単に実現できます。
変換する列が多いときなど、例えばcol_01~col_99のような列をすべて縦持ちに変換したい場合、すべての列名を指定するのは大変ですが、cols = starts_with("col")のように指定することで条件に合致するすべての列名を指定することができます。

変形

# 縦持ち⇒横持ちに変換
dat %>% pivot_wider(names_from = "year", values_from = "amount")
# 横持ち⇒縦持ちに変換
dat %>% pivot_longer(cols = colnames,names_to = "names_to", values_to = "values_to")
# "aaa"で始まるすべての列を指定
dat %>% pivot_longer(cols = starts_with("aaa"),names_to = "names_to", values_to = "values_to")

文字列操作

名寄せや結合のためのキー作成のために文字列操作が必要となる場面にはstringrライブラリの関数が便利です。

文字列の結合

str_c("aaa","bbb","ccc",sep = "_")
> "aaa_bbb_ccc"
str_c("aaa","bbb","ccc")
> "aaabbbccc"

文字列の切り出し

substr("abcde",2,4)
> "bcd"

# リストとして出力
str_split(c("2020-02-02", "2020-03-03", "2020-04-04"), pattern = "-")
> [[1]]
> [1] "2020" "02"   "02"  
>
> [[2]]
> [1] "2020" "03"   "03"  
>
> [[3]]
> [1] "2020" "04"   "04" 
# 行列として出力
str_split(c("2020-02-02", "2020-03-03", "2020-04-04"), pattern = "-", simplify = TRUE)
>      [,1]   [,2] [,3]
> [1,] "2020" "02" "02"
> [2,] "2020" "03" "03"
> [3,] "2020" "04" "04"

文字列のパディング

formatC(1:10,width=2,flag="0")
>  [1] "01" "02" "03" "04" "05" "06" "07" "08" "09" "10"

型変換

# string型から他の型に変換
as.Date() #Date型に変換
as.double() #double型に変換

集計・分析

データフレームを整形した後の集計や分析方法を紹介します。

集計

単純な集約であれば、group_by()で集約する軸を指定した後、summarize()で集計関数を指定します。

iris %>% group_by(Species) %>% summarize(count = n(),meanPL = mean(Petal.Length))
>   Species    count meanPL
>   <fct>      <int>  <dbl>
> 1 setosa        50   1.46
> 2 versicolor    50   4.26
> 3 virginica     50   5.55

相関分析

とりあえずざっとデータの概況を掴みたいとき、相関行列の計算は最も手軽な方法の一つですが、correlate()で一発です。
Speciesのようなカテゴリ変数があるとエラーになるのでfilter()で除外しておきましょう。

# 相関行列
iris %>%
  select(-Species) %>%
  correlate()

>   rowname      Sepal.Length Sepal.Width Petal.Length Petal.Width
>   <chr>               <dbl>       <dbl>        <dbl>       <dbl>
> 1 Sepal.Length       NA          -0.118        0.872       0.818
> 2 Sepal.Width        -0.118      NA           -0.428      -0.366
> 3 Petal.Length        0.872      -0.428       NA           0.963
> 4 Petal.Width         0.818      -0.366        0.963      NA

可視化（ggplot）

単純にグラフを表示させるだけならggplotがお勧めです。
ggplotは柔軟性が高い分、いちいちテーマ設定をするのが面倒ですが、ggthemesライブラリを使うことでお決まりのデザインから気に入ったものを選ぶことができます。
ggthemesで使えるデザインについては
ggthemesのテーマとカラーパレットがとても参考になります。
私はtheme_fivethirtyeight()がシンプルで好きです。

散布図（viridisカラーパレット）

散布図はgeom_point()です。
scale_colour_viridis_d()によって、viridisカラーパレットを使用することができます。
離散値であればscale_colour_viridis_d()、連続値であればscale_colour_viridis_c()を使用します。
optionによってカラーパレットを変更することができます。
カラーパレットの種類についてはThe viridis color palettesをご参照ください。

散布図（viridisカラーパレットを使用）

iris %>%
  ggplot(aes(Petal.Length, Petal.Width, colour = Species)) +
  geom_point() +
  theme_fivethirtyeight() +
  scale_colour_viridis_d(option = "viridis") +
  theme(axis.title = element_text()) +
  labs(x = "花弁の長さ", y = "花弁の幅", title = "あやめの花弁の長さと幅")

棒グラフ

stat_summary()で棒グラフを作成することができます。

シンプルな棒グラフ

iris %>%
  ggplot(aes(Species, Petal.Length)) +
  stat_summary(aes(fill = "1"), fun.y = "mean", geom = "bar") +
  coord_cartesian(ylim = c(1,6)) +
  theme_fivethirtyeight() +
  theme(axis.title = element_text(),legend.position = 'none') +
  labs(x = "種類", y = "花弁の長さ（平均）", title = "あやめの種類と花弁の長さ")

coord_flip()するとx軸とy軸が入れ替わります。

coord_flip

iris %>%
  ggplot(aes(Species, Petal.Length)) +
  stat_summary(aes(fill = "1"), fun.y = "mean", geom = "bar") +
  coord_cartesian(ylim = c(1,6)) +
  coord_flip() +
  theme_fivethirtyeight() +
  theme(axis.title = element_text(),legend.position = 'none') +
  labs(x = "種類", y = "花弁の長さ（平均）", title = "あやめの種類と花弁の長さ")

geom_label()でグラフにラベルを付けることができます。

ラベル付き棒グラフ

# distinctしているのでそれぞれの種類のうち一番先頭行にあるものだけ抽出されています。
iris %>%
  distinct(Species, .keep_all = TRUE) %>%
  ggplot(aes(Species, Petal.Length, fill = Species)) +
  geom_col() +
  geom_label(aes(label = Petal.Length)) +
  scale_fill_manual(values = c("#00ced1", "#ff6347","#00ced1"))+
  theme_fivethirtyeight() +
  theme(axis.title = element_text(),legend.position = 'none') +
  labs(x = "種類", y = "花弁の長さ", title = "あやめの種類と花弁の長さ")

theme(axis.text.x = element_text(angle = 90, hjust = 1))とすることでx軸のラベルを縦書きにすることができます。

x軸のメモリを縦書きに

# axis.text.x = element_text(angle = 90, hjust = 1)でx軸の目盛りが縦書きになる
iris %>%
  ggplot(aes(Species, Petal.Length)) +
  stat_summary(aes(fill = "1"), fun.y = "mean", geom = "bar") +
  coord_cartesian(ylim = c(1,6)) +
  theme_fivethirtyeight() +
  theme(axis.title = element_text(),axis.text.x = element_text(angle = 90, hjust = 1),legend.position = 'none') +
  labs(x = "種類", y = "花弁の長さ（平均）", title = "あやめの種類と花弁の長さ")

折れ線グラフ

# 折れ線グラフ
EuStockMarkets %>%
  data.frame(time = time(.)) %>%
  filter(time > 1998.5) %>% 
  pivot_longer(col = -time, names_to = "Stock", values_to = "Indices") %>%
  ggplot(aes(time, Indices, colour = Stock)) +
  geom_line() + 
  geom_point() +
  theme_fivethirtyeight() +
  theme(axis.title = element_text()) +
  labs(x = "年", y = "株価指数", title = "主要な欧州株価指数")

ggrepelのgeom_label_repel()を使うと凡例をグラフの中に記載するようなグラフを作ることができますが、少しコツが要ります。
何も考えずにgeom_label_repel()を入れると、すべてのプロットに凡例がついてしまって非常に見にくくなってしまいます。
そこで、各ラベルごとに1つだけ文字を表示させるためにmutate(label = if_else(time == median(time), Stock, NA_character_))をしています。
下記のコードでは、timeの中央値にラベルを与え、他のtimeはNA_character_を与えることでラベルが1つずつしか表示されないようにしています。
元々の凡例を消すtheme(legend.position = "none")もお忘れなく。

ggrepel

EuStockMarkets %>%
  data.frame(time = time(.)) %>%
  filter(time > 1998.5) %>% 
  pivot_longer(col = -time, names_to = "Stock", values_to = "Indices") %>%
  mutate(label = if_else(time == median(time), Stock, NA_character_)) %>% #ggrepel用のラベルを作成
  ggplot(aes(time, Indices, colour = Stock, group = Stock)) +
  geom_line() + 
  geom_point() +
  geom_label_repel(aes(label = label), nudge_x = 0, na.rm = TRUE, alpha = 0.8) + # 凡例を図の中に記載
  theme_fivethirtyeight() +
  theme(axis.title = element_text(), legend.position = "none") +
  labs(x = "年", y = "株価指数", title = "主要な欧州株価指数")

geom_density()で密度関数を表示させることができます。
alpha =は不透明度を表し、値が小さいほどグラフが透過します。

分布

density

iris %>%
  ggplot(aes(Petal.Length, fill = Species)) +
  geom_density(alpha = 0.5) +
  theme_hc() +
  theme(legend.position = "top") +
  labs(x = "Petal.Length", y = "Density", fill = "", title = "Length of Petal")

facet_wrap()を使うと、複数の図を良い感じのレイアウトで並べて表示してくれます。
下記のコードでは、Stockごとに図を分けています。

複数のグラフをまとめて表示する

複数のグラフを一つの図に並べる

# facet_wrap
label <- as_labeller(c(`CAC` = "フランス",
                        `DAX` = "アイビス",
                        `FTSE` = "イギリス",
                        `SMI` = "スイス"))
EuStockMarkets %>%
  data.frame(time = time(.)) %>%
  filter(time > 1998.5) %>% 
  pivot_longer(col = -time, names_to = "Stock", values_to = "Indices") %>%
  ggplot(aes(time, Indices, colour = Stock)) +
  geom_line() + 
  geom_point() +
  theme_fivethirtyeight() +
  facet_wrap(~ Stock,labeller = label) +
  theme(axis.title = element_text()) +
  labs(x = "年", y = "株価指数", title = "主要な欧州株価指数")

facet_grid()を使うと行・列それぞれに軸を指定して図を配置することができます。
下記のコードでは月とStockごとに図を分割しています。

行と列を指定して複数の図を並べる

EuStockMarkets %>%
  data.frame(time = as.Date((as.double(time(.)) - 1991)*365, origin = "1991-01-01")) %>%
  # filter(time > as.Date("1998-01-01")) %>% 
  pivot_longer(col = -time, names_to = "Stock", values_to = "Indices") %>%
  mutate(Year = months(time)) %>%
  ggplot(aes(time, Indices, colour = Stock)) +
  geom_line() + 
  geom_point() +
  theme_fivethirtyeight() +
  facet_grid(Stock ~ Year) +
  theme(axis.title = element_text(), axis.text.x=element_blank(), axis.ticks.x=element_blank(), legend.position = "none") +
  labs(x = "年", y = "株価指数", title = "主要な欧州株価指数")

patchworkを使うと、全く異なる複数の図を自在に並べることができます。
各図はこれまでと同様に作成し、最後に+ (横並び)や/ (縦並び)を指定するだけなので直感的です。
p1 + (p2 / p3)のように入れ子にすることもできます。

patchworkを用いて複数の図を並べる

# patchwork
p1 <- EuStockMarkets %>%
  data.frame(time = time(.)) %>%
  filter(time > 1998.5) %>% 
  pivot_longer(col = -time, names_to = "Stock", values_to = "Indices") %>%
  ggplot(aes(time, Indices, colour = Stock)) +
  geom_line() + 
  geom_point() +
  theme_fivethirtyeight() +
  theme(axis.title = element_text()) +
  labs(x = "年", y = "株価指数", title = "主要な欧州株価指数")

p2 <- iris %>%
  ggplot(aes(Species, Petal.Length)) +
  stat_summary(aes(fill = "1"), fun.y = "mean", geom = "bar") +
  coord_cartesian(ylim = c(1,6)) +
  theme_fivethirtyeight() +
  theme(axis.title = element_text(),legend.position = 'none') +
  labs(x = "種類", y = "花弁の長さ（平均）", title = "あやめの種類と花弁の長さ")
# 横に並べる
p1 + p2

# 縦に並べる
p1/p2

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

2020年版 Rによるデータ分析メモ

はじめに

ライブラリ読み込みなど

ファイル入出力

readr, readxl, utils

vroom

データの確認

head

sample_n

glimpse

summary

skim

kable

データ整形

行を抽出

列を抽出

行・列の追加・更新・結合

文字列操作

集計・分析

集計

相関分析

可視化（ggplot）

散布図（viridisカラーパレット）

棒グラフ

折れ線グラフ

分布

複数のグラフをまとめて表示する

2020年版　Rによるデータ分析メモ