0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 3 years have passed since last update.

Pandas100本ノックをTidyverseで (42まで)

Last updated at Posted at 2020-09-26

head --> head

library(tidyverse)
df <- read_csv('input//titanic3.csv')
df %>% head()

tail --> tail

df %>% tail()

shape --> dim

df %>% dim()

read_csv --> read_csv

df2 <- read_csv('input//data1.csv')
df2 %>% head()

sort_values --> arrange

df %>% arrange(fare)
df %>% arrange(desc(fare))

copy --> Rには参照はない。

df_copy <- df
df_copy <- tibble() #df_copy can be deleted.
df #original df is still kept.

dtypes, dtype 知らない。

astype --> as.character, as.numeric, as.integerなど

df %>% mutate(pclass = as.character(pclass))

len --> nrow

df %>% nrow()

info --> 知らない。

unique --> unique

df %>% pull(sex) %>% unique()
df %>% pull(cabin) %>% unique()

columns --> colnames

df %>% colnames()

index --> indexによるアクセスは良くない。

[] --> Rでも使えるけれど、selectとpullを使い分けたほうがいい。

df[,'name'] # tibble
df$name # array
df %>% select(name) # tibble
df %>% pull(name) # array
df[ , c('name', 'sex')] # tibble
df %>% select(name, sex) # tibble

[] --> []

df[1:4, ]
df[4:10, ]
df %>% slice(4:10)

loc --> 必要ではない。print(n = )で表示行数を調整。

df[ , ] %>% print(n = 20)

nに対して nrow(.)を入れれば一応全行表示できるけれど。(説明省略)

df[ , c('name', 'ticket')] %>% print(n = nrow(.))
df %>% select(name, ticket) %>% print(n = nrow(.))

: --> :

df %>% select(name:ticket)

iloc --> 必要ではない。

write_csv --> write_csv

df_copy <- df %>% select(name, age, sex)
df_copy %>% write_csv('output//sample.csv')

[] --> []

不等号の時はNAが入ってしまうのでお勧めできない。filter関数を使おう。

df[df$age >= 30, ] # not good, NA is not removed
df[(df$age >= 30) & (F == is.na(df$age)), ] # complicated
df %>% filter(age >= 30) # easier

等号の時はNAは入らない。

df[df$sex == 'female', ]
df %>% filter(sex == 'female')

query --> filter

Rのfilter関数の場合、&で接続しなくてもいい。,でつなげて書ける。

df %>% filter(sex == "female" & age >= 40)
df %>% filter(sex == "female", age >= 40)

str.contains --> str_detect

df %>% filter(T == str_detect(name, "Mrs"))

select_dtypes --> 知らない

nunique --> 一つでこなす関数は思いつかない。

value_counts() --> group_byからのcount

df %>% group_by(embarked) %>% count()

[] --> []

indexを使ってアクセスするのは良くない。

df[3,'age'] <- 40

以下でも一応できるけれども、

df[df$sex == 'male', 'sex'] <- "0"
df[df$sex == 'female', 'sex'] <- "1"

case_when関数を使って高速化を狙いたい。

df %>% mutate(sex = case_when(
    sex == 'male' ~ '0',
    sex == 'female' ~ '1'
))

mutate関数で直感的に書きたい。

df[,'fare'] <- df$fare + 100
df %>% mutate(fare = fare + 100)

round --> round

df[, 'fare'] <- df$fare %>% round()
df %>% mutate(fare = fare %>% round())

新規列

df[,'test'] <- 1
df %>% mutate(test = 1)

str.cat --> str_c

df[ ,'test'] <- df$cabin %>% str_c(df$embarked, sep='_')
df %>% mutate(test = cabin %>% str_c(embarked, sep='_'))

列のdrop --> selectでマイナスを使う

行のdropの関数は知らない。dropを使うよりはfilterをかけて部分行を取り出すほうがよい。

df %>% select(-body)
0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?