1
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 3 years have passed since last update.

mlr3で「Rによる機械学習」:第3章「遅延学習 -- 最近傍法を使った分類」

Last updated at Posted at 2021-07-09

はじめに

Lantz (2017)1 ではRによる機械学習のパッケージとしてRWekaというパッケージを使っていますが,最近mlr3パッケージの勉強を始めたので,こちらでコードを書いてみようと思いました.

準備

install.packages("mlr3verse", "tidyverse", "gmodels")
library(mlr3verse)
library(tidyverse)
library(gmodels)

3-2 実例 -- k近傍法を使った乳がん検診

データの準備をします.
StrawBerryMoonさんの指摘でouterの使い方を修正しました.

url <- "https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/wdbc.data"
col_types <- paste("if", paste(rep("d", 30), collapse = ""), sep = "")
wbcd <- read_csv(url, 
                 col_names = FALSE,
                 col_types = col_types)
specs <- c("radius", "texture",
           "perimeter", "area",
           "smoothness", "compactness",
           "concavity", "cancave_points",
           "symmetry", "fractal_dimension")
postfix <- c("mean", "SE", "worst")
colnames(wbcd) <-  c("id", "diagnosis", c(outer(specs, postfix, paste, sep="_")))
colnames(wbcd)        

idを削除します.

wbcd <- wbcd[-1]

diagnosisを確認し,ラベルを書き換えます.

table(wbcd$diagnosis)
wbcd <- wbcd %>%
  mutate(
    diagnosis = factor(wbcd$diagnosis,
                       levels = c("B", "M"),
                       labels = c("Benign", "Malignant")
                       )
    )
round(prop.table(table(wbcd$diagnosis)) * 100, digits = 1)

特徴量の一部を確認します.

summary(wbcd[c("radius_mean", "area_mean", "smoothness_mean")])

数値データの正規化をします.

normalize <- function(x) {
  return((x - min(x)) / (max(x) - min(x)))
}

wbcd_n <- wbcd %>% 
  select(-diagnosis) %>%
  summarise(across(everything(), normalize))
summary(wbcd_n$area_mean)

k近傍法で分類します.

wbcd_backend_n <- bind_cols(wbcd_n, diagnosis = wbcd$diagnosis)
task <- TaskClassif$new("wbcd_task_n",
                        backend = wbcd_backend_n,
                        target = "diagnosis")
learner <- lrn("classif.kknn")
learner$param_set$values <- list(k = 21)
learner$train(task, row_ids = 1:469)
wbcd_pred_n <- learner$predict(task, row_ids = 470:569)

予測結果のクロス集計表を作成します.本の結果と異なりますが気にしないことにします.

CrossTable(x = wbcd_backend_n$diagnosis[470:569],
           y = wbcd_pred_n$response, 
           prop.chisq = FALSE)

次にzスコアで標準化します.

wbcd_z <- as_tibble(scale(wbcd[-1]))
summary(wbcd_z$area_mean)

先ほどと同様にk近傍法で分類します.

wbcd_backend_z <- bind_cols(wbcd_z, diagnosis = wbcd$diagnosis)
task <- TaskClassif$new("wbcd_task_z",
                        backend = wbcd_backend_z,
                        target = "diagnosis")
learner$train(task, row_ids = 1:469)
wbcd_pred_z <- learner$predict(task, row_ids = 470:569)

予測結果のクロス集計表を作成します.こちらも本の結果とは異なり,変化はありませんでした.

CrossTable(x = wbcd_backend_z$diagnosis[470:569],
           y = wbcd_pred_z$response,
           prop.chisq = FALSE)
  1. Lantz, Brett (2017) 『Rによる機械学習』,翔泳社,長尾高弘訳

1
1
4

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?