More than 5 years have passed since last update.

caret: Model Training and Tuning - Models By Similarity

Posted at 2016-04-17

訳注：
http://topepo.github.io/caret/similarity.html の和訳です。

このページでは caret の train 関数で使用できるすべてのモデルをネットワーク図で示しています。この図がどのように出来ているか（そして、 networkD3 パッケージをどう使っているか）は the Revolutions blog を参照のこと。要約すると、タグ（例えば「バギング」、「L1正則化」等）の集合により、それぞれのモデルに注釈付をしています。この情報を用いて、互いに類似したものをクラスタリングしています。
緑色の丸は回帰のモデルであり、青い丸は分類のみ、オレンジの丸は回帰と分類の両方のモデルです。モデルの丸の上にマウスを置くと、モデル名と caret でのモデルコードが表示されます。名前のすべてを見たい時にはノードを左に移動させます（訳注？）。13モデルはこのグラフには表示していません。
（図略）

このグラフを作成するためのデータはここにあります。モデルのダイバースセットを見つけるために、maximum dissimilarity サンプリングを用います。回帰データに放射基底関数を用いた SVM モデルを使いたいと仮定しましょう。タグに基づけば、ダイバースセットから4つのモデルを探すと何になるか？

tag <- read.csv("tag_data.csv", row.names = 1)
tag <- as.matrix(tag)

## Select only models for regression
regModels <- tag[tag[,"Regression"] == 1,]

all <- 1:nrow(regModels)
## Seed the analysis with the SVM model
start <- grep("(svmRadial)", rownames(regModels), fixed = TRUE)
pool <- all[all != start]

## Select 4 model models by maximizing the Jaccard
## dissimilarity between sets of models
nextMods <- maxDissim(regModels[start,,drop = FALSE],
                      regModels[pool, ],
                      method = "Jaccard",
                      n = 4)

rownames(regModels)[c(start, nextMods)]

[1] "Support Vector Machines with Radial Basis Function Kernel (svmRadial)"
[2] "Cubist (cubist)"                                                      
[3] "Bayesian Regularized Neural Networks (brnn)"                          
[4] "Generalized Linear Model with Stepwise Feature Selection (glmStepAIC)"
[5] "Ridge Regression with Variable Selection (foba)"

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

caret: Model Training and Tuning - Models By Similarity

訳注： http://topepo.github.io/caret/similarity.html の和訳です。

訳注：
http://topepo.github.io/caret/similarity.html の和訳です。