More than 5 years have passed since last update.

データサイエンスの面接において知っておくべき２１のQ＆A（日本語訳その１「正則化とは何か？」）

Last updated at 2016-04-30Posted at 2016-03-25

Originally published in KDnuggets: 21 Must-Know Data Science Interview Questions and Answers
KDnuggetsより正式に和訳を書くことの許可を得ました。
Gregory-san thank you for the approval

KDnuggetの中で面白い記事があったので、備忘録も含めてその和訳。

21 Must-Know Data Science Interview
Questions and Answers

タイトルの通り、データサイエンスの面接において答えられないといけない質問２１種類が解答とともに紹介されています。中には尊敬するデータサイエンティストはなどと明確な正解がないものもありますが、概ね押さえておかないと行けない質問と解答がまとまっていて、いい記事でした。

元の記事がある程度の前提知識が必要なので、そこもまとめながら一つずつ和訳していきます。

1.Explain what regularization is and why it is useful.

和訳すると「正則化とは何か、そしてそれが有用な理由を説明せよ」となります。

原文

Regularization is the process of adding a tuning parameter to a model to induce smoothness in order to prevent overfitting. (see also KDnuggets posts on Overfitting)

正則化とは、オーバーフィッティングを防ぐために、モデルを滑らかにするために、モデルに対してチューニングパラメータを追加するプロセスである。

オーバーフィッティングに関しては別の投稿を参考にとなっています。シンプルすぎるモデルでは予測精度が出ないため、たくさんの変数を取り込みモデルを複雑化して精度を出せるようにしていきます。ただモデルを複雑にしすぎた際に、テストデータに対しては高い精度を出せるものの、テストデータ以外のデータに対して対応できないモデルになってしまいます、このことをオーバーフィッティングと言います。そのため、モデルは適度に複雑なモデルがよいことになります。オーバーフィッティングを防ぐとはこの適度に複雑なモデルを作ることを意味します。

原文

This is most often done by adding a constant multiple to an existing weight vector. This constant is often either the L1 (Lasso) or L2 (ridge), but can in actuality can be any norm. The model predictions should then minimize the mean of the loss function calculated on the regularized training set.

正則化を行う場合、大体にして、既存の重みベクトルの定数倍を加えて使用します。その中でもL1(Lasso)かL2(ridge)がよく使われますが、理論上は任意のノルムで可能です。モデル構築では、正則化したトレーニングセットを元に損失関数の平均値を最小にします。

線形回帰の数式を元に、Ridge回帰、Lasso回帰の違いを見ていきます。

\begin{eqnarray}
\min_{(\beta_0,\beta)\in \mathbb{R}^{p+1}}R_\lambda(\beta_0,\beta) & = & \min_{(\beta_0,\beta)\in \mathbb{R}^{p+1}} \Big[\frac{1}{2N} 
{\displaystyle \sum_{i=1}^{n}} (y_i-\beta_0-x_i^T\beta)^2+\lambda P_a (\beta)\Big],\\
P_a (\beta) & = & (1-\alpha)\frac{1}{2}||\beta||^2_{l_2}+\alpha||\beta||_{\ell_1}\\

& = & {\displaystyle \sum_{j=1}^{p}} \big[\frac{1}{2}(1-\alpha)\beta^2_j+\alpha|\beta_j|\big]
\end{eqnarray}

$\lambda P_a (\beta)$がチューニングパラメータとなります。
$\alpha$を変更することによって、Ridge回帰、Lasso回帰、Elastic Netとなります。

Ridge回帰　$\alpha=0$
Elastic Net　$0<\alpha<1$
Lasso回帰　$\alpha=1$

そして、$\lambda$はComplexity Parameterと呼ばれ、0になると通常の最小二乗法となり、無限にするとすべての変数の影響が小さくなります。
正則化していく場合には、Ridge回帰、Lasso回帰、Elastic Netのどれかを選択して、Complexity Parameterを調整していきます。

原文

Xavier Amatriain presents a good comparison of L1 and L2 regularization here, for those interested.

Xavier AmatriainさんのL1とL2の比較コメントがよいと書いてあるのでそこも紹介します。

原文

It is well known, as explained by others, that L1 regularization helps perform feature selection in sparse feature spaces, and that is a good practical reason to use L1 in some situations. However, beyond that particular reason I have never seen L1 to perform better than L2 in practice. And, to be clear, I don't think I am the only one to be in this situation. If you take a look at LIBLINEAR FAQ on this issue you will see how they have not seen a practical example where L1 beats L2 and encourage users of the library to contact them if they find one. Even in a situation where you might benefit from L1's sparsity in order to do feature selection, using L2 on the remaining variables is likely to give better results than L1 by itself.

Lasso回帰がスパースなデータに対して効果が高いという理論上の理解を示しながら、経験則としては常にRidge回帰がよかったとコメントしています。そして、このことはLasso回帰が高価を発揮するスパースなデータに対しても同じだったとコメントしています。

原文

I would say that as a rule-of-thumb, you should always go for L2 in practice.Even in the case when you have a strong reason to use L1 given the number of features, I would recommend going for Elastic Nets instead.

基本的にリッジ回帰を使い、どうしてもLasso回帰が必要な場合にはElastic Netsを使うべきと書いています。

まとめ##

正則化とはオーバーフィッティングを防ぐことができ、lasso回帰やRidge回帰などの方法があります。
個人的には、Xavier Amatriainさんの基本的にRidge回帰の精度が経験則から高くなるというのが一番Tipsとしては役にたつものだなと感じました。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up