More than 3 years have passed since last update.

Attention関連。Additive attentionとDot-product (multiplicative) attentionの比較。

Last updated at 2021-04-11Posted at 2021-04-11

概要

Additive attention　と、Dot-product (multiplicative) attention　の比較の仕方がわからなかったので、記事にする。

以下のAttention is all you needの論文を参考にする。

Attention Is All You Need

Vaswani, Ashish, et al. "Attention is all you need." arXiv preprint arXiv:1706.03762 (2017).

比較

論文引用

The two most commonly used attention functions are additive attention [2], and dot-product (multiplicative) attention.

まず、最も、一般的に使われるものとして、

additive attention
dot-product (multiplicative) attention

の2つを上げている。

次に比較としては、

この2つは理論的な複雑さは似ていますが、dot-product attentionは、高度に最適化された行列乗算コードを用いて実装できるため、実際にはより高速で、実際にはより高速です。

と述べている。
dot-product (multiplicative) attentionは、additive attentionと比べて、

2つは理論的な複雑さは似ています
高度に最適化された行列乗算コードを用いて実装できる（ため↓)
実際にはより高速
より省スペース

と整理されている。
とりあえず、後続の、『dkの小さな値では2つのメカニズムは同様の性能を示しますが・・・』の観点は、ここでは、省略します。
⇒　ポイントとしては、逆に、上記以外の差に言及していないことに注目したいと思います。

詳しくない人（ワタシとか）の立場から、少し、わかりづらいことは

additive attentionのフィードフォワードネットワークの部分のパラメータをどう決めるの？、いつ学習するの、的なことが頭に浮かぶが、
そういう話ではないのでしょう、
何も、そういう話を書かれていないので。。。

参考になりそうな論文

以下の論文

Effective Approaches to Attention-based Neural Machine Translation

Luong, Minh-Thang, Hieu Pham, and Christopher D. Manning. "Effective approaches to attention-based neural machine translation." arXiv preprint arXiv:1508.04025 (2015).

では、このあたりのものが、
dot, general, concatという形で羅列されています。

⇒　何と何をどういう観点で比較するのかが、なんでもありすぎで、難しい世界だなーーと思いました。

まとめ

特にありません。
この記事は、高々、違和感？が共有できるかな程度の期待で書いた記事です。

Attention関連。Additive attentionとDot-product (multiplicative) attentionの比較。

概要

比較

詳しくない人（ワタシとか）の立場から、少し、わかりづらいことは

参考になりそうな論文

まとめ

関連記事（本人）