More than 3 years have passed since last update.

論文『Attention is all you need』を読み間違わないコツN選（まだ、N=2)。

Last updated at 2021-03-27Posted at 2021-03-16

はじめに

論文『Attention is all you need』
https://arxiv.org/abs/1706.03762

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. arXiv preprint arXiv:1706.03762.

は、不慣れなワタシには、読むのが難しい。
これまでにも、記事を書きちらかしているが。。。。
理解しようと努力中。

この記事では、意識できていないと、正しく論文が読めないというポイントを示し、それをコツとする。

【コツ１】 Value⇒Key⇒Queryの順に図示されている

図は、すべて、論文の引用。

以下のMulti-Head Attentionの図で、その順番が明示されている。

以下、論文を引用すると、文章でも以下のように説明されている。

3.2.3 ApplicationsofAttentioninourModel The Transformer uses multi-head attention in three different ways:
• In "encoder-decoder attention" layers, the queries come from the previous decoder layer, and the memory keys and values come from the output of the encoder. This allows every position in the decoder to attend over all positions in the input sequence. This mimics the typical encoder-decoder attention mechanisms in sequence-to-sequence models such as [38, 2, 9].

一部、翻訳

"encoder-decoder attention"レイヤーでは、queriesは前のデコーダーレイヤーから取得され、memory keysとvaluesはエンコーダーの出力から取得されます。

【コツ2】 Figure 1: The Transformer - model architecture. は、適切な解説を読む

論文のFigure 1の説明は、相当、不十分だと思う。
また、論文をなぞった解説では、論文に従って、不十分な解説のままになっている。

↓　下記は、超お薦め
(自分の関連記事でも超お薦めしている。）
機械学習におけるtransformer(by ライオンブリッジジャパン株式会社)
https://lionbridge.ai/ja/articles/machine-learning-transformer/

まとめ

特にありません。
（周辺の知識なく、この論文だけ読んでも、理解はできないと思います。
多くの方のいろんな記事とかを参考にしつつ。。。コツコツとかな。。）

自己注意Self-Attention の解説でわかりやすいと思った記事N選　(まだ、N＝9）

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up