More than 3 years have passed since last update.

Attention Is All You Need の『Attention』を、段階を踏んで、理解する手段

Last updated at 2021-04-04Posted at 2021-01-14

概要

を考えたので記事にする。
（少々、挫折しそうですが。）

第１段階:query,key,valueに慣れる (<-- まだ、進捗としては、こっちです。）
第２段階:attentionは、広い範囲から情報を引っ張れる特徴があることを理解する
第n段階:学習の仕組みを理解する

第1段階

第1段階の目標

query,key,valueに慣れる

手段A(元祖の論文を読む）

以下の論文が重要とどこかでみたので、以下の論文をながめる。
https://arxiv.org/pdf/1606.03126.pdf

Key-Value Memory Networks for Directly Reading Documents

Abstract

Directly reading documents and being able to answer questions from them is an unsolved challenge. Toavoiditsinherentdifﬁculty,questionanswering(QA)hasbeendirectedtowards using Knowledge Bases (KBs) instead, which has proven effective. Unfortunately KBs often suffer from being too restrictive, as the schema cannot support certain types of answers, and too sparse, e.g. Wikipedia contains much more information than Freebase. In this work we introduce a new method, Key-Value Memory Networks,thatmakesreadingdocumentsmore viablebyutilizingdifferentencodingsintheaddressing and output stages of the memory read operation. To compare using KBs, information extraction or Wikipedia documents directly in a single framework we construct an analysis tool, WIKIMOVIES,aQAdatasetthatcontains raw text alongside a preprocessed KB, in the domain of movies. Our method reduces the gap between all three settings. It also achieves state-of-the-artresultsontheexistingWIKIQA benchmark.

Google翻訳

ドキュメントを直接読んで、それらからの質問に答えることができることは、未解決の課題です。固有の困難を回避するために、質問応答（QA）は、代わりにナレッジベース（KB）を使用するように指示されており、効果的であることが証明されています。残念ながら、スキーマは特定のタイプの回答をサポートできず、まばらすぎるため、KBは制限が厳しすぎるという問題がよくあります。ウィキペディアには、Freebaseよりもはるかに多くの情報が含まれています。この作業では、メモリ読み取り操作のアドレス指定および出力ステージで異なるエンコーディングを利用することにより、ドキュメントの読み取りをより実行可能にする新しい方法、Key-Valueメモリネットワークを紹介します。 KB、情報抽出、またはWikipediaドキュメントを単一のフレームワークで直接使用して比較するために、映画のドメインで、前処理されたKBと一緒に生のテキストを含む分析ツールWIKIMOVIES、aQAdatasetを構築します。私たちの方法は、3つの設定すべての間のギャップを減らします。また、既存のWIKIQAベンチマークで最先端の結果を達成します。

代表的な図

ノリのわかる説明

KBTriple Knowledgebaseentrieshaveastructure of triple “subject relation object” (see Table 1 for examples). The representation we consider is simple: the key is composed of the left-hand side entity (subject) and the relation, and the value is the right-hand side entity (object). We double the KB and consider the reversed relation as well (e.g. we now have two triples “Blade Runner directed_by Ridley Scott” and “Ridley Scott !directed_by Blade Runner” where !directed_by is a different entry in the dictionary than directed_by). Having the entry both ways round is important for answering different kinds of questions (“Who directed Blade Runner?” vs. “What did Ridley Scott direct?”). For a standard MemNN that does not have key-value pairs the whole triple has to be encoded into the same memory slot.

Google翻訳

KBTriple Knowledgebaseentriesは、トリプル「サブジェクトリレーションオブジェクト」の構造を持っています（例については、表1を参照してください）。私たちが考える表現は単純です。キーは左側のエンティティ（サブジェクト）とリレーションで構成され、値は右側のエンティティ（オブジェクト）です。 KBを2倍にし、逆の関係も考慮します（たとえば、「ブレードランナーdirected_by RidleyScott」と「RidleyScott！directed_by BladeRunner」の2つのトリプルがあります。！directed_byは辞書のdirected_byとは異なるエントリです）。さまざまな種類の質問に答えるには、エントリを双方向にすることが重要です（「ブレードランナーを監督したのは誰か」と「リドリー・スコットは何を監督したのか」）。キーと値のペアを持たない標準のMemNNの場合、トリプル全体を同じメモリスロットにエンコードする必要があります。

手段B(3DグラフィックスをつかったYoutube教材）

自己注意Self-Attention の解説でわかりやすいと思った記事N選　(まだ、N＝7）

に示した、

その７の『How to get meaning from text with language model BERT | AI Explained(Peltarion) 　Youtube』( https://www.youtube.com/watch?v=-9vVhYEXeyQ&t=337s )

とかをみる。

進捗

ワタシの理解の状況は、以下。

この説明で、multi-headとかはわからないだろうと思う。
bankの「土手」と「銀行」が、この類の定番ネタのよう。
第一段階の目標は、達成していないが。。。手段Cを追加予定。

　（下記、著作権が怪しいので、この手段C、ダメ！！やり直し）手段C（Multi-headを理解するためのYoutube)

適当に検索して選んだもの。ちょっと見た範囲では、役立つ気がしました。
これって、著作権とかセーフなんだろうか？？？

https://www.youtube.com/watch?v=dichIcUZfOw
Visual Guide to Transformer Neural Networks - (Part 1) Position Embeddings
(Intellimath - The Math of Intelligence)

https://www.youtube.com/watch?v=mMa2PmYJlCo
Visual Guide to Transformer Neural Networks - (Part 2) Multi-Head & Self-Attention
(Intellimath - The Math of Intelligence)

↓　以下、画面の引用。　Multi-headには、なっている。

　手段C（Multi-headを理解するためのxxx 工事中！)

第2段階

第2段階の目標

attentionは、広い範囲から情報を引っ張れる特徴があることを理解する

第n段階

第n段階の目標

学習の仕組みを理解する

わからないことをリストアップ

Q ⇒　済
Question embeddingって何？

(自分で)A
embddingが、言語処理等では非常に一般的。例えば、
https://ishitonton.hatenablog.com/entry/2018/11/25/200332
を引用すると、

自然言語処理におけるEmbedding（埋め込み）とは、「文や単語、文字など自然言語の構>成要素に対して、何らかの空間におけるベクトルを与えること」を指す。

例えば、Word Embedddingでは、各単語に対して固有のベクトルを与えることを指している。

Q

Self-AttentionのSelfって何？　確か、入力と出力を同じにするような感じだったと思うが、それで、何を得る？？

(自分で)A

Q

Attentionが、かなり離れた場所とも関連付けられるという仕組みは、どういう仕組み？
（CNNの比較で言われることがある。）

(自分で)A

Q

どこに学習できるパラメータがあって、どう学習していのかが、よくわからない。

(自分で)A

Q ⇒　済

そもそも、論文の下記の図のinputはわかるとして、outputが２つあり、片方は、モデルに入力しているのがよくわからない。。。outputなのに。。。
超基本ですが。。。
また、この図の構成で何をするのかも、よくわかっていない。

(自分で)A

↓　下記は、超お薦め
(自分の関連記事でも超お薦めしている。）
機械学習におけるtransformer(by ライオンブリッジジャパン株式会社)
https://lionbridge.ai/ja/articles/machine-learning-transformer/

まとめ

とくにありません。
引用している論文は、確か、意味があるはず。（いまのところ、この記事の価値は、それぐらい。。。）
ちょっと、挫折しそうです。

自己注意Self-Attention の解説でわかりやすいと思った記事N選　(まだ、N＝9）

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

Attention Is All You Need の 『Attention』を、段階を踏んで、理解する手段

概要

第1段階

第1段階の目標

手段A(元祖の論文を読む）

Abstract

代表的な図

ノリのわかる説明

手段B(3DグラフィックスをつかったYoutube教材）

進捗

（下記、著作権が怪しいので、この手段C、ダメ！！やり直し）手段C（Multi-headを理解するためのYoutube)

手段C（Multi-headを理解するためのxxx 工事中！)

第2段階

第2段階の目標

第n段階

第n段階の目標

わからないことをリストアップ

まとめ

関連記事

Attention Is All You Need の『Attention』を、段階を踏んで、理解する手段

　（下記、著作権が怪しいので、この手段C、ダメ！！やり直し）手段C（Multi-headを理解するためのYoutube)

　手段C（Multi-headを理解するためのxxx 工事中！)