Attention Is All You Needで学ぶ文法解釈

Posted at 2026-02-01

次の英語本文を、1文ずつ文法解釈せよ。各文について必ず次の順番・形式で出力すること。

【出力形式（各文共通）】
(1) 原文（そのまま1文）
(2) 文型ラベル（SVC / SVO / SVOO / SVOC / SVA など）
(3) SVOC分解（S=…, V=…, O=…, C=… を可能な範囲で明示。存在しない要素は省略）
(4) 修飾構造の分解（関係代名詞節, 分詞構文, 同格, 前置詞句, that節, 不定詞, などを箇条書きで示し、各要素の先行詞/係り先も明示）
(5) 日本語訳（直訳寄り。構造が対応するように訳す）
(6) 注意点（代名詞の指示先、省略主語、曖昧になりやすい前置詞句の係り先などがあれば1〜3点）

【制約】

推測で補わない。不明な係り先は「不明」と書く。
省略された主語は、根拠が明確な場合のみ「省略主語=…」として書く。
余計な解説は入れず、上の(1)〜(6)だけを出力する。

【対象本文】
<ここに英文を貼る>

Abstract
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles, by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder.

SV(C)

S: The dominant sequence transduction models（主要な系列変換モデル）
V: are based on（〜に基づいている）
C: complex recurrent or convolutional neural networks（複雑な再帰型または畳み込み型ニューラルネットワーク）

修飾（関係代名詞節）

that include an encoder and a decoder（エンコーダとデコーダを含む）
- S: that (= neural networks)
- V: include
- O: an encoder and a decoder

日本語（直訳寄り）

主要な系列変換モデルは、エンコーダとデコーダを含む複雑な再帰型または畳み込み型ニューラルネットワークに基づいている。

The best performing models also connect the encoder and decoder through an attention mechanism.

SVO

S: The best performing models（最も性能の高いモデル）
V: connect（接続する）
O: the encoder and decoder（エンコーダとデコーダを）
修飾: also（さらに）
修飾: through an attention mechanism（注意機構を介して）

日本語

最も性能の高いモデルは、注意機構を介してエンコーダとデコーダをさらに接続している。

We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely.

SVO

S: We（私たちは）
V: propose（提案する）
O: a new simple network architecture（新しい簡単なネットワーク構造）

同格

the Transformer（= その構造の名前がTransformer）

分詞句（付帯説明）

based solely on attention mechanisms（注意機構だけに基づく）
dispensing with recurrence and convolutions entirely（再帰と畳み込みを完全に捨てる）

日本語

私たちは、注意機構だけに基づき、再帰と畳み込みを完全に用いない新しい簡単なネットワーク構造、Transformer を提案する。

Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train.

SVOC

S: Experiments on two machine translation tasks（2つの機械翻訳タスクでの実験）
V: show（示す）
O: these models（これらのモデルが）
C: to be superior in quality（品質で優れていることを）

付帯（while + Ving：同時性）

while being more parallelizable（より並列化しやすく）
and requiring significantly less time to train（学習時間が大幅に少なくて済む）

日本語

2つの機械翻訳タスクでの実験は、これらのモデルが、品質で優れているだけでなく、より並列化しやすく、学習に必要な時間も大幅に少ないことを示している。

Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles, by over 2 BLEU.

SVO

S: Our model（我々のモデル）
V: achieves（達成する）
O: 28.4 BLEU（BLEU 28.4を）
修飾: on the WMT 2014 English-to-German translation task（WMT2014英独タスクで）

分詞句（付帯：結果/追加情報）

improving over the existing best results（既存の最良結果を上回り）
including ensembles（アンサンブルを含む最良結果を）
by over 2 BLEU（2 BLEU超だけ）

日本語

我々のモデルは、WMT2014英独翻訳タスクでBLEU 28.4を達成し、アンサンブルを含む既存の最良結果を2 BLEU超上回った。

On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature.

SVO

前置き: On the WMT 2014 English-to-French translation task（WMT2014英仏タスクでは）
S: our model（我々のモデルが）
V: establishes（打ち立てる/確立する）
O: a new single-model state-of-the-art BLEU score of 41.8（単一モデルとして新SOTAのBLEU 41.8を）

付帯（after + 名詞句）

after training for 3.5 days on eight GPUs（8GPUで3.5日学習した後で）

同格（名詞句で前内容を言い換え）

a small fraction of the training costs of the best models from the literature
=（既存研究の最良モデルに比べれば）学習コストのごく一部

日本語

WMT2014英仏翻訳タスクでは、我々のモデルは8GPUで3.5日学習しただけで、単一モデルとして新たな最先端となるBLEU 41.8を達成した。これは、既存研究の最良モデルの学習コストのごく一部にすぎない。

We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.

主節（SVO）

S: We（私たちは）
V: show（示す）
O: that節（〜ということを）

that節

S: the Transformer（Transformer が）
V: generalizes（一般化する）
修飾: well（うまく）
目的地: to other tasks（他のタスクへ）

手段（by + Ving）

by applying it successfully to English constituency parsing（英語の統語解析にうまく適用することで）
- it = the Transformer

両条件（both A and B）

both with large and limited training data（大量データでも少量データでも）

日本語

私たちは、Transformer を英語の句構造解析にうまく適用することで、学習データが多い場合でも少ない場合でも、Transformer が他のタスクへうまく一般化することを示す。

1 Introduction

Recurrent neural networks, long short-term memory [13] and gated recurrent [7] neural networks in particular, have been firmly established as state of the art approaches in sequence modeling and transduction problems such as language modeling and machine translation [35, 2, 5]. Numerous efforts have since continued to push the boundaries of recurrent language models and encoder-decoder architectures [38, 24, 15].

Recurrent models typically factor computation along the symbol positions of the input and output sequences. Aligning the positions to steps in computation time, they generate a sequence of hidden states h_t, as a function of the previous hidden state h_{t-1} and the input for position t. This inherently sequential nature precludes parallelization within training examples, which becomes critical at longer sequence lengths, as memory constraints limit batching across examples. Recent work has achieved significant improvements in computational efficiency through factorization tricks [21] and conditional computation [32], while also improving model performance in case of the latter. The fundamental constraint of sequential computation, however, remains.

Attention mechanisms have become an integral part of compelling sequence modeling and transduction models in various tasks, allowing modeling of dependencies without regard to their distance in the input or output sequences [2, 19]. In all but a few cases [27], however, such attention mechanisms are used in conjunction with a recurrent network.

In this work we propose the Transformer, a model architecture eschewing recurrence and instead relying entirely on an attention mechanism to draw global dependencies between input and output. The Transformer allows for significantly more parallelization and can reach a new state of the art in translation quality after being trained for as little as twelve hours on eight P100 GPUs.

(1) Recurrent neural networks, long short-term memory [13] and gated recurrent [7] neural networks in particular, have been firmly established as state of the art approaches in sequence modeling and transduction problems such as language modeling and machine translation [35, 2, 5].
(2) SVC
(3) S=Recurrent neural networks, long short-term memory [13] and gated recurrent [7] neural networks in particular / V=have been established / C=state of the art approaches in sequence modeling and transduction problems
(4) 修飾構造

in particular：S（Recurrent neural networks ...）を限定修飾（特に〜）
as state of the art approaches：established の補語（C）を導入
in sequence modeling and transduction problems：approaches を修飾（どの分野の手法か）
such as language modeling and machine translation：problems を具体化（例示）
(5) 日本語訳
特に長短期記憶[13]やゲート付き再帰[7]ニューラルネットワークを含む再帰型ニューラルネットワークは、言語モデリングや機械翻訳[35, 2, 5]といった系列モデリングおよび変換（transduction）の問題における最先端の手法として、確固として確立されてきた。
(6) 注意点
Sが挿入句（in particular）を含み長いが、主動詞は have been established。
as 以下は「〜として確立」の補語であり、理由ではない。

(1) Numerous efforts have since continued to push the boundaries of recurrent language models and encoder-decoder architectures [38, 24, 15].
(2) SVO
(3) S=Numerous efforts / V=have continued / O=to push the boundaries of recurrent language models and encoder-decoder architectures
(4) 修飾構造

since：V（have continued）を時間修飾（それ以降）
to push ...：不定詞（continued の目的語相当＝「〜し続けた」内容）
of recurrent language models and encoder-decoder architectures：boundaries を修飾（何の境界か）
(5) 日本語訳
それ以降も、多くの取り組みが、再帰型言語モデルとエンコーダ・デコーダ型アーキテクチャ[38, 24, 15]の限界を押し広げることを続けてきた。
(6) 注意点
push の目的語は the boundaries（比喩）。

(1) Recurrent models typically factor computation along the symbol positions of the input and output sequences.
(2) SVO
(3) S=Recurrent models / V=factor / O=computation
(4) 修飾構造

typically：V を修飾（一般に）
along the symbol positions ...：factor を修飾（どの軸に沿って分解するか）
of the input and output sequences：positions を修飾（どの列の位置か）
(5) 日本語訳
再帰型モデルは一般に、入力列と出力列の記号位置に沿って計算を分解する。
(6) 注意点
factor は「因数分解」ではなく「分解する／要因化する」の用法。

(1) Aligning the positions to steps in computation time, they generate a sequence of hidden states h_t, as a function of the previous hidden state h_{t-1} and the input for position t.
(2) SVO
(3) S=they / V=generate / O=a sequence of hidden states h_t
(4) 修飾構造

Aligning the positions to steps in computation time：分詞構文（主節全体を修飾）
- 係り先：they generate ...（主節）
- 省略主語=they（根拠：直後主語が they）
as a function of ...：a sequence（または generate の仕方）を補足（関数として）※係り先=不明（文法上両方に取り得るため）
of the previous hidden state ... and the input ...：function を修飾（何の関数か）
for position t：the input を修飾（どの位置の入力か）
(5) 日本語訳
計算時間のステップにそれらの位置を対応づけることで、それら（=再帰型モデル）は、直前の隠れ状態 h_{t-1} と位置 t の入力の関数として、隠れ状態 h_t の系列を生成する。
(6) 注意点
they の指示先：直前の Recurrent models。
as a function of の係り先は「hidden states」か「generate」か曖昧になり得るため不明とする。

(1) This inherently sequential nature precludes parallelization within training examples, which becomes critical at longer sequence lengths, as memory constraints limit batching across examples.
(2) SVO
(3) S=This inherently sequential nature / V=precludes / O=parallelization within training examples
(4) 修飾構造

inherently sequential：nature を修飾
within training examples：parallelization を修飾（どこで並列化できないか）
which becomes critical at longer sequence lengths：関係代名詞節（先行詞=parallelization）
at longer sequence lengths：becomes critical を修飾（条件）
as memory constraints limit batching across examples：as節（理由）
- S=memory constraints / V=limit / O=batching across examples
  (5) 日本語訳
この本質的に逐次的な性質は、学習例の内部での並列化を妨げるが、その並列化は系列長が長くなるほど重要になる。というのも、メモリ制約が例をまたいだバッチ化を制限するからである。
(6) 注意点
which の先行詞は parallelization（直前名詞）。
as は「〜なので」の理由。

(1) Recent work has achieved significant improvements in computational efficiency through factorization tricks [21] and conditional computation [32], while also improving model performance in case of the latter.
(2) SVO
(3) S=Recent work / V=has achieved / O=significant improvements in computational efficiency
(4) 修飾構造

through factorization tricks ... and conditional computation ...：achieved を修飾（手段）
while also improving model performance ...：while 節（付帯・同時）
- 省略主語=Recent work（根拠：主節主語と同一が自然）
in case of the latter：improving model performance を修飾（後者の場合に）
the latter：直前の2要素のうち後者=conditional computation
(5) 日本語訳
近年の研究は、因数分解の工夫[21]と条件付き計算[32]によって計算効率を大きく改善してきた。また同時に、後者の場合にはモデル性能も改善している。
(6) 注意点
the latter は conditional computation を指す。
while は逆接ではなく「同時に」の付帯が基本。

(1) The fundamental constraint of sequential computation, however, remains.
(2) SV
(3) S=The fundamental constraint of sequential computation / V=remains
(4) 修飾構造

of sequential computation：constraint を修飾
however：文全体を修飾（対比）
(5) 日本語訳
しかし、逐次計算という根本的な制約は残ったままである。
(6) 注意点
remains は自動詞で、補語を取らない用法。

(1) Attention mechanisms have become an integral part of compelling sequence modeling and transduction models in various tasks, allowing modeling of dependencies without regard to their distance in the input or output sequences [2, 19].
(2) SVC
(3) S=Attention mechanisms / V=have become / C=an integral part of compelling sequence modeling and transduction models in various tasks
(4) 修飾構造

of compelling sequence modeling and transduction models ...：part を修飾（何の一部か）
in various tasks：models を修飾（どの場面のモデルか）
allowing modeling of dependencies ...：分詞句（主節全体を修飾）
- 省略主語=Attention mechanisms（根拠：allowing の主語になれるのは attention mechanisms が自然）
without regard to their distance ...：dependencies を修飾（距離を無視して）
their：dependencies を指す（依存関係の距離）
in the input or output sequences：distance を修飾（どこでの距離か）
(5) 日本語訳
注意機構は、さまざまなタスクにおける有力な系列モデリング／変換モデルの不可欠な一部となってきた。これは、入力列または出力列における距離に関係なく依存関係をモデリングできるようにする[2, 19]。
(6) 注意点
allowing は結果・付帯の分詞句。
their は dependencies に係るのが自然。

(1) In all but a few cases [27], however, such attention mechanisms are used in conjunction with a recurrent network.
(2) SVA（受動）
(3) S=such attention mechanisms / V=are used / A=in conjunction with a recurrent network
(4) 修飾構造

In all but a few cases：文全体を修飾（ほとんどすべての場合で）
however：文全体を修飾（対比）
in conjunction with ...：are used を修飾（併用して）
(5) 日本語訳
しかし[27]、ごく少数の例外を除けば、そうした注意機構は再帰ネットワークと併用されている。
(6) 注意点
all but a few = 「少数を除くすべて」＝ほとんど全部。

(1) In this work we propose the Transformer, a model architecture eschewing recurrence and instead relying entirely on an attention mechanism to draw global dependencies between input and output.
(2) SVO
(3) S=we / V=propose / O=the Transformer
(4) 修飾構造

In this work：文全体を修飾（本研究では）
a model architecture ...：同格（Transformer の説明）
eschewing recurrence：分詞（architecture を修飾）
and instead relying entirely on ...：分詞（architecture を修飾、対比）
to draw global dependencies ...：不定詞（attention mechanism を修飾＝目的）
between input and output：dependencies を修飾（どの間の依存か）
(5) 日本語訳
本研究では、再帰を避け、代わりに入力と出力の間の大域的な依存関係を引き出すための注意機構だけに全面的に依拠するモデル構造である Transformer を提案する。
(6) 注意点
同格 a model architecture ... は Transformer を説明。
to draw の係り先は an attention mechanism（目的）として読むのが自然。

(1) The Transformer allows for significantly more parallelization and can reach a new state of the art in translation quality after being trained for as little as twelve hours on eight P100 GPUs.
(2) SVO（＋助動詞can）
(3) S=The Transformer / V=allows for / O=significantly more parallelization / V=can reach / O=a new state of the art in translation quality
(4) 修飾構造

and：等位接続（allows for ... と can reach ... を並列）
in translation quality：state of the art を修飾（何における最先端か）
after being trained ...：前置詞句（after + 動名詞受動）で can reach を修飾（条件/タイミング）
for as little as twelve hours：trained を修飾（期間）
on eight P100 GPUs：trained を修飾（計算資源）
(5) 日本語訳
Transformer は、より大きな並列化を可能にし、8基のP100 GPUで最短12時間の学習の後でも、翻訳品質において新たな最先端に到達し得る。
(6) 注意点
1文内に「allows for」と「can reach」の2つの述語がある（等位）。
after being trained は「到達の前提条件」を表す。

2 Background

The goal of reducing sequential computation also forms the foundation of the Extended Neural GPU [16], ByteNet [18] and ConvS2S [9], all of which use convolutional neural networks as basic building block, computing hidden representations in parallel for all input and output positions. In these models, the number of operations required to relate signals from two arbitrary input or output positions grows in the distance between positions, linearly for ConvS2S and logarithmically for ByteNet. This makes it more difficult to learn dependencies between distant positions [12]. In the Transformer this is reduced to a constant number of operations, albeit at the cost of reduced effective resolution due to averaging attention-weighted positions, an effect we counteract with Multi-Head Attention as described in section 3.2.

Self-attention, sometimes called intra-attention is an attention mechanism relating different positions of a single sequence in order to compute a representation of the sequence. Self-attention has been used successfully in a variety of tasks including reading comprehension, abstractive summarization, textual entailment and learning task-independent sentence representations [4, 27, 28, 22].

End-to-end memory networks are based on a recurrent attention mechanism instead of sequence-aligned recurrence and have been shown to perform well on simple-language question answering and language modeling tasks [34].

To the best of our knowledge, however, the Transformer is the first transduction model relying entirely on self-attention to compute representations of its input and output without using sequence-aligned RNNs or convolution. In the following sections, we will describe the Transformer, motivate self-attention and discuss its advantages over models such as [17, 18] and [9].
(1) The goal of reducing sequential computation also forms the foundation of the Extended Neural GPU [16], ByteNet [18] and ConvS2S [9], all of which use convolutional neural networks as basic building block, computing hidden representations in parallel for all input and output positions.
(2) SVO
(3) S=The goal of reducing sequential computation / V=forms / O=the foundation of the Extended Neural GPU [16], ByteNet [18] and ConvS2S [9]
(4) 修飾構造

of reducing sequential computation：goal を修飾（何の目標か）
also：V を修飾（〜もまた）
of the Extended Neural GPU ..., ByteNet ... and ConvS2S ...：foundation を修飾（何の基盤か）
all of which use convolutional neural networks ...：関係代名詞節（先行詞=the Extended Neural GPU, ByteNet, ConvS2S）
- use A as B：use convolutional neural networks as basic building block（〜を基本ブロックとして使う）
computing hidden representations ...：分詞句（関係代名詞節を補足）
- 係り先：use（または all of which ... 全体）※不明（両方に取り得るため）
in parallel：computing を修飾（並列に）
for all input and output positions：computing を修飾（どの位置についてか）
(5) 日本語訳
逐次計算を減らすという目標は、Extended Neural GPU[16]、ByteNet[18]、ConvS2S[9]の基盤も成している。これらはすべて、畳み込みニューラルネットワークを基本ブロックとして用い、入力と出力の全位置について隠れ表現を並列に計算する。
(6) 注意点
all of which の先行詞は3つのモデル（Extended Neural GPU / ByteNet / ConvS2S）。
computing ... の係り先は use か節全体か曖昧になり得るため不明。

(1) In these models, the number of operations required to relate signals from two arbitrary input or output positions grows in the distance between positions, linearly for ConvS2S and logarithmically for ByteNet.
(2) SV
(3) S=the number of operations required to relate signals from two arbitrary input or output positions / V=grows
(4) 修飾構造

In these models：文全体を修飾（これらのモデルでは）
required to relate signals ...：過去分詞句（operations を修飾）
- to relate ...：不定詞（required の内容）
- from two arbitrary input or output positions：signals を修飾（どこからの信号か）
in the distance between positions：grows を修飾（距離に対して増える）
linearly for ConvS2S and logarithmically for ByteNet：grows を修飾（増え方の具体化）
(5) 日本語訳
これらのモデルでは、任意の2つの入力位置または出力位置からの信号を関連づけるのに必要な演算回数は、位置間の距離に応じて増大する。ConvS2Sでは線形に、ByteNetでは対数的に増える。
(6) 注意点
required to relate ... は operations を限定する修飾。
grows の「何に対して増えるか」は in the distance ...。

(1) This makes it more difficult to learn dependencies between distant positions [12].
(2) SVOC
(3) S=This / V=makes / O=it / C=more difficult to learn dependencies between distant positions [12]
(4) 修飾構造

to learn dependencies ...：不定詞（difficult の内容）
between distant positions：dependencies を修飾
This：指示先=直前文内容（演算回数が距離とともに増えること）
(5) 日本語訳
これは、離れた位置間の依存関係を学習することをより困難にする[12]。
(6) 注意点
it は形式目的語（真の内容は to learn ...）。
This の指示先は直前文全体（距離に応じて演算回数が増える性質）。

(1) In the Transformer this is reduced to a constant number of operations, albeit at the cost of reduced effective resolution due to averaging attention-weighted positions, an effect we counteract with Multi-Head Attention as described in section 3.2.
(2) SVC（受動）
(3) S=this / V=is reduced / C=to a constant number of operations
(4) 修飾構造

In the Transformer：文全体を修飾（Transformerでは）
to a constant number of operations：reduced の補語（どこまで減るか）
albeit at the cost of ...：譲歩の挿入句（ただし〜という代償で）
at the cost of reduced effective resolution：cost の内容
due to averaging attention-weighted positions：reduced effective resolution を修飾（原因）
averaging attention-weighted positions：動名詞句（averaging の目的語=attention-weighted positions）
an effect we counteract ...：同格（直前の内容を「効果」として言い換え）
- we counteract ...：関係節（先行詞=an effect）
with Multi-Head Attention：counteract を修飾（手段）
as described in section 3.2：described を修飾（どこで説明か）
(5) 日本語訳
Transformer では、これは一定回数の演算へと減らされる。ただし、注意重み付きの位置を平均化することによって有効解像度が低下するという代償を伴う。この効果は、第3.2節で述べるMulti-Head Attentionによって私たちが打ち消す。
(6) 注意点
this の指示先は直前の「演算回数が距離で増える性質」か「演算回数そのもの」かが文脈依存で、文だけでは不明。
an effect は直前の「reduced effective resolution ...」を受ける同格。

(1) Self-attention, sometimes called intra-attention is an attention mechanism relating different positions of a single sequence in order to compute a representation of the sequence.
(2) SVC
(3) S=Self-attention / V=is / C=an attention mechanism
(4) 修飾構造

sometimes called intra-attention：過去分詞句（Self-attention を修飾）
relating different positions ...：現在分詞句（mechanism を修飾）
- of a single sequence：positions を修飾
in order to compute ...：不定詞句（relating を修飾＝目的）
a representation of the sequence：compute の目的語
of the sequence：representation を修飾
(5) 日本語訳
Self-attention（ときに intra-attention と呼ばれる）は、単一の系列の異なる位置どうしを関連づける注意機構であり、その系列の表現を計算することを目的とする。
(6) 注意点
「Self-attention, sometimes called intra-attention is ...」は挿入句の後に主動詞 is が来る。
relating は mechanism を後置修飾。

(1) Self-attention has been used successfully in a variety of tasks including reading comprehension, abstractive summarization, textual entailment and learning task-independent sentence representations [4, 27, 28, 22].
(2) SV（受動完了）
(3) S=Self-attention / V=has been used
(4) 修飾構造

successfully：used を修飾
in a variety of tasks：used を修飾（どこで使われたか）
including reading comprehension, ...：tasks を具体化（例示）
learning task-independent sentence representations：including の列挙要素
- task-independent：sentence representations を修飾
  (5) 日本語訳
Self-attention は、読解、抽象要約、テキスト含意、タスク非依存の文表現の学習[4, 27, 28, 22]などを含む多様なタスクで、成功裏に用いられてきた。
(6) 注意点
including 以下は tasks の例示（全列挙ではない）。

(1) End-to-end memory networks are based on a recurrent attention mechanism instead of sequence-aligned recurrence and have been shown to perform well on simple-language question answering and language modeling tasks [34].
(2) SVC（＋等位）
(3) S=End-to-end memory networks / V=are based / C=on a recurrent attention mechanism / V=have been shown / C=to perform well on simple-language question answering and language modeling tasks [34]
(4) 修飾構造

instead of sequence-aligned recurrence：are based を修飾（〜の代わりに）
and：等位接続（are based ... と have been shown ...）
to perform well ...：不定詞（shown の内容）
on simple-language question answering and language modeling tasks：perform well を修飾（対象）
(5) 日本語訳
エンドツーエンド・メモリネットワークは、系列に整列した再帰の代わりに再帰的注意機構に基づいており、平易な言語での質問応答と語モデル化のタスク[34]で良好に機能することが示されている。
(6) 注意点
1文内に「are based」と「have been shown」の2つの述語がある（等位）。

(1) To the best of our knowledge, however, the Transformer is the first transduction model relying entirely on self-attention to compute representations of its input and output without using sequence-aligned RNNs or convolution.
(2) SVC
(3) S=the Transformer / V=is / C=the first transduction model
(4) 修飾構造

To the best of our knowledge：文全体を修飾（我々の知る限り）
however：文全体を修飾（対比）
relying entirely on self-attention ...：現在分詞句（model を修飾）
to compute representations ...：不定詞（relying を修飾＝目的）
of its input and output：representations を修飾
without using sequence-aligned RNNs or convolution：compute（または relying）を修飾※係り先=不明（両方に取り得るため）
its：the Transformer を指す
(5) 日本語訳
しかし、我々の知る限り、Transformer は、系列整列RNNや畳み込みを用いずに、その入力と出力の表現を計算するためにself-attentionのみに全面的に依拠する、最初の変換（transduction）モデルである。
(6) 注意点
without using ... の係り先は compute か relying か曖昧になり得るため不明。
its は the Transformer を指す。

(1) In the following sections, we will describe the Transformer, motivate self-attention and discuss its advantages over models such as [17, 18] and [9].
(2) SVO（＋等位）
(3) S=we / V=will describe / O=the Transformer / V=(will) motivate / O=self-attention / V=(will) discuss / O=its advantages over models such as [17, 18] and [9]
(4) 修飾構造

In the following sections：文全体を修飾（以降の節で）
and：等位接続（describe / motivate / discuss を並列）
its：指示先=不明（直前に the Transformer と self-attention があり、どちらの advantages か文だけでは確定できないため）
over models such as ...：advantages を修飾（比較対象）
(5) 日本語訳
以降の節では、私たちはTransformerを説明し、self-attentionの動機づけを行い、[17, 18]や[9]のようなモデルに対するその利点を議論する。
(6) 注意点
its の指示先は文だけでは不明（Transformer/self-attention のどちらにも取り得る）。
3つの動詞が等位で並ぶ。
3 Model Architecture

Most competitive neural sequence transduction models have an encoder-decoder structure [5, 2, 35]. Here, the encoder maps an input sequence of symbol representations (x1, ..., xn) to a sequence of continuous representations z = (z1, ..., zn). Given z, the decoder then generates an output sequence (y1, ..., ym) of symbols one element at a time. At each step the model is auto-regressive [10], consuming the previously generated symbols as additional input when generating the next.

(shifted right)

Figure 1: The Transformer - model architecture.

The Transformer follows this overall architecture using stacked self-attention and point-wise, fully connected layers for both the encoder and decoder, shown in the left and right halves of Figure 1, respectively.

(1) Most competitive neural sequence transduction models have an encoder-decoder structure [5, 2, 35].
(2) SVO
(3) S=Most competitive neural sequence transduction models / V=have / O=an encoder-decoder structure [5, 2, 35]
(4) 修飾構造

Most：models を修飾（大半の）
competitive：models を修飾（競争力のある）
neural sequence transduction：models を修飾（どの種のモデルか）
(5) 日本語訳
競争力のあるニューラル系列変換モデルの大半は、エンコーダ・デコーダ構造[5, 2, 35]を持つ。
(6) 注意点
Most は「ほとんどの」。100%ではない。

(1) Here, the encoder maps an input sequence of symbol representations (x1, ..., xn) to a sequence of continuous representations z = (z1, ..., zn).
(2) SVOA（+ A=to ...）
(3) S=the encoder / V=maps / O=an input sequence of symbol representations (x1, ..., xn) / A=to a sequence of continuous representations z = (z1, ..., zn)
(4) 修飾構造

Here：文全体を修飾（ここでは）
of symbol representations：sequence を修飾（何の並びか）
(x1, ..., xn)：symbol representations の具体例（括弧内）
to a sequence ...：maps の到達先（対応付け先）
of continuous representations：sequence を修飾
z = (z1, ..., zn)：continuous representations の同格（定義）
(5) 日本語訳
ここでは、エンコーダは、記号表現 (x1, ..., xn) からなる入力系列を、連続表現の系列 z = (z1, ..., zn) へ写像する。
(6) 注意点
map A to B は「AをBに写像」。to は方向。

(1) Given z, the decoder then generates an output sequence (y1, ..., ym) of symbols one element at a time.
(2) SVO
(3) S=the decoder / V=generates / O=an output sequence (y1, ..., ym) of symbols
(4) 修飾構造

Given z：分詞（過去分詞）句（条件）
- 係り先：the decoder generates ...（主節）
then：V を修飾（その後）
(y1, ..., ym)：output sequence の具体例
of symbols：sequence を修飾
one element at a time：generates を修飾（逐次的に）
(5) 日本語訳
z が与えられると、デコーダはその後、記号の出力系列 (y1, ..., ym) を、1要素ずつ生成する。
(6) 注意点
Given z は「zを与えると」の条件。主語は明示されないが、主節に係る。

(1) At each step the model is auto-regressive [10], consuming the previously generated symbols as additional input when generating the next.
(2) SVC（+ 分詞句）
(3) S=the model / V=is / C=auto-regressive [10]
(4) 修飾構造

At each step：文全体を修飾（各ステップで）
consuming the previously generated symbols ...：現在分詞句（主節全体を補足）
- 省略主語=the model（根拠：consuming の主体になれるのが model）
the previously generated symbols：consuming の目的語
as additional input：consuming を修飾（何として消費か）
when generating the next：when節（時）
- 係り先：consuming（いつ消費するか）
the next：指示先=不明（the next symbol / the next element / the next output のいずれかに取り得るため）
(5) 日本語訳
各ステップでモデルは自己回帰型[10]であり、次を生成するときに、先に生成した記号を追加入力として消費する。
(6) 注意点
consuming 以下は分詞句で主節（is auto-regressive）を補足。
the next の指示先は文だけでは不明。

(1) Figure 1: The Transformer - model architecture.
(2) 文型判定不能（名詞句）
(3) SVOC分解なし（見出し）
(4) 修飾構造

The Transformer - model architecture：Figure 1 の説明（タイトル）
(5) 日本語訳
図1：Transformer―モデル構成。
(6) 注意点
文章ではなく図キャプション。

(1) The Transformer follows this overall architecture using stacked self-attention and point-wise, fully connected layers for both the encoder and decoder, shown in the left and right halves of Figure 1, respectively.
(2) SVO
(3) S=The Transformer / V=follows / O=this overall architecture
(4) 修飾構造

using stacked self-attention and point-wise, fully connected layers ...：分詞句（手段）（follow の方法）
for both the encoder and decoder：layers を修飾（適用先）
shown in the left and right halves of Figure 1, respectively：過去分詞句（encoder and decoder を修飾）
- 係り先：the encoder and decoder（直前の名詞句）
in the left and right halves of Figure 1：shown を修飾（どこに示されるか）
respectively：left/right と encoder/decoder の対応を明示
(5) 日本語訳
Transformer は、積み重ねたself-attentionと、点ごとの全結合層をエンコーダとデコーダの双方に用いることで、この全体アーキテクチャに従う。エンコーダとデコーダはそれぞれ図1の左半分と右半分に示されている。
(6) 注意点
shown ... は encoder and decoder を後置修飾し、respectively で左右対応を示す。
using ... は follow の手段・方法。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up