More than 5 years have passed since last update.

scoutyAdvent Calendar 2018

A Review of the Neural History of Natural Language Processing (和訳) 後編

Last updated at 2018-12-06Posted at 2018-12-06

この記事は scouty Advent Calendar 2018 の6日目です。

前回に引き続き、「A Review of the Neural History of Natural Language Processing」の後半を和訳していきます。翻訳は相当に意訳です。よい訳語がみつからなかった technical term は英語表記をそのまま残してあります。間違いがあったらこっそり教えていただけると幸せです。

2014 - Sequence-to-sequence モデル

2014年に、Sutskever ら¹ は、ニューラルネットワークを用いて系列データ(シーケンス)を別の系列データにマッピングする一般的なフレームワークである sequence-to-sequence 学習を提案した。
このフレームワークにおいて、エンコーダは文のシンボル(単語など)を1つずつ処理してそれをベクトル表現に圧縮する(訳註: これがデコーダの初期ベクトルとなる)。
デコーダは、エンコーダの状態に基づいて出力シンボルを1つ1つ予測し、下図8のように、1つ前のステップで予測されたシンボルを次のステップの入力として受け付ける。

Figure 8: sequence-to-sequence モデル (Sutskever et al., 2014)

機械翻訳は、このフレームワークのキラーアプリケーションとなった。
2016年、Googleは、モノリシックなフレーズベースの機械翻訳モデルを、ニューラル機械翻訳モデル(Wu et al., 2016)²に置き換え始めたと発表した。
Jeff Dean によると、これは、フレーズベースの500,000行の機械翻訳のコードをニューラルネットワークモデルの500行に入れ替えることを意味していた。

このフレームワークは柔軟性があるため、様々なモデルをエンコーダ・デコーダとして組み合わせることで、様々な自然言語の生成タスクにおいて頼りになるフレームワークとなっている。
注目すべきは、デコーダは単に系列データのみならず、任意の形式のデータを用いて学習することができるということである。
これにより、例えば、画像をもとにキャプションを生成したり(Vinyals et al,. 2015)³ (下図9参照)、表をもとに文章を生成したり(Lebret et al., 2016)⁴、ソースコードの差分からその説明文を生成したりするなど(Loyola et al., 2017)⁵、多くの応用が可能となっている。

Figure 9: 画像からキャプションを生成 (Vinyals et al., 2015)

sequence-to-sequence 学習は、出力がある構造を持つような、自然言語処理で一般的な構造推定タスクにも適用可能である。
単純化のため、モデルの出力は、図10の構文解析結果のように直列化された形式で出力される。
ニューラルネットワークを用いることで、構文解析(Vinyals et al., 2015)⁶や固有表現抽出(Gillick et al., 2016)⁷をはじめとする様々なタスクにおいて、訓練データさえ十分に与えられれば直列化された形式の出力を直接生成できることが示された。

Figure 10: 構文解析器の直列化 (Vinyals et al., 2015)

系列データのエンコーダおよびデコーダのは通常RNNに基づいているが、異なるモデルのタイプにも使用することができる。
これらの新しいアーキテクチャは主に機械翻訳の研究において生まれたものであり、機械翻訳は sequence-to-sequence アーキテクチャのペトリ皿として機能している。
最近のモデルとしては、ディープLSTM (Wu et al., 2016)⁸、畳み込みエンコーダ (Kalchbrenner et al., 2016; Gehring et al., 2017)⁹ ¹⁰、トランスフォーマー(Vaswani et al., 2017)¹¹ (これは次のセクションで説明する)、およびLSTMとトランスフォーマーの組み合わせ(Chen et al., 2018)¹²などがある。

2015 - アテンション (Attention)

アテンション(Bahdanau et al., 2015) ¹³ は、ニューラル機械翻訳の中核となるイノベーションの１つであり、ニューラル機械翻訳を古典的なフレーズベースの機械翻訳システムよりも優れたものにする重要なアイディアである。
seaquence-to-sequence 学習のボトルネックは、入力シーケンスの内容全体を固定サイズのベクトル(訳註: LSTMなどの内部状態ベクトル)に圧縮する必要があることである。
アテンションでは、デコーダが入力シーケンスの隠れ状態を振り返る(図11のように、デコーダへの追加入力として隠れ状態の加重平均を与える)ことで、このボトルネックを軽減した。

Figure 11: アテンション (Bahdanau et al., 2015)

様々な種類のアテンションが提案されている(Luong et al., 2015)¹⁴。
それらの簡単な概略はこちらの記事を参照されたい。
アテンションは応用範囲が広く、入力の特定の部分に基づいて意思決定を行う必要のあるタスクには、潜在的に有効である。
代表的な応用先として、構文解析(Vinyals et al., 2015)¹⁵、読解 (Hermann et al., 2015)¹⁶、そしてワンショット学習(Vinyals et al., 2016)¹⁷などがある。
また、入力データは系列データである必要すらなく、下図12に示した画像のキャプション生成 (Xu et al., 2015)¹⁸の場合のように、他の形式のデータであってもかまわない。
この場合、有用な副次効果として、アテンションの重みによって、入力のうちどの部分が個別の出力に関連しているかを調べることにより、モデルの内部動作をほんの少しだけ垣間見ることができる。

Figure 12: 画像の表題生成モデルにおいて、「フリスビー」という単語を生成する際にモデルが何を注目(訳註: attending)しているかの可視化。(Xu et al., 2015)

アテンションは、単に入力シーケンスを見ることだけではない;
セルフ・アテンションによって、文や文章内で周囲の単語を見ることで、より文脈依存性の高い単語表現を得ることができる。
多層構造のセルフ・アテンションは、最先端のニューラル機械翻訳モデルである「トランスフォーマー」アーキテクチャ(Vaswani et al., 2017)¹⁹ のコアとなる機構である。

2015 - メモリベースネットワーク

アテンションは、メモリの役割を果たす過去の隠れ状態の中から
何を取り出すべきかを選択する、一種のファジーメモリと解釈することができる。
アテンションとメモリの関係についての詳細は、この投稿を参照されたい。
アテンションよりも明示的に、メモリを持つ様々なモデルが提案されている。
それらのメモリベスネットワークの例として、ニューラル・チューリングマシン(Graves et al., 2014)²⁰、メモリ・ネットワーク(Weston et al., 2015)²¹ とEnd-to-end メモリ・ネットワーク(Sukhbaatar et al., 2015)²²、動的メモリネットワーク (Kumar et al., 2015)²³、可微分ニューラルコンピュータ (Graves et al., 2016)²⁴、そして Reccurent Entity Network (Henaff et al., 2017)²⁵ などが挙げられる。

メモリは、アテンションと同様に、現在の状態との類似度に基づいてアクセスされることが多く、普通、読み出しと書き込みが可能である。
様々なモデルで異なるのは、この機構をどのように実装し、どのようにメモリを用いるかという点である。
例えば、end-to-end メモリ・ネットワークは、入力を複数回処理してメモリを更新することで、複数ステップの推定を可能にする。
ニューラル・チューリングマシーンには、位置ベースのメモリアクセス機構があり、これにより、ソートなどの簡単なプログラムを学習できる。
メモリベースモデルは、例えば言語モデリングや読解など、長い期間情報を保持することが有効となるタスクに多く適用される。
メモリの概念は非常に汎用性があり、ナレッジベースやテーブルをメモリとして機能させることもできるし、メモリへの書き込みは入力全体やその一部のみに基づいて行うことも可能である。

2018 - 学習済み言語モデル

学習済みの単語の埋め込みモデルは、コンテキストを反映しないものであり、今日の我々のモデルでは最初のレイヤーを初期化するためにのみ利用される。
最近では、ニューラルネットワークを事前学習するために、種々の教師あり学習タスクが用いられてきた(Conneau et al., 2017; McCann et al., 2017; Subramanian et al., 2018)²⁶ ²⁷ ²⁸。
これらとは対照的に、言語モデルでは学習に必要なのはラベルなしのテキストのみである。
これにより、学習対象を数10億ものトークン、新しいドメインや言語にスケールすることができる。
学習済み言語モデルは2015年に初めて提案され(Dai & Le, 2015)²⁹、つい最近になって様々な種類のタスクにおいて有用であることが示された。
言語モデル埋め込みは対象とするモデルの特徴量として用いることもできるし(Peters et al., 2018)³⁰、対象のデータに対して言語モデルをファインチューニングすることもできる(Ramachandran et al., 2017; Howard & Ruder, 2018)³¹ ³²。
図13に示すように、言語モデル埋め込みを追加することで、様々なタスクにおける最先端のモデルに対して、大幅な精度改善が得られる。

Figure 13: 最先端手法に対する言語モデル埋め込みによる改善 (Peters et al., 2018)

学習済み言語モデルを用いることで、大幅に少ないデータで学習が可能となることが示されている。
言語モデルはラベルなしデータのみが必要であることを考慮すると、ラベル付きデータが少ない低リソース言語において特に有効であることがわかる。
事前学習言語モデルの潜在能力に関するより詳しい情報は、こちらの記事を参照されたい。

その他のマイルストーン

上述のものほどには普及していないにしても、その他のいくつかの研究もまた、広範囲な影響をもたらしている。

文字ベース表現

CNN や LSTM を文字列に直接適用することで文字ベース表現を得る手法はかなり一般的になってきており、特に、形態学的特徴が豊富な言語や、形態学的な情報が重要となるタスク、未知の単語が多いタスクに用いられる。
筆者の知る限り、文字ベース表現が最初に使われたのは系列ラベリング(Lample et al., 2016; Plank et al., 2016)³³ ³⁴である。
文字ベース表現により、固定の語彙を大量の計算コストで処理する必要性を緩和され、完全な文字ベースのニューラル機械翻訳(Ling et al., 2016; Lee et al., 2017)³⁵ ³⁶などのアプリケーションが可能となった。

敵対的学習

敵対的学習の手法は機械学習の分野で旋風を巻き起こし、自然言語処理においても様々な形で使用されてきた。
敵対的手法のアプローチは、モデルを検証し、その失敗理由を理解するためのツールにとどまらず、学習をよりロバストにする手法として加速度的に広まった(Jia & Liang, 2017) ³⁷。
(仮想)敵対的学習、すなわち最悪ケース摂動法(Miyato et al., 2017; Yasunaga et al., 2018) ³⁸ ³⁹と、domain-adversarial losses (Ganin et al., 2016; Kim et al., 2017) ⁴⁰ ⁴¹は、モデルをよりロバストにすることができる有用な正則化の一種である。
また、生成的敵対モデル (GAN) は、自然言語の生成においてはまだそこまで有効ではないが(Semeniuta et al., 2018) ⁴²、分布をマッチする場合などにおいては有効である(Conneau et al., 2018)⁴³。

強化学習

強化学習は、学習中におけるデータ選択(Fnag et al., 2017; Wu et al., 2018)⁴⁴ ⁴⁵や、対話モデリング(Liu et al., 2018)⁴⁶のような時間依存性を持つタスクに有用であることが示されている。
強化学習は、また、クロスエントロピーのような代替的な目的関数を最適化するのではなく、ROUGEやBLEUなどといった最終目的となる微分不可能な指標を直接最適化する場合においても有効であることが、要約(Paulus et al., 2018; Celikyilmaz et al., 2018)⁴⁷ ⁴⁸や機械翻訳 (Rnzato et al., 2016)⁴⁹ などのタスクにおいて示されている。
同様に、逆強化学習は、visual storytelling(Wang et al., 2018)⁵⁰のような、報酬関数が複雑すぎて特定することができない状況においても有効である。

ニューラルネット以外のマイルストーン

1998年以降、FrameNet プロジェクトが開始され(Baker et al., 1998)⁵¹、今日でも活発に研究が進められている浅い意味解析の一種である意味役割ラベリングのタスクが導入された。
2000年代初頭、国際会議 Conference on Natural Language Learning (CoNLL)において発表された共通課題は、チャンキング (Tjong Kim Sang et al., 2000)⁵²や固有表現抽出 (Tjong Kim Sang et al., 2003)⁵³、係り受け解析 (Buchholz et al., 2006)⁵⁴ などの、主要な自然言語処理タスクに影響を及ぼした。
今日でも、CoNLL の共通課題データセットの多くは、手法の評価を行う際の標準として用いられる。

2001年には、系列ラベリングで最も影響力のある手法の１つである条件付きランダム場 (CRF; Lafferty et al., 2001)⁵⁵が紹介され、国際会議 ICML 2011で Test-of-time賞を受賞した。
固有表現認識(Lample et al., 2016) ⁵⁶のようなラベルの相互依存性を伴う系列ラベリング問題に対して、CRF 層は最新手法のコアとして採用されている。

2002 年には、バイリンガル評価基準 (BLEU; Papineni et al., 2002)⁵⁷ が提案されたことで機械翻訳システムのスケールアップが可能となり、BLEUは現在でも機械翻訳における標準的な評価基準として利用されている。
同年、構造化パーセプトロン(Collins, 2002)⁵⁸ が提案され、構造認識における研究の基盤となった。
同じ会議で、最も有名で広く研究されている自然言語処理タスクの1つである
感情分析が提案された(Pang et al., 2002)⁵⁹。
これら3つの論文は、いずれも国際会議NAACL 2018 の Test-of-time 賞を受賞した。

2003年には、機械学習の中でも最も広く使用されている手法の1つである
潜在ディリクレ配分(LDA; Blei et al., 2003)⁶⁰が提案され、これは現在でもトピックモデリングの標準的な手法となっている。
2004年には、SVM よりも構造化データの創刊を捉えるのに適した、新奇な最大マージンモデルが提案された(Taskar et al., 2004a; 2004b)⁶¹ ⁶²。

2006年には、多くのアノテーションと、高いアノテーション一致度を誇る
多言語コーパスであるOntoNotes (Hovy et al., 2006)⁶³が発表された。
OntoNotes は、依存解析や今日参照解決などの様々なタスクの学習と評価に
使用されている。
2008年にMilne and Witten (2008)⁶⁴ は、Wikipedia を使っていかに機械学習を強化するかについて解説した。
Wikipedia は、エンティティ間のリンクや曖昧さ回避、言語モデリング、知識ベース、その他様々なタスクのための
機械学習における最も有用な資源の1つである。

2009 年には、distant supervision (Mintz et al., 2009)⁶⁵ という考え方が提案された。
Distant supervision はヒューリスティックや既存の知識ベースの情報を活用し、ノイズを含むパターンを生成し、これは大きなコーパスから自動的に例を抽出するのに使える。
Distant supervision は広範に使われ、関係抽出、情報抽出、感情分析などのタスクにおいて標準的な手法となった。

指針を提供してくれたDjaméSeddah、Daniel Khashabi、Shyam Upadhyay、Chris Dyer、Michael Rothに感謝します（Twitterのスレッドを参照）。

参考文献

Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems. ↩
Wu, Y., Schuster, M., Chen, Z., Le, Q. V, Norouzi, M., Macherey, W., … Dean, J. (2016). Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. ArXiv Preprint ArXiv:1609.08144. ↩
Vinyals, O., Toshev, A., Bengio, S., & Erhan, D. (2015). Show and tell: A neural image caption generator. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3156-3164). ↩
Lebret, R., Grangier, D., & Auli, M. (2016). Generating Text from Structured Data with Application to the Biography Domain. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Retrieved from http://arxiv.org/abs/1603.07771 ↩
Loyola, P., Marrese-Taylor, E., & Matsuo, Y. (2017). A Neural Architecture for Generating Natural Language Descriptions from Source Code Changes. In ACL 2017. Retrieved from http://arxiv.org/abs/1704.04856 ↩
Vinyals, O., Kaiser, L., Koo, T., Petrov, S., Sutskever, I., & Hinton, G. (2015). Grammar as a Foreign Language. Advances in Neural Information Processing Systems. ↩
Gillick, D., Brunk, C., Vinyals, O., & Subramanya, A. (2016). Multilingual Language Processing From Bytes. In NAACL (pp. 1296–1306). Retrieved from http://arxiv.org/abs/1512.00103 ↩
Wu, Y., Schuster, M., Chen, Z., Le, Q. V, Norouzi, M., Macherey, W., … Dean, J. (2016). Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. ArXiv Preprint ArXiv:1609.08144. ↩
Kalchbrenner, N., Espeholt, L., Simonyan, K., Oord, A. van den, Graves, A., & Kavukcuoglu, K. (2016). Neural Machine Translation in Linear Time. ArXiv Preprint ArXiv: Retrieved from http://arxiv.org/abs/1610.10099 ↩
Gehring, J., Auli, M., Grangier, D., Yarats, D., & Dauphin, Y. N. (2017). Convolutional Sequence to Sequence Learning. ArXiv Preprint ArXiv:1705.03122. Retrieved from http://arxiv.org/abs/1705.03122 ↩
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., … Polosukhin, I. (2017). Attention Is All You Need. In Advances in Neural Information Processing Systems. ↩
Chen, M. X., Foster, G., & Parmar, N. (2018). The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation. In Proceedings of ACL 2018. ↩
Bahdanau, D., Cho, K., & Bengio, Y. (2015). Neural Machine Translation by Jointly Learning to Align and Translate. In ICLR 2015. ↩
Luong, M.-T., Pham, H., & Manning, C. D. (2015). Effective Approaches to Attention-based Neural Machine Translation. In Proceedings of EMNLP 2015. Retrieved from http://arxiv.org/abs/1508.04025 ↩
Vinyals, O., Kaiser, L., Koo, T., Petrov, S., Sutskever, I., & Hinton, G. (2015). Grammar as a Foreign Language. Advances in Neural Information Processing Systems. ↩
Hermann, K. M., Kočiský, T., Grefenstette, E., Espeholt, L., Kay, W., Suleyman, M., & Blunsom, P. (2015). Teaching Machines to Read and Comprehend. Advances in Neural Information Processing Systems. Retrieved from http://arxiv.org/abs/1506.03340v1 ↩
Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, K., & Wierstra, D. (2016). Matching Networks for One Shot Learning. In Advances in Neural Information Processing Systems 29 (NIPS 2016). Retrieved from http://arxiv.org/abs/1606.04080 ↩
Xu, K., Courville, A., Zemel, R. S., & Bengio, Y. (2015). Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. In Proceedings of ICML 2015. ↩
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., … Polosukhin, I. (2017). Attention Is All You Need. In Advances in Neural Information Processing Systems. ↩
Graves, A., Wayne, G., & Danihelka, I. (2014). Neural turing machines. arXiv preprint arXiv:1410.5401. ↩
Weston, J., Chopra, S., & Bordes, A. (2015). Memory Networks. In Proceedings of ICLR 2015. ↩
Sukhbaatar, S., Szlam, A., Weston, J., & Fergus, R. (2015). End-To-End Memory Networks. In Proceedings of NIPS 2015. Retrieved from http://arxiv.org/abs/1503.08895 ↩
Kumar, A., Irsoy, O., Ondruska, P., Iyyer, M., Bradbury, J., Gulrajani, I., … & Socher, R. (2016, June). Ask me anything: Dynamic memory networks for natural language processing. In International Conference on Machine Learning (pp. 1378-1387). ↩
Graves, A., Wayne, G., Reynolds, M., Harley, T., Danihelka, I., Grabska-Barwińska, A., … Hassabis, D. (2016). Hybrid computing using a neural network with dynamic external memory. Nature. ↩
Henaff, M., Weston, J., Szlam, A., Bordes, A., & LeCun, Y. (2017). Tracking the World State with Recurrent Entity Networks. In Proceedings of ICLR 2017. ↩
Conneau, A., Kiela, D., Schwenk, H., Barrault, L., & Bordes, A. (2017). Supervised Learning of Universal Sentence Representations from Natural Language Inference Data. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. ↩
McCann, B., Bradbury, J., Xiong, C., & Socher, R. (2017). Learned in Translation: Contextualized Word Vectors. In Advances in Neural Information Processing Systems. ↩
Subramanian, S., Trischler, A., Bengio, Y., & Pal, C. J. (2018). Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning. In Proceedings of ICLR 2018. ↩
Dai, A. M., & Le, Q. V. (2015). Semi-supervised Sequence Learning. Advances in Neural Information Processing Systems (NIPS ’15). Retrieved from http://arxiv.org/abs/1511.01432 ↩
Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. In Proceedings of NAACL-HLT 2018. ↩
Ramachandran, P., Liu, P. J., & Le, Q. V. (2017). Unsupervised Pretraining for Sequence to Sequence Learning. In Proceedings of EMNLP 2017. ↩
Howard, J., & Ruder, S. (2018). Universal Language Model Fine-tuning for Text Classification. In Proceedings of ACL 2018. Retrieved from http://arxiv.org/abs/1801.06146 ↩
Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., & Dyer, C. (2016). Neural Architectures for Named Entity Recognition. In NAACL-HLT 2016. ↩
Plank, B., Søgaard, A., & Goldberg, Y. (2016). Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary Loss. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. ↩
Ling, W., Trancoso, I., Dyer, C., & Black, A. (2016). Character-based Neural Machine Translation. In ICLR. Retrieved from http://arxiv.org/abs/1511.04586 ↩
Lee, J., Cho, K., & Bengio, Y. (2017). Fully Character-Level Neural Machine Translation without Explicit Segmentation. In Transactions of the Association for Computational Linguistics. ↩
Jia, R., & Liang, P. (2017). Adversarial Examples for Evaluating Reading Comprehension Systems. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. ↩
Miyato, T., Dai, A. M., & Goodfellow, I. (2017). Adversarial Training Methods for Semi-supervised Text Classification. In Proceedings of ICLR 2017. ↩
Yasunaga, M., Kasai, J., & Radev, D. (2018). Robust Multilingual Part-of-Speech Tagging via Adversarial Training. In Proceedings of NAACL 2018. Retrieved from http://arxiv.org/abs/1711.04903 ↩
Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., … Lempitsky, V. (2016). Domain-Adversarial Training of Neural Networks. Journal of Machine Learning Research, 17. ↩
Kim, Y., Stratos, K., & Kim, D. (2017). Adversarial Adaptation of Synthetic or Stale Data. In Proceedings of ACL (pp. 1297–1307). ↩
Semeniuta, S., Severyn, A., & Gelly, S. (2018). On Accurate Evaluation of GANs for Language Generation. Retrieved from http://arxiv.org/abs/1806.04936 ↩
Conneau, A., Lample, G., Ranzato, M., Denoyer, L., & Jégou, H. (2018). Word Translation Without Parallel Data. In Proceedings of ICLR 2018. Retrieved from http://arxiv.org/abs/1710.04087 ↩
Fang, M., Li, Y., & Cohn, T. (2017). Learning how to Active Learn: A Deep Reinforcement Learning Approach. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Retrieved from https://arxiv.org/pdf/1708.02383v1.pdf ↩
Wu, J., Li, L., & Wang, W. Y. (2018). Reinforced Co-Training. In Proceedings of NAACL-HLT 2018. ↩
Liu, B., Tür, G., Hakkani-Tür, D., Shah, P., & Heck, L. (2018). Dialogue Learning with Human Teaching and Feedback in End-to-End Trainable Task-Oriented Dialogue Systems. In Proceedings of NAACL-HLT 2018. ↩
Paulus, R., Xiong, C., & Socher, R. (2018). A deep reinforced model for abstractive summarization. In Proceedings of ICLR 2018. ↩
Celikyilmaz, A., Bosselut, A., He, X., & Choi, Y. (2018). Deep communicating agents for abstractive summarization. In Proceedings of NAACL-HLT 2018. ↩
Ranzato, M. A., Chopra, S., Auli, M., & Zaremba, W. (2016). Sequence level training with recurrent neural networks. In Proceedings of ICLR 2016. ↩
Wang, X., Chen, W., Wang, Y.-F., & Wang, W. Y. (2018). No Metrics Are Perfect: Adversarial Reward Learning for Visual Storytelling. In Proceedings of ACL 2018. Retrieved from http://arxiv.org/abs/1804.09160 ↩
Baker, C. F., Fillmore, C. J., & Lowe, J. B. (1998, August). The berkeley framenet project. In Proceedings of the 17th international conference on Computational linguistics-Volume 1 (pp. 86-90). Association for Computational Linguistics. ↩
Tjong Kim Sang, E. F., & Buchholz, S. (2000, September). Introduction to the CoNLL-2000 shared task: Chunking. In Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning-Volume 7 (pp. 127-132). Association for Computational Linguistics. ↩
Tjong Kim Sang, E. F., & De Meulder, F. (2003, May). Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. In Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003-Volume 4 (pp. 142-147). Association for Computational Linguistics. ↩
Buchholz, S., & Marsi, E. (2006, June). CoNLL-X shared task on multilingual dependency parsing. In Proceedings of the tenth conference on computational natural language learning (pp. 149-164). Association for Computational Linguistics. ↩
Lafferty, J., McCallum, A., & Pereira, F. C. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. ↩
Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., & Dyer, C. (2016). Neural Architectures for Named Entity Recognition. In NAACL-HLT 2016. ↩
Papineni, K., Roukos, S., Ward, T., & Zhu, W. J. (2002, July). BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics (pp. 311-318). Association for Computational Linguistics. ↩
Collins, M. (2002, July). Discriminative training methods for hidden markov models: Theory and experiments with perceptron algorithms. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10 (pp. 1-8). Association for Computational Linguistics. ↩
Pang, B., Lee, L., & Vaithyanathan, S. (2002, July). Thumbs up?: sentiment classification using machine learning techniques. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10 (pp. 79-86). Association for Computational Linguistics. ↩
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning research, 3(Jan), 993-1022. ↩
Taskar, B., Klein, D., Collins, M., Koller, D., & Manning, C. (2004). Max-margin parsing. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. ↩
Taskar, B., Klein, D., Collins, M., Koller, D., & Manning, C. (2004). Max-margin parsing. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. ↩
Hovy, E., Marcus, M., Palmer, M., Ramshaw, L., & Weischedel, R. (2006, June). OntoNotes: the 90% solution. In Proceedings of the human language technology conference of the NAACL, Companion Volume: Short Papers (pp. 57-60). Association for Computational Linguistics. ↩
Milne, D., & Witten, I. H. (2008, October). Learning to link with wikipedia. In Proceedings of the 17th ACM conference on Information and knowledge management (pp. 509-518). ACM. ↩
Mintz, M., Bills, S., Snow, R., & Jurafsky, D. (2009, August). Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2-Volume 2 (pp. 1003-1011). Association for Computational Linguistics. ↩

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up