LoginSignup
0
0

More than 5 years have passed since last update.

[Review] Summary of Context Embedding

Last updated at Posted at 2018-05-10

Foreword

This article is for reviewing the research papers below.

And what motives me so much is the google's post on its blog about novel AI module called Duplex. Duplex is able to converse to any strangers to fulfill the specific task like reserving a restaurant or booking a flight ticket. Starting from Turing machine test, it has been long time since many researchers got driven to the immense domain of human-like dialogue system by AI.

Agenda

  1. Summary of A Neural Network Approach to Context-Sensitive Generation of Conversational Responses
  2. Summary of Sequential Dialogue Context Modeling for Spoken Language Understanding

A Neural Network Approach to Context-Sensitive Generation of Conversational Responses

Authors:
Alessandro Sordoni, Michel Galley, Michael Auli, Chris Brockett, Yangfeng Ji, Margaret Mitchell, Jian-Yun Nie, Jianfeng Gao, Bill Dolan

1. Introduction

The work of Ritter et al. (2011), for example, demonstrates that a response generation system can be constructed from Twitter conversations using statistical machine translation techniques, where a status post by a Twitter user is “translated” into a plausible looking response. But it doesn't address the challenge of generating context-sensitive response of the conversation.
The ability to take into account previous utterances is key to building dialog systems that can keep conversations active and engaging
We propose to address the challenge of context-sensitive response generation by using continuous representations or embeddings of words and phrases to compactly encode semantic and syntactic similarity
We present a neural network architecture for response generation that is both context-sensitive and data-driven. As such, it can be trained from end to end on massive amounts of social media data.

2. Recurrent Language Model

In this section, they gave a brief overview of the core model for their approach, which is Recurrent language model (RLM) (Mikolov et al., 2010).
A RLM is a generative model of sentences, i.e. given sentences $S = s_1, s_2 ... s_T$, it predicts:
$p(s) = \prod^T_{t=1} p(s_t | s_1, s_2, ... , s_{t-1}) $
And the model architecture is parametrised by three weight matrices.
$\Theta_{RNN} =$ <$W_{in}, W_{out}, W_{hh}$>

And as for forward pass,

h = \sigma(s_t^T W_{in} + h_{t-1}^T W_{hh})\\
o_t = h_t^T W_{out}\\
p(s_t | s_1, s_2 , ... , s_{t-1}) = \frac{exp(o_{tw})}{\sum^V_{v=1}exp(o_{tv})}\\
L(s) = - \sum^T_{t=1} \log p(s_t | s_1, s_2 , ... , s_{t-1})

Then recurrence is unrolled backwards in time using BPTT(backpropagation through time).

PPT summary of 2010 Mikolov et al RLM: https://pdfs.semanticscholar.org/presentation/bba8/a2c9b9121e7c78e91ea2a68630e77c0ad20f.pdf
research paper: http://www.fit.vutbr.cz/research/groups/speech/publi/2010/mikolov_interspeech2010_IS100722.pdf

3. Context-Sensitive Models

To clarify the role of each sentence, they have distinguished the three entities from the conversation respectively.
1. r : response(Target for prediction)
2. c : context(feature for prediction: previous sentences)
3. m : message(feature for prediction: concurrent sentence)
Screen Shot 2018-05-21 at 15.55.12.png
And based on this processing, they have challenged three models.

  1. Tripled Language Model
  2. Dynamic-Context Generative Model I
  3. Dynamic-Context Generative Model II

So in this section, let's go through one by one.

1. Tripled Language Model

So far we have seen the foundation to encode the context to the input and the architecture of RNN itself.
In this section, I would like to elaborate the method of the context embedding.

p(r|c,m) = \sum^T_{t=1} p(r_t | r_1, r_2 , ... , r_{t-1}, c, m)

c : context(any past exchanged dialogues)
m : message(concurrent utterance)
r : response(response message)

So simply saying, we consider all history of the conversation to predict the next utterance.

2. Dynamic Context Sensitive Model I

Screen Shot 2018-05-31 at 13.01.00.png

In this architecture, they have tried to encode the context into input using simple three layers FNN, then the outcome of it will be propagated and merged to the hidden RNN cell.

Algorithm
1. Using BoW(bag of words, $b_{cm} \in R^V$), we can assign indices to words then convert them into one-hot word vector
2. $b_{cm}$ will be input for FNN
3. outcomes of FNN will merged to the hidden layer of RNN

Params
<$W_{in}, W_{hh}, W_{out}, [W^l_f]^L_{l=1}$>
where $[W^l_f]^L_{l=1}$ are the weights used in FNN.

In FNN

first \space layer \space \space k_1 = b^T_{cm} W^1_f\\
k_1 = \sigma (b^T_{l-1} W^1_f) \space for \space l = 2, ... , L

In RNN

h_t = \sigma(h^T_{t-1}W_{hh} + k_L + m^T_t W_{in})\\
o_t = h^T_t W_{out}\\
p(r_t | r_1, r_2 , ... , r_{t-1}, c, m) = softmax(o_t)

With this architecture, we can achieve two things below.
1. The contexts can be represented in a word embeddings manner.
2. RLM can properly and naturally take the contexts into account.

2. Dynamic Context Sensitive Model II

The issue arises in Dynamic Context Sensitive Model I is that it doesn't distinguish between contexts and current messages. So it just mixes both by BoW technique.

To approach this issue, they have modified the input layer of FNN, and made it available to take two separate inputs.

k_1 = [b^T_c W^1_f, b^T_m W^1_f] \\
k_l = \sigma(k^T_{l-1} W^l_f) \quad for \space l = 2, ... , L

With this modification, the model gets able to handle the c, m individually.

Implementation

Please refer to this github repo!
https://github.com/Rowing0914/Dynamic_Context_Generative_Model_keras

2. Sequential Dialogue Context Modeling for Spoken Language

Understanding
Authors: Ankur Bapna, Gokhan Tur¨, Dilek Hakkani-Tur¨, Larry Heck

1. Introduction

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0