Foreword
This article is for reviewing the research papers below.
- A Neural Network Approach to Context-Sensitive Generation of Conversational Responses
- Sequential Dialogue Context Modeling for Spoken Language
Understanding
And what motives me so much is the google's post on its blog about novel AI module called Duplex. Duplex is able to converse to any strangers to fulfill the specific task like reserving a restaurant or booking a flight ticket. Starting from Turing machine test, it has been long time since many researchers got driven to the immense domain of human-like dialogue system by AI.
Agenda
- Summary of A Neural Network Approach to Context-Sensitive Generation of Conversational Responses
- Summary of Sequential Dialogue Context Modeling for Spoken Language
Understanding
A Neural Network Approach to Context-Sensitive Generation of Conversational Responses
Authors:
Alessandro Sordoni, Michel Galley, Michael Auli, Chris Brockett, Yangfeng Ji, Margaret Mitchell, Jian-Yun Nie, Jianfeng Gao, Bill Dolan
1. Introduction
The work of Ritter et al. (2011), for example, demonstrates that a response generation system can be constructed from Twitter conversations using statistical machine translation techniques, where a status post by a Twitter user is “translated” into a plausible looking response. But it doesn't address the challenge of generating context-sensitive response of the conversation.
The ability to take into account previous utterances is key to building dialog systems that can keep conversations active and engaging
We propose to address the challenge of context-sensitive response generation by using continuous representations or embeddings of words and phrases to compactly encode semantic and syntactic similarity
We present a neural network architecture for response generation that is both context-sensitive and data-driven. As such, it can be trained from end to end on massive amounts of social media data.
2. Recurrent Language Model
In this section, they gave a brief overview of the core model for their approach, which is Recurrent language model (RLM) (Mikolov et al., 2010).
A RLM is a generative model of sentences, i.e. given sentences $S = s_1, s_2 ... s_T$, it predicts:
$p(s) = \prod^T_{t=1} p(s_t | s_1, s_2, ... , s_{t-1}) $
And the model architecture is parametrised by three weight matrices.
$\Theta_{RNN} =$ <$W_{in}, W_{out}, W_{hh}$>
And as for forward pass,
h = \sigma(s_t^T W_{in} + h_{t-1}^T W_{hh})\\
o_t = h_t^T W_{out}\\
p(s_t | s_1, s_2 , ... , s_{t-1}) = \frac{exp(o_{tw})}{\sum^V_{v=1}exp(o_{tv})}\\
L(s) = - \sum^T_{t=1} \log p(s_t | s_1, s_2 , ... , s_{t-1})
Then recurrence is unrolled backwards in time using BPTT(backpropagation through time).
PPT summary of 2010 Mikolov et al RLM: https://pdfs.semanticscholar.org/presentation/bba8/a2c9b9121e7c78e91ea2a68630e77c0ad20f.pdf
research paper: http://www.fit.vutbr.cz/research/groups/speech/publi/2010/mikolov_interspeech2010_IS100722.pdf
3. Context-Sensitive Models
To clarify the role of each sentence, they have distinguished the three entities from the conversation respectively.
- r : response(Target for prediction)
- c : context(feature for prediction: previous sentences)
- m : message(feature for prediction: concurrent sentence)

- Tripled Language Model
- Dynamic-Context Generative Model I
- Dynamic-Context Generative Model II
So in this section, let's go through one by one.
1. Tripled Language Model
So far we have seen the foundation to encode the context to the input and the architecture of RNN itself.
In this section, I would like to elaborate the method of the context embedding.
p(r|c,m) = \sum^T_{t=1} p(r_t | r_1, r_2 , ... , r_{t-1}, c, m)
c : context(any past exchanged dialogues)
m : message(concurrent utterance)
r : response(response message)
So simply saying, we consider all history of the conversation to predict the next utterance.
2. Dynamic Context Sensitive Model I

In this architecture, they have tried to encode the context into input using simple three layers FNN, then the outcome of it will be propagated and merged to the hidden RNN cell.
Algorithm
- Using BoW(bag of words, $b_{cm} \in R^V$), we can assign indices to words then convert them into one-hot word vector
- $b_{cm}$ will be input for FNN
- outcomes of FNN will merged to the hidden layer of RNN
Params
<$W_{in}, W_{hh}, W_{out}, [W^l_f]^L_{l=1}$>
where $[W^l_f]^L_{l=1}$ are the weights used in FNN.
In FNN
first \space layer \space \space k_1 = b^T_{cm} W^1_f\\
k_1 = \sigma (b^T_{l-1} W^1_f) \space for \space l = 2, ... , L
In RNN
h_t = \sigma(h^T_{t-1}W_{hh} + k_L + m^T_t W_{in})\\
o_t = h^T_t W_{out}\\
p(r_t | r_1, r_2 , ... , r_{t-1}, c, m) = softmax(o_t)
With this architecture, we can achieve two things below.
- The contexts can be represented in a word embeddings manner.
- RLM can properly and naturally take the contexts into account.
2. Dynamic Context Sensitive Model II
The issue arises in Dynamic Context Sensitive Model I is that it doesn't distinguish between contexts and current messages. So it just mixes both by BoW technique.
To approach this issue, they have modified the input layer of FNN, and made it available to take two separate inputs.
k_1 = [b^T_c W^1_f, b^T_m W^1_f] \\
k_l = \sigma(k^T_{l-1} W^l_f) \quad for \space l = 2, ... , L
With this modification, the model gets able to handle the c, m individually.
Implementation
Please refer to this github repo!
https://github.com/Rowing0914/Dynamic_Context_Generative_Model_keras
2. Sequential Dialogue Context Modeling for Spoken Language
Understanding
Authors: Ankur Bapna, Gokhan Tur¨, Dilek Hakkani-Tur¨, Larry Heck