More than 5 years have passed since last update.

[Review] Bidirectional RNN/LSTM

bidirectionalRNN

Last updated at 2018-06-04Posted at 2018-05-28

Introduction

This paper explains the basic concept of Bidirectional RNN by making the benefits of this new architecture clear and its implementation as well.
The reason why I wrote this article was that I had to understand bidirectional LSTM which is used in state-of-the-art language models and sometimes by combining with other models, the model can obtain significant representational capability of languages.

Bidirectional RNN

The idea has initially appeared in the great paper published by Mike Schuter et al in 1999.
Link: https://pdfs.semanticscholar.org/4b80/89bc9b49f84de43acc2eb8900035f7d492b2.pdf

However, I couldn't find any mathematical reference about the architecture.
So I have decided to proceed my research further.

Architecture

I have create this ppt! lol

Legacy RNN/LSTM

Bidirectional RNN/LSTM

Maths

Regarding to basic RNN, please refer to this my article.
https://qiita.com/Rowing0914/items/6803fbc0af9163788a0c

Based on this, the Bidirectional RNN only differentiates its input.
Input should look like this.

h_1^{(t)} = \sigma(W_{in}X^{(t)} + W_{hh_1}h_1^{(t-1)})\\
h_2^{(t)} = \sigma(W_{in}X^{(t)} + W_{hh_2}h_2^{(t+1)})\\
o^{(t)} = softmax(W_{out,h_1}h_1^{(t)} + W_{out,h_2}h_2^{(t)})

I did Proof of concept mathematically.

imdb_bidirectional_lstm.py

from keras.layers import LSTM, Bidirectional, Dense, Dropout, Embedding
from keras.datasets import imdb
from keras.models import Sequential
from keras.preprocessing import sequence
import numpy as np

max_features = 20000
maxlen = 100
batch_size = 32

print('Loading data...')
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
print(len(x_train), 'train sequences')
print(len(x_test), 'test sequences')

print('Pad sequences(samples x time)')
x_train = sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = sequence.pad_sequences(x_test, maxlen=maxlen)
print('x_train shape: ', x_train.shape)
print('x_test shape: ', x_test.shape)
y_train = np.array(y_train)
y_test = np.array(y_test)


model = Sequential()
model.add(Embedding(max_features, 128, input_length=maxlen))
model.add(Bidirectional(LSTM(64)))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))

model.compile('adam', 'binary_crossentropy', metrics=['accuracy'])

print('Train...')
model.fit(x_train, y_train, batch_size=batch_size, epochs=4, validation_data=[x_test, y_test])

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up