0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

[Review] Advances in Semantic Textual Similarity

Last updated at Posted at 2018-05-18

This post follows the great research blog of Google
https://ai.googleblog.com/

Introduction

Recent dramatic growth in natural language understanding research made great achievements. In this post, they have explained their approaches in two research papers recently published.

  1. Learning Semantic Textual Similarity from Conversations
  2. Universal Sentence Encoder
    And they made the pre-trained model accessible on TensorflowHub

Learning Semantic Textual Similarity from Conversations

In this paper, they introduced novel approach to understand the semantic textual similarity based on the natural conversation. The concept is that the sentences are likely to be semantically similar if they have a similar distribution of responses. For instance, "How are you?" and "How old are you?" are meaning totally different and indeed the response should be different. But ""How old are you?" and "What is your age?" are meant same thing, asking age.

Screen Shot 2018-05-18 at 8.28.50.png

In this work, they have aimed to have learn the similarity between sentence through the response classification task; given a conversational input, we wish to classify the correct response from a batch of randomly selected responses.
image3.gif
However, the ultimate goal is to let a model be able to return the embeddings representing a variety of natural language relationships, including similarity and relatedness.

Universal Sentence Encoder

They have made the three models related with the sentence encoding architecture for NLP tasks on TF Lite.
Most of them are inspired by some existing reputed techniques in NN domain though, they have dispensed the decoder system and built the model with only encoders.

  1. Using DAN(Deep Average Network) in encoding
  2. Using Transformer in encoding
  3. Multi DAN models for encoding

Through the validation, they have found that compute time for the model using Transformer increases noticeably as sentence length increases, whereas the compute time for the DAN model stays nearly constant as sentence length is increased.

Screen Shot 2018-05-22 at 8.52.52.png

DAN

Simply averaging the input word vectors and propagate them to feed forward network.
Screen Shot 2018-05-22 at 8.57.31.png
reference: https://www.cs.umd.edu/~miyyer/pubs/2015_acl_dan.pdf

Transformer

It is the neural network architecture based on a self-attention mechanism.

Screen Shot 2018-05-22 at 9.00.40.png Screen Shot 2018-05-22 at 9.01.37.png reference: https://arxiv.org/pdf/1706.03762.pdf

Self dot product attention
$Attention(Q, K, V) = softmax(\frac{QK^T}{\sqrt{d_k}})$
Screen Shot 2018-05-22 at 9.03.48.png

References

Skip-Thought Vector
https://papers.nips.cc/paper/5950-skip-thought-vectors.pdf

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?