More than 5 years have passed since last update.

テキストマイニングにおけるattention風の可視化は、もっと評価されるべき

Last updated at 2019-09-11Posted at 2019-08-26

attention風の可視化は、もっと評価されるべきと考えたので共有します。
オリジナルの実装は、以下のリンク先にあります。

【 self attention 】簡単に予測理由を可視化できる文書分類モデルを実装する
Qiita記事：
https://qiita.com/itok_msi/items/ad95425b6773985ef959
Github：
https://github.com/nn116003/self-attention-classification/blob/master/view_attn.py

また、この実装は、以下の本より発見しました。

つくりながら学ぶ！PyTorchによる発展ディープラーニング
https://www.amazon.co.jp/dp/B07VPDVNKW/ref=dp-kindle-redirect?_encoding=UTF8&btkr=1

attention風の可視化

オリジナルを参考に、私のほうで多少、関数に修正を加えています。


def highlight(word, attn):
    "Attentionの値が大きいと文字の背景が濃い赤になるhtmlを出力させる関数"

    html_color = '#%02X%02X%02X' % (
        255, int(255*(1 - attn)), int(255*(1 - attn)))
    return '<span style="background-color: {}"> {}</span>'.format(html_color, word)

def mk_html(words, attns):
    html = ""
    for word, attn in zip(words, attns):
        html += ' ' + highlight(word,attn)
    return html + "<br><br>"

mk_htmlの関数に、単語列とスコア列を入力すると、attention風の可視化結果が出力されます。
今回は、サンプルデータをルールベースで作成します。


from janome.tokenizer import Tokenizer

t = Tokenizer()

# 対象文章
sentence = "すもももももももものうち"

# ルールベースでのスコア計算
import random 

words = []
attns = []

for token in t.tokenize(sentence):
    words.append(token.surface)
    if token.part_of_speech.startswith('名詞'):
        attns.append(0.6 + 0.2 * random.random())  # 0～1。1に近いほど赤くなる
    else:
        attns.append(0.2 + 0.2 * random.random())  # 0～1。1に近いほど赤くなる
        
print(words)
print(attns)

出力：
['すもも', 'も', 'もも', 'も', 'もも', 'の', 'うち']
[0.7397653917795854, 0.38089028798835384, 0.7423692670794249, 0.27461082771113776, 0.6142608833129594, 0.21088886758561579, 0.7646013231487092]

この結果を用いて、attention風に、テキスト解析結果を可視化します。

from IPython.display import HTML
display(HTML(mk_html(words, attns)))

出力：

こんな感じの結果が出力されます。

テキストの試解析結果の可視化では、WordCloudなどを用いていましたが、
attention風の可視化も、注目すべき単語が分かりやすく、個人的には一押しです。

通常、transformerや、BERTなどで用いられる可視化表現ですが、単語列とスコア列さえ作成できれば適用できるので、かなり応用範囲は広いと感じました。

皆様も是非試してみてください。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up