Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

AllenNLPでCoreference Resolution(共参照解析)

Last updated at Posted at 2019-11-08

Coreference Resolution

Coreference Resolution(共参照解析・照応解析?)とは、ある文章中で、同一のモノを指す2つ以上の語句を見つけ対応づける処理です。

本記事では、AllenNLPの学習済みCoreference Resolutionモデルを用いた共参照解析手順を備忘録がてらに書いていきたいと思います。

[Coreference Resolutionのイメージ]
"The legal pressures facing Michael Cohen are growing in a wide-ranging investigation of his personal business affairs and his work on behalf of his former client, President Trump. In addition to his work for Mr. Trump, he pursued his own business interests, including ventures in real estate, personal loans and investments in taxi medallions."



pip install allennlp

##学習済みCoreference Resolutionモデルのダウンロード

wget https://s3-us-west-2.amazonaws.com/allennlp/models/coref-model-2018.02.05.tar.gz


tar -zxvf coref-model-2018.02.05.tar.gz


import pprint
from allennlp.predictors.predictor import Predictor

def my_coref(text, model = False):
    d = model.predict(document = text)
    clusters = d['clusters']
    words = d['document']

    coref_phrase_list = []
    for cluster in clusters:
        l = [(' '.join(words[c[0]:c[1]+1]), (c[0], c[1])) for c in cluster]
    return coref_phrase_list

if __name__ == '__main__':
    predictor = Predictor.from_path("coref-model-2018.02.05")
    doc1 = """Paul Allen was born on January 21, 1953, in Seattle, Washington, to Kenneth Sam Allen and Edna Faye Allen. Allen attended Lakeside School, a private school in Seattle, where he befriended Bill Gates."""

    d = predictor.predict(document=doc1)


    pprint.pprint(my_coref(doc1, predictor))


# pprint.pprint(d)
# d は辞書型オフジェクト
# d['clusters'][i]: ある同一のものを指し示している表現(フレーズor単語)の単語オフセットリスト [start, end]
# d['document']: 入力文章の単語リスト
# d['predicted_antecedents']: ??
# d['top_spans']: ??

{'clusters': [[[0, 1], [24, 24], [36, 36]], [[11, 11], [33, 33]]],
 'document': ['Paul',
 'predicted_antecedents': [-1,
 'top_spans': [[0, 1],
               [6, 6],
               [8, 8],
               [11, 11],
               [11, 13],
               [13, 13],
               [16, 18],
               [16, 22],
               [17, 18],
               [20, 22],
               [24, 24],
               [25, 25],
               [33, 33],
               [33, 39],
               [36, 36],
               [38, 39]]}

# pprint.pprint(my_coref(doc1, predictor))
[[('Paul Allen', (0, 1)), ('Allen', (24, 24)), ('he', (36, 36))],
 [('Seattle', (11, 11)), ('Seattle', (33, 33))]]

今回はAllenNLPを使って簡単にCoreference Resolutionをやってみました。



Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?