はじめに
pythonのライブラリspaCy を使って固有表現抽出する.
今回はspaCyの最も精度が良いRoBERTa ベースのモデルを利用する.
実装
Install
# model
!pip install https://huggingface.co/spacy/en_core_web_trf/resolve/main/en_core_web_trf-any-py3-none-any.whl
# spacy-transformers
!pip install spacy-transformers -f https://download.pytorch.org/whl/torch_stable.html
load
# Using spacy.load().
import spacy
import spacy_transformers
# model
nlp = spacy.load("en_core_web_trf")
use
この部分はspyCa サイトから
# Process whole documents
text = ("When Sebastian Thrun started working on self-driving cars at "
"Google in 2007, few people outside of the company took him "
"seriously. “I can tell you very senior CEOs of major American "
"car companies would shake my hand and turn away because I wasn’t "
"worth talking to,” said Thrun, in an interview with Recode earlier "
"this week.")
doc = nlp(text)
# Analyze syntax
print("Noun phrases:", [chunk.text for chunk in doc.noun_chunks])
print("Verbs:", [token.lemma_ for token in doc if token.pos_ == "VERB"])
# Find named entities, phrases and concepts
for entity in doc.ents:
print(entity.text, entity.label_)
感想
- 入力文の長さによって結果が異なります.
- 時々長い文を入力しても,何も返さない場合がある.これは困っています.