More than 1 year has passed since last update.

すぐできる固有表現抽出

Posted at 2022-12-06

はじめに

pythonのライブラリspaCy を使って固有表現抽出する．
今回はspaCyの最も精度が良いRoBERTa ベースのモデルを利用する．

実装

Install

# model 
!pip install https://huggingface.co/spacy/en_core_web_trf/resolve/main/en_core_web_trf-any-py3-none-any.whl

# spacy-transformers
!pip install spacy-transformers -f https://download.pytorch.org/whl/torch_stable.html

load

# Using spacy.load().
import spacy
import spacy_transformers

# model
nlp = spacy.load("en_core_web_trf")

use

この部分はspyCa サイトから

# Process whole documents
text = ("When Sebastian Thrun started working on self-driving cars at "
        "Google in 2007, few people outside of the company took him "
        "seriously. “I can tell you very senior CEOs of major American "
        "car companies would shake my hand and turn away because I wasn’t "
        "worth talking to,” said Thrun, in an interview with Recode earlier "
        "this week.")
doc = nlp(text)

# Analyze syntax
print("Noun phrases:", [chunk.text for chunk in doc.noun_chunks])
print("Verbs:", [token.lemma_ for token in doc if token.pos_ == "VERB"])

# Find named entities, phrases and concepts
for entity in doc.ents:
    print(entity.text, entity.label_)

感想

入力文の長さによって結果が異なります．
時々長い文を入力しても，何も返さない場合がある．これは困っています．

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up