GPT3.5-TurboでNERを試す

Posted at 2024-06-03

GPT3.5-TurboでNER(固有表現抽出)を試してみました。LangChainを使って以下のコードを元にしています。

結果

漫画の名称を抽出しています。「Prompt」列の太字が正解。
以外とそこそこいけます。長文では試していません。

Prompt	結果	備考
One Pieceの主人公を教えて	正解
貸した魔力は【リボ払い】で強制徴収の主人公を教えて	(リボ払い)	長くて新し目はだめ?
ドラゴン・ボールの主人公を教えて	正解	「・」を入れてもOK
転生したらスライムだった件の主人公を教えて	正解	有名なら長くてもOK
我妻さんは俺のヨメの主人公を教えて	正解	そんな有名じゃなくてもOKだった
竜師をクビになったけど竜がついてきました。の主人公を教えて	正解	2024/5開始の新作。「。」があるから正解？
カイジはどんなお話？	正解	中途半端なタイトル

環境

Python3.12.2で動かしています。裏ではPoetry Dotenv Pluginが動いていて、自動的に.envファイルを読んでOpenAIのキーを環境変数に入れています。
以下のパッケージ一覧。

Package	Version
langchain	0.2.1
langchain-openai	0.1.8

プログラム

juypterで動かしています。

from typing import Optional

from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_openai import ChatOpenAI

class Comic(BaseModel):
    """Information about a person."""

    # ^ Doc-string for the entity Person.
    # This doc-string is sent to the LLM as the description of the schema Person,
    # and it can help to improve extraction results.

    # Note that:
    # 1. Each field is an `optional` -- this allows the model to decline to extract it!
    # 2. Each field has a `description` -- this description is used by the LLM.
    # Having a good description can help improve extraction results.
    comic: Optional[str] = Field(
        default=None, description="The name of comic"
    )


from langchain_core.prompts import ChatPromptTemplate

# Define a custom prompt to provide instructions and any additional context.
# 1) You can add examples into the prompt template to improve extraction quality
# 2) Introduce additional parameters to take context into account (e.g., include metadata
#    about the document from which the text was extracted.)
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are an expert extraction algorithm. "
            "Only extract relevant information from the text. "
            "If you do not know the value of an attribute asked to extract, "
            "return null for the attribute's value.",
        ),
        # Please see the how-to about improving performance with
        # reference examples.
        # MessagesPlaceholder('examples'),
        ("human", "{text}"),
    ]
)

llm = ChatOpenAI()
runnable = prompt | llm.with_structured_output(schema=Comic)

text = "カイジはどんなお話？"
runnable.invoke({"text": text})

最後のtextの値を変えて試しています。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up