Transformerモデルを使った自然言語処理

Posted at 2024-09-22

Transformerモデルは、自然言語処理(NLP)のさまざまなタスクで優れたパフォーマンスを発揮しています。この記事では、Hugging Faceのtransformersライブラリを使用して、Text Classification（テキスト分類）、NER（固有表現抽出）、Question Answering（質問応答）、Summarization（要約）、Translation（翻訳）、**Generation（生成）**の各タスクを実際に実行する方法を紹介します。

1. インストールとセットアップ

まずは、Hugging Faceのtransformersライブラリをインストールします。

pip install transformers

さらに、データの表示に便利なpandasライブラリも使用します。

pip install pandas

次に、Pythonで必要なライブラリをインポートします。

from transformers import pipeline
import pandas as pd

2. Text Classification（テキスト分類）

テキスト分類は、テキストがどのカテゴリに属するかを予測します。以下のコードでは、テキストに対して感情分析を行っています。

classifier = pipeline("text-classification")

text = """Dear Amazon, last week I ordered an Optimus Prime action figure 
from your online store in Germany. Unfortunately, when I opened the package, 
I discovered to my horror that I had been sent an action figure of Megatron 
instead! As a lifelong enemy of the Decepticons, I hope you can understand my 
dilemma. To resolve the issue, I demand an exchange of Megatron for the 
Optimus Prime figure I ordered. Enclosed are copies of my records concerning 
this purchase. I expect to hear from you soon. Sincerely, Bumblebee."""

outputs = classifier(text)
print(pd.DataFrame(outputs))

結果：
このコードは、テキストがポジティブかネガティブかの感情を分類します。Hugging Faceの事前学習モデルが使用され、予測結果が出力されます。

3. NER（固有表現抽出）

NERは、テキストから人物名や組織名、地名などの固有名詞を抽出するタスクです。

ner_tagger = pipeline("ner", aggregation_strategy="simple")
outputs = ner_tagger(text)
print(pd.DataFrame(outputs))

結果：
固有名詞（例: Amazon, Bumblebeeなど）が抽出され、どの部分が固有表現かを確認できます。

4. Question Answering（質問応答）

質問応答は、与えられたテキストの中から、質問に対する答えを見つけ出すタスクです。

reader = pipeline("question-answering")
question = "What is the problem?"
outputs = reader(question=question, context=text)
print(pd.DataFrame([outputs]))

結果：
「What is the problem?」という質問に対して、テキスト内から適切な回答を抽出します。

5. Summarization（要約）

要約タスクは、長いテキストを簡潔にまとめるタスクです。

summarizer = pipeline("summarization")
outputs = summarizer(text, max_length=56, clean_up_tokenization_spaces=True)
print(outputs[0]['summary_text'])

結果：
入力されたテキストが短く要約され、簡潔な形で出力されます。

6. Translation（翻訳）

翻訳は、テキストを他の言語に翻訳するタスクです。この例では英語からドイツ語への翻訳を行います。

translator = pipeline("translation_en_to_de", model="Helsinki-NLP/opus-mt-en-de")
outputs = translator(text, clean_up_tokenization_spaces=True, min_length=100)
print(outputs[0]['translation_text'])

結果：
テキストがドイツ語に翻訳され、翻訳結果が表示されます。

7. Generation（生成）

テキスト生成は、与えられた文に基づいて続きのテキストを生成します。

from transformers import set_seed
set_seed(42)

generator = pipeline("text-generation")
response = "Dear Bumblebee, I am sorry to hear that your order was mixed up."
prompt = text + "\n\nCustomer service response:\n" + response
outputs = generator(prompt, max_length=500)
print(outputs[0]['generated_text'])

結果：
入力した文脈を元に、自然な文章の続きを生成します。ここでは、Bumblebeeのクレームに対するカスタマーサービスの返答を生成しています。

これらのコードを通じて、Transformerモデルを使った基本的なNLPタスクの流れを確認しました。Hugging Faceのpipeline機能は非常に使いやすく、複雑なモデルを簡単に実行することができます。

参考文献

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up