More than 3 years have passed since last update.

【huggingface/transformers】Pipelinesの使い方

Last updated at 2022-07-23Posted at 2022-07-23

Pipelinesについて

BERTをはじめとするトランスフォーマーモデルを利用する上で非常に有用なHuggingface inc.のtransformersライブラリですが、推論を実行する場合はpipelineクラスが非常に便利です。
以下は公式の使用例です。

>>> from transformers import pipeline
>>> unmasker = pipeline('fill-mask', model='bert-base-uncased')
>>> unmasker("Hello I'm a [MASK] model.")

[{'sequence': "[CLS] hello i'm a fashion model. [SEP]",
  'score': 0.1073106899857521,
  'token': 4827,
  'token_str': 'fashion'},
 {'sequence': "[CLS] hello i'm a role model. [SEP]",
  'score': 0.08774490654468536,
  'token': 2535,
  'token_str': 'role'},
 {'sequence': "[CLS] hello i'm a new model. [SEP]",
  'score': 0.05338378623127937,
  'token': 2047,
  'token_str': 'new'},
 {'sequence': "[CLS] hello i'm a super model. [SEP]",
  'score': 0.04667217284440994,
  'token': 3565,
  'token_str': 'super'},
 {'sequence': "[CLS] hello i'm a fine model. [SEP]",
  'score': 0.027095865458250046,
  'token': 2986,
  'token_str': 'fine'}]

以下、より詳しく解説します。

公式ドキュメント

英語になりますが、使い方は公式ドキュメント参照です。
以下の通り、マスク語予測、テキスト分類、文章生成等の推論器を定義することができます。

マスク語予測

>>> from transformers import pipeline
>>> pipe = pipeline("fill-mask")
>>> pipe("Paris is the <mask> of France.")
[{'score': 0.6790180802345276, 'token': 812, 'token_str': ' capital', 'sequence': 'Paris is the capital of France.'}, {'score': 0.05177970975637436, 'token': 32357, 'token_str': ' birthplace', 'sequence': 'Paris is the birthplace of France.'}, {'score': 0.038252782076597214, 'token': 1144, 'token_str': ' heart', 'sequence': 'Paris is the heart of France.'}, {'score': 0.02434905804693699, 'token': 29778, 'token_str': ' envy', 'sequence': 'Paris is the envy of France.'}, {'score': 0.022851234301924706, 'token': 1867, 'token_str': ' Capital', 'sequence': 'Paris is the Capital of France.'}]

テキスト分類

>>> from transformers import pipeline
>>> pipe = pipeline("text-classification")
>>> pipe("This restaurant is awesome")
>>> pipe("This restaurant is awesome")
[{'label': 'POSITIVE', 'score': 0.9998743534088135}]

文章生成

>>> from transformers import pipeline
>>> pipe = pipeline("text-generation")
>>> pipe("Hello, I'm a language model,")
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
[{'generated_text': 'Hello, I\'m a language model, you will understand that at the beginning, when most people did not understand anything, and had a bad memory, and they began saying "here is a model that was created." And I said, "you can'}]

マスク語予測モデル

マスク語予測モデルを利用する際は、推論器のインスタンス化の際の第１引数にfill-maskを指定し、第２引数でモデル名を指定します。その他、top_kを指定することで出力値の個数を制御することが可能です。

>>> from transformers import pipeline
>>> unmasker = pipeline("fill-mask", "cl-tohoku/bert-base-japanese-whole-word-masking", top_k=1)
Some weights of the model checkpoint at cl-tohoku/bert-base-japanese-whole-word-masking were not used when initializing BertForMaskedLM: ['cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
>>> unmasker("吾輩は[MASK]である。")
[{'score': 0.7356285452842712, 'token': 6040, 'token_str': '猫', 'sequence': '吾輩 は 猫 で ある 。'}]

テキスト分類モデル

テキスト分類モデルを実装する際は、第１引数にtext-classificationを指定します。
　　以下は、daigo氏の日本語感情分析モデルで行った日本語感情分析です。

>>> from transformers import pipeline
>>> classifier = pipeline("text-classification", "daigo/bert-base-japanese-sentiment")
>>> classifier("私は幸せです。")
[{'label': 'ポジティブ', 'score': 0.9945743680000305}]
>>> classifier("私は不幸です。")
[{'label': 'ネガティブ', 'score': 0.9909571409225464}]

テキストペアを入力する際は、以下の通りtextとtext_pairをキーとして持つ辞書データを推論器に渡します。

>>> from transformers import pipeline
>>> classifier = pipeline("text-classification", model="typeform/distilbert-base-uncased-mnli")
>>> classifier({"text": "This man is walking in a park", "text_pair": "the man is not in the park"})
{'label': 'CONTRADICTION', 'score': 0.999176561832428}

複数のテキストの分類を行う際は、対象となるテキスト（ペア）をリストで入力します。

>>> from transformers import pipeline
>>> classifier = pipeline("text-classification", "daigo/bert-base-japanese-sentiment")
>>> classifier(["私は幸せです。", "私は不幸です。"])
[{'label': 'ポジティブ', 'score': 0.9945743680000305}, {'label': 'ネガティブ', 'score': 0.9909571409225464}]

生成モデル

生成モデルを利用する際の第１引数はtext-generationになります。Rinna社のGPT2で文章を生成してみました。Rinna社のGPT2モデルはトークナイザにT5Tokenizerを用いていますが、モデルとトークナイザのクラスモデルが異なる際は、モデルとトークナイザをそれぞれインスタンス化してから、pipelineクラスに入力する必要があります。

>>> from transformers import T5Tokenizer, GPT2LMHeadModel
>>> tokenizer = T5Tokenizer.from_pretrained("rinna/japanese-gpt2-small")
>>> tokenizer.do_lower_case = True  # due to some bug of tokenizer config loading
>>> model = GPT2LMHeadModel.from_pretrained("rinna/japanese-gpt2-small")
>>> generator = pipeline("text-generation", model=model, tokenizer=tokenizer)
>>> generator("吾輩は猫である。")
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
[{'generated_text': '吾輩は猫である。 猫はとても賢い。 ある意味、犬の様に利口である。 猫は、頭の上に上がらず、背中を向いて寝そべっていることが多い。 人間は、寝そ'}]

まとめ

Pipelinesには紹介したものの他にもたくさんの機能があるようです。トランスフォーマーモデルをデプロイする際の強力な武器になってくれそうですね！

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up