1
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 3 years have passed since last update.

Googleから出た最新のマルチタスクモデル T5(Text-to-Text Transfer Transformer) で、英語テキストの抽象型要約、英文の独文翻訳、英文の仏文翻訳をやってみた

Last updated at Posted at 2020-12-12

Googleから出た最新のマルチタスクモデル T5(Text-to-Text Transfer Transformer) を動かしてみます。

T5モデルについて )

Google AI Blog

__論文__はこちらです。総計67ページの大作です。

( マルチタスクモデル「T5モデル」の適用先タスクを宣言する方法 )

tokenized_text = tokenizer.encode(t5_prepared_Text, return_tensors="pt").to(device)

t5_prepared_Textの部分に格納する文字列を切り替えることで、切り替えます。

なお、以下に登場する「変数preprocess_text」は、英文の文字列を格納したstr型の変数オブジェクトです。

  1. 抽象型要約を行う場合:"summarize: "+preprocess_text
  2. 英文から独文への機械翻訳を行う場合:"translate English to German: "+preprocess_text
  3. 英文から仏文への機械翻訳を行う場合:"translate English to French: "+preprocess_text
英文から独文への機械翻訳を行う場合
t5_prepared_Text = "translate English to German: "+preprocess_text
tokenized_text = tokenizer.encode(t5_prepared_Text, return_tensors="pt").to(device)

なお、原論文を読むと、この他に、以下の「適用先のタスク宣言」を行う指定文字列が用意されていることがわかります。

  • ss2 sentence
  • stsb sentence1
  • cb hypothesis
  • copa choice1
  • multirc question
  • wic pos
  • wsc
  • question
  • translate English to Romanian

Google T5モデルを使って、英文の抽象型要約(Abstractive Summarization)を実行するコード__は、次のRamsriさんのWebページ__に掲載されています。

今回は、RamsriさんのPythonコードを実行してみましす。

なお、本スクリプトは、Huggingfaceに登録されているようです。

スクリーンショット 2020-12-12 18.07.36.png

( 実行環境 )

  • 計算機 : MacBookPro (CPU)
  • OS : macOS Catalina
  • Python : ver.3.6.3
  • transformers : ver.2.8.0
  • torch : ver.1.4.0
Terminal
Desktop % mkdir t5_asbtract_summarization
Desktop % cd t5_asbtract_summarization 
ocean@AfoGuardMacBook-Pro t5_asbtract_summarization % pyenv local TensorFlow   
ocean@AfoGuardMacBook-Pro t5_asbtract_summarization % pyenv versions                 
  system
  3.6.0
  3.6.0/envs/TensorFlow
  3.6.1
  3.6.3
  3.6.3/envs/gpt2_ja
  3.9.0
* TensorFlow (set by /Users/ocean/Desktop/t5_asbtract_summarization/.python-version)
  gpt2_ja
ocean@AfoGuardMacBook-Pro t5_asbtract_summarization % 

transformerstorchをバージョン指定して入れる

Terminal
ocean@AfoGuardMacBook-Pro t5_asbtract_summarization % pip install transformers==2.8.0
pip3 install torch torchvision install transformers==2.8.0
Collecting torch
  Cache entry deserialization failed, entry ignored
  Downloading https://files.pythonhosted.org/packages/b6/01/fffb29c3892d80801bc6400e07c90b8fa6cd5f3db5ce9d7ca8068e14e0b2/torch-1.7.1-cp36-none-macosx_10_9_x86_64.whl (108.8MB)
    100% |████████████████████████████████| 108.8MB 14kB/s 

( ・・・省略・・・ )

Installing collected packages: typing-extensions, dataclasses, numpy, torch, pillow, torchvision, install, idna, certifi, urllib3, chardet, requests, tokenizers, filelock, sentencepiece, tqdm, regex, six, click, joblib, sacremoses, jmespath, python-dateutil, botocore, s3transfer, boto3, transformers
  Running setup.py install for sacremoses ... done
Successfully installed boto3-1.16.35 botocore-1.19.35 certifi-2020.12.5 chardet-3.0.4 click-7.1.2 dataclasses-0.8 filelock-3.0.12 idna-2.10 install-1.3.4 jmespath-0.10.0 joblib-0.17.0 numpy-1.19.4 pillow-8.0.1 python-dateutil-2.8.1 regex-2020.11.13 requests-2.25.0 s3transfer-0.3.3 sacremoses-0.0.43 sentencepiece-0.1.94 six-1.15.0 tokenizers-0.5.2 torch-1.7.1 torchvision-0.8.2 tqdm-4.54.1 transformers-2.8.0 typing-extensions-3.7.4.3 urllib3-1.26.2
You are using pip version 9.0.1, however version 20.3.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
ocean@AfoGuardMacBook-Pro t5_asbtract_summarization %
Terminal
ocean@AfoGuardMacBook-Pro t5_asbtract_summarization % pip install torch==1.4.0
( ・・・省略・・・ )
ocean@AfoGuardMacBook-Pro t5_asbtract_summarization %

Python 3.6.3 の対話型インタプリタを立ち上げる

Terminal
ocean@AfoGuardMacBook-Pro t5_asbtract_summarization % python
Python 3.6.3 (default, Dec 10 2020, 22:43:16) 
[GCC Apple LLVM 12.0.0 (clang-1200.0.32.27)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>

タスク1:英文の抽象型要約

( 入力した文章 )

入力テキスト
The US has "passed the peak" on new coronavirus cases, President Donald Trump said and predicted that some states would reopen this month.The US has over 637,000 confirmed Covid-19 cases and over 30,826 deaths, the highest for any country in the world.At the daily White House coronavirus briefing on Wednesday, Trump said new guidelines to reopen the country would be announced on Thursday after he speaks to governors."We'll be the comeback kids, all of us," he said. "We want to get our country back."The Trump administration has previously fixed May 1 as a possible date to reopen the world's largest economy, but the president said some states may be able to return to normalcy earlier than that.

( T5モデルが出力した文章 )

出力テキスト
 the us has over 637,000 confirmed Covid-19 cases and over 30,826 deaths. president Donald Trump predicts some states will reopen the country in april, he said. "we'll be the comeback kids, all of us," the president says.

( 実行したコード )

Python3.6.3
>>> import torch
>>> import json
>>> from transformers import T5Tokenizer, T5ForConditionalGeneration, T5Config
>>> 
>>> model = T5ForConditionalGeneration.from_pretrained('t5-small')
Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.20k/1.20k [00:00<00:00, 354kB/s]
Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 242M/242M [00:24<00:00, 10.1MB/s]
>>> 
>>> tokenizer = T5Tokenizer.from_pretrained('t5-small')
Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 792k/792k [00:01<00:00, 730kB/s]
>>> 
>>> device = torch.device('cpu')
>>> 
>>> text = """The US has "passed the peak" on new coronavirus cases, President Donald Trump said and predicted that some states would reopen this month.
... 
... The US has over 637,000 confirmed Covid-19 cases and over 30,826 deaths, the highest for any country in the world.
... 
... At the daily White House coronavirus briefing on Wednesday, Trump said new guidelines to reopen the country would be announced on Thursday after he speaks to governors.
... 
... "We'll be the comeback kids, all of us," he said. "We want to get our country back."
... 
... The Trump administration has previously fixed May 1 as a possible date to reopen the world's largest economy, but the president said some states may be able to return to normalcy earlier than that."""
>>> 
>>> preprocess_text = text.strip().replace("\n","")
>>> print(preprocess_text)
The US has "passed the peak" on new coronavirus cases, President Donald Trump said and predicted that some states would reopen this month.The US has over 637,000 confirmed Covid-19 cases and over 30,826 deaths, the highest for any country in the world.At the daily White House coronavirus briefing on Wednesday, Trump said new guidelines to reopen the country would be announced on Thursday after he speaks to governors."We'll be the comeback kids, all of us," he said. "We want to get our country back."The Trump administration has previously fixed May 1 as a possible date to reopen the world's largest economy, but the president said some states may be able to return to normalcy earlier than that.
>>> 
>>> t5_prepared_Text = "summarize: "+preprocess_text
>>> print(("original text preprocessed: \n", preprocess_text))
('original text preprocessed: \n', 'The US has "passed the peak" on new coronavirus cases, President Donald Trump said and predicted that some states would reopen this month.The US has over 637,000 confirmed Covid-19 cases and over 30,826 deaths, the highest for any country in the world.At the daily White House coronavirus briefing on Wednesday, Trump said new guidelines to reopen the country would be announced on Thursday after he speaks to governors."We\'ll be the comeback kids, all of us," he said. "We want to get our country back."The Trump administration has previously fixed May 1 as a possible date to reopen the world\'s largest economy, but the president said some states may be able to return to normalcy earlier than that.')
>>> 
>>> tokenized_text = tokenizer.encode(t5_prepared_Text, return_tensors="pt").to(device)
>>> type(tokenized_text))
<class 'torch.Tensor'>
>>> 
>>> summary_ids = model.generate(tokenized_text, num_beams=4, no_repeat_ngram_size=2, min_length=30, max_length=100, early_stopping=True)
>>> 
>>> print(type(summary_ids))
<class 'torch.Tensor'>
>>> 
>>> output = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
>>> print(type(output))
<class 'str'>
>>> 
>>> print ("\n\nSummarized text: \n",output)


Summarized text: 
 the us has over 637,000 confirmed Covid-19 cases and over 30,826 deaths. president Donald Trump predicts some states will reopen the country in april, he said. "we'll be the comeback kids, all of us," the president says.
>>> 

タスク2:英文から独文への機械翻訳

( 入力した文章 )

入力テキスト
The US has "passed the peak" on new coronavirus cases, President Donald Trump said and predicted that some states would reopen this month.The US has over 637,000 confirmed Covid-19 cases and over 30,826 deaths, the highest for any country in the world.At the daily White House coronavirus briefing on Wednesday, Trump said new guidelines to reopen the country would be announced on Thursday after he speaks to governors."We'll be the comeback kids, all of us," he said. "We want to get our country back."The Trump administration has previously fixed May 1 as a possible date to reopen the world's largest economy, but the president said some states may be able to return to normalcy earlier than that.

( T5モデルが出力した文章 )

出力テキスト
 Die USA haben den Höchststand auf neuen Koronavirus-Fällen "passiert", sagte Präsident Donald Trump und prognostizierte, dass einige Staaten in diesem Monat wieder eröffnen würden.Die USA verfügen über mehr als 637.000 bestätigte Covid-19-Fälle und über 30.826 Todesfälle, die höchste für jedes Land der Welt.

( 実行したコード )

Python3.6.3
>>> t5_prepared_Text = "translate English to German: "+preprocess_text
>>> print ("original text preprocessed: \n", preprocess_text)
original text preprocessed: 
 The US has "passed the peak" on new coronavirus cases, President Donald Trump said and predicted that some states would reopen this month.The US has over 637,000 confirmed Covid-19 cases and over 30,826 deaths, the highest for any country in the world.At the daily White House coronavirus briefing on Wednesday, Trump said new guidelines to reopen the country would be announced on Thursday after he speaks to governors."We'll be the comeback kids, all of us," he said. "We want to get our country back."The Trump administration has previously fixed May 1 as a possible date to reopen the world's largest economy, but the president said some states may be able to return to normalcy earlier than tha
>>> 
>>> tokenized_text = tokenizer.encode(t5_prepared_Text, return_tensors="pt").to(device)
>>> summary_ids = model.generate(tokenized_text,
...                                     num_beams=4,
...                                     no_repeat_ngram_size=2,
...                                     min_length=30,
...                                     max_length=100,
...                                     early_stopping=True)

>>> 
>>> output = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
>>> print ("\n\nTranslated text: \n",output)


Translated text: 
 Die USA haben den Höchststand auf neuen Koronavirus-Fällen "passiert", sagte Präsident Donald Trump und prognostizierte, dass einige Staaten in diesem Monat wieder eröffnen würden.Die USA verfügen über mehr als 637.000 bestätigte Covid-19-Fälle und über 30.826 Todesfälle, die höchste für jedes Land der Welt.
>>> 

タスク3:英文から仏文への機械翻訳

( 入力した文章 )

入力テキスト
The US has "passed the peak" on new coronavirus cases, President Donald Trump said and predicted that some states would reopen this month.The US has over 637,000 confirmed Covid-19 cases and over 30,826 deaths, the highest for any country in the world.At the daily White House coronavirus briefing on Wednesday, Trump said new guidelines to reopen the country would be announced on Thursday after he speaks to governors."We'll be the comeback kids, all of us," he said. "We want to get our country back."The Trump administration has previously fixed May 1 as a possible date to reopen the world's largest economy, but the president said some states may be able to return to normalcy earlier than that.

( T5モデルが出力した文章 )

出力テキスト
 Les Etats-Unis ont «passé le sommet» sur les nouveaux cas de coronavirus, a déclaré le président Donald Trump et prévoyait que certains États rouvriraient ce mois-ci. Les ÉtatsUnis affichent plus de 637 000 cas confirmés de Covid-19 ainsi que 30 826 décès, le plus élevé pour tout pays du monde. Lors de l'exposé quotidien

( 実行したコード )

Python3.6.3
>>> t5_prepared_Text = "translate English to French: "+preprocess_text
>>> tokenized_text = tokenizer.encode(t5_prepared_Text, return_tensors="pt").to(device)
>>> summary_ids = model.generate(tokenized_text,
...                                     num_beams=4,
...                                     no_repeat_ngram_size=2,
...                                     min_length=30,
...                                     max_length=100,
...                                     early_stopping=True)
>>> 
>>> output = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
>>> print ("\n\nTranslated tex: \n",output)


Translated text: 
 Les Etats-Unis ont «passé le sommet» sur les nouveaux cas de coronavirus, a déclaré le président Donald Trump et prévoyait que certains États rouvriraient ce mois-ci. Les ÉtatsUnis affichent plus de 637 000 cas confirmés de Covid-19 ainsi que 30 826 décès, le plus élevé pour tout pays du monde. Lors de l'exposé quotidien
>>>
1
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?