- venvの環境を用意する
python3.10 -m venv bert
source bert/bin/activate
- Pythonのライブラリをインストール
pip install transformers==4.50.0
pip install fugashi
pip install ipadic==1.0.0
pip install numpy torch
pip install sentencepiece unidic_lite
- BERTを動かすためのPython script
run.py
import numpy as np
import torch
from transformers import BertJapaneseTokenizer, BertForMaskedLM
model_name = 'tohoku-nlp/bert-base-japanese-whole-word-masking'
tokenizer = BertJapaneseTokenizer.from_pretrained(model_name)
bert_mlm = BertForMaskedLM.from_pretrained(model_name)
bert_mlm = bert_mlm.cuda()
text = '今日はバンコクで[MASK]を食べる。'
text0 = text
tokens = tokenizer.tokenize(text)
input_ids = tokenizer.encode(text, return_tensors='pt')
input_ids = input_ids.cuda()
with torch.no_grad():
output = bert_mlm(input_ids=input_ids)
scores = output.logits
print(scores.shape)
mask_position = input_ids[0].tolist().index(4)
id_best = scores[0, mask_position].argmax(-1).item()
token_best = tokenizer.convert_ids_to_tokens(id_best)
token_best = token_best.replace('##', '')
text = text.replace('[MASK]',token_best)
print(text0, "->", text)
- 実行結果
python run.py
今日はバンコクで[MASK]を食べる。 -> 今日はバンコクでカレーを食べる。
今日はカレー食べます