@kii_2688 (kii)posted at 2022-10-13

tokenizer.encodeが使えない

Q&A

Closed

Python tokenizer PyTorch GoogleColaboratory

解決したいこと

・AttributeError: module 'tokenizer' has no attribute 'encode'を解消したい

googlecolabで作成したPytorchの学習モデルを保存して、ローカルの環境で利用しようと考えています。

https://qiita.com/Yokohide/items/e74254f334e1335cd502
↑使用している学習モデルはこの記事を参考にしています。
colab上での実行は成功しています。

該当のコード

test_model.py

import torch
import torch.nn as nn
import torch.nn.functional as F
import tokenizer 

    
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(3, 3)
 
    def forward(self, x):
        x = F.relu(self.fc1(x))
        return x

def generate_reply(inp, num_gen=1):
    device = torch.device('cpu')
    input_text = "<s>" + str(inp) + "[SEP]"
    model= Net().to(device)
    model.load_state_dict(torch.load("model.pth", map_location=device),strict=False)
    

    input_ids = tokenizer.encode(input_text, return_tensors='pt').to(device)
    out = model.generate(input_ids, do_sample=True, max_length=64, num_return_sequences=num_gen, 
                         top_p=0.95, top_k=20, bad_words_ids=[[1], [5]], no_repeat_ngram_size=3)
    
    
    print(">", "あなた")
    print(inp)
    print(">", "あいて")
    for sent in tokenizer.batch_decode(out):
        sent = sent.split('[SEP]</s>')[1]
        sent = sent.replace('</s>', '')
        sent = sent.replace('<br>', '\n')
        print(sent)

call_test_model.py

import test_model

test_model.generate_reply('こんにちわ')

表示されているエラー

Traceback (most recent call last):
  File "c:\Users\81906\Downloads\linebot\test_model_call.py", line 3, in <module>
    test_model.generate_reply('こんにちわ')
  File "c:\Users\81906\Downloads\linebot\test_model.py", line 23, in generate_reply
    input_ids = tokenizer.encode(input_text, return_tensors='pt')
AttributeError: module 'tokenizer' has no attribute 'encode'

自分で試したこと

・公式ドキュメントを読みtokenizerがencode属性を持つことを確認
・colab上で動作することを再確認

colab上でGPUを用いて作成したモデルをローカルでCPUを用いて実行しようとしています。（自前のPCがGPUを搭載していないため）
モデルをアップロード（？）して利用するのが初めてなので、そもそもtokenizerが問題ではなく、学習モデルの読み込みが間違っている可能性もあります。

有識者の方、ご教授いただけると幸いです。よろしくお願いします。

0 likes

@PondVillege posted at 2022-10-14

公式ドキュメントを読みtokenizerがencode属性を持つことを確認

import tokenizerを使われたということは，次のライブラリを採用されたようですが

encode属性，無いです...

参考にされた記事で貼っているJupyter Notebookのリンクを見に行ったところ

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("rinna/japanese-gpt2-small")

のようにしてtransformersのAutoTokenizerを利用してtokenizerインスタンスを生成していました．

ここで得られたGPT2Tokenizer ClassはPreTraindTokenizer Classから継承されており，これは確かにencode()属性を持っています．

ちゃんとtransformersからtokenizerを作成しましょう．

1Like

Are you sure you want to delete the question?

tokenizer.encodeが使えない

解決したいこと

該当のコード

自分で試したこと

1Answer

Comments

Your answer might help someone💌