More than 1 year has passed since last update.

MistralForTokenClassificationによる日本語品詞付与モデルSwallow-MS-7b-uposをリリース

Last updated at 2024-03-28Posted at 2024-03-28

3月18日に書いたMistralForTokenClassificationを使って、Swallow-MS-7b-uposを作ってみた。NVIDIA A100-SXM4-40GBを8枚使えば、モデルの作成時間を1時間30分にまで短縮できるようになったが、予想に反して品詞付与の精度はイマイチだったりする。

>>> from transformers import pipeline
>>> tag=pipeline("upos","KoichiYasuoka/Swallow-MS-7b-upos",trust_remote_code=True)
>>> nlp=lambda x:[(x[t["start"]:t["end"]],t["entity"]) for t in tag(x)]
>>> print(nlp("予想に反して品詞付与の精度はイマイチ"))
[('予想', 'NOUN'), ('に', 'ADP'), ('反', 'B-NOUN'), ('して', 'I-NOUN'), ('品', 'NOUN'), ('詞', 'B-NOUN'), ('付与', 'I-NOUN'), ('が', 'ADP'), ('イマイチ', 'B-VERB')]

うーむ、昨日の記事と同様に、「反」「し」「て」とトークナイズして

[('予想', 'NOUN'), ('に', 'ADP'), ('反', 'B-VERB'), ('し', 'I-VERB'), ('て', 'SCONJ'), ('品', 'B-NOUN'), ('詞', 'I-NOUN'), ('付与', 'NOUN'), ('の', 'ADP'), ('精度', 'NOUN'), ('は', 'ADP'), ('イマイチ', 'ADV')]

と品詞付与できればいいのだけど。すでにPull Requestも出てるので、さて、どうやって精度あげようかな。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up