0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

ModernBERT-baseは「It don't [MASK] a thing if it ain't got that swing」の[MASK]に何を埋めてくるのか

Last updated at Posted at 2024-12-26

ModernBERTが12月19日に発表されたので、試しに使ってみることにした。ただし、transformersでのサポートはv4.48以降に持ち越しとなったので、現時点のtransformers v4.47.1で動かそうとすると、trust_remote_code=Trueの助けが必要だったりする。Google Colaboratoryだと、こんな感じ。

!pip install transformers triton
!test -d ModernBERT-base || git clone --depth=1 https://huggingface.co/answerdotai/ModernBERT-base
!test -f ModernBERT-base/configuration_modernbert.py || ( curl -L https://github.com/huggingface/transformers/raw/refs/heads/main/src/transformers/models/modernbert/configuration_modernbert.py | sed 's/^from \.\.\./from transformers./' > ModernBERT-base/configuration_modernbert.py )
!test -f ModernBERT-base/modeling_modernbert.py || ( curl -L https://github.com/huggingface/transformers/raw/refs/heads/main/src/transformers/models/modernbert/modeling_modernbert.py | sed -e 's/^from \.\.\./from transformers./' -e 's/^from .* import is_triton_available/import importlib\nis_triton_available = lambda: importlib.util.find_spec("triton") is not None/' > ModernBERT-base/modeling_modernbert.py )
import json
with open("ModernBERT-base/config.json","r",encoding="utf-8") as r:
  d=json.load(r)
if not "auto_map" in d:
  d["auto_map"]={
    "AutoConfig":"configuration_modernbert.ModernBertConfig",
    "AutoModel":"modeling_modernbert.ModernBertModel",
    "AutoModelForMaskedLM":"modeling_modernbert.ModernBertForMaskedLM",
    "AutoModelForSequenceClassification":"modeling_modernbert.ModernBertForSequenceClassification",
    "AutoModelForTokenClassification":"modeling_modernbert.ModernBertForTokenClassification"
  }
  with open("ModernBERT-base/config.json","w",encoding="utf-8") as w:
    json.dump(d,w,indent=2)
from transformers import AutoTokenizer,AutoModelForMaskedLM,FillMaskPipeline
tkz=AutoTokenizer.from_pretrained("ModernBERT-base")
mdl=AutoModelForMaskedLM.from_pretrained("ModernBERT-base",trust_remote_code=True)
fmp=FillMaskPipeline(tokenizer=tkz,model=mdl)
print(fmp("It don't [MASK] a thing if it ain't got that swing"))

ModernBERT-baseが「It don't [MASK] a thing if it ain't got that swing」の[MASK]に何を埋めてくるのか試したところ、私(安岡孝一)の手元では以下の結果が得られた。

[{'score': 0.8329909443855286, 'token': 1599, 'token_str': ' mean', 'sequence': "It don't mean a thing if it ain't got that swing"}, {'score': 0.026023954153060913, 'token': 513, 'token_str': ' do', 'sequence': "It don't do a thing if it ain't got that swing"}, {'score': 0.020469805225729942, 'token': 2647, 'token_str': ' matter', 'sequence': "It don't matter a thing if it ain't got that swing"}, {'score': 0.009303263388574123, 'token': 320, 'token_str': ' be', 'sequence': "It don't be a thing if it ain't got that swing"}, {'score': 0.008411774411797523, 'token': 1056, 'token_str': ' make', 'sequence': "It don't make a thing if it ain't got that swing"}]

「 mean」が83%でダントツなあたり、この英文をModernBERT-baseは知っているのだろう。ただし、「 mean」のアタマには空白が付いている。このあたり、ModernBERTのトークナイザは、BERTよりGPT風だったりする。うーむ、品詞付与や係り受け解析、このトークナイザ相手だと大変かなあ。

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?