JMTEBをpython3.9で動かすには

Posted at 2025-02-12

modernbert-base-japanese-wikipediaを、JMTEBで評価してみることにした。ただし、JMTEBはpython3.9では動かないので、そこは、ちょっとばかり頑張ってみることにした。

#! /bin/sh
model="KoichiYasuoka/modernbert-base-japanese-wikipedia"
pip3.9 install sentence-transformers accelerate datasets jsonnet jsonargparse smart-open openai loguru tiktoken triton
test -d JMTEB || git clone --depth=1 https://github.com/sbintuitions/JMTEB
NAWK=nawk
if [ -x /usr/bin/gawk ]
then NAWK=/usr/bin/gawk
elif [ -x /usr/bin/mawk ]
then NAWK=/usr/bin/mawk
fi
find JMTEB/src -type f -print |
( while read F
  do T=`echo $F | sed 's?/src/?/src9/?'`
     mkdir -p `dirname $T`
     case $T in
     *.py) $NAWK '
{
  if(f!=1){
    if($0!~/__future__/){
      printf("from typing import Union\n");
      f=1;
    }
  }
  s=$0;
  while(match(s,/[A-Za-z]+(\[[A-Za-z ",]+\])?( \| [A-Za-z]+(\[[A-Za-z ",]+\])?)+/)>0){
    u=substr(s,RSTART,RLENGTH)
    gsub(/\|/,",",u);
    s=substr(s,1,RSTART-1)"Union["u"]"substr(s,RSTART+RLENGTH);
  }
  print s;
}' $F > $T ;;
     *) cp $F $T ;;
     esac
  done
)
( cd JMTEB/src9 && python3.9 -m jmteb --embedder SentenceBertEmbedder --embedder.model_name_or_path $model --save_dir ../../result/$model )
echo '***' $model
cat result/$model/summary.json
echo ''

NVIDIA A100-SXM4-40Bだと、約8時間で以下の結果が出力された。

*** KoichiYasuoka/modernbert-base-japanese-wikipedia
{
    "Classification": {
        "amazon_counterfactual_classification": {
            "macro_f1": 0.7421788737578211
        },
        "amazon_review_classification": {
            "macro_f1": 0.48802715434889077
        },
        "massive_intent_classification": {
            "macro_f1": 0.7691018550046452
        },
        "massive_scenario_classification": {
            "macro_f1": 0.8592072858263317
        }
    },
    "Reranking": {
        "esci": {
            "ndcg@10": 0.9074315783061467
        }
    },
    "Retrieval": {
        "jagovfaqs_22k": {
            "ndcg@10": 0.31588139624501516
        },
        "jaqket": {
            "ndcg@10": 0.04412074696344684
        },
        "mrtydi": {
            "ndcg@10": 0.02025729160659547
        },
        "nlp_journal_abs_intro": {
            "ndcg@10": 0.7693263292058586
        },
        "nlp_journal_title_abs": {
            "ndcg@10": 0.5514830546550054
        },
        "nlp_journal_title_intro": {
            "ndcg@10": 0.32297762338199365
        }
    },
    "STS": {
        "jsick": {
            "spearman": 0.7487648151029657
        },
        "jsts": {
            "spearman": 0.6872200019629169
        }
    },
    "Clustering": {
        "livedoor_news": {
            "v_measure_score": 0.5042800077305779
        },
        "mewsc16": {
            "v_measure_score": 0.3847169582408134
        }
    },
    "PairClassification": {
        "paws_x_ja": {
            "binary_f1": 0.623541887592789
        }
    }
}

かなりRetrievalが低いのだが、modernbert-base-japanese-wikipediaは検索コーパスによるトレーニングをおこなっていないので、まあ、仕方ないだろう。あとは、まあまあの評価値かな。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up