0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

古典中国語C-CLUEとrun_ner.pyによる固有表現抽出の評価

Posted at

Yongrui Xu, Caixia Mao, Zhiyong Wang, Guonian Jin, liangji Zhong & Tao Qian『Semantic-enhanced graph neural network for named entity recognition in ancient Chinese books』という論文が、NatureのScientific Reports 14に載っていたので、ざっと読んでみた。古典中国語における固有表現抽出のためのグラフニューラルネットワークを提案していて、その評価にC-CLUEのdata_nerを使っているらしいのだが、どうも様子がおかしい。たとえば「Table 1. Statistics of the C-CLUE datatset」は、タイトルのdatatsetという単語が意味不明だし、中身もおかしい。

ScientificReports17488Table1.png

Google Colaboratoryで確かめてみよう。

!test -d C-CLUE || git clone --depth=1 https://github.com/jizijing/C-CLUE
!tr " " "\012" < C-CLUE/data_ner/target.txt | sort | uniq -c | sed -n "s/^ *\([0-9]*\) B-\([A-Z]*\)$/\2 \1/p" > train.uniq
!tr " " "\012" < C-CLUE/data_ner/dev-label.txt | sort | uniq -c | sed -n "s/^ *\([0-9]*\) B-\([A-Z]*\)$/\2 \1/p" > dev.uniq
!tr " " "\012" < C-CLUE/data_ner/test_tgt.txt | sort | uniq -c | sed -n "s/^ *\([0-9]*\) B-\([A-Z]*\)$/\2 \1/p" > test.uniq
!join -a1 train.uniq dev.uniq | awk '{{if(NF==2)print $$0,0;else print}}' | join -a1 - test.uniq | awk '{{if(NF==3)print $$0,0;else print}}' > all.uniq
s='''BEGIN{printf("%5s %10s %10s %10s\\n","","Train","Dev","Test")}
  {train+=$2;dev+=$3;test+=$4;printf("%5s %10d %10d %10d\\n",$1,$2,$3,$4)}
  END{printf("%5s %10d %10d %10d\\n","Total",train,dev,test)}'''
!awk '{s}' all.uniq

私(安岡孝一)の手元では、以下の結果が得られた。

           Train        Dev       Test
  BOO        119          0         16
  JOB       2252        448        349
  LOC       3625        220        236
  ORG       2041          4         45
  PER      11532        756        859
  WAR          6          0          0
Total      19575       1428       1505

ごらんのとおり、C-CLUEには「OFI」などというラベルは使われていない。これが仮に「JOB」のことだとして、「BOO」と「WAR」を除外することにしても、今度はDev (Validation set)の合計が合わない。ここが合わないと、他のTableの結果も微妙に違ってくる。特にTable 5は

ScientificReports17488Table5.png

2番目にRoberta-Classical-Chineseの結果を示しているので、私としても非常に気になるところだ。しかしながら、論文の末尾に

All codes and resources are released at the website: https://github.com/qtxcm/BAC-GNN-CRF.

と示されたwebsiteには、現時点では何も置かれていない。正直イラっと来たので、2023年5月12日の日記のアイデアを使って、transformersのrun_ner.pyでC-CLUEの固有表現抽出に挑戦してみた。Google Colaboratory (GPU版)だと、こんな感じ。

!pip install transformers datasets evaluate seqeval accelerate
!test -d C-CLUE || git clone --depth=1 https://github.com/jizijing/C-CLUE
s='$1=="transformers"{printf("-b v%s",$2)}'
!test -d transformers || git clone `pip list | awk '{s}'` https://github.com/huggingface/transformers
def makejson(token_file,tag_file,json_file):
  with open(token_file,"r",encoding="utf-8") as r1, open(tag_file,"r",encoding="utf-8") as r2, open(json_file,"w",encoding="utf-8") as w:
    for s,t in zip(r1,r2):
      print('{"tokens":["'+s.rstrip().replace(' ','","')+'"],"tags":["'+t.rstrip().replace(' ','","')+'"]}',file=w)
makejson("C-CLUE/data_ner/source.txt","C-CLUE/data_ner/target.txt","train.json")
makejson("C-CLUE/data_ner/dev.txt","C-CLUE/data_ner/dev-label.txt","dev.json")
makejson("C-CLUE/data_ner/test1.txt","C-CLUE/data_ner/test_tgt.txt","test.json")
import sys,subprocess
for b in ["google-bert/bert-base-chinese","KoichiYasuoka/roberta-classical-chinese-base-char","ethanyt/guwenbert-base","SIKU-BERT/sikubert","Jihuai/bert-ancient-chinese"]:
  f=False
  for s in subprocess.run([sys.executable,"transformers/examples/pytorch/token-classification/run_ner.py","--model_name_or_path",b,"--train_file","train.json","--validation_file","dev.json","--test_file","test.json","--output_dir","/tmp","--overwrite_output_dir","--do_train","--do_eval","--do_predict"],capture_output=True,text=True).stdout.split("\n"):
    if f and s.find("INFO")<0:
      print(s)
    elif s.startswith("***** train metrics "):
      f=True
      print("##### "+b+" #####\n"+s)

私の手元では、以下の結果が得られた。

##### google-bert/bert-base-chinese #####
***** train metrics *****
  epoch                    =        3.0
  total_flos               =   335817GF
  train_loss               =     0.1562
  train_runtime            = 0:02:30.87
  train_samples            =       1902
  train_samples_per_second =      37.82
  train_steps_per_second   =      4.732
***** eval metrics *****
  epoch                   =        3.0
  eval_accuracy           =     0.9037
  eval_f1                 =     0.5964
  eval_loss               =     0.3251
  eval_precision          =     0.5417
  eval_recall             =     0.6634
  eval_runtime            = 0:00:02.21
  eval_samples            =        238
  eval_samples_per_second =    107.595
  eval_steps_per_second   =     13.562
***** predict metrics *****
  predict_accuracy           =     0.9119
  predict_f1                 =     0.6513
  predict_loss               =      0.313
  predict_precision          =     0.5745
  predict_recall             =      0.752
  predict_runtime            = 0:00:02.06
  predict_samples_per_second =    115.369
  predict_steps_per_second   =     14.542

##### KoichiYasuoka/roberta-classical-chinese-base-char #####
***** train metrics *****
  epoch                    =        3.0
  total_flos               =   335817GF
  train_loss               =     0.2046
  train_runtime            = 0:03:19.43
  train_samples            =       1902
  train_samples_per_second =     28.611
  train_steps_per_second   =       3.58
***** eval metrics *****
  epoch                   =        3.0
  eval_accuracy           =     0.9066
  eval_f1                 =     0.6265
  eval_loss               =     0.3079
  eval_precision          =     0.5608
  eval_recall             =     0.7096
  eval_runtime            = 0:00:02.15
  eval_samples            =        238
  eval_samples_per_second =    110.426
  eval_steps_per_second   =     13.919
***** predict metrics *****
  predict_accuracy           =     0.9104
  predict_f1                 =     0.6592
  predict_loss               =      0.298
  predict_precision          =     0.5705
  predict_recall             =     0.7805
  predict_runtime            = 0:00:02.09
  predict_samples_per_second =    113.744
  predict_steps_per_second   =     14.338

##### ethanyt/guwenbert-base #####
***** train metrics *****
  epoch                    =        3.0
  total_flos               =   335817GF
  train_loss               =     0.2094
  train_runtime            = 0:02:57.02
  train_samples            =       1902
  train_samples_per_second =     32.232
  train_steps_per_second   =      4.033
***** eval metrics *****
  epoch                   =        3.0
  eval_accuracy           =     0.9054
  eval_f1                 =     0.6258
  eval_loss               =     0.3065
  eval_precision          =      0.555
  eval_recall             =     0.7173
  eval_runtime            = 0:00:02.15
  eval_samples            =        238
  eval_samples_per_second =    110.254
  eval_steps_per_second   =     13.898
***** predict metrics *****
  predict_accuracy           =     0.9077
  predict_f1                 =      0.652
  predict_loss               =     0.3033
  predict_precision          =     0.5622
  predict_recall             =     0.7759
  predict_runtime            = 0:00:02.44
  predict_samples_per_second =      97.44
  predict_steps_per_second   =     12.282

##### SIKU-BERT/sikubert #####
***** train metrics *****
  epoch                    =        3.0
  total_flos               =   335817GF
  train_loss               =     0.1446
  train_runtime            = 0:03:08.72
  train_samples            =       1902
  train_samples_per_second =     30.234
  train_steps_per_second   =      3.783
***** eval metrics *****
  epoch                   =        3.0
  eval_accuracy           =     0.9088
  eval_f1                 =     0.6211
  eval_loss               =     0.3067
  eval_precision          =     0.5628
  eval_recall             =     0.6928
  eval_runtime            = 0:00:02.07
  eval_samples            =        238
  eval_samples_per_second =    114.492
  eval_steps_per_second   =     14.432
***** predict metrics *****
  predict_accuracy           =      0.913
  predict_f1                 =     0.6561
  predict_loss               =     0.3037
  predict_precision          =     0.5796
  predict_recall             =      0.756
  predict_runtime            = 0:00:02.42
  predict_samples_per_second =     98.247
  predict_steps_per_second   =     12.384

##### Jihuai/bert-ancient-chinese #####
***** train metrics *****
  epoch                    =        3.0
  total_flos               =   335817GF
  train_loss               =     0.1322
  train_runtime            = 0:03:41.68
  train_samples            =       1902
  train_samples_per_second =      25.74
  train_steps_per_second   =      3.221
***** eval metrics *****
  epoch                   =        3.0
  eval_accuracy           =     0.9118
  eval_f1                 =      0.613
  eval_loss               =     0.2833
  eval_precision          =     0.5734
  eval_recall             =     0.6585
  eval_runtime            = 0:00:02.05
  eval_samples            =        238
  eval_samples_per_second =    115.602
  eval_steps_per_second   =     14.572
***** predict metrics *****
  predict_accuracy           =     0.9174
  predict_f1                 =      0.661
  predict_loss               =     0.2658
  predict_precision          =     0.5929
  predict_recall             =     0.7467
  predict_runtime            = 0:00:02.01
  predict_samples_per_second =    118.208
  predict_steps_per_second   =       14.9

「predict metrics」を表形式にしてみよう。

Precision Recall F1
Bert-Base-Chinese 57.45 75.20 65.13
Roberta-Classical-Chinese 57.05 78.05 65.92
GuwenBert-Base 56.22 77.59 65.20
SikuBert 57.96 75.60 65.61
Bert-Ancient-Chinese 59.29 74.67 66.10

うーん、この論文のTable 5とは、かなり異なる結果になってしまう。というか、Table 5のRecall値は、各モデルの持つポテンシャルを全体に下回っているということだ。どうして、こんなことになってるんだろ。

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?