0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

Ollamaで使用可能なembeddingモデルの比較 

Posted at

埋め込み検証

対象データ:
60件ほどセキュリティや、CTF、Pythonライブラリ等の参考資料が入っています。

クエリ:

  • Nmapの代わりに使用可能なツール
  • X86 アラインメントについて

評価:ほしい情報に対して
0論外
1全く出てこない
2クエリどちらか一つは上位3件に含まれる。
3クエリどちらかは1件目に出力
4最高

結果

mxbai-embed-large       2(3)    670MB
nomic-embed-text        2         274MB
snowflake-arctic-embed  エラー      669MB
all-minilm              1(0)     46MB
bge-m3                  1     1.2GB
bge-large               エラー     671MB
paraphrase-multilingual 3          563MB
snowflake-arctic-embed2 1(0)    1.2GB
granite-embedding        エラー       63MB

2(1)だとしたら、2寄りの1つまり1.5くらい

以下の記事でもparaphrase-multilingualの性能が良いらしいです。
https://www.jiang.jp/posts/20230601_embedding_benchmark/
0 OpenAIEmbeddings 0.684147 0.548009 0.868852
1 TensorflowHubEmbeddings 0.560619 0.407494 0.761124
2 paraphrase-multilingual-mpnet-base-v2 0.525899 0.398126 0.676815
3 oshizo/sbert-jsnli-luke-japanese-base-lite 0.520106 0.405152 0.655738
4 paraphrase-multilingual-MiniLM-L12-v2 0.497027 0.370023 0.639344
5 intfloat/multilingual-e5-base 0.481144 0.337237 0.632319
6 sonoisa/sentence-bert-base-ja-mean-tokens-v2 0.465294 0.327869 0.622951
7 setu4993/smaller-LaBSE 0.450434 0.290398 0.632319
8 sonoisa/sentence-bert-base-ja-en-mean-tokens 0.438923 0.304450 0.599532
9 setu4993/LaBSE 0.434725 0.274005 0.625293
10 Blaxzter/LaBSE-sentence-embeddings 0.434725 0.274005 0.625293
11 distiluse-base-multilingual-cased-v2 0.428484 0.264637 0.620609
12 ZurichNLP/unsup-simcse-xlm-roberta-base 0.419397 0.299766 0.526932
13 sentence-transformers/stsb-xlm-r-multilingual 0.361811 0.231850 0.484778
14 sonoisa/clip-vit-b-32-japanese-v1 0.320160 0.203747 0.437939
15 sonoisa/sentence-bert-base-ja-mean-tokens 0.293779 0.177986 0.402810
16 google/canine-s 0.270446 0.159251 0.358314
17 google/canine-c 0.258978 0.159251 0.341920
18 colorfulscoop/sbert-base-ja 0.227531 0.133489 0.295082
19 sonoisa/t5-base-japanese 0.213053 0.135831 0.278689
20 M-CLIP/M-BERT-Distil-40 0.170714 0.084309 0.236534
21 microsoft/unihanlm-base 0.162957 0.098361 0.187354
22 nielsr/lilt-xlm-roberta-base 0.143722 0.074941 0.173302
23 severinsimmler/xlm-roberta-longformer-base-16384 0.129116 0.072600 0.145199
24 facebook/nllb-moe-54b 0.000000 0.000000 0.000000
25 sonoisa/sentence-luke-japanese-base-lite 0.000000 0.000000 0.000000
26 TylorShine/distilhubert-ft-japanese-50k 0.000000 0.000000 0.000000
27 rinna/japanese-hubert-base 0.000000 0.000000 0.000000
28 ArthurZ/nllb-moe-128 0.000000 0.000000 0.000000
29 pkshatech/simcse-ja-bert-base-clcmlp 0.000000 0.000000 0.000000
30 paulhindemith/fasttext-jp-embedding 0.000000 0.000000 0.000000
31 rinna/japanese-cloob-vit-b-16 0.000000 0.000000 0.000000
32 rinna/japanese-clip-vit-b-16 0.000000 0.000000 0.000000
33 sonoisa/sentence-t5-base-ja-mean-tokens 0.000000 0.000000 0.000000
34 megagonlabs/transformers-ud-japanese-electra-b...

まとめ

精度はモデルの大きさによらないことが判明しました。

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?