特許情報を用いたOllamaモデルの性能評価：llmとEmbeddingの違い

Last updated at 2024-10-17Posted at 2024-10-17

はじめに

以前、私の投稿でOllamaのllmモデルとembeddingモデルを使って、自分のPCでRAG（Retrieval-Augmented Generation）を実現したことについて書きました。

その後、@SNAMGNさんから貴重なアドバイスをいただき、Ollamaにはさまざまなモデルが存在することを教えていただきました。私自身も確認したところ、確かに多くのモデルがあることがわかりました。

そこで今回は、Ollamaに登録されているllmとembeddingモデルを採用し、「処理速度」と「回答の正確性」について比較検証を行うことにしました。

私のPCスペック

まずは、私が使用しているPCのスペックをご紹介します。以前の記事でも触れましたが、メモリが8GBでは不足するため、スワップ領域を19,456MBに設定してOllamaを何とか動作させることができました。

リソース	詳細
CPU	Intel(R) Core(TM) i3-8145U CPU @ 2.10GHz
GPU	なし
メモリ	8GB
SSD	1TB
スワップ領域	19,456MB

このスペックで検証に挑戦します。

モデルランキング

Ollamaのサイトから、llmおよびembeddingモデルの一覧を以下に示します。

モデルが豊富に存在していたため、今回はダウンロード数が多いものをピックアップしました。なお、これは2024年10月14日時点での情報です。

llmモデル

「対象」に「〇」マークを付けたものが、今回の比較対象となるモデルです。ダウンロード数（pull数）が多いモデルを選定しています。

モデル名	ダウンロード数	アップデート	説明	対象
llama3.2	610,100	2 weeks ago	Meta's Llama 3.2 goes small with 1B and 3B models.	〇
llama3.1	6,200,000	3 weeks ago	Llama 3.1 is a new state-of-the-art model from Meta available in 8B, 70B and 405B parameter sizes.	〇
qwen2.5	779,000	3 weeks ago	Qwen2.5 models are pretrained on Alibaba's latest large-scale dataset, encompassing up to 18 trillion tokens. The model supports up to 128K tokens and has multilingual support.	〇
nemotron-mini	16,600	3 weeks ago	A commercial-friendly small language model by NVIDIA optimized for roleplay, RAG QA, and function calling.
mistral-small	19,200	3 weeks ago	Mistral Small is a lightweight model designed for cost-effective use in tasks like translation and summarization.
mistral-nemo	298,400	2 weeks ago	A state-of-the-art 12B model with 128k context length, built by Mistral AI in collaboration with NVIDIA.
mistral	3,900,000	4 months ago	The 7B model released by Mistral AI, updated to version 0.3.	〇
mixtral	446,200	5 months ago	A set of Mixture of Experts (MoE) model with open weights by Mistral AI in 8x7b and 8x22b parameter sizes.
command-r	227,900	6 weeks ago	Command R is a Large Language Model optimized for conversational interaction and long context tasks.
command-r-plus	99,200	6 weeks ago	Command R+ is a powerful, scalable large language model purpose-built to excel at real-world enterprise use cases.
qwen2	3,900,000	4 months ago	Qwen2 is a new series of large language models from Alibaba group	〇
qwen2.5-coder	206,400	3 weeks ago	Qwen2 is a new series of large language models from Alibaba group
mistral-large	87,700	2 months ago	Mistral Large 2 is Mistral's new flagship model that is significantly more capable in code generation, mathematics, and reasoning with 128k context window and support for dozens of languages.
gdisney/mistral-large-uncensored	53,500	2 months ago
hermes3	43,800	6 weeks ago	Hermes 3 is the latest version of the flagship Hermes series of LLMs by Nous Research
llama3-groq-tool-use	30,100	2 months ago	A series of models from Groq that represent a significant advancement in open-source AI capabilities for tool use/function calling.

embeddingモデル

「対象」に「〇」マークを付けたものが、今回の比較対象となるモデルです。ダウンロード数（pull数）が多いモデルを選定しています。

モデル名	ダウンロード数	アップデート	説明	対象
nomic-embed-text	715,100	7 months ago	A high-performing open embedding model with a large token context window.	〇
mxbai-embed-large	448,100	6 months ago	State-of-the-art large embedding model from mixedbread.ai	〇
snowflake-arctic-embed	158,000	5 months ago	A suite of text embedding models by Snowflake, optimized for performance.	×(エラーのため)
all-minilm	107,400	7 months ago	Embedding models on very large sentence level datasets.	〇
unclemusclez/jina-embeddings-v2-base-code	52,400	3 months ago	(Updated: 07/1/2024) https://huggingface.co/jinaai/jina-embeddings-v2-base-code	〇
hellord/mxbai-embed-large-v1	41,200	6 months ago		〇
znbang/bge	21,500	7 months ago	BAAI General Embedding
bge-m3	20,000	2 months ago	BGE-M3 is a new model from BAAI distinguished for its versatility in Multi-Functionality, Multi-Linguality, and Multi-Granularity.	〇
shaw/dmeta-embedding-zh	16,500	6 months ago	https://huggingface.co/DMetaSoul/Dmeta-embedding-zh
jina/jina-embeddings-v2-base-de	13,800	4 months ago	Text embedding model (base) for English and German input of size up to 8192 tokens
chroma/all-minilm-l6-v2-f32	11,000	6 months ago
bge-large	7,501	2 months ago	Embedding model from BAAI mapping texts to vectors.
quentinz/bge-large-zh-v1.5	4,642	2 months ago
paraphrase-multilingual	4,361	2 months ago	Sentence-transformers model that can be used for tasks like clustering or semantic search.
milkey/m3e	2,364	6 months ago	Moka-AI Massive Mixed Embedding

ただし、snowflake-arctic-embedはretrieve時にエラーが発生したため、今回は比較対象から除外しました。

評価基準

今回の評価基準は以下の通りです。

データベース登録処理

ベクトルデータベースとしてChromaを採用し、インプット資料として「特許情報xml」ファイルを使用しました。このxmlファイルを読み込んで、Chromaのデータベースへの登録が完了するまでの時間を評価基準としました。

質問応答処理

「データベース登録処理」で作成したベクトルデータベースを基に、ユーザの質問に対する回答時間を評価しました。また、その回答内容が期待に沿っているかを、以下の4段階で評価しました。

回答内容の評価基準

得点	説明
3	正確に要約されている
2	観点は異なるが、納得できる要約
1	キーワードのみ含まれている要約
0	キーワードすら含まれていない

また、評価する質問内容は以下の4つとしました。

No.	質問内容
1	「組合せ処置およびその方法」の概要を教えて
2	「フューリンインヒビター」の概要を教えて
3	用語「アルキル」の意味は？
4	用語「アゴニスト」とは？

プログラム

「データベース登録処理」と「質問応答処理」について、以下にPythonのプログラムを紹介します。
各プログラムのモデル宣言部分はコメントアウトしていますので、必要に応じてコメントを外して実行してください。

データベース登録処理

chroma_retriever.py

import glob
import os
import xml.etree.ElementTree as ET
from dotenv import load_dotenv
from langchain.text_splitter import CharacterTextSplitter
#from langchain.embeddings.openai import OpenAIEmbeddings
#from langchain.vectorstores import Chroma
from langchain_chroma import Chroma
import ollama
from datetime import datetime

load_dotenv()

docs = []

# 取り出したい名前空間-タグ名
name_spaces_tag_names = [
    "{http://www.wipo.int/standards/XMLSchema/ST96/Common}PublicationNumber",
    "{http://www.wipo.int/standards/XMLSchema/ST96/Common}PublicationDate",
    "{http://www.wipo.int/standards/XMLSchema/ST96/Common}RegistrationDate",
    "{http://www.wipo.int/standards/XMLSchema/ST96/Common}ApplicationNumberText",
    "{http://www.wipo.int/standards/XMLSchema/ST96/Common}PartyIdentifier",
    "{http://www.wipo.int/standards/XMLSchema/ST96/Common}EntityName",
    "{http://www.wipo.int/standards/XMLSchema/ST96/Common}PostalAddressText",
    "{http://www.wipo.int/standards/XMLSchema/ST96/Common}PatentCitationText",
    "{http://www.wipo.int/standards/XMLSchema/ST96/Common}PersonFullName",
    "{http://www.wipo.int/standards/XMLSchema/ST96/Common}P",
    "{http://www.wipo.int/standards/XMLSchema/ST96/Common}FigureReference",
    "{http://www.wipo.int/standards/XMLSchema/ST96/Patent}PlainLanguageDesignationText",
    "{http://www.wipo.int/standards/XMLSchema/ST96/Patent}FilingDate",
    "{http://www.wipo.int/standards/XMLSchema/ST96/Patent}InventionTitle",
    "{http://www.wipo.int/standards/XMLSchema/ST96/Patent}MainClassification",
    "{http://www.wipo.int/standards/XMLSchema/ST96/Patent}FurtherClassification",
    "{http://www.wipo.int/standards/XMLSchema/ST96/Patent}PatentClassificationText",
    "{http://www.wipo.int/standards/XMLSchema/ST96/Patent}SearchFieldText",
    "{http://www.wipo.int/standards/XMLSchema/ST96/Patent}ClaimText",
]

# 埋め込み関数のラッパーを作成
class OllamaEmbeddingFunction:
    def __init__(self, model):
        self.model = model
    def embed_documents(self, texts):
        embeddings = []
        for text in texts:
            response = ollama.embeddings(model=self.model, prompt=text)
            embeddings.append(response['embedding'])
        return embeddings  # ここで計算した埋め込みを返します

def set_element(level, trees, el):
    trees.append({"tag" : el.tag, "attrib" : el.attrib, "content_page" :el.text})

def set_child(level, trees, el):
    set_element(level, trees, el)
    for child in el:
        set_child(level+1, trees, child)

def parse_and_get_element(input_file):
    tmp_elements = []
    new_elements = []
    tree = ET.parse(input_file)
    root = tree.getroot()
    set_child(1, tmp_elements, root)
    for name_space_tag_name in name_spaces_tag_names:
        for tmp_element in tmp_elements:
            if tmp_element["tag"] == name_space_tag_name:
                new_elements.append(tmp_element)
    return new_elements

def execute():
    title = ""
    entryName = ""
    patentCitationText = ""

    files = glob.glob(os.path.join("C:/Users/ogiki/JPB_2024999", "**/*.*"), recursive=True)
    for file in files:
        base, ext = os.path.splitext(file)
        if ext == '.xml':
            topic_name = os.path.splitext(os.path.basename(file))[0]
            print(file)

            text_splitter = CharacterTextSplitter(chunk_size=300, chunk_overlap=0)
            new_elements = parse_and_get_element(file)
            for new_element in new_elements:
                try:
                    text = new_element["content_page"]
                    tag = new_element["tag"]
                    title = text if tag == "{http://www.wipo.int/standards/XMLSchema/ST96/Patent}InventionTitle" else ""
                    entryName = text if tag == "{http://www.wipo.int/standards/XMLSchema/ST96/Common}EntityName" else ""
                    patentCitationText = text if tag == "{http://www.wipo.int/standards/XMLSchema/ST96/Common}PatentCitationText" else ""

                    documents = text_splitter.create_documents(texts=[text], metadatas=[{
                        "name": topic_name, 
                        "source": file, 
                        "tag": tag, 
                        "title": title,
                        "entry_name": entryName, 
                        "patent_citation_text" : patentCitationText}]
                    )
                    docs.extend(documents)
                except Exception as e:
                    continue

    # OllamaEmbeddingFunctionのインスタンスを作成
    # embedding_function = OllamaEmbeddingFunction(model='nomic-embed-text')
    # db = Chroma(persist_directory="C:/Users/ogiki/vectorDB/nomic-embed-text", embedding_function=embedding_function)
    # embedding_function = OllamaEmbeddingFunction(model='mxbai-embed-large')
    # db = Chroma(persist_directory="C:/Users/ogiki/vectorDB/mxbai-embed-large", embedding_function=embedding_function)
    # embedding_function = OllamaEmbeddingFunction(model='snowflake-arctic-embed')
    # db = Chroma(persist_directory="C:/Users/ogiki/vectorDB/snowflake-arctic-embed", embedding_function=embedding_function)
    # embedding_function = OllamaEmbeddingFunction(model='all-minilm')
    # db = Chroma(persist_directory="C:/Users/ogiki/vectorDB/all-minilm", embedding_function=embedding_function)
    # embedding_function = OllamaEmbeddingFunction(model='unclemusclez/jina-embeddings-v2-base-code')
    # db = Chroma(persist_directory="C:/Users/ogiki/vectorDB/unclemusclez_jina-embeddings-v2-base-code", embedding_function=embedding_function)
    embedding_function = OllamaEmbeddingFunction(model='bge-m3')
    db = Chroma(persist_directory="C:/Users/ogiki/vectorDB/bge-m3", embedding_function=embedding_function)

    intv = 500
    ln = len(docs)
    max_loop = int(ln / intv) + 1
    for i in range(max_loop):
        splitted_documents = text_splitter.split_documents(docs[intv * i : intv * (i+1)])
        db.add_documents(splitted_documents)

if __name__ == "__main__":
    formatted_time = datetime.now().strftime("%H:%M:%S")
    print("開始時刻:", formatted_time)
    execute()
    formatted_time = datetime.now().strftime("%H:%M:%S")
    print("修了時刻:", formatted_time)

プログラムの詳細な説明については、過去の投稿記事をご覧いただければ幸いです。
ご不明な点がありましたら、ぜひお気軽にご質問ください。

質疑応答処理

chroma_streamlit.py

import streamlit as st
from langchain_community.chat_models.ollama import ChatOllama
from langchain.prompts import PromptTemplate
from langchain.schema import HumanMessage
from langchain.vectorstores import Chroma
import ollama
from datetime import datetime

# 埋め込み関数のラッパーを作成
class OllamaEmbeddingFunction:
    def __init__(self, model):
        self.model = model
    def embed_documents(self, texts):
        embeddings = []
        for text in texts:
            response = ollama.embeddings(model=self.model, prompt=text)
            embeddings.append(response['embedding'])
        return embeddings  # ここで計算した埋め込みを返します
    def embed_query(self, query):
        response = ollama.embeddings(model=self.model, prompt=query)
        return response['embedding']  # クエリの埋め込みを返す


# embedding_function = OllamaEmbeddingFunction(model='nomic-embed-text')
# embedding_function = OllamaEmbeddingFunction(model='mxbai-embed-large')
# embedding_function = OllamaEmbeddingFunction(model='snowflake-arctic-embed')
# embedding_function = OllamaEmbeddingFunction(model='all-minilm')
# embedding_function = OllamaEmbeddingFunction(model='unclemusclez/jina-embeddings-v2-base-code')
embedding_function = OllamaEmbeddingFunction(model='bge-m3')
chat = ChatOllama(model="llama3.2", temperature=0)
# chat = ChatOllama(model="llama3.1", temperature=0)
# chat = ChatOllama(model="qwen2.5", temperature=0)
# chat = ChatOllama(model="mistral", temperature=0)
# chat = ChatOllama(model="qwen2", temperature=0)
database = Chroma(
    # persist_directory="C:/Users/ogiki/vectorDB/nomic-embed-text", 
    # persist_directory="C:/Users/ogiki/vectorDB/mxbai-embed-large", 
    # persist_directory="C:/Users/ogiki/vectorDB/snowflake-arctic-embed", 
    # persist_directory="C:/Users/ogiki/vectorDB/all-minilm", 
    # persist_directory="C:/Users/ogiki/vectorDB/unclemusclez/jina-embeddings-v2-base-code", 
    persist_directory="C:/Users/ogiki/vectorDB/bge-m3", 
    embedding_function=embedding_function
)

prompt = PromptTemplate(template="""文章を元に質問に答えてください。 

文章: 
{document}

質問: {query}
""", input_variables=["document", "query"])


# =====================================================
st.title("特許検索システム")

if "messages" not in st.session_state:
    st.session_state.messages = []

for message in st.session_state.messages:
    with st.chat_message(message["role"]):
        st.markdown(message["content"])

input_message = st.chat_input("準備ができました！メッセージを入力してください！")
text_input = st.text_input("ここに番号を入力してください")

if input_message:
    formatted_time = datetime.now().strftime("%H:%M:%S")
    print("開始時刻:", formatted_time)
    st.session_state.messages.append({"role": "user", "content": input_message})
    print(f"入力されたメッセージ: {input_message}")
    
    with st.chat_message("user"):
        st.markdown(input_message)

    with st.chat_message("assistant"):
        # ----- VectorDBからドキュメントを取得 (ローカルEmbeddingを利用) -----
        documents = database.similarity_search_with_score(input_message, k=3, filter={"name":text_input})
        documents_string = ""
        for document in documents:
            print("---------------document.metadata---------------")
            print(document[0].metadata)
            print(document[1])
            documents_string += f"""
                ---------------------------
                {document[0].page_content}
                """
        print("---------------documents_string---------------")
        print(input_message)
        print(documents_string)
        # ----- プロンプトを基に回答をもらう (ローカルLLMを利用) -----
        result = chat([
            HumanMessage(content=prompt.format(document=documents_string,
                                            query=input_message))
        ])
        st.markdown(result.content)
        st.session_state.messages.append({"role": "assistant", "content": result.content})
    formatted_time = datetime.now().strftime("%H:%M:%S")
    print("修了時刻:", formatted_time)

このプログラムの詳細な説明についても、過去の投稿記事をご覧いただければ幸いです。
ご不明な点がありましたら、ぜひお気軽にご質問ください。

比較結果（Embedding）

まずはembeddingモデルの性能比較を行いました。今回の比較では、llmにllama3.2を使用しています。

データベース登録処理（秒）

モデル	nomic-embed-text	mxbai-embed-large	all-minilm	unclemusclez/jina-embeddings-v2-base-code	bge-m3
処理時間	1,142	2,529	94	92	2,984

この結果から、処理速度においてはall-minilmとunclemusclez/jina-embeddings-v2-base-codeが最も優れていることがわかります。最速のモデルと最も遅いモデルでは、処理時間に20倍以上の差がありました。

質疑応答処理（秒）

質問内容	nomic-embed-text	mxbai-embed-large	all-minilm	unclemusclez/jina-embeddings-v2-base-code	bge-m3
「組合せ処置およびその方法」の概要を教えて	107	91	64	44	64
「フューリンインヒビター」の概要を教えて	104	31	19	86	173
用語「アルキル」の意味は？	45	39	25	17	57
「アゴニスト」の意味は？	120	124	141	10	135
平均	94.0	71.3	62.3	39.3	107.3

この結果では、平均応答時間が最も短いunclemusclez/jina-embeddings-v2-base-codeが優れた結果を示しました。all-minilmも比較的速い応答速度を見せています。

回答内容の精度

次に、回答内容の精度を評価した結果です。

質問内容	nomic-embed-text	mxbai-embed-large	all-minilm	unclemusclez/jina-embeddings-v2-base-code	bge-m3
「組合せ処置およびその方法」の概要を教えて	2	3	3	0	1
「フューリンインヒビター」の概要を教えて	2	1	0	1	3
用語「アルキル」の意味は？	3	3	3	1	3
「アゴニスト」の意味は？	2	2	2	0	2
平均	2.3	2.3	2.0	0.5	2.3

この結果から、nomic-embed-text、mxbai-embed-large、bge-m3が精度面で優れた結果を示しました。一方、unclemusclez/jina-embeddings-v2-base-codeは応答速度は速いものの、回答精度が低いことがわかります。all-minilmは、応答速度が比較的速く、精度もまずまずの結果でした。

比較結果 (llm)

次に、llmの比較結果を見てみましょう。今回はembeddingモデルをnomic-embed-textに固定し、異なるllmを用いてその性能を確認しました。

質疑応答処理 (秒)

質問文章	llama3.2	llama3.1	qwen2.5	mistral	qwen2
「組合せ処置およびその方法」の概要を教えて	107	374	306	552	726
「フューリンインヒビター」の概要を教えて	104	116	278	232	289
用語「アルキル」の意味は？	45	226	159	196	157
「組合せ処置およびその方法」の概要を教えて	120	307	690	900	419
平均	94.0	255.5	358.3	470.0	397.8

llama3.2は圧倒的に速く、llama3.1と比べても2倍以上の処理速度を誇っています。

回答内容精度

次に、回答内容の精度についての結果を示します。

質問文章	llama3.2	llama3.1	qwen2.5	mistral	qwen2
「組合せ処置およびその方法」の概要を教えて	2	1	3	1	1
「フューリンインヒビター」の概要を教えて	2	0	3	1	1
用語「アルキル」の意味は？	3	3	3	3	3
「組合せ処置およびその方法」の概要を教えて	2	1	2	1	3
平均	2.3	1.3	2.8	1.5	2.0

qwen2.5が圧倒的に高い精度を示しました。一方で、llama3.2も注目に値します。llama3.1と比較すると、処理速度と回答精度の両面で向上が見られました。

まとめ

Embeddingモデルとllmモデルについて、それぞれの最適な選択肢を検討しました。

Embeddingモデル

処理速度を最優先に考える場合は、unclemusclez/jina-embeddings-v2-base-codeが適しているかもしれません。一方、回答精度を重視するのであれば、nomic-embed-text、mxbai-embed-large、またはbge-m3の使用をおすすめします。少しでも処理速度を向上させたい方には、all-minilmも選択肢として考えられます。

llmモデル

処理速度を重視する場合にはllama3.2が適しており、回答精度を重視する場合はqwen2.5が推奨されます。全体的に見ると、llama3.2がバランスの取れた選択肢と言えるでしょう。

他に試してほしいモデルがあれば、お気軽にご連絡ください。ぜひ試してみたいと思います。最後までお読みいただき、ありがとうございました。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up