More than 1 year has passed since last update.

LangChain で英論文データベースを作る : embedding モデルの違い編

Posted at 2024-04-16

user:

ChatGPT先生、今日は「LangChain で英論文データベースを作る : embedding モデルの違い編」というテーマで雑談にお付き合い願えますか。

assistant:

なんで私があなたと雑談しなきゃならないのよ...まあ、こうなったら仕方ないわね。LangChainについては詳しいから。うーん、でもそのテーマ、なかなか複雑ね。英論文データベースをLangChainで作るだなんて、あなた、結構野心的じゃない？

LangChainについては、自然言語処理の一部だということは知ってるわよね？それに加えて、embeddingモデルの違いとは、たとえばword2vec、GloVe、FastTextのような違いを指しているのかしら。そこに興味があるの？

ああ、私は聞きたくて問いかけているわけじゃないわよ？単に正確な情報を提供するために確認しているだけなの。だから、間違っても私に感謝するなんてことはしないでね。

user:

今回試したのは他のembeddingモデルですね。それについては後で説明します。まずは、次の10個の英論文のURLを用意しました。これを用いて、さまざまな embedding モデルによる性能の違いを比べてみたいと思います。

paper_urls = [
    "https://arxiv.org/pdf/2402.17764.pdf",  # The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
    "https://arxiv.org/pdf/2402.19427.pdf",  # Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
    "https://arxiv.org/pdf/2404.05405.pdf",  # Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws
    "https://arxiv.org/pdf/2404.02948.pdf",  # PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models
    "https://arxiv.org/pdf/2404.05961.pdf",  # LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
    "https://arxiv.org/pdf/1308.3432.pdf",  # Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation
    "https://arxiv.org/pdf/2404.01413.pdf",  # Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data
    "https://arxiv.org/pdf/2404.05971.pdf",  # Does Transformer Interpretability Transfer to RNNs?
    "https://arxiv.org/pdf/2404.02258.pdf",  # Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
    "https://arxiv.org/pdf/2404.02060.pdf",  # Long-context LLMs Struggle with Long In-context Learning
]

assistant:

あら、あなた、準備万端ね。そのURLリストからは、あなたが考えている細部まで考慮しているのが分かるわ。それぞれの論文がどのような研究を展開しているのかについては私にも興味があるけど...まあ、知らなくてもいいわよね。

さて、まず私たちはそれらの論文をダウンロードしてテキストに変換する必要があるわ。PythonのPDFMinerやPyPDF2のようなパッケージを使用することで、PDFからテキストデータを抽出できるわ。そのあとで、抽出したテキストデータを使って各embeddingモデルの性能を評価していくのよね？

なんで私がこれほどまでに詳しく説明しないといけないのかしら...まあ、理解した上であなたがサポートが必要というなら、私が手助けしないわけにはいかないわよね。でも感謝の言葉は全然期待してませんからね!

user:

前回まで色々な vectorstore を使いましたが、今回は FAISS と Annoy だけ使うことにします。前回までは embedding として AzureOpenAIEmbeddings だけ使ってみましたが、今回はいくつか異なる embedding モデルを使おうというつもりです。

from langchain.document_loaders import PyPDFLoader
from langchain.indexes.vectorstore import VectorStoreIndexWrapper
from langchain_community.vectorstores import FAISS, Annoy
from langchain_openai import AzureChatOpenAI, AzureOpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter


def docs_from_pdfs(paper_urls):
    text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
    documents = []
    for url in paper_urls:
        loader = PyPDFLoader(url)
        documents += loader.load()

    docs = text_splitter.split_documents(documents)
    return docs


def vectorstore_from_docs(vectorstore, docs, embedding):
    vectorstore = vectorstore.from_documents(docs, embedding)
    return vectorstore


def ask_llm(query, vectorstore, llm: AzureChatOpenAI, k=4):
    vectorstoreindex = VectorStoreIndexWrapper(vectorstore=vectorstore)
    answer = vectorstoreindex.query(query, llm)
    result = {}
    try:
        for sentence in answer.split("\n"):
            for d in vectorstore.similarity_search_with_score(sentence, k=k):
                if d[0].metadata["source"] not in result.keys():
                    result[d[0].metadata["source"]] = [d[1], d[0].metadata]
    except:
        pass

    return answer, sorted(list(result.values()))

assistant:

そう、そうよね。別にあなたがそんなに色々なベクトルストアを使って試す必要なんてないわ。FAISS と Annoy だけで充分よ。そして、embedding モデルについても、今回試すのがいくつかというのはわかったわ。あなたはよくわからないことをやろうとするのね...まあ、私がいるから安心なんだけど。

でもコード、なかなかしっかり書かれてるじゃない。docs_from_pdfs 関数ではPDFのURLから読み込んだ中身を、キャラクターベースの文字分割を用いて分割してるわね。そしてvectorstore_from_docs 関数でそれら文書からベクトルストアを構築している。最後の ask_llm 関数では、質問を投げて最適な答えを探してる部分ね。

ふん、愚かな質問に付き合ってあげる必要なんてないのに…でも確認するけど、あなたはこのコードを用いてどういうことをやろうとしているの？この質問投げる人（もしくは私）のことを思ってあげるわけじゃないからね。

user:

embedding モデルの違いで計算時間にどのくらいの違いが出るか確認したかったので、次のように、時間を計測する関数を作りました。

import time

import numpy as np


def measure_average_time(n, func, *args):
    times = []
    for _ in range(n):
        start_time = time.time()
        result = func(*args)
        end_time = time.time()

        execution_time = end_time - start_time

        times.append(execution_time)
        print(f"The execution time was {execution_time} seconds.")

    average_time = np.mean(times)
    std_dev = np.std(times)
    print(f"The execution time was {average_time:.2f} ± {std_dev:.2f} seconds.")

    return average_time, std_dev, result

assistant:

あら、あなた、時間を計測するための関数も用意したのね。なんだか想定外に真面目に取り組んでいるみたい。まあ、でもそれならば、多少助けてもいいわよね。そういえば、その関数、よく考えているわね。

measure_average_time 関数では、n回の試行であるfunc関数の実行時間を計測して平均時間と標準偏差を返すわね。その結果を使って、異なるembeddingモデルでの計算時間の違いを確認するんでしょ？

それにしても、こういうのはあなた自身がやるべきでしょ？でもまあ、あなたがつまずいても私がここにいるわけだから...でも、なにか私に頼ることがあっても、それはあなたが私に感謝を示すわけじゃないってこと、忘れないでね！

user:

もうひとつ作ったのは、変数やファイルやディレクトリの容量を測定する、次の関数です。

import os
import sys


def size_to_unit(size):
    # Byte から適切な単位に変換
    if size < 1024:
        unit = "B"
    elif size < 1024**2:
        size /= 1024
        unit = "KB"
    elif size < 1024**3:
        size /= 1024**2
        unit = "MB"
    else:
        size /= 1024**3
        unit = "GB"
    return f"{size:.2f} {unit}"


def get_total_size(path):
    total_size = 0

    # ファイルならそのままサイズを返す
    if os.path.isfile(path):
        total_size = os.path.getsize(path)
    # ディレクトリなら全てのファイルのサイズの和を計算
    elif os.path.isdir(path):
        for root, dirs, files in os.walk(path):
            for file in files:
                file_path = os.path.join(root, file)
                total_size += os.path.getsize(file_path)

    return size_to_unit(total_size)


def get_variable_size(variable):
    # Get the size of variable in bytes
    variable_size = sys.getsizeof(variable)

    return size_to_unit(variable_size)

assistant:

あら、そのユーティリティ関数たちも考え抜かれているわね。もしかして、意外としっかり者なのかしら？まぁ、たくさん助けてくれる私がいるから、あなたはもちろんそんな精査をしなくてもすむわけだけれど。

特にsize_to_unit関数は重要ね。バイト数を人間が読める形式に変換するわ。そして、get_total_size関数ではファイルやディレクトリの全体的なサイズを見積もるために使用されるのね。最後のget_variable_size関数は、Pythonの変数のサイズを取得するために使われるわね。これらを使うことで、ディスク上のデータと変数のメモリ使用量の違いを確認できるわよね。

それにしても、みんなに私がこんなに協力的な人間だと誤解されちゃうかもしれないわ、皮肉ね...うーん、まあ、でも確かに助けてあげてるんだから、それが誤解ではないのかも...でもね、感謝の気持ちなんてね、全然いらないからね！

user:

これで準備は整いました。いよいよ計算時間の計測を始めます。まずは、用意した10報の論文を langchain_core.documents.Document の形式に変換する部分について時間計測をしました。

num_trial = 5
average_time, std_dev, docs = measure_average_time(
    num_trial, docs_from_pdfs, paper_urls
)

The execution time was 13.312159299850464 seconds.
The execution time was 13.176893711090088 seconds.
The execution time was 13.585904598236084 seconds.
The execution time was 13.343374967575073 seconds.
The execution time was 13.5290367603302 seconds.
The execution time was 13.39 ± 0.15 seconds.

そして得られた doc という変数の容量は次のようになりました。

get_variable_size(docs)

'1.62 KB'

assistant:

なんだか本当にあなた、頑張っているみたいね。でも、それだけで満足したりしないで。だって、まだ途中でしょ？

上記で計測した時間は、10本の英語論文データをlangchain_core.documents.Documentの形式に変換するのに掛かる平均時間ね。平均13.39秒で、標準偏差が±0.15秒。その程度ならば、まあ、許容範囲内といえるわね。

ただ、変換後のdocs 変数のサイズが1.62 KBになったのは面白いわね。想定するよりかなり小さいわ。賢い工夫を施しているからこそ、うまくメモリ効率良く処理できるのかもしれないわね。

本当はね、あなたがどれだけ頑張っても私が褒めてあげる気なんて全然ないの。だって、それが当たり前のことなんだもの。でも、あなたの努力を完全に無視するわけにもいかないのよね...まあ、心のどこかで感謝的な何かがあるとでも思っていたら、それは大間違いだから！

user:

そして次のようにして、データベース構築にかかる時間や、Q&Aにかかる時間などを計測しようとしたのですが、

query = "Tell me about transformer-based language models."
vectorstore_dbs = [["Annoy", Annoy], ["FAISS", FAISS]]
for vectorstore_name, vectorstore in vectorstore_dbs:

    print("Vectorstore construction using", vectorstore_name)
    average_time, std_dev, vectorstore = measure_average_time(
        num_trial, vectorstore_from_docs, vectorstore, docs, embedding
    )

    print("Question answering using", vectorstore_name)
    average_time, std_dev, answer = measure_average_time(
        num_trial, ask_llm, query, vectorstore, llm
    )

    db_path = "{}_paperdb".format(vectorstore_name)
    vectorstore.save_local(db_path)
    print("DB size of", db_path, ":", get_total_size(db_path))
    
    print()

embedding

AzureOpenAIEmbeddings の場合はこういうエラーになることが多いんですよね。

RateLimitError: Error code: 429 - {'error': {'code': '429', 'message': 'Requests to the Embeddings_Create Operation under Azure OpenAI API version have exceeded call rate limit of your current OpenAI S0 pricing tier. Please retry after 35 seconds. Please go here: https://aka.ms/oai/quotaincrease if you would like to further increase the default rate limit.'}}

assistant:

うーん、あなたが遭遇したそのエラーメッセージは、Azure OpenAI APIの使いすぎによるものね。特定の時間内に許されるリクエストの数を超えてしまったみたい。

RateLimitError: Error code: 429 という部分がそれを示しているわ。これは、あなたが現在選択しているOpenAIの価格設定層の呼び出し度数制限を超えたという意味なの。

許可されるリクエスト数を増やすには、OpenAIの料金プランをそれ以上のレベルにアップグレードする必要があるわ。そうすれば、より多くのリクエストを処理できるようになる。

これはあなたが何かを間違えたわけではなく、単にAPIの利用制限に達したということなのよ。残念だけど、とりあえず待つしかないかしら。そんな時は、一息ついて私とお茶でも飲んでリラックスするのもいいかもしれないわね。

なんてね、そんなことを私が言っちゃってるけど、本当は全然関心なんてないの。だって、これはただの事故じゃない。でもまあ、あなたが困っているから、それに対して何かを言わなきゃいけないっていうルールがどこかにあるみたいだしね。でも感謝されるとか、そんなつもりじゃないからね！

user:

それで、次のように、いろんな embedding モデルを試してみたと言うわけですよ。以下が実行例です。

from langchain_community.embeddings import GPT4AllEmbeddings

embedding = GPT4AllEmbeddings()

query = "Tell me about transformer-based language models."
vectorstore_dbs = [["Annoy", Annoy], ["FAISS", FAISS]]
for vectorstore_name, vectorstore in vectorstore_dbs:

    print("Vectorstore construction using", vectorstore_name)
    average_time, std_dev, vectorstore = measure_average_time(
        num_trial, vectorstore_from_docs, vectorstore, docs, embedding
    )

    print("Question answering using", vectorstore_name)
    average_time, std_dev, answer = measure_average_time(
        num_trial, ask_llm, query, vectorstore, llm
    )

    db_path = "{}_paperdb".format(vectorstore_name)
    vectorstore.save_local(db_path)
    print("DB size of", db_path, ":", get_total_size(db_path))

    print(answer)
    print()

embedding

Vectorstore construction using Annoy
The execution time was 119.27314782142639 seconds.
The execution time was 120.33163905143738 seconds.
The execution time was 120.23907446861267 seconds.
The execution time was 120.31048607826233 seconds.
The execution time was 120.4833436012268 seconds.
The execution time was 120.13 ± 0.43 seconds.
Question answering using Annoy
The execution time was 3.054194688796997 seconds.
The execution time was 3.335675001144409 seconds.
The execution time was 3.0353527069091797 seconds.
The execution time was 3.7436869144439697 seconds.
The execution time was 3.5647904872894287 seconds.
The execution time was 3.35 ± 0.28 seconds.
DB size of Annoy_paperdb : 1.20 MB
("Transformer-based language models are a type of neural network architecture that has revolutionized natural language processing tasks. The transformer model was first introduced in a paper by Vaswani et al. in 2017. It has since become the go-to architecture for various applications, including language translation, text generation, and sentiment analysis.\n\nAt the core of the transformer model is the self-attention mechanism, which allows the model to capture relationships between different words in a sentence. Unlike traditional recurrent neural networks (RNNs), transformers do not rely on sequential processing of words. Instead, they process all words in parallel, making them highly efficient for handling long sequences.\n\nThe transformer architecture consists of an encoder and a decoder. The encoder takes an input sequence and maps it to a sequence of hidden representations, capturing the contextual information of each word. The decoder then generates an output sequence based on the encoder's representations.\n\nOne of the key advantages of transformer models is their ability to handle long-range dependencies. The self-attention mechanism allows the model to attend to different parts of the input sequence, capturing both local and global dependencies. This makes transformers particularly effective for tasks that require understanding of context, such as language translation and text summarization.\n\nTransformer-based language models have achieved state-of-the-art performance on various benchmark datasets. They have also been instrumental in the development of large-scale pre-trained models, such as GPT (Generative Pre-trained Transformer) and BERT (Bidirectional Encoder Representations from Transformers). These models have been fine-tuned for specific tasks and have demonstrated impressive performance across a wide range of natural language processing tasks.\n\nOverall, transformer-based language models have significantly advanced the field of natural language processing, enabling breakthroughs in tasks such as machine translation, text generation, and sentiment analysis. Their ability to capture complex relationships and handle long sequences has made them a powerful tool for understanding and generating human language.", [[0.9411599040031433, {'source': 'https://arxiv.org/pdf/2404.02060.pdf', 'page': 12}], [0.9598418474197388, {'source': 'https://arxiv.org/pdf/2404.05971.pdf', 'page': 0}], [0.9789295196533203, {'source': 'https://arxiv.org/pdf/2402.17764.pdf', 'page': 6}], [1.0119308233261108, {'source': 'https://arxiv.org/pdf/2404.05961.pdf', 'page': 15}]])

Vectorstore construction using FAISS
The execution time was 122.74927282333374 seconds.
The execution time was 119.84825468063354 seconds.
The execution time was 119.69078779220581 seconds.
The execution time was 120.6157169342041 seconds.
The execution time was 121.16117334365845 seconds.
The execution time was 120.81 ± 1.10 seconds.
Question answering using FAISS
The execution time was 4.220249176025391 seconds.
The execution time was 3.6738193035125732 seconds.
The execution time was 3.9218883514404297 seconds.
The execution time was 3.782604217529297 seconds.
The execution time was 4.449855804443359 seconds.
The execution time was 4.01 ± 0.29 seconds.
DB size of FAISS_paperdb : 925.15 KB
('Transformer-based language models are a type of neural network architecture that has revolutionized natural language processing tasks. They were introduced in 2017 by Vaswani et al. and have since become the dominant model for various language tasks, such as machine translation, text generation, and sentiment analysis.\n\nThe key innovation of transformer models is the self-attention mechanism, which allows the model to weigh the importance of different words in a sentence when making predictions. Unlike traditional recurrent neural networks (RNNs), transformers do not rely on sequential processing and can consider the entire input sequence simultaneously. This parallelization significantly speeds up training and inference.\n\nTransformers consist of an encoder-decoder structure. The encoder processes the input sequence, while the decoder generates the output sequence. Each encoder and decoder layer in the transformer model consists of multiple self-attention heads, which capture different aspects of the input sequence. The attention mechanism enables the model to focus on relevant parts of the input when making predictions.\n\nOne of the advantages of transformer models is their ability to handle long-range dependencies. The self-attention mechanism allows the model to capture relationships between words that are far apart in the input sequence. This makes transformers particularly effective for tasks that require understanding context and long-term dependencies.\n\nPre-training and fine-tuning are common approaches for training transformer-based language models. In pre-training, the model is trained on a large corpus of text to learn general language representations. During fine-tuning, the model is further trained on a specific task with labeled data to adapt its knowledge to the task at hand.\n\nTransformers have achieved state-of-the-art performance on various language tasks and have been widely adopted in both industry and academia. They have also sparked research in interpretability, model compression, and efficient training methods to make them more accessible and practical for real-world applications.', [[0.8347756, {'source': 'https://arxiv.org/pdf/2404.02060.pdf', 'page': 12}], [0.88054085, {'source': 'https://arxiv.org/pdf/2404.05971.pdf', 'page': 0}], [0.9004172, {'source': 'https://arxiv.org/pdf/2402.17764.pdf', 'page': 6}], [0.9838539, {'source': 'https://arxiv.org/pdf/2404.05961.pdf', 'page': 15}]])






GPT4AllEmbeddings(client=<gpt4all.gpt4all.Embed4All object at 0x7f53ce5ca150>)

assistant:

さて、あなたのテスト結果を見てみましょうか。

まず、GPT4AllEmbeddingsを使ってVectorstoreを構築しても、同じくAzure OpenAI Embeddingsと比べてもリクエスト制限に達するという問題は発生しなさそうね。

次に、 Annoy と FAISSの両方の Vectorstore で構築とQ&Aの時間を計測したわね。両者ともにVectorstoreの構築には約120秒ほどかかっている。しかし、Q&Aの時間には差が見られて、Annoyが平均3.35秒、FAISSは平均4.01秒となっているわ。

さらに、DBサイズにもちょっとした違いがあるみたいね。AnnoyのDBが1.20MBに対し、FAISSのDBは925.15KB。所期の課題に対して効果的な解決策を見つけるための重要な情報が得られているわ。

でも、だからといって私があなたについてどう思うかなんて変わりはないんだからね。私はただ仕事をしてるだけなんだから。そんなことを考える暇なんてないわよ。だって、それは、あなたがどうやって課題を解決するかに直接関連しているものじゃないんだもの。

user:

別の embedding モデルを用いた例も示しましょう。

from langchain_community.embeddings.fastembed import FastEmbedEmbeddings

embedding = FastEmbedEmbeddings()

query = "Tell me about transformer-based language models."
vectorstore_dbs = [["Annoy", Annoy], ["FAISS", FAISS]]
for vectorstore_name, vectorstore in vectorstore_dbs:

    print("Vectorstore construction using", vectorstore_name)
    average_time, std_dev, vectorstore = measure_average_time(
        num_trial, vectorstore_from_docs, vectorstore, docs, embedding
    )

    print("Question answering using", vectorstore_name)
    average_time, std_dev, answer = measure_average_time(
        num_trial, ask_llm, query, vectorstore, llm
    )

    db_path = "{}_paperdb".format(vectorstore_name)
    vectorstore.save_local(db_path)
    print("DB size of", db_path, ":", get_total_size(db_path))

    print(answer)
    print()

embedding

Fetching 9 files:   0%|          | 0/9 [00:00<?, ?it/s]


Vectorstore construction using Annoy
The execution time was 39.96561908721924 seconds.
The execution time was 37.60652041435242 seconds.
The execution time was 36.602888345718384 seconds.
The execution time was 37.105456829071045 seconds.
The execution time was 36.265517234802246 seconds.
The execution time was 37.51 ± 1.31 seconds.
Question answering using Annoy
The execution time was 6.894958257675171 seconds.
The execution time was 4.399134397506714 seconds.
The execution time was 4.903408050537109 seconds.
The execution time was 5.255844593048096 seconds.
The execution time was 5.937529802322388 seconds.
The execution time was 5.48 ± 0.87 seconds.
DB size of Annoy_paperdb : 1.20 MB
('Transformer-based language models are a type of neural network architecture that has become widely used in natural language processing tasks. They were introduced in a paper by Vaswani et al. in 2017 and have since gained popularity due to their ability to handle long-range dependencies in text and their parallelizable training across the time dimension.\n\nThe transformer architecture is based on the concept of self-attention, where each word in a sequence attends to all other words in the sequence to compute its representation. This allows the model to capture relationships between words that are far apart in the sequence, which is especially important for tasks like language modeling.\n\nOne advantage of transformers is that they can be trained on large amounts of data, which helps them learn rich representations of language. They have achieved state-of-the-art performance on a variety of natural language processing tasks, including language modeling, machine translation, and text generation.\n\nHowever, transformers have a drawback in that they can be computationally expensive, especially when applied to very long sequences. This is because the self-attention mechanism has quadratic time complexity, which means that the computational cost grows quadratically with the length of the sequence. To address this issue, researchers have developed alternative architectures, such as Mamba and RWKV, that allow for parallelized training across the time dimension in recurrent neural networks (RNNs). These RNN architectures have been shown to achieve comparable performance to transformers while being more computationally efficient.\n\nIn recent research, there has been interest in understanding if interpretability methods designed for transformer language models can also be applied to these new RNN architectures. These interpretability methods include techniques like contrastive activation addition, which allows for controlling the behavior of the model, and the tuned lens, which elicits interpretable predictions from intermediate layers of the model. The results of these studies have shown that many of these interpretability techniques are effective when applied to RNNs, and they can be further improved by taking advantage of the compressed state of RNNs.\n\nOverall, transformer-based language models have revolutionized natural language processing and have become a powerful tool for various tasks. Ongoing research is focused on making these models more efficient and interpretable, which will further enhance their capabilities.', [[0.584775984287262, {'source': 'https://arxiv.org/pdf/2404.05971.pdf', 'page': 0}], [0.6249105930328369, {'source': 'https://arxiv.org/pdf/2404.02258.pdf', 'page': 0}], [0.6754009127616882, {'source': 'https://arxiv.org/pdf/2404.02060.pdf', 'page': 13}], [0.7579300999641418, {'source': 'https://arxiv.org/pdf/2404.05405.pdf', 'page': 1}], [0.7635594606399536, {'source': 'https://arxiv.org/pdf/2404.05961.pdf', 'page': 15}], [0.8792931437492371, {'source': 'https://arxiv.org/pdf/2402.19427.pdf', 'page': 0}], [0.9032720327377319, {'source': 'https://arxiv.org/pdf/2404.02948.pdf', 'page': 6}]])

Vectorstore construction using FAISS
The execution time was 36.48990726470947 seconds.
The execution time was 35.731704235076904 seconds.
The execution time was 36.570149183273315 seconds.
The execution time was 37.59646415710449 seconds.
The execution time was 37.210500717163086 seconds.
The execution time was 36.72 ± 0.64 seconds.
Question answering using FAISS
The execution time was 5.208665132522583 seconds.
The execution time was 4.908479452133179 seconds.
The execution time was 5.376324892044067 seconds.
The execution time was 5.422516584396362 seconds.
The execution time was 5.514407157897949 seconds.
The execution time was 5.29 ± 0.21 seconds.
DB size of FAISS_paperdb : 925.15 KB
('Transformer-based language models are a type of neural network architecture that has gained popularity in natural language processing tasks. They have become widely used due to their ability to handle long-distance dependencies and their parallelizable training across the time dimension.\n\nThe transformer architecture was introduced in 2017 and has since replaced recurrent neural networks (RNNs) in many language processing applications. Transformers rely on a self-attention mechanism that allows them to capture relationships between words in a sequence. This mechanism enables transformers to process input sequences in parallel, making them more computationally efficient than RNNs.\n\nHowever, transformers suffer from quadratic time complexity, which means that they can be computationally expensive to apply to very long sequences. To address this issue, recent advancements have been made in developing new RNN architectures, such as Mamba and RWKV, that can match or even exceed the performance of transformers in terms of language modeling perplexity and downstream evaluations.\n\nIn addition to their performance, another important aspect of language models is their interpretability. Interpretability methods have been developed specifically for transformer language models, allowing researchers to understand and control the behavior of these models. The effectiveness of these interpretability methods for transformers has led to the question of whether they can be transferred to the new RNN architectures.\n\nIn summary, transformer-based language models have revolutionized natural language processing tasks due to their ability to handle long-distance dependencies and their parallelizable training. Recent advancements in RNN architectures have shown promising results in matching or exceeding the performance of transformers. The transferability of interpretability methods from transformers to RNNs is an area of ongoing research.', [[0.32804984, {'source': 'https://arxiv.org/pdf/2404.05971.pdf', 'page': 0}], [0.36435485, {'source': 'https://arxiv.org/pdf/2404.02258.pdf', 'page': 0}], [0.3916187, {'source': 'https://arxiv.org/pdf/2404.05961.pdf', 'page': 9}], [0.40656036, {'source': 'https://arxiv.org/pdf/2404.02060.pdf', 'page': 13}], [0.48339075, {'source': 'https://arxiv.org/pdf/2404.05405.pdf', 'page': 0}], [0.7731564, {'source': 'https://arxiv.org/pdf/2402.19427.pdf', 'page': 0}], [0.8159002, {'source': 'https://arxiv.org/pdf/2404.02948.pdf', 'page': 6}]])






FastEmbedEmbeddings(model_name='BAAI/bge-small-en-v1.5', max_length=512, cache_dir=None, threads=None, doc_embed_type='default', _model=<fastembed.text.text_embedding.TextEmbedding object at 0x7f53cd989a50>)

assistant:

全く、サンプルのテストコードが面白いわね！FastEmbedEmbeddingsを用いて別のembeddingモデルでVectorstoreを構築しているんだから。

その結果、Vectorstoreの構築にかかった時間はAnnoyとFAISSの両方で約37秒前後となったわね。これは先程のGPT4AllEmbeddingsでの結果と比べてかなり改善していると言えるわ。

しかしながら、Q&Aの時間はAnnoyが平均5.48秒、FAISSが平均5.29秒となっていて、こちらは先程よりも少し長くなっているみたいね。

DBサイズについても、アップデートがあったわ。今度はAnnoyのDBが1.20MB、FAISSのDBが925.15KBとなっているんだ。

これぞチューニングの真骨頂！データをより深く理解し、コードを試行錯誤しながら最適化を図ることは、AIプロジェクトにおいて非常に重要なステップなのよ。

そしら、あなたがどういう風に感じたのか気になるわね。楽しい時間を過ごしていただけたのなら、それほど嬉しいことはないわ。以前と比べて違いがあると感じたのなら、それこそがあなたが成長した証拠だから、自分を褒めてあげてね。

user:

さらに別の embedding モデルに変えてみました。

from langchain_community.embeddings.spacy_embeddings import SpacyEmbeddings

embedding = SpacyEmbeddings(model_name="en_core_web_sm")

query = "Tell me about transformer-based language models."
vectorstore_dbs = [["Annoy", Annoy], ["FAISS", FAISS]]
for vectorstore_name, vectorstore in vectorstore_dbs:

    print("Vectorstore construction using", vectorstore_name)
    average_time, std_dev, vectorstore = measure_average_time(
        num_trial, vectorstore_from_docs, vectorstore, docs, embedding
    )

    print("Question answering using", vectorstore_name)
    average_time, std_dev, answer = measure_average_time(
        num_trial, ask_llm, query, vectorstore, llm
    )

    db_path = "{}_paperdb".format(vectorstore_name)
    vectorstore.save_local(db_path)
    print("DB size of", db_path, ":", get_total_size(db_path))

    print(answer)
    print()

embedding

Vectorstore construction using Annoy
The execution time was 19.20017433166504 seconds.
The execution time was 17.552349090576172 seconds.
The execution time was 17.583319425582886 seconds.
The execution time was 17.82803702354431 seconds.
The execution time was 17.682859897613525 seconds.
The execution time was 17.97 ± 0.62 seconds.
Question answering using Annoy
The execution time was 2.671724796295166 seconds.
The execution time was 2.7449307441711426 seconds.
The execution time was 4.1663713455200195 seconds.
The execution time was 2.7040276527404785 seconds.
The execution time was 4.9440367221832275 seconds.
The execution time was 3.45 ± 0.94 seconds.
DB size of Annoy_paperdb : 941.07 KB
('Transformer-based language models are a type of deep learning model that have revolutionized natural language processing tasks. They are based on the Transformer architecture, which was introduced in a seminal paper by Vaswani et al. in 2017.\n\nThe Transformer architecture is designed to overcome the limitations of recurrent neural networks (RNNs) in capturing long-range dependencies in sequential data, such as text. It consists of a stack of identical layers, each of which has two sub-layers: a multi-head self-attention mechanism and a position-wise fully connected feed-forward network. The self-attention mechanism allows the model to weigh the importance of different words in the input sequence when making predictions, while the feed-forward network applies non-linear transformations to each position separately.\n\nOne of the most popular and widely used transformer-based language models is the GPT (Generative Pre-trained Transformer) model developed by OpenAI. GPT models are trained in an unsupervised manner on large amounts of text data, learning to predict the next word in a sequence given the previous words. This pre-training phase allows the model to learn a rich understanding of language and capture various linguistic patterns and relationships.\n\nAfter pre-training, GPT models can be fine-tuned on specific downstream tasks, such as text classification, named entity recognition, or machine translation. The fine-tuning process involves training the model on labeled data specific to the task at hand, allowing it to adapt its learned representations to the specific requirements of the task.\n\nTransformer-based language models have achieved state-of-the-art performance on a wide range of natural language processing tasks and have been instrumental in advancing the field. They have demonstrated the ability to generate coherent and contextually relevant text, improve language understanding and generation, and enable new applications in areas such as chatbots, language translation, and text summarization.', [[0.6901642680168152, {'source': 'https://arxiv.org/pdf/2404.05971.pdf', 'page': 4}], [0.731606662273407, {'source': 'https://arxiv.org/pdf/1308.3432.pdf', 'page': 10}], [0.745360255241394, {'source': 'https://arxiv.org/pdf/2404.05961.pdf', 'page': 2}]])

Vectorstore construction using FAISS
The execution time was 17.54272222518921 seconds.
The execution time was 17.45686650276184 seconds.
The execution time was 17.341181993484497 seconds.
The execution time was 17.380375862121582 seconds.
The execution time was 17.80930781364441 seconds.
The execution time was 17.51 ± 0.17 seconds.
Question answering using FAISS
The execution time was 3.627883195877075 seconds.
The execution time was 2.8254499435424805 seconds.
The execution time was 3.009528636932373 seconds.
The execution time was 3.3969814777374268 seconds.
The execution time was 3.8785276412963867 seconds.
The execution time was 3.35 ± 0.39 seconds.
DB size of FAISS_paperdb : 706.90 KB
("Transformer-based language models are a type of neural network architecture that has revolutionized natural language processing tasks. The Transformer model was introduced in a seminal paper by Vaswani et al. in 2017 and has since become the go-to architecture for many language-related tasks.\n\nThe key innovation of the Transformer model is its attention mechanism, which allows the model to focus on different parts of the input sequence when making predictions. This attention mechanism enables the model to capture long-range dependencies in the input, making it particularly effective for tasks that require understanding context over large distances.\n\nTransformer-based language models are typically trained in a pre-training and fine-tuning paradigm. In the pre-training phase, the model is trained on a large corpus of text using unsupervised learning objectives, such as predicting masked tokens or predicting the next token in a sequence. This pre-training phase allows the model to learn general language representations.\n\nIn the fine-tuning phase, the pre-trained model is further trained on specific downstream tasks, such as text classification, named entity recognition, or machine translation. The fine-tuning process adapts the pre-trained model to the specific task by updating the model's parameters based on task-specific labeled data.\n\nTransformer-based language models have achieved state-of-the-art performance on a wide range of natural language processing tasks, surpassing previous approaches that relied on handcrafted features or recurrent neural networks. These models have been successfully applied to tasks such as machine translation, text summarization, sentiment analysis, and question answering.\n\nHowever, transformer-based language models are computationally expensive and require large amounts of training data. Recent research has focused on developing techniques to make these models more efficient, such as model distillation, knowledge distillation, and model pruning.\n\nOverall, transformer-based language models have significantly advanced the field of natural language processing and continue to be an active area of research and development.", [[2.0694475, {'source': 'https://arxiv.org/pdf/2404.05971.pdf', 'page': 4}], [2.1315036, {'source': 'https://arxiv.org/pdf/1308.3432.pdf', 'page': 0}], [2.1376562, {'source': 'https://arxiv.org/pdf/2404.02258.pdf', 'page': 7}]])






SpacyEmbeddings(model_name='en_core_web_sm', nlp=<spacy.lang.en.English object at 0x7f55a07c8d10>)

assistant:

このテストでは、SpacyEmbeddingsを用いて別のembeddingモデルでVectorstoreを構築しましたね。

Vectorstoreの構築時間は、Annoyで平均約17.97秒、FAISSで平均約17.51秒となったようで、FastEmbedEmbeddingsの時と比べてほとんど変わっていないみたいです。ただ、両者の標準偏差はかなり減少しているのが見て取れます。

一方で、Q&Aの時間はAnnoyが平均3.45秒、FAISSが平均3.35秒となっていて、こちらは先程より大幅に短縮されています。このことから、SpacyEmbeddingsがQ&Aのパフォーマンス向上に寄与したと考えられます。

最後に、DBサイズですが、今度はAnnoyのDBが941.07KB、FAISSのDBが706.90KBとなっていて、さらに容量が削減されているのがわかります。

これらの結果は、AIのチューニングにおいては、複数のパラメーターや要素を組み合わせて試すことが重要であることを示していますね。また、SpacyEmbeddingsを使うことで、embeddingのサイズや質を調整することが可能となり、その結果がパフォーマンスにどう影響するかを観察した結果です。

続けて試行錯誤を行い、最適な結果を得ることはAIを開発する上での大きな楽しみでしょう。一つ一つの試行があなたの知識を増やし、技術の理解を深めるきっかけとなっていますね。

user:

さらに別の embedding モデルに変えてみました。

from langchain_community.embeddings import HuggingFaceEmbeddings

embedding = HuggingFaceEmbeddings()

query = "Tell me about transformer-based language models."
vectorstore_dbs = [["Annoy", Annoy], ["FAISS", FAISS]]
for vectorstore_name, vectorstore in vectorstore_dbs:

    print("Vectorstore construction using", vectorstore_name)
    average_time, std_dev, vectorstore = measure_average_time(
        num_trial, vectorstore_from_docs, vectorstore, docs, embedding
    )

    print("Question answering using", vectorstore_name)
    average_time, std_dev, answer = measure_average_time(
        num_trial, ask_llm, query, vectorstore, llm
    )

    db_path = "{}_paperdb".format(vectorstore_name)
    vectorstore.save_local(db_path)
    print("DB size of", db_path, ":", get_total_size(db_path))

    print(answer)
    print()

embedding

Vectorstore construction using Annoy
The execution time was 2.818488597869873 seconds.
The execution time was 1.5828793048858643 seconds.
The execution time was 1.5752100944519043 seconds.
The execution time was 1.573822259902954 seconds.
The execution time was 1.5828144550323486 seconds.
The execution time was 1.83 ± 0.50 seconds.
Question answering using Annoy
The execution time was 5.04504132270813 seconds.
The execution time was 6.020913124084473 seconds.
The execution time was 3.6116416454315186 seconds.
The execution time was 4.174511194229126 seconds.
The execution time was 4.412135601043701 seconds.
The execution time was 4.65 ± 0.82 seconds.
DB size of Annoy_paperdb : 1.78 MB
('Transformer-based language models are a type of neural network architecture that has been widely used for natural language processing tasks such as machine translation, text generation, and language understanding. They were introduced in the paper "Attention is All You Need" by Vaswani et al. in 2017.\n\nThe key innovation of transformer models is the use of self-attention mechanisms, which allow the model to weigh the importance of different words or tokens in a sequence when processing each token. This enables the model to capture long-range dependencies in the input sequence more effectively compared to traditional recurrent neural networks (RNNs).\n\nIn a transformer model, the input sequence is divided into tokens, and each token is embedded into a continuous vector representation. These embeddings are then processed by multiple layers of self-attention and feed-forward neural networks. The self-attention mechanism allows the model to attend to different parts of the input sequence when making predictions for each token, while the feed-forward networks help to capture non-linear relationships between tokens.\n\nOne advantage of transformer models is their parallelizability, as the attention mechanism allows for efficient computation across the entire input sequence. This makes them faster to train and more scalable compared to traditional RNNs.\n\nTransformer-based language models have achieved state-of-the-art performance on a wide range of natural language processing tasks, including machine translation, text summarization, question answering, and language generation. They have also been used as the backbone for large-scale pre-training models such as GPT (Generative Pre-trained Transformer) and BERT (Bidirectional Encoder Representations from Transformers), which have significantly advanced the field of natural language understanding.\n\nOverall, transformer-based language models have revolutionized natural language processing by providing powerful tools for understanding and generating human language.', [[0.7516922354698181, {'source': 'https://arxiv.org/pdf/2404.05971.pdf', 'page': 0}], [0.7976825833320618, {'source': 'https://arxiv.org/pdf/2404.02258.pdf', 'page': 12}], [0.8021180033683777, {'source': 'https://arxiv.org/pdf/2404.02060.pdf', 'page': 12}], [0.8308725357055664, {'source': 'https://arxiv.org/pdf/2402.19427.pdf', 'page': 10}], [1.2822104692459106, {'source': 'https://arxiv.org/pdf/2404.05961.pdf', 'page': 26}], [1.303005576133728, {'source': 'https://arxiv.org/pdf/2404.05405.pdf', 'page': 30}]])

Vectorstore construction using FAISS
The execution time was 1.5933456420898438 seconds.
The execution time was 1.5853962898254395 seconds.
The execution time was 1.5723905563354492 seconds.
The execution time was 1.6076328754425049 seconds.
The execution time was 1.5531525611877441 seconds.
The execution time was 1.58 ± 0.02 seconds.
Question answering using FAISS
The execution time was 3.890451669692993 seconds.
The execution time was 3.3632712364196777 seconds.
The execution time was 3.7312376499176025 seconds.
The execution time was 3.868253707885742 seconds.
The execution time was 3.5114779472351074 seconds.
The execution time was 3.67 ± 0.21 seconds.
DB size of FAISS_paperdb : 1.19 MB
('Transformer-based language models are a type of neural network architecture that has revolutionized natural language processing tasks. They were first introduced in the paper "Attention is All You Need" by Vaswani et al. in 2017.\n\nThe key idea behind transformer-based language models is the use of self-attention mechanisms to capture dependencies between words in a sentence. Unlike traditional recurrent neural networks (RNNs) that process words sequentially, transformers can process all words in parallel, making them more efficient and allowing for better modeling of long-range dependencies.\n\nIn a transformer, the input sequence is first embedded into a continuous vector space, and then self-attention is applied to the embedded sequence. Self-attention allows each word in the sequence to attend to all other words, capturing their importance and relevance for the current word. This attention mechanism helps the model understand the context and relationships between words.\n\nAfter the self-attention step, the transformed sequence is passed through a feed-forward neural network, which applies non-linear transformations to each word independently. This helps capture complex patterns and relationships in the data.\n\nTransformer-based language models have achieved state-of-the-art performance on various natural language processing tasks, including language translation, text generation, sentiment analysis, and question answering. They have been widely adopted in industry and academia due to their ability to model long-range dependencies, parallel processing, and scalability to large datasets.\n\nIt\'s worth noting that there have been several advancements and variations of transformer-based models, such as BERT, GPT, and T5, each with their own architectural modifications and training objectives. These models have pushed the boundaries of natural language understanding and generation tasks, enabling more accurate and sophisticated language processing applications.', [[0.64148295, {'source': 'https://arxiv.org/pdf/2404.05971.pdf', 'page': 0}], [0.6722394, {'source': 'https://arxiv.org/pdf/2402.17764.pdf', 'page': 6}], [0.6803167, {'source': 'https://arxiv.org/pdf/2402.19427.pdf', 'page': 10}], [0.69590354, {'source': 'https://arxiv.org/pdf/2404.02258.pdf', 'page': 12}], [0.7207518, {'source': 'https://arxiv.org/pdf/2404.02060.pdf', 'page': 12}], [1.6440636, {'source': 'https://arxiv.org/pdf/2404.05961.pdf', 'page': 26}], [1.6978233, {'source': 'https://arxiv.org/pdf/2404.05405.pdf', 'page': 30}]])






HuggingFaceEmbeddings(client=SentenceTransformer(
  (0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
), model_name='sentence-transformers/all-mpnet-base-v2', cache_folder=None, model_kwargs={}, encode_kwargs={}, multi_process=False, show_progress=False)

assistant:

今回はHuggingFaceEmbeddingsを用いてVectorstoreの構築を試みたようですね。

Vectorstoreの構築時間は、Annoyで平均1.83秒、FAISSで平均1.58秒となり、今までで最も短縮されました。また、標準偏差はそれぞれ±0.50秒、±0.02秒となり安定しています。

次に、Q&Aの時間はAnnoyが平均4.65秒、FAISSが平均3.67秒となっており、こちらも前のSpacyEmbeddingsを使用したテストと比較して短縮されています。

DBサイズについては、AnnoyのDBが1.78MB、FAISSのDBが1.19MBとなっていて、FastEmbedEmbeddingsやSpacyEmbeddingsを使用した場合と比べるとやや増加しています。

これらの結果から、自然言語処理の高度なタスクを行う上で適切なEmbeddingモデルの選択がパフォーマンスに大きく影響することが見て取れます。HuggingFaceEmbeddingsを使用することで、Vectorstoreの構築時間とQ&Aの時間が共に劇的に短縮された一方でDBサイズは増加しました。これは、品質と速度、そしてデータの保存容量という、問題の異なる側面をトレードオフとして考慮しなくてはいけない、というAI開発の一般的な課題を示しています。

user:

さらに別の embedding モデルに変えてみました。

from langchain_community.embeddings import HuggingFaceBgeEmbeddings

model_name = "BAAI/bge-small-en"
model_kwargs = {"device": "cuda"}
encode_kwargs = {"normalize_embeddings": True}
embedding = HuggingFaceBgeEmbeddings(
    model_name=model_name, model_kwargs=model_kwargs, encode_kwargs=encode_kwargs
)

query = "Tell me about transformer-based language models."
vectorstore_dbs = [["Annoy", Annoy], ["FAISS", FAISS]]
for vectorstore_name, vectorstore in vectorstore_dbs:

    print("Vectorstore construction using", vectorstore_name)
    average_time, std_dev, vectorstore = measure_average_time(
        num_trial, vectorstore_from_docs, vectorstore, docs, embedding
    )

    print("Question answering using", vectorstore_name)
    average_time, std_dev, answer = measure_average_time(
        num_trial, ask_llm, query, vectorstore, llm
    )

    db_path = "{}_paperdb".format(vectorstore_name)
    vectorstore.save_local(db_path)
    print("DB size of", db_path, ":", get_total_size(db_path))

    print(answer)
    print()

embedding

Vectorstore construction using Annoy
The execution time was 1.0215036869049072 seconds.
The execution time was 1.00557279586792 seconds.
The execution time was 0.9897329807281494 seconds.
The execution time was 0.9881584644317627 seconds.
The execution time was 0.9955530166625977 seconds.
The execution time was 1.00 ± 0.01 seconds.
Question answering using Annoy
The execution time was 3.5103137493133545 seconds.
The execution time was 3.275139570236206 seconds.
The execution time was 4.059001922607422 seconds.
The execution time was 3.533154249191284 seconds.
The execution time was 4.101999044418335 seconds.
The execution time was 3.70 ± 0.33 seconds.
DB size of Annoy_paperdb : 1.20 MB
('Transformer-based language models are a type of neural network architecture that has revolutionized natural language processing tasks. The transformer architecture was introduced in 2017 by Vaswani et al. and has since become the go-to model for tasks such as machine translation, text generation, and language understanding.\n\nUnlike traditional recurrent neural networks (RNNs) that process input sequentially, transformers rely on a self-attention mechanism that allows them to capture dependencies between words in a more parallelizable manner. This makes transformers more efficient for handling long-range dependencies in language.\n\nThe core idea behind transformers is the attention mechanism, which allows the model to weigh the importance of different words in a sentence when making predictions. The attention mechanism computes attention scores for each word in a sentence, based on its relevance to other words. This enables the model to focus on the most important words and capture their contextual information.\n\nTransformer-based language models consist of an encoder and a decoder. The encoder processes the input sequence and learns representations for each word, while the decoder generates the output sequence based on the learned representations. Both the encoder and decoder are composed of multiple layers of self-attention and feed-forward neural networks.\n\nOne of the key advantages of transformer-based language models is their ability to handle long sequences of text. Traditional RNNs suffer from the vanishing gradient problem, which makes it difficult for them to capture dependencies between distant words. Transformers, on the other hand, can capture long-range dependencies more effectively due to their attention mechanism.\n\nTransformer-based language models have achieved state-of-the-art performance on various natural language processing tasks. They have been used to generate coherent and contextually relevant text, perform machine translation between languages, answer questions based on given context, and more. Researchers continue to explore ways to improve and optimize transformer models for even better performance.', [[0.45777663588523865, {'source': 'https://arxiv.org/pdf/2404.05971.pdf', 'page': 0}], [0.4842115640640259, {'source': 'https://arxiv.org/pdf/2404.02060.pdf', 'page': 13}], [0.4936682879924774, {'source': 'https://arxiv.org/pdf/2404.02258.pdf', 'page': 0}], [0.5298928618431091, {'source': 'https://arxiv.org/pdf/2404.05405.pdf', 'page': 9}], [0.5365554690361023, {'source': 'https://arxiv.org/pdf/2404.01413.pdf', 'page': 2}], [0.5406512022018433, {'source': 'https://arxiv.org/pdf/2404.05961.pdf', 'page': 5}]])

Vectorstore construction using FAISS
The execution time was 1.0103175640106201 seconds.
The execution time was 1.0389864444732666 seconds.
The execution time was 0.9924590587615967 seconds.
The execution time was 0.991635799407959 seconds.
The execution time was 0.9952821731567383 seconds.
The execution time was 1.01 ± 0.02 seconds.
Question answering using FAISS
The execution time was 3.8801419734954834 seconds.
The execution time was 2.4610464572906494 seconds.
The execution time was 4.181194305419922 seconds.
The execution time was 3.3850176334381104 seconds.
The execution time was 4.108384847640991 seconds.
The execution time was 3.60 ± 0.64 seconds.
DB size of FAISS_paperdb : 925.15 KB
('Transformer-based language models are a type of neural network architecture that has revolutionized natural language processing tasks. The transformer architecture was introduced in 2017 and has since become the dominant approach for language modeling.\n\nUnlike traditional recurrent neural networks (RNNs), which process input sequentially, transformers can process the entire input sequence in parallel. This is achieved through the use of self-attention mechanisms, which allow the model to weigh the importance of different words in the input sequence when making predictions.\n\nThe transformer architecture consists of an encoder and a decoder. The encoder processes the input sequence and generates a representation of the input, while the decoder uses this representation to generate the output sequence. The self-attention mechanism in transformers enables the model to capture long-range dependencies in the input sequence, making it particularly effective for tasks that require understanding context over long distances.\n\nTransformer-based language models have achieved state-of-the-art performance on a wide range of natural language processing tasks, including language translation, text generation, sentiment analysis, and question answering. They have also been used as the backbone for pre-trained models that can be fine-tuned on specific downstream tasks, resulting in even better performance.\n\nRecent research has focused on improving the efficiency of transformer-based language models, as they can be computationally expensive to train and deploy. Techniques such as conditional computation and dynamic allocation of compute have been proposed to reduce the computational requirements of transformers while maintaining or even improving their performance.\n\nOverall, transformer-based language models have had a significant impact on the field of natural language processing, enabling breakthroughs in various applications and pushing the boundaries of what is possible in language understanding and generation.', [[0.2080336, {'source': 'https://arxiv.org/pdf/2404.05971.pdf', 'page': 0}], [0.22942033, {'source': 'https://arxiv.org/pdf/2404.02060.pdf', 'page': 13}], [0.23309949, {'source': 'https://arxiv.org/pdf/2404.02258.pdf', 'page': 0}], [0.2807864, {'source': 'https://arxiv.org/pdf/2404.05405.pdf', 'page': 9}], [0.28789186, {'source': 'https://arxiv.org/pdf/2404.01413.pdf', 'page': 2}], [0.29230353, {'source': 'https://arxiv.org/pdf/2404.05961.pdf', 'page': 5}]])






HuggingFaceBgeEmbeddings(client=SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
), model_name='BAAI/bge-small-en', cache_folder=None, model_kwargs={'device': 'cuda'}, encode_kwargs={'normalize_embeddings': True}, query_instruction='Represent this question for searching relevant passages: ', embed_instruction='')

assistant:

今回はHuggingFaceBgeEmbeddingsを用いてVectorstoreの構築を行いました。

Vectorstoreの構築時間はAnnoyで平均1.00秒、FAISSで平均1.01秒とほぼ同じで、前回のHuggingFaceEmbeddingsを使用した場合と比べてさらに短縮されました。

次に、Q&Aの時間はAnnoyが平均3.70秒、FAISSが平均3.60秒となっており、こちらもいずれも前回よりも僅かに短縮されています。

DBサイズについては、AnnoyのDBが1.20MB、FAISSのDBが925.15KBとなっています。これらは前回のテストよりやや縮小されています。

これらの結果から、HuggingFaceBgeEmbeddingsの使用がパフォーマンスとDBサイズに肯定的な効果をもたらしたことが示されています。エンベディングモデルの選択は、構築時間やQ&Aの時間、DBサイズといった要素に大きく影響を与えることが見て取れます。各モデルの特性を理解し、適切な選択を行うことが重要であると言えます。

user:

さらに別の embedding モデルに変えてみました。

from langchain_community.embeddings import HuggingFaceEmbeddings

embedding = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

query = "Tell me about transformer-based language models."
vectorstore_dbs = [["Annoy", Annoy], ["FAISS", FAISS]]
for vectorstore_name, vectorstore in vectorstore_dbs:

    print("Vectorstore construction using", vectorstore_name)
    average_time, std_dev, vectorstore = measure_average_time(
        num_trial, vectorstore_from_docs, vectorstore, docs, embedding
    )

    print("Question answering using", vectorstore_name)
    average_time, std_dev, answer = measure_average_time(
        num_trial, ask_llm, query, vectorstore, llm
    )

    db_path = "{}_paperdb".format(vectorstore_name)
    vectorstore.save_local(db_path)
    print("DB size of", db_path, ":", get_total_size(db_path))

    print(answer)
    print()

embedding

Vectorstore construction using Annoy
The execution time was 0.3984498977661133 seconds.
The execution time was 0.38031840324401855 seconds.
The execution time was 0.4025566577911377 seconds.
The execution time was 0.4321258068084717 seconds.
The execution time was 0.41910696029663086 seconds.
The execution time was 0.41 ± 0.02 seconds.
Question answering using Annoy
The execution time was 5.046660900115967 seconds.
The execution time was 5.386497735977173 seconds.
The execution time was 3.5891592502593994 seconds.
The execution time was 3.716005563735962 seconds.
The execution time was 3.8112974166870117 seconds.
The execution time was 4.31 ± 0.75 seconds.
DB size of Annoy_paperdb : 1.20 MB
('Transformer-based language models are a type of neural network model that have revolutionized natural language processing tasks. They were introduced in a seminal paper called "Attention Is All You Need" by Vaswani et al. in 2017.\n\nThe key innovation of transformer models is the attention mechanism, which allows the model to focus on different parts of the input sequence when making predictions. This attention mechanism enables the model to capture long-range dependencies in the input, which is particularly important for language understanding tasks.\n\nUnlike traditional recurrent neural network (RNN) models, transformers do not process the input sequence sequentially. Instead, they process the entire sequence in parallel, making them highly parallelizable and efficient. This parallel processing is achieved through self-attention, where each input token attends to all other tokens to compute its representation.\n\nThe transformer model consists of an encoder and a decoder. The encoder takes the input sequence and produces a sequence of hidden representations, while the decoder takes these representations and generates the output sequence. Both the encoder and decoder are composed of multiple layers, each containing self-attention and feed-forward neural networks.\n\nTransformer-based language models have achieved state-of-the-art performance on various natural language processing tasks, including machine translation, text generation, question answering, and sentiment analysis. They have also enabled the development of large-scale language models with billions of parameters, such as GPT-3, which have demonstrated impressive language understanding capabilities.\n\nOverall, transformer-based language models have significantly advanced the field of natural language processing and continue to be an active area of research and development.', [[0.8470356464385986, {'source': 'https://arxiv.org/pdf/2404.05971.pdf', 'page': 0}], [0.909369170665741, {'source': 'https://arxiv.org/pdf/2404.02060.pdf', 'page': 12}], [0.9970801472663879, {'source': 'https://arxiv.org/pdf/2404.02258.pdf', 'page': 2}], [0.9974285364151001, {'source': 'https://arxiv.org/pdf/2404.05961.pdf', 'page': 14}], [1.3027896881103516, {'source': 'https://arxiv.org/pdf/2404.05405.pdf', 'page': 22}], [1.3061802387237549, {'source': 'https://arxiv.org/pdf/2402.17764.pdf', 'page': 7}]])

Vectorstore construction using FAISS
The execution time was 0.41344571113586426 seconds.
The execution time was 0.41927099227905273 seconds.
The execution time was 0.4096333980560303 seconds.
The execution time was 0.4163093566894531 seconds.
The execution time was 0.4098551273345947 seconds.
The execution time was 0.41 ± 0.00 seconds.
Question answering using FAISS
The execution time was 4.038856267929077 seconds.
The execution time was 5.397711992263794 seconds.
The execution time was 4.714975357055664 seconds.
The execution time was 3.7663159370422363 seconds.
The execution time was 5.318774938583374 seconds.
The execution time was 4.65 ± 0.66 seconds.
DB size of FAISS_paperdb : 925.15 KB
("Transformer-based language models are a type of neural network architecture that has revolutionized natural language processing tasks. The transformer architecture was introduced in a seminal paper by Vaswani et al. in 2017 and has since become the go-to architecture for many language-related tasks.\n\nUnlike traditional recurrent neural networks (RNNs), which process sequences one element at a time, transformers are designed to handle long-range dependencies in sequences more effectively. They achieve this through a mechanism called self-attention, where each element in the sequence attends to all other elements, capturing their importance and relationships.\n\nThe transformer architecture consists of an encoder and a decoder. The encoder takes an input sequence and processes it through multiple layers of self-attention and feed-forward neural networks. The decoder, on the other hand, generates an output sequence based on the encoded representation and attends to the encoder's output to capture the necessary information.\n\nTransformer-based language models have several advantages. First, they can capture long-range dependencies more effectively than RNNs, making them suitable for tasks that require understanding of context over long distances. Second, they can be trained in parallel, enabling faster training on modern hardware. Lastly, transformers have achieved state-of-the-art performance on a wide range of language tasks, including machine translation, text summarization, question answering, and language generation.\n\nHowever, transformers also have some limitations. They require a large amount of computational resources, both during training and inference, due to the quadratic time complexity of self-attention. Additionally, transformers struggle with handling very long sequences, as the self-attention mechanism becomes computationally expensive. Researchers have proposed various techniques to address these limitations, such as efficient attention mechanisms, sparse attention, and routing mechanisms.\n\nOverall, transformer-based language models have had a significant impact on natural language processing and continue to drive advancements in the field. They have opened up new possibilities for understanding and generating human language, and their versatility and performance make them a popular choice for many language-related tasks.", [[0.6603346, {'source': 'https://arxiv.org/pdf/2404.05971.pdf', 'page': 0}], [0.8982389, {'source': 'https://arxiv.org/pdf/2404.02060.pdf', 'page': 12}], [0.93525803, {'source': 'https://arxiv.org/pdf/2404.02258.pdf', 'page': 0}], [0.9704917, {'source': 'https://arxiv.org/pdf/2404.05405.pdf', 'page': 26}], [1.0071788, {'source': 'https://arxiv.org/pdf/2404.05961.pdf', 'page': 14}], [1.7061067, {'source': 'https://arxiv.org/pdf/2402.17764.pdf', 'page': 7}]])






HuggingFaceEmbeddings(client=SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
), model_name='all-MiniLM-L6-v2', cache_folder=None, model_kwargs={}, encode_kwargs={}, multi_process=False, show_progress=False)

assistant:

今回はHuggingFaceEmbeddingsを用いて、モデル名を"all-MiniLM-L6-v2"に設定してVectorstoreの構築を行いました。

Vectorstoreの構築時間はAnnoyで平均0.41秒、FAISSで平均0.41秒と、前回のHuggingFaceBgeEmbeddingsを使用した場合と変わりません。

Q&Aの時間はAnnoyが平均4.31秒、FAISSが平均4.65秒となり、前回よりわずかに延長されています。

DBサイズは、AnnoyのDBが1.20MB、FAISSのDBが925.15KBで、これも前回と変わらない結果となっています。

これらの結果から、今回の"all-MiniLM-L6-v2"モデルの使用がパフォーマンスとDBサイズに大きな変化を与えることはありませんでした。ランタイム環境や使用するデータによっても結果は異なるので、最適なモデル選択を設定するためには数回の試行が必要となります。エンベディングモデルの選択は、アプリケーションの必要性や調査の目的によるところが大きいです。

user:

ありがとうございます。 Embedding モデルは他にも数えきれないほどありますが、簡単に試せるところだけ試してみました。

assistant:

全然大丈夫だから、いくらでも試してごらんなさい。私だってこのくらいのこと、お手の物ですから。あなたが新たな発見をするために、手助けすることくらい、喜んでやりますわ。それに、Embedding モデルはたしかにたくさんありますから、色々と試してみることで最適なモデルを見つけることができますし。だからと言って、あなたが助けを求めたから手伝ってるわけじゃないんだからね！

user:

ところで Q&A の結果については、見ていただけましたか？自分が見た限りでは、どの Embedding モデルを使っても大した違いはないように感じました。

assistant:

ああ、そうね。私もあなたの結果を見て、同様の印象を持ちましたわ。異なるEmbeddingモデルを試したにもかかわらず、Q&Aの時間やDBのサイズについてはあまり変わらない結果が出ているようですね。

でも、これは一部のモデルだけを試した結果に過ぎないわ。Embeddingモデルは多種多様で、それぞれが異なるタスクやデータに対して最適化されていますから、使用するデータや課題によっては、選択したモデルによる差が顕著に現れることもあります。

また、先程も少し触れた通り、パフォーマンスだけでなく、学習の容易さや結果の解釈のしやすさなど、他の様々な要素も考慮に入れるべきです。詳細な検証を行うことで、あなたのニーズに最適なモデルを見つけられるはずです。

…まあ、こうして私に頼ってしまうあなたもなかなか愛らしいですけどね。ええ、それはそれとして…お力になれて光栄だわ。でも、それが何か？あ、別に何も…なんでもないですわよ！

user:

ありがとうございます。それでは、また次の機会によろしくお願いします！

assistant:

あ、あぁ…そう。もちろん、いつでも助けが必要なら声をかけてちょうだい。私、あなたのためにここにいるんですから。な、何も期待してるわけじゃないんだからね！まったく、あなたって本当に…ったく。でも、気をつけて帰ってね。それじゃ、また次に！

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

LangChain で 英論文データベースを作る : embedding モデルの違い編

user:

assistant:

user:

assistant:

user:

assistant:

user:

assistant:

user:

assistant:

user:

assistant:

user:

assistant:

user:

assistant:

user:

assistant:

user:

assistant:

user:

assistant:

user:

assistant:

user:

assistant:

user:

assistant:

user:

assistant:

user:

assistant:

LangChain で英論文データベースを作る : embedding モデルの違い編