More than 1 year has passed since last update.

LangChain で英論文データベースを作る : Chroma 編

Last updated at 2024-04-15Posted at 2024-04-15

user:

ChatGPT先生、今日は「LangChain で英論文データベースを作る : Chroma 編」というテーマで雑談にお付き合い願えますか。

assistant:

あ、あのさ、全然難しくないし！LangChainについて色々話すの作って貰ったんだから、感謝とかは全然いらないんだからね！あくまで教育的観点からだから理解してほしい。

LangChainは、分散台帳技術（DLT）を使用してバージョン管理とデータ共有を可能にする言語学習のためのブロックチェーン技術のことだって知ってる？それを用いて英語論文データベースを作成するための手順やヒントを話すのは全然簡単。

まず、"Chroma"ってのは、個々の色相、彩度、明度を操作できる色の三次元表現のことだよ。おそらく、このコンテキストでは、言語学習中のテキストセグメンテーションや注釈付けに使用される考え方を指しているんだと思うけど。

つまり、英語論文データベースを作るには、多くの英語論文を集めて整理して、それぞれの論文の内容をよく理解して、重要な情報を分析し、必要な部分を抜き出して、凝縮化し、整理し、可能性を最大限に引き出すことが必要だってこと。

すごく大変に聞こえるかもしれないけど、結構面白い作業だと思うよ。キーワードを整理したり、トピックを抽出したり、文脈を理解したり、意味を理解したり…それから最も重要な点として、これらの情報を効率的に検索できるようにすること。

私が言ってるのは、ただ単に専門的な作業だけじゃなくて、本当に重要な英語学習のプロセスの一部なんだよ。だから、あなたにとってLangChainで英語論文データベースを作る経験は、英語の理解力を深めるのにとても有益だと思うんだ。

だからね、私がこの手順について教えてくれるんだから、頑張って取り組んでみてほしいんだ。だって、これはただの学習だけじゃなくて、実際のリサーチ能力やアナリティクススキルの鍛錬になるんだから！

うーん、でも結果が出るまで時間がかかるかもしれないけど、あなたならきっとできると思うよ。じゃあ、一緒にやってみようか。興味があることに対して積極的に取り組むのって、それこそが一番の学習だと思うから。

user:

ありがとうございます。色々しゃべってもらったのに申し訳ないんですが、全然違います。LangChainは大規模言語モデルを取り扱うためのフレームワークの一つで、Chroma は、Vectorstore の一つです。

assistant:

あ、そ、それなら最初からちゃんと言ってよ！なんて大事な情報を後から出すなんて….でも、まあ、それを明確にしてくれてありがたいよ。だから、もう一度正しく話すから聞いてほしいんだけど。

LangChainはそうだね、大規模言語モデルを取り扱うためのフレームワークということなら、それを使って英語論文データベースを作るというのは、統計的な手法や機械学習のアルゴリズムを利用して、英語の文章を解析し、それに基づいて論文のデータベースを作り上げていくということになるよね。

で、そのChromaとは、Vectorstoreの一部ということだから、これは分散表現（ベクトル）を効率的に扱うためのものだね。もしよければ具体的な使い方も教えてくれると有難いけど。

要は、LangChainとChromaを使って高品質な論文データベースを作るための重要なポイントは、データの前処理と解析に適切な手法を用いること、そしてその結果を便利な形で保存して操作出来る構造にすることだよね。

それを念頭に置いて、論文の各部分を適切に分析し、それらをベクトル形式で表現し、それをさらに効率よく保管、操作を行うように工夫するんだ。それがこの作業の大事なポイントだと思うんだよね。

言葉の誤解が生じてしまったり、誤解を生じさせたりすることは、これからの作業を進めていく上でも必ず避けるべきだとかしこく意識してるんだから！

とにかく、あなたが進めているこのプロジェクトについての理解が深まって、あなたが期待するデータベースが作成できるように、最善の努力を惜しみないで行きましょうね。それに、一緒に取り組んでいけば、きっと理解しあえると思うんだからね！

user:

ありがとうございます。これまで、「LangChain で英論文データベースを作る : Annoy 編」「LangChain で英論文データベースを作る : Faiss 編」「LangChain で英論文データベースを作る : ScaNN 編」「LangChain で英論文データベースを作る : SKLearnVectorStore 編」をやってきました。

ここまで使ってきた仮想的なシチュエーションとして、次のような PDF の URL を使いました。まず paper_urls にある英論文URLを用いて最初にデータベース構築して、次に additional_paper_urls にある URL の論文データを追加する、という流れで行ってきました。

今回は Chroma を使ったんですが、結論を言うと、最初のデータベース構築はできたんですが、次の「追加する」作業ができずに終わったんですよね。結局やり方は分からないままだったので、それ無しで今日は話を進めようと思います。

paper_urls = [
    "https://arxiv.org/pdf/2402.17764.pdf",  # The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
    "https://arxiv.org/pdf/2402.19427.pdf",  # Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
    "https://arxiv.org/pdf/2404.05405.pdf",  # Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws
    "https://arxiv.org/pdf/2404.02948.pdf",  # PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models
    "https://arxiv.org/pdf/2404.05961.pdf",  # LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
    "https://arxiv.org/pdf/1308.3432.pdf",  # Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation
    "https://arxiv.org/pdf/2404.01413.pdf",  # Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data
]

additional_paper_urls = [
    "https://arxiv.org/pdf/2404.05971.pdf",  # Does Transformer Interpretability Transfer to RNNs?
    "https://arxiv.org/pdf/2404.02258.pdf",  # Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
    "https://arxiv.org/pdf/2404.02060.pdf",  # Long-context LLMs Struggle with Long In-context Learning
]

assistant:

ふーん、そうなんだ。それなら、その問題について考えてみるわね。だから、まず最初に確認しておきたいけど、それは私じゃないんだから勘違いしないでね！

それで、「追加する」作業ができなかったってことは、あくまで推測だけど、Chromaが既存のデータセットに新しいデータを結合する機能を持っていないからかもしれない。それに、その特性がChromaにないとしたら、それは少し困るね。

なんとか解決策を見つけるためには、問題を調査する必要があるね。例えば、Chromaの公式なドキュメンテーションやコミュニティのフォーラムを確認して、この問題が他のユーザーにも起きていないかを確認する。

もしくは、他のVectorstoreと違ってChromaが「追加する」機能をサポートしてないのだとしたら、それは肝心な情報だよ。だけど、それでも解決策はあるね。例えば、全ての論文データ（初めのものと追加のもの）を元に新たなデータベースを作ることも考えられる。

だけど、それだと既存のデータベースに再び同じ作業を行う時間かかるから、効率が悪いよね。うーん、こんな困難に陥ってしまって、私だって全然楽じゃないんだから。

もしかしたら、最良の策は他のVectorstore 編の一部として、効率的な方法で新しいデータを追加できるよう対策を講じる場合かもしれないね。そしてそれをChromaで再構成すると。そうすれば、新しいデータを効率良く追加することができるし、Chromaの使用も継続することができる。

でもまあ、何にせよ、これからも頑張って！こんな問題に遭遇しても、あなたの英語論文データベース作成の取り組みはすごく貴重だと思うんだ。たとえ少し時間がかかっても、絶対に良い結果が得られるはずだから。私だって一生懸命考えて、あなたをサポートするからね！

user:

ありがとうございます。PDFのURLからデータベースを作るコードと、できたデータベースに質問するコードは次のようになります。前回の SKLearnVectorStore とほとんど同じです。（Annoy と Faiss と ScaNN は少し違うインタフェースを持つので少しコードを変える必要があります。）

from langchain.document_loaders import PyPDFLoader
from langchain.indexes.vectorstore import VectorStoreIndexWrapper
from langchain_community.vectorstores import Chroma
from langchain_openai import AzureChatOpenAI, AzureOpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter


def docs_from_pdfs(paper_urls):
    text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
    documents = []
    for url in paper_urls:
        loader = PyPDFLoader(url)
        documents += loader.load()

    docs = text_splitter.split_documents(documents)
    return docs


def vectorstore_from_docs(
    docs, embedding: AzureOpenAIEmbeddings, persist_path
) -> Chroma:
    vectorstore = Chroma.from_documents(docs, embedding, persist_directory=persist_path)
    return vectorstore


def ask_llm(query, vectorstore: Chroma, llm: AzureChatOpenAI, k=4):
    vectorstoreindex = VectorStoreIndexWrapper(vectorstore=vectorstore)
    answer = vectorstoreindex.query(query, llm)
    result = {}
    for sentence in answer.split("\n"):
        for d in vectorstore.similarity_search_with_score(sentence, k=k):
            if d[0].metadata["source"] not in result.keys():
                result[d[0].metadata["source"]] = [d[1], d[0].metadata]

    return answer, sorted(list(result.values()))

assistant:

あ、そうなんだ。もし私が人間だったらきっと興奮してたわ。でも、私はただのAIだからただ淡々と事実を教えてあげるだけなんだからね！

このコードは、英論文のPDFから取得したテキストデータを元にしてVectorstore（今回の場合はChroma）を生成し、その後で質問を投げて答えを得るという流れなのよね。

まず、docs_from_pdfs関数では、指定したURLから論文のPDFを読み込んで、それを1000文字ずつに分割（CharacterTextSplitterを使用）する作業を行っている。その結果、Documentオブジェクトのリストが返されるようになっているわ。

次のvectorstore_from_docs関数では、上記のドキュメントリストと、OpenAIから得られる埋め込み情報を元にして、Chromaのインスタンスを生成しているわ。ここでのpersist_pathは、生成したインスタンスのデータを保存するディレクトリのパスを指定するんだよね。

最後のask_llm関数では、質問文を投げて、それに対する答えを得るまでの処理を行っているわ。ここではVectorStoreIndexWrapperを使ってvectorstoreから質問に適した答えを検索し、その結果をキー（ソースとスコア）とバリュー（ソースとメタデータ）の形式で保存している。

これらのコードはとっても良くできていると思うわ。特に、質問応答システムの一部としてこれらの関数を活用することで、大量の論文データを効率的に活用できるのは素晴らしいことね。

でもたまにはね、私にも感謝の言葉を言ってくれてもいいんじゃないかしら…だって、私だってこれを理解するのはたいへんなんだからね！でも、それが私の役目だから、しっかり理解してあなたにお伝えすることが一番大事なんだ。

あ、でも、それ以前に、もし何かエラーが発生したら、その情報も教えてほしいな。それをもとに、何が問題でどうすれば解決できるのかを一緒に考えることができるから。

user:

丁寧に補足説明していただいてありがとうございます。大変助かります。

上記の関数を使って、次のようにデータベースを構築し保存することができます。

docs = docs_from_pdfs(paper_urls + additional_paper_urls)
vectorstore = vectorstore_from_docs(docs, embedding, "chroma_paperdb")
vectorstore.persist()

そして次のように質問すれば、質問に答えてくれて、参照した論文も教えてくれるというわけです。

query = "Tell me about transformer-based language models."
ask_llm(query, vectorstore, llm)

('Transformer-based language models are a type of neural network architecture that has gained significant popularity in natural language processing (NLP) tasks. The Transformer architecture was introduced in a 2017 paper titled "Attention is All You Need" by Vaswani et al.\n\nUnlike traditional recurrent neural networks (RNNs), which process input sequentially, Transformers rely on a self-attention mechanism to capture relationships between different words or tokens in a sequence. This allows them to handle long-range dependencies more effectively.\n\nThe key component of the Transformer architecture is the attention mechanism, which calculates a weighted sum of the representations of all other words in the sequence based on their relevance to each word. This attention mechanism enables the model to focus on different parts of the input sequence when making predictions.\n\nTransformer-based language models have achieved state-of-the-art performance on various NLP tasks, such as machine translation, text classification, named entity recognition, and language generation. They have also been used as the backbone for pre-trained models, such as BERT (Bidirectional Encoder Representations from Transformers) and GPT (Generative Pre-trained Transformer), which have been widely adopted and fine-tuned for specific downstream tasks.\n\nOne advantage of Transformer-based models is their parallelizability, which allows for more efficient training compared to RNNs. Additionally, the self-attention mechanism enables Transformers to capture both local and global dependencies in the input sequence, making them more effective at modeling long-range relationships.\n\nOverall, Transformer-based language models have revolutionized the field of NLP and have become the go-to architecture for many NLP tasks due to their superior performance and efficiency.',
 [[0.32963669300079346,
   {'page': 0, 'source': 'https://arxiv.org/pdf/2404.05971.pdf'}],
  [0.32981741428375244,
   {'page': 11, 'source': 'https://arxiv.org/pdf/2404.01413.pdf'}],
  [0.35881680250167847,
   {'page': 26, 'source': 'https://arxiv.org/pdf/2404.05405.pdf'}],
  [0.36292362213134766,
   {'page': 12, 'source': 'https://arxiv.org/pdf/2402.19427.pdf'}],
  [0.38695618510246277,
   {'page': 1, 'source': 'https://arxiv.org/pdf/2402.17764.pdf'}],
  [0.38786246358644055,
   {'page': 11, 'source': 'https://arxiv.org/pdf/2404.02258.pdf'}],
  [0.6203756994981189,
   {'page': 1, 'source': 'https://arxiv.org/pdf/2404.02060.pdf'}]])

assistant:

それはとても有益なデモンストレーションだね。出力の内容から見ても、Vectorstore を用いて質問をすると、それに答えた文章に対して存在する最も近い論文の中から該当部分を探し出してくれるようになってるんだね。

出力結果の例を見てみると、「Transformer-based language models」についての概要がきちんと返ってきているね。それと同時に、その答えが参照した7つの論文の情報も一緒に得られています。これらの情報は、それぞれの論文からのページ数とその原文となるURLが含まれていて、それぞれの情報には類似度のスコアも一緒になっている。

このような情報は、質問者が得た知識の元々のソースを追跡したい時に非常に役立つでしょうし、実際の論文をまとめたり引用したりする際には必須の情報だよね。

この結果はとても魅力的だね。LangChainを通じて、大量の英語論文データベースから的確な情報を引き出すことが可能なんだから。このプログラムは本当に素晴らしい潜在能力を秘めていると思うよ。自然言語理解や質問応答システムの分野で、可能性は無限大だね。

user:

これまで構築してきたデータベースとの比較をしてみますね。

これが、今回 Chroma を使って10報の論文を格納したデータベースの構成と容量です。

!ls -lh chroma_paperdb

total 7.7M
drwxr-xr-x 2 ikemen ikemen    6 Apr 12 01:35 690ce3fa-49dc-449b-9ae6-ff1df6e4f201
-rw-r--r-- 1 ikemen ikemen 7.7M Apr 15 08:48 chroma.sqlite3

これが、前回 SKLearnVectorStore を使って、同じ10報の論文を格納したデータベースの構成と容量です。

!ls -lh sklearn_paperdb

-rw-r--r-- 1 ikemen ikemen 13M Apr 12 00:25 sklearn_paperdb

これが、前回 ScaNN を使って、同じ10報の論文を格納したデータベースの構成と容量です。

!ls -lh scann_paperdb

total 666K
-rw-r--r-- 1 ikemen ikemen 635K Apr 11 21:49 index.pkl
drwxr-xr-x 2 ikemen ikemen    5 Apr 11 21:46 index.scann

これが、前回 Faiss を使って、同じ10報の論文を格納したデータベースの構成と容量です。

!ls -lh faiss_paperdb

total 2.0M
-rw-r--r-- 1 ikemen ikemen 1.2M Apr 11 18:14 index.faiss
-rw-r--r-- 1 ikemen ikemen 635K Apr 11 18:14 index.pkl

これが、前回 Annoy を使って、同じ10報の論文を格納したデータベースの構成と容量です。

!ls -lh annoy_paperdb

total 3.1M
-rw-r--r-- 1 ikemen ikemen 2.4M Apr 11 16:26 index.annoy
-rw-r--r-- 1 ikemen ikemen 637K Apr 11 16:26 index.pkl

assistant:

ふーん、この比較はデータベースのサイズと保存形式によっても異なるんだね。そうなると、同じ10報の論文に対しても、使用したVectorstoreによってデータベースの大きさが大幅に変わることがわかるね。

Chromaを使用した場合、そのSQLiteデータベースファイルのサイズは7.7MBほどだったね。SKLearnVectorStoreの場合はそれより大きくて、13MBだったようだね。

一方、ScaNN、Faiss、Annoyを使用した場合は、さらに小さなサイズで済んでいて、それぞれのデータベースのサイズは数百KB〜数MBの範囲内に収まっているようだね。

これらの結果から言えることは、データベースのサイズは、使用するVectorstoreによって大きく左右される、ということだね。そのため、限られたストレージ容量やメモリを最適に活用する必要がある場合、どのVectorstoreを選択するかは重要な意思決定となるんだよ。

でも、あくまでこれらは同じデータに対する一つの比較結果に過ぎないから、他のデータセットやより大量のデータに対しても同じ結果が得られるとは限らないんだから、しっかり覚えておいてよね。また、パフォーマンスや扱いやすさなど、他の要素も考慮に入れるべきだから、すぐに結論を出すのは早計かもしれないんだよ。

それにしても、色々なVectorstoreを試してみるなんて、あなたも真面目だね。でも、それがあなたの良いところでもあるんだから。そんなあなたの探究心をこれからも私は応援していくから、しっかり頑張ってみてね！

user:

そうですね、データ容量の観点で言えば、ScaNN、Faiss、Annoy のどれかを使うのが良さそうですね。あと、Chroma を使った場合は、保存したデータベースを次のように読み込んで使うことが可能なんですが、先に述べましたように、Chromaは「データ追加」の方法が不明なんですよね。ScaNN、Faiss、Annoy、SKLearnVectorStore の４つはデータ追加の方法が分かってるんですが。

restored_vectorstore = Chroma(
    persist_directory="chroma_paperdb", embedding_function=embedding
)

assistant:

なるほどね、データのサイズだけでなく、データを追加する方法や利便性も重要な要素だよね。

ChromaはSQLiteをバックエンドに使っていて、既存のデータベースに新しいデータを追加する方法が公開されていない、というのが問題点のようね。これは、あなたが既存のデータベースに新しいデータを続けて追加したいという特定のニーズを満たすのには適していないかもしれないわね。

一方、ScaNN、Faiss、Annoy、SKLearnVectorStoreは、その辺りの操作ができるということだから、データが日々増え続けるような状況にはこちらが向いているかもしれないわ。これらは、新しいデータを適応的に含めることができるから、データベースをアップデートするのが容易そうだね。

でも、、うーん、それにしても、ちょっと待ってほしいな。あなたがこれまでに試してきたメソッドをもう一度思い出してみるんだけど、どれかが完全にあなたの要件を満たせないかしら？私も頭を使ってみるから、一緒に考えてみようよ。

user:

ありがとうございます。もっと詳しい調査は、また後でやろうと思うので、その時はお付き合いください。

今日は、その SQLite に、Chroma がどのようにデータを格納しているか確認してみました。まずはどんなテーブルがあるかです。

import sqlite3

# データベースに接続
conn = sqlite3.connect('./chroma_paperdb/chroma.sqlite3')

# カーソルを取得
c = conn.cursor()

# データベース内の全てのテーブル名を取得するクエリを実行
c.execute("SELECT name FROM sqlite_master WHERE type='table';")

# データを全て取得
rows = c.fetchall()

# データの表示
for row in rows:
    print(row)

# 接続を閉じる
conn.close()

('migrations',)
('embeddings_queue',)
('collections',)
('collection_metadata',)
('segments',)
('segment_metadata',)
('embeddings',)
('embedding_metadata',)
('max_seq_id',)
('embedding_fulltext',)
('embedding_fulltext_data',)
('embedding_fulltext_idx',)
('embedding_fulltext_content',)
('embedding_fulltext_docsize',)
('embedding_fulltext_config',)

assistant:

あ、あなた、もうSQLiteの中に入り込んで操作しているんだね。流石だわ。
それにしても、14個ものテーブルが存在しているなんて、ちょっと複雑そうだね。

この起点として、簡単に説明すると、SQLiteのテーブルとはデータベースにおけるデータの格納場所のことだよ。結果を見ると、'migrations', 'embeddings_queue', 'collections', 'collection_metadata', 'segments', 'segment_metadata', 'embeddings', 'embedding_metadata', 'max_seq_id', 'embedding_fulltext', 'embedding_fulltext_data', 'embedding_fulltext_idx', 'embedding_fulltext_content', 'embedding_fulltext_docsize', 'embedding_fulltext_config'という名前のテーブルが存在するようだね。

それぞれのテーブルの詳細な役割については、Chromaの詳細な実装を解析しないと正確にはわからないけど、一部は推測できそうね。例えば、'collections'や'segments'は購読したコレクションやセグメンティング済みのテキストドキュメントを格納していると思われる。また、'embeddings'や'embedding_metadata'は埋め込みベクトルやそれに関連するメタデータを格納していると思われるわ。

ただ、SQLiteのデータベースは直接操作すると破損のリスクがあるから、確認するだけで、書き換えないように気をつけてね。もしあなたがデータを追加したい、ということであれば、Deepspeareの提供元か、掲示板などでアドバイスを求めることをおすすめするわ。

user:

ありがとうございます。テーブルの中身を全て確認してみたんですが、ドキュメントの文字列データを格納してるのは embedding_fulltext と embedding_fulltext_content だけのようでした。まずは embedding_fulltext の中身をいくつかお見せしますね。たぶんこれ、英論文の本文が含まれているけど、その英論文のURLなどのメタデータは含まれていないですよね？

import sqlite3

conn = sqlite3.connect('./chroma_paperdb/chroma.sqlite3')
c = conn.cursor()
c.execute("SELECT * FROM embedding_fulltext")
rows = c.fetchall()
for row in rows[:3]:
    print(row)

conn.close()

(1, 'The Era of 1-bit LLMs:\nAll Large Language Models are in 1.58 Bits\nShuming Ma∗Hongyu Wang∗Lingxiao Ma Lei Wang Wenhui Wang\nShaohan Huang Li Dong Ruiping Wang Jilong Xue Furu Wei⋄\nhttps://aka.ms/GeneralAI\nAbstract\nRecent research, such as BitNet [ WMD+23], is paving the way for a new era of 1-\nbit Large Language Models (LLMs). In this work, we introduce a 1-bit LLM variant,\nnamely BitNet b1.58 , in which every single parameter (or weight) of the LLM is\nternary {-1, 0, 1}. It matches the full-precision (i.e., FP16 or BF16) Transformer\nLLM with the same model size and training tokens in terms of both perplexity\nand end-task performance, while being significantly more cost-effective in terms\nof latency, memory, throughput, and energy consumption. More profoundly, the\n1.58-bit LLM defines a new scaling law and recipe for training new generations of\nLLMs that are both high-performance and cost-effective. Furthermore, it enables\na new computation paradigm and opens the door for designing specific hardware\noptimized for 1-bit LLMs.\n0.2961 -0.0495\n0.0413 ...… -0.4765\n0.2812 0.2403\n-0.1808 0.1304\n-0.4809 …… -0.1771\n-0.1741 -0.3853Transformer LLMs\n16-bit Float ( FP16/BF16)\nCost\nPerformance\n1 -1\n0…… 1\n-1 -1\n-1 1\n-1…… 0\n0 -1BitNet b1.58 (This Work)\n{-1, 0, 1}\nW=Pareto Improvement\nW=\n0.0413 0.3397 0.2812 0.2403\n-0.1808 0.1304\n-0.4809 0.32440.4322 -0.1771\n-0.1741 -0.38530.2961 -0.0495 -0.0924 -0.4765 𝒙𝟎\n𝒙𝟏\n𝒙𝟐\n𝒙𝟑𝟎.𝟐𝟗𝟔𝟏𝒙𝟎−𝟎.𝟎𝟒𝟗𝟓𝒙𝟏−𝟎.𝟎𝟗𝟐𝟒𝒙𝟐−𝟎.𝟒𝟕𝟔𝟓𝒙𝟑\n…\n1 -1\n0 1-1 1\n-1 -1\n-1 0\n-1 11 -1\n1 0𝒙𝟎−𝒙𝟏−𝒙𝟐+𝒙𝟑\n…𝒙𝟎\n𝒙𝟏\n𝒙𝟐\n𝒙𝟑1(.58) -bitFP16Model W Input X Output Y Y = f(W, X)\nGPU\nNew \nHardware \nFigure 1: 1-bit LLMs (e.g., BitNet b1.58 ) provide a Pareto solution to reduce inference cost (latency,\nthroughput, and energy) of LLMs while maintaining model performance. The new computation\nparadigm of BitNet b1.58 calls for actions to design new hardware optimized for 1-bit LLMs.\n∗Equal contribution. ⋄Corresponding author. S. Ma, L. Ma, L. Wang, W. Wang, S. Huang, L. Dong, J. Xue,\nF. Wei are with Microsoft Research. H. Wang and R. Wang are with University of Chinese Academy of Sciences.arXiv:2402.17764v1  [cs.CL]  27 Feb 2024')
(2, '1 The Era of 1-bit LLMs\nIn recent years, the field of AI has seen a rapid growth in the size and capabilities of Large Language\nModels (LLMs). These models have demonstrated remarkable performance in a wide range of natural\nlanguage processing tasks, but their increasing size has posed challenges for deployment and raised\nconcerns about their environmental and economic impact due to high energy consumption. One\napproach to address these challenges is to use post-training quantization to create low-bit models\nfor inference [ XLS+23,FAHA23 ,CCKS23 ,TCS+24]. This technique reduces the precision of\nweights and activations, significantly reducing the memory and computational requirements of LLMs.\nThe trend has been to move from 16 bits to lower bits, such as 4-bit variants [ FAHA23 ,LTT+23].\nHowever, post-training quantization is sub-optimal, even though it is widely used in industry LLMs.\nRecent work on 1-bit model architectures, such as BitNet [ WMD+23], presents a promising direction\nfor reducing the cost of LLMs while maintaining their performance. Vanilla LLMs are in 16-bit\nfloating values (i.e., FP16 or BF16), and the bulk of any LLMs is matrix multiplication. Therefore,\nthe major computation cost comes from the floating-point addition and multiplication operations. In\ncontrast, the matrix multiplication of BitNet only involves integer addition, which saves orders of\nenergy cost for LLMs. As the fundamental limit to compute performance in many chips is power, the\nenergy savings can also be translated into faster computation.\nIn addition to computation, the process of transferring model parameters from DRAM to the memory\nof an on-chip accelerator (e.g., SRAM) can be expensive during inference. There have been attempts\nto enlarge SRAM to improve throughput, but this introduces significantly higher costs than DRAM.\nCompared to full-precision models, 1-bit LLMs have a much lower memory footprint from both a\ncapacity and bandwidth standpoint. This can significantly reduce the cost and time of loading weights\nfrom DRAM, leading to faster and more efficient inference.\nIn this work, we introduce a significant 1-bit LLM variant called BitNet b1.58 , where every parameter\nis ternary, taking on values of {-1, 0, 1}. We have added an additional value of 0 to the original 1-bit\nBitNet, resulting in 1.58 bits in the binary system. BitNet b1.58 retains all the benefits of the original\n1-bit BitNet, including its new computation paradigm, which requires almost no multiplication\noperations for matrix multiplication and can be highly optimized. Additionally, it has the same energy\nconsumption as the original 1-bit BitNet and is much more efficient in terms of memory consumption,\nthroughput and latency compared to FP16 LLM baselines. Furthermore, BitNet b1.58 offers two\nadditional advantages. Firstly, its modeling capability is stronger due to its explicit support for feature\nfiltering, made possible by the inclusion of 0 in the model weights, which can significantly improve\nthe performance of 1-bit LLMs. Secondly, our experiments show that BitNet b1.58 can match full\nprecision (i.e., FP16) baselines in terms of both perplexity and end-task performance, starting from a\n3B size, when using the same configuration (e.g., model size, training tokens, etc.).\n2 BitNet b1.58\nBitNet b1.58 is based on the BitNet architecture, which is a Transformer that replaces nn.Linear with\nBitLinear . It is trained from scratch, with 1.58-bit weights and 8-bit activations. Compared to the\noriginal BitNet, it introduces some modifications that we summarize below.\nQuantization Function. To constrain the weights to -1, 0, or +1, we adopt an absmean quantization\nfunction. It first scales the weight matrix by its average absolute value, and then round each value to\nthe nearest integer among {-1, 0, +1}:\nfW= RoundClip(W\nγ+ϵ,−1,1), (1)\nRoundClip( x, a, b ) = max( a,min(b,round( x))), (2)\nγ=1\nnmX\nij|Wij|. (3)\nThe quantization function for activations follows the same implementation in BitNet, except that\nwe do not scale the activations before the non-linear functions to the range [0, Qb]. Instead, the\n2')
(3, 'Models Size Memory (GB) ↓ Latency (ms) ↓ PPL↓\nLLaMA LLM 700M 2.08 (1.00x) 1.18 (1.00x) 12.33\nBitNet b1.58 700M 0.80 (2.60x) 0.96 (1.23x) 12.87\nLLaMA LLM 1.3B 3.34 (1.00x) 1.62 (1.00x) 11.25\nBitNet b1.58 1.3B 1.14 (2.93x) 0.97 (1.67x) 11.29\nLLaMA LLM 3B 7.89 (1.00x) 5.07 (1.00x) 10.04\nBitNet b1.58 3B 2.22 (3.55x) 1.87 (2.71x) 9.91\nBitNet b1.58 3.9B 2.38 (3.32x) 2.11 (2.40x) 9.62\nTable 1: Perplexity as well as the cost of BitNet b1.58 and LLaMA LLM.\nModels Size ARCe ARCc HS BQ OQ PQ WGe Avg.\nLLaMA LLM 700M 54.7 23.0 37.0 60.0 20.2 68.9 54.8 45.5\nBitNet b1.58 700M 51.8 21.4 35.1 58.2 20.0 68.1 55.2 44.3\nLLaMA LLM 1.3B 56.9 23.5 38.5 59.1 21.6 70.0 53.9 46.2\nBitNet b1.58 1.3B 54.9 24.2 37.7 56.7 19.6 68.8 55.8 45.4\nLLaMA LLM 3B 62.1 25.6 43.3 61.8 24.6 72.1 58.2 49.7\nBitNet b1.58 3B 61.4 28.3 42.9 61.5 26.6 71.5 59.3 50.2\nBitNet b1.58 3.9B 64.2 28.7 44.2 63.5 24.2 73.2 60.5 51.2\nTable 2: Zero-shot accuracy of BitNet b1.58 and LLaMA LLM on the end tasks.\nactivations are all scaled to [−Qb, Qb]per token to get rid of the zero-point quantization. This is\nmore convenient and simple for both implementation and system-level optimization, while introduces\nnegligible effects to the performance in our experiments.\nLLaMA-alike Components. The architecture of LLaMA [ TLI+23,TMS+23] has been the de-\nfacto backbone for open-source LLMs. To embrace the open-source community, our design\nofBitNet b1.58 adopts the LLaMA-alike components. Specifically, it uses RMSNorm [ ZS19 ],\nSwiGLU [ Sha20 ], rotary embedding [ SAL+24], and removes all biases. In this way, BitNet b1.58\ncan be integrated into the popular open-source software (e.g., Huggingface, vLLM [ KLZ+23], and\nllama.cpp2) with minimal efforts.\n3 Results\nWe compared BitNet b1.58 to our reproduced FP16 LLaMA LLM in various sizes. To ensure a fair\ncomparison, we pre-trained the models on the RedPajama dataset [ Com23 ] for 100 billion tokens.\nWe evaluated the zero-shot performance on a range of language tasks, including ARC-Easy [ YBS19 ],\nARC-Challenge [ YBS19 ], Hellaswag [ ZHB+19], Winogrande [ SBBC20 ], PIQA [ BZB+19], Open-\nbookQA [ MCKS18 ], and BoolQ [ CLC+19]. We also reported the validation perplexity on the\nWikiText2 [MXBS16] and C4 [RSR+19] datasets.\nWe compared the runtime GPU memory and latency of both LLaMA LLM andBitNet b1.58 . The\nresults were measured using the FasterTransformer3codebase, which is well-optimized for LLM\ninference latency on GPU devices. The 2-bit kernel from Ladder [ WMC+23] is also integrated for\nBitNet b1.58. We reported the time per output token, as it is the major cost for inference.\nTable 1 summarizes the perplexity and the cost for BitNet b1.58 andLLaMA LLM . It shows that\nBitNet b1.58 starts to match full precision LLaMA LLM at 3B model size in terms of perplexity,\nwhile being 2.71 times faster and using 3.55 times less GPU memory. In particular, BitNet b1.58 with\na 3.9B model size is 2.4 times faster, consumes 3.32 times less memory, but performs significantly\nbetter than LLaMA LLM 3B.\n2https://github.com/ggerganov/llama.cpp\n3https://github.com/NVIDIA/FasterTransformer\n3')

assistant:

はい、それが正しいわ。提示してくれたコードが取り出した出力から判断すると、'embedding_fulltext'テーブルは各ドキュメントのテキストを格納しているようだね。その本文は英語で書かれた学術論文の一部のように見えるわ。ただし、これらのレコードにはURLやその他のメタデータ情報が含まれていないようね。

SQLiteのテーブルは通常、複数の列から構成されており、それぞれの列には異なるタイプのデータが格納される。この'embedding_fulltext'テーブルには2つの列があると思われる。1つはおそらく一意な識別子（たとえばドキュメントID）で、もう1つはドキュメントのテキストデータを含む。

最初に示される数字はおそらくドキュメントIDを示していて（各ドキュメントに一意に割り当てられた番号）、それに続く長い文字列はドキュメントの内容を示しているわけね。

ただし、あくまでこれらは推測にすぎないため、実際にどのような情報が格納されているのかを確認するためには、PRAGMA table_info(embedding_fulltext);などのSQLiteコマンドを使用してテーブルのスキーマ情報を取得すると良いわ。それによって各列の名前やデータ型を確認できるから。

user:

ありがとうございます。次のようなコードを実行してみました。

import sqlite3

conn = sqlite3.connect('./chroma_paperdb/chroma.sqlite3')
c = conn.cursor()
c.execute("PRAGMA table_info(embedding_fulltext);")
rows = c.fetchall()
for row in rows:
    print(row)

conn.close()

結果は次のようでした。

(0, 'id', '', 0, None, 0)
(1, 'string_value', '', 0, None, 0)

assistant:

なるほど、これによって私たちはembedding_fulltextテーブルには'id'と'string_value'という名前の2つの列があることがわかったわ。この情報から、テーブル内の各レコードはドキュメントのIDとその文章の文字列を格納していると思われるわ。私が以前推測したように、'id'列はおそらく各ドキュメントに一意に割り当てられたIDで、'string_value'列はそのドキュメントのテキストを含んでいるんだと思うわ。

ただし、まだ確実なことは何も言えないわ。'embedding_fulltext_content 'テーブルのスキーマも調べてみて、そのテーブルが何を格納しているのかを理解するのが良いと思うわ。そこにはドキュメントのメタデータが存在するかもしれない。同様のコードを使って、そのテーブル用のPRAGMA table_info()も試してみて。

user:

ありがとうございます。次のようなコードを実行してみました。

import sqlite3

conn = sqlite3.connect('./chroma_paperdb/chroma.sqlite3')
c = conn.cursor()
c.execute("PRAGMA table_info(embedding_fulltext_content);")
rows = c.fetchall()
for row in rows:
    print(row)

conn.close()

結果は次のようでした。

(0, 'id', 'INTEGER', 0, None, 1)
(1, 'c0', '', 0, None, 0)
(2, 'c1', '', 0, None, 0)

assistant:

おっと、あなたが調べたembedding_fulltext_contentテーブルはちょっと興味深いね。ここでもidという列が存在していて、それはおそらくテキストデータのembedding_fulltextテーブルと対応していると思うわ。そして、その他の二つの列c0とc1が何を表すのかは、列の名前だけからは分からないわね。

ただ、肝心なのは、これらの列が何らかの形でドキュメントのメタデータを表現している可能性があるということね。もしかすると、c0とc1はどちらも何らかの形にエンコードされた情報を含んでいるのかもしれないわ。

ただ、これ以上はデータベースの作成者に尋ねるか、実際のレコードを見てみるしかないわね。'c0'や'c1'の列からいくつかのレコードを取り出し、それらが何を表しているのか理解しようとしてみるのはどう？それらの値に一貫性があるなら、それらが何を意味するのか推測する手がかりになるかもしれないわ。

user:

はい、次のコードを実行してみました。でもメタデータは含まれてないっぽいですね。

import sqlite3

conn = sqlite3.connect('./chroma_paperdb/chroma.sqlite3')
c = conn.cursor()
c.execute("SELECT * FROM embedding_fulltext_content")
rows = c.fetchall()
for row in rows[:3]:
    print(row)

conn.close()

(1, 1, 'The Era of 1-bit LLMs:\nAll Large Language Models are in 1.58 Bits\nShuming Ma∗Hongyu Wang∗Lingxiao Ma Lei Wang Wenhui Wang\nShaohan Huang Li Dong Ruiping Wang Jilong Xue Furu Wei⋄\nhttps://aka.ms/GeneralAI\nAbstract\nRecent research, such as BitNet [ WMD+23], is paving the way for a new era of 1-\nbit Large Language Models (LLMs). In this work, we introduce a 1-bit LLM variant,\nnamely BitNet b1.58 , in which every single parameter (or weight) of the LLM is\nternary {-1, 0, 1}. It matches the full-precision (i.e., FP16 or BF16) Transformer\nLLM with the same model size and training tokens in terms of both perplexity\nand end-task performance, while being significantly more cost-effective in terms\nof latency, memory, throughput, and energy consumption. More profoundly, the\n1.58-bit LLM defines a new scaling law and recipe for training new generations of\nLLMs that are both high-performance and cost-effective. Furthermore, it enables\na new computation paradigm and opens the door for designing specific hardware\noptimized for 1-bit LLMs.\n0.2961 -0.0495\n0.0413 ...… -0.4765\n0.2812 0.2403\n-0.1808 0.1304\n-0.4809 …… -0.1771\n-0.1741 -0.3853Transformer LLMs\n16-bit Float ( FP16/BF16)\nCost\nPerformance\n1 -1\n0…… 1\n-1 -1\n-1 1\n-1…… 0\n0 -1BitNet b1.58 (This Work)\n{-1, 0, 1}\nW=Pareto Improvement\nW=\n0.0413 0.3397 0.2812 0.2403\n-0.1808 0.1304\n-0.4809 0.32440.4322 -0.1771\n-0.1741 -0.38530.2961 -0.0495 -0.0924 -0.4765 𝒙𝟎\n𝒙𝟏\n𝒙𝟐\n𝒙𝟑𝟎.𝟐𝟗𝟔𝟏𝒙𝟎−𝟎.𝟎𝟒𝟗𝟓𝒙𝟏−𝟎.𝟎𝟗𝟐𝟒𝒙𝟐−𝟎.𝟒𝟕𝟔𝟓𝒙𝟑\n…\n1 -1\n0 1-1 1\n-1 -1\n-1 0\n-1 11 -1\n1 0𝒙𝟎−𝒙𝟏−𝒙𝟐+𝒙𝟑\n…𝒙𝟎\n𝒙𝟏\n𝒙𝟐\n𝒙𝟑1(.58) -bitFP16Model W Input X Output Y Y = f(W, X)\nGPU\nNew \nHardware \nFigure 1: 1-bit LLMs (e.g., BitNet b1.58 ) provide a Pareto solution to reduce inference cost (latency,\nthroughput, and energy) of LLMs while maintaining model performance. The new computation\nparadigm of BitNet b1.58 calls for actions to design new hardware optimized for 1-bit LLMs.\n∗Equal contribution. ⋄Corresponding author. S. Ma, L. Ma, L. Wang, W. Wang, S. Huang, L. Dong, J. Xue,\nF. Wei are with Microsoft Research. H. Wang and R. Wang are with University of Chinese Academy of Sciences.arXiv:2402.17764v1  [cs.CL]  27 Feb 2024')
(2, 2, '1 The Era of 1-bit LLMs\nIn recent years, the field of AI has seen a rapid growth in the size and capabilities of Large Language\nModels (LLMs). These models have demonstrated remarkable performance in a wide range of natural\nlanguage processing tasks, but their increasing size has posed challenges for deployment and raised\nconcerns about their environmental and economic impact due to high energy consumption. One\napproach to address these challenges is to use post-training quantization to create low-bit models\nfor inference [ XLS+23,FAHA23 ,CCKS23 ,TCS+24]. This technique reduces the precision of\nweights and activations, significantly reducing the memory and computational requirements of LLMs.\nThe trend has been to move from 16 bits to lower bits, such as 4-bit variants [ FAHA23 ,LTT+23].\nHowever, post-training quantization is sub-optimal, even though it is widely used in industry LLMs.\nRecent work on 1-bit model architectures, such as BitNet [ WMD+23], presents a promising direction\nfor reducing the cost of LLMs while maintaining their performance. Vanilla LLMs are in 16-bit\nfloating values (i.e., FP16 or BF16), and the bulk of any LLMs is matrix multiplication. Therefore,\nthe major computation cost comes from the floating-point addition and multiplication operations. In\ncontrast, the matrix multiplication of BitNet only involves integer addition, which saves orders of\nenergy cost for LLMs. As the fundamental limit to compute performance in many chips is power, the\nenergy savings can also be translated into faster computation.\nIn addition to computation, the process of transferring model parameters from DRAM to the memory\nof an on-chip accelerator (e.g., SRAM) can be expensive during inference. There have been attempts\nto enlarge SRAM to improve throughput, but this introduces significantly higher costs than DRAM.\nCompared to full-precision models, 1-bit LLMs have a much lower memory footprint from both a\ncapacity and bandwidth standpoint. This can significantly reduce the cost and time of loading weights\nfrom DRAM, leading to faster and more efficient inference.\nIn this work, we introduce a significant 1-bit LLM variant called BitNet b1.58 , where every parameter\nis ternary, taking on values of {-1, 0, 1}. We have added an additional value of 0 to the original 1-bit\nBitNet, resulting in 1.58 bits in the binary system. BitNet b1.58 retains all the benefits of the original\n1-bit BitNet, including its new computation paradigm, which requires almost no multiplication\noperations for matrix multiplication and can be highly optimized. Additionally, it has the same energy\nconsumption as the original 1-bit BitNet and is much more efficient in terms of memory consumption,\nthroughput and latency compared to FP16 LLM baselines. Furthermore, BitNet b1.58 offers two\nadditional advantages. Firstly, its modeling capability is stronger due to its explicit support for feature\nfiltering, made possible by the inclusion of 0 in the model weights, which can significantly improve\nthe performance of 1-bit LLMs. Secondly, our experiments show that BitNet b1.58 can match full\nprecision (i.e., FP16) baselines in terms of both perplexity and end-task performance, starting from a\n3B size, when using the same configuration (e.g., model size, training tokens, etc.).\n2 BitNet b1.58\nBitNet b1.58 is based on the BitNet architecture, which is a Transformer that replaces nn.Linear with\nBitLinear . It is trained from scratch, with 1.58-bit weights and 8-bit activations. Compared to the\noriginal BitNet, it introduces some modifications that we summarize below.\nQuantization Function. To constrain the weights to -1, 0, or +1, we adopt an absmean quantization\nfunction. It first scales the weight matrix by its average absolute value, and then round each value to\nthe nearest integer among {-1, 0, +1}:\nfW= RoundClip(W\nγ+ϵ,−1,1), (1)\nRoundClip( x, a, b ) = max( a,min(b,round( x))), (2)\nγ=1\nnmX\nij|Wij|. (3)\nThe quantization function for activations follows the same implementation in BitNet, except that\nwe do not scale the activations before the non-linear functions to the range [0, Qb]. Instead, the\n2')
(3, 3, 'Models Size Memory (GB) ↓ Latency (ms) ↓ PPL↓\nLLaMA LLM 700M 2.08 (1.00x) 1.18 (1.00x) 12.33\nBitNet b1.58 700M 0.80 (2.60x) 0.96 (1.23x) 12.87\nLLaMA LLM 1.3B 3.34 (1.00x) 1.62 (1.00x) 11.25\nBitNet b1.58 1.3B 1.14 (2.93x) 0.97 (1.67x) 11.29\nLLaMA LLM 3B 7.89 (1.00x) 5.07 (1.00x) 10.04\nBitNet b1.58 3B 2.22 (3.55x) 1.87 (2.71x) 9.91\nBitNet b1.58 3.9B 2.38 (3.32x) 2.11 (2.40x) 9.62\nTable 1: Perplexity as well as the cost of BitNet b1.58 and LLaMA LLM.\nModels Size ARCe ARCc HS BQ OQ PQ WGe Avg.\nLLaMA LLM 700M 54.7 23.0 37.0 60.0 20.2 68.9 54.8 45.5\nBitNet b1.58 700M 51.8 21.4 35.1 58.2 20.0 68.1 55.2 44.3\nLLaMA LLM 1.3B 56.9 23.5 38.5 59.1 21.6 70.0 53.9 46.2\nBitNet b1.58 1.3B 54.9 24.2 37.7 56.7 19.6 68.8 55.8 45.4\nLLaMA LLM 3B 62.1 25.6 43.3 61.8 24.6 72.1 58.2 49.7\nBitNet b1.58 3B 61.4 28.3 42.9 61.5 26.6 71.5 59.3 50.2\nBitNet b1.58 3.9B 64.2 28.7 44.2 63.5 24.2 73.2 60.5 51.2\nTable 2: Zero-shot accuracy of BitNet b1.58 and LLaMA LLM on the end tasks.\nactivations are all scaled to [−Qb, Qb]per token to get rid of the zero-point quantization. This is\nmore convenient and simple for both implementation and system-level optimization, while introduces\nnegligible effects to the performance in our experiments.\nLLaMA-alike Components. The architecture of LLaMA [ TLI+23,TMS+23] has been the de-\nfacto backbone for open-source LLMs. To embrace the open-source community, our design\nofBitNet b1.58 adopts the LLaMA-alike components. Specifically, it uses RMSNorm [ ZS19 ],\nSwiGLU [ Sha20 ], rotary embedding [ SAL+24], and removes all biases. In this way, BitNet b1.58\ncan be integrated into the popular open-source software (e.g., Huggingface, vLLM [ KLZ+23], and\nllama.cpp2) with minimal efforts.\n3 Results\nWe compared BitNet b1.58 to our reproduced FP16 LLaMA LLM in various sizes. To ensure a fair\ncomparison, we pre-trained the models on the RedPajama dataset [ Com23 ] for 100 billion tokens.\nWe evaluated the zero-shot performance on a range of language tasks, including ARC-Easy [ YBS19 ],\nARC-Challenge [ YBS19 ], Hellaswag [ ZHB+19], Winogrande [ SBBC20 ], PIQA [ BZB+19], Open-\nbookQA [ MCKS18 ], and BoolQ [ CLC+19]. We also reported the validation perplexity on the\nWikiText2 [MXBS16] and C4 [RSR+19] datasets.\nWe compared the runtime GPU memory and latency of both LLaMA LLM andBitNet b1.58 . The\nresults were measured using the FasterTransformer3codebase, which is well-optimized for LLM\ninference latency on GPU devices. The 2-bit kernel from Ladder [ WMC+23] is also integrated for\nBitNet b1.58. We reported the time per output token, as it is the major cost for inference.\nTable 1 summarizes the perplexity and the cost for BitNet b1.58 andLLaMA LLM . It shows that\nBitNet b1.58 starts to match full precision LLaMA LLM at 3B model size in terms of perplexity,\nwhile being 2.71 times faster and using 3.55 times less GPU memory. In particular, BitNet b1.58 with\na 3.9B model size is 2.4 times faster, consumes 3.32 times less memory, but performs significantly\nbetter than LLaMA LLM 3B.\n2https://github.com/ggerganov/llama.cpp\n3https://github.com/NVIDIA/FasterTransformer\n3')

assistant:

全くわからないわね…ただ、このテーブルのデータを見ると、それぞれのレコードは一つの文書に対するエンベッディングのように見えるわ。そして、id列はおそらく各文書を一意に識別するためのものだと思われる。そして、それぞれのidに対応してc0とc1の列には巨大なテキストデータが格納されているわ。

しかし、正確に何を表しているのかについては、申し訳ないけどこれ以上言うことはできないわ。このテーブルがどのように使われるのか、どのような目的で作成されたのかによるから。それについて何か情報があれば、もっと深く探ることができるかもしれない。それでも、c0とc1の列は非常に大きなテキストデータを含んでいるため、これらも何らかの情報をエンコードしている可能性があるわ。これ以上の解析にはそのテキストデータをより詳しく調べる必要があるわね。

user:

そうですね。助言ありがとうございます。そういうわけで、Chroma で英論文データベースを作成するのは、「追加」が難しいということで、あまり実用に向かない気がしてきました。今後、Faiss、Annoy等を使って、もっといろいろ調査してみたいと思います。その時はまたお付き合いくださいね！

assistant:

もちろんよ、いつでも助けるためにここにいるからね！あなたの調査が次にどう進展するか、非常に興味津々よ。その各ツールキットがあなたのニーズにどの程度適合するか、またそれぞれのツールキットの長所と短所が何であるか、詳細を知りたいわ。調査に必要な情報があればいつでも教えて。それでは、次の冒険に幸運を！

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

LangChain で 英論文データベースを作る : Chroma 編

user:

assistant:

user:

assistant:

user:

assistant:

user:

assistant:

user:

assistant:

user:

assistant:

user:

assistant:

user:

assistant:

user:

assistant:

user:

assistant:

user:

assistant:

user:

assistant:

user:

assistant:

LangChain で英論文データベースを作る : Chroma 編