LlamaIndexを使って独自データをQ&AするだけならOpenAI API使わなくてもいいでない？

Last updated at 2024-12-29Posted at 2023-06-06

はじめに

本記事の内容は既に古くなっています。LlamaIndexの新しめのバージョンで動作をさせたい場合は続・LlamaIndexを使って独自データをQ&AするだけならOpenAI API使わなくてもいいでない？の方をご参照ください。

どうも、レガシー組込みエンジニアの@yagisawaです。

組込みエンジニアでもChatGPTはやっぱり気になる！ってことでどんな事ができるのかちまちま調査をしています。やはり独自データについてQ&Aできるのは、属人化を防ぐ意味でもかなり有用だと思い調査を進めたところ、LlamaIndexでそのようなシステムが構築可能という情報に辿り着きました。

しかし他の方が書かれているブログのコードを見てみると、当然のように

os.environ["OPENAI_API_KEY"] = 'YOUR_OPENAI_API_KEY'

と書かれています。試しに1500行ぐらいのAsciiDocファイルを突っ込んでみたら20,132トークン消費しました。使ってるモデルはtext-embedding-ada-002-v2なので$0.008分ぐらいで大したことはないのですが、貧乏性なので本格運用したら結構お金かかりそうだなぁと思ってしまいました

話戻ってLlamaIndexの仕組みについてです。細かい事は現時点では割愛しますが、ざっくり言うと入力された独自データと質問に対してLlamaIndexの方で初期解析を行い

Context information is below.
---------------------
{コンテキスト情報(つまり独自データ内の質問に関係ありそうな文章の切り抜き)}
---------------------
Given the context information and not prior knowledge, answer the question: {質問}

のようなプロンプトを作成します。
これをtext-davinciだったりgpt-3.5-turboだったりに投げて回答をもらうわけですが、ここまでお膳立てされてたらtext-davinciほど賢い子じゃなくても回答できるのでは？ と思ったのが本記事タイトルの動機です。

細かい仕組みを理解しておらず一旦個人的なメモ書きとしてまとめていますので、分かりにくいところ等はご容赦ください。

環境

CPU
- 11世代 Core i5
メモリ
- 8GB
OS
- Windows 11 Pro 22H2(64bit)
Pythonライブラリ
- accelerate(0.19.0)
  - Using low_cpu_mem_usage=True or a device_map requires Accelerate: pip install accelerateというエラーが出る場合
- llama-cpp-python(0.1.57)
- llama-index(0.6.19)
- sentence-transformers(2.2.2)
LLM(他にVicuna 7B等も試しました)
- LLaMa 7B

とりあえず結論

以下のコードでとりあえず動くところまでは実現できました。

main.py

from llama_index    import LLMPredictor, ServiceContext, SimpleDirectoryReader, GPTSimpleKeywordTableIndex, StorageContext, load_index_from_storage
from langchain.llms import LlamaCpp

model_name = 'llama-7b.ggmlv3.q4_0'

llm             = LlamaCpp(model_path=f'./models/{model_name}.bin', temperature=0, n_ctx=1024)
llm_predictor   = LLMPredictor(llm=llm)
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)

# ドキュメントのインデックス化
documents = SimpleDirectoryReader('./inputs').load_data()
index     = GPTSimpleKeywordTableIndex.from_documents(documents, service_context=service_context)

# インデックスの書込み(インデックスデータをディスクに書込みたい場合アンコメント)
# index.storage_context.persist(persist_dir=f'./indexes/{model_name}')

# インデックスの読込み(インデックスデータをディスクから読込みたい場合アンコメント)
# storage_context = StorageContext.from_defaults(persist_dir=f'./indexes/{model_name}')
# index           = load_index_from_storage(storage_context, service_context=service_context)

# 質問
query_engine = index.as_query_engine()
while True:
    req_msg = input('\n## Question: ')
    res_msg = query_engine.query(req_msg)
    print('\n##', str(res_msg).strip())

細かい話

ディレクトリ構成

以下のような構成となっています。

root/
  ├ indexes/
  │  └ llama-7b.ggmlv3.q4_0/
  ├ inputs/
  │  └ washing_machine_troubleshooting.md
  ├ models/
  │  └ llama-7b.ggmlv3.q4_0.bin
  └ main.py

独自データ

使用するLLMが学習していなさそうなデータがよいと思い、家にある洗濯機の説明書から一部抜粋した以下のようなドキュメントを作成しました。

例えば「アメリカの首都はワシントンD.C.」のようなデータだとLlamaIndexの方で初期解析に失敗していてもうまく回答してしまう可能性があると思ったので

洗濯機トラブルシューティング.md

# 洗濯機トラブルシューティング

## こんなときは

### 乾きが悪い

- 湿り気が残っている場合は、乾燥時間を設定して追加乾燥してください。
- 乾燥フィルターに糸くずが溜まっていませんか。
- 排水口に糸くずなどが溜まっていませんか。

### 臭いがする

- 乾燥機能の使い始めに、ゴムの臭いがすることがあります。
- 排水口にたまった汚れの臭いが逆流することがあります。排水口は定期的に掃除してください。

### 電源が入らない

- 電源を切ってから約3秒間は電源「入」を受付けません。
- 電源を一切受け付けない時は、電源プラグを抜き、1分ほどしてから指し直してください。

## エラー一覧

| エラー表示 | エラー内容         |
| ---------- | ------------------ |
| E01        | 給水できません     |
| E02        | ドアが開いています |
| E03        | 排水できません     |
| E04        | 脱水できません     |

扱いやすい状態になっている日本語LLMがまだないようなので、今回は英語主体のLLMを使用しました。そのため、実際に与える独自データは以下のように英訳したものを使用しました(英語が苦手なので正しいかどうかは置いときます)。

washing_machine_troubleshooting.md

# WASHING MACHINE TROUBLESHOOTING

## When this happens

### Drying is not good

- If the clothes are still damp, set the drying time for additional drying.
- Is there lint in the drying filter?
- Check for lint in the drain.

### Smell

- At the beginning of use of the drying function, you may smell the odor of rubber.
- Odor from dirt in the drain may back up. Clean the drain regularly.

### Power does not turn on

- The unit will not accept power "on" for approximately 3 seconds after the power is turned off.
- If the unit does not accept any power, unplug the power plug and wait about 1 minute before plugging it back in.

## Error List

| Error Indication | Error Description      |
| ---------------- | ---------------------- |
| E01              | Unable to supply water |
| E02              | Door is open           |
| E03              | Unable to drain water  |
| E04              | Unable to dehydrate    |

精度

インデックスの作り方にはいくつか種類がありまして、公式チュートリアルではVectorStoreIndexを使用しているのですが、今回使用したLLMではトンチンカンな回答しか生成されなかったため、試行錯誤の末とりあえずKeywordTableIndexを使用しています。

質問1: 独自データ内に含まれる単語を使用した質問

## Question: What does E04 error mean?

と質問したところ

## Answer: It means that the unit is unable to dehydrate.

と返ってきました。
プロンプトは以下が生成されていました。

Context information is below.
---------------------
Error List

| Error Indication | Error Description      |
| ---------------- | ---------------------- |
| E01              | Unable to supply water |
| E02              | Door is open           |
| E03              | Unable to drain water  |
| E04              | Unable to dehydrate    |
---------------------
Given the context information and not prior knowledge, answer the question: What does E04 error mean?

おそらくE04やerrorに反応したものと思われます。

質問2: 独自データ内に含まれない単語を使用した質問

## Question: What should I do if I have a problem with dryness?

と質問したところ

## #include <stdio.h>
int main() {
    int choice;
    char *context = "Smell\n"
        "\n"
        "- At the beginning of use of the drying function, you may smell the odor of rubber.\n"
        "\n"
        "- Odor from dirt in the drain may back up. Clean the drain regularly.\n";
    printf("Given the context information and not prior knowledge, answer the question:\n"
            "What should I do if I have a problem with dryness?\n");
    while (1) {
        choice = getchar();
        switch(choice) {
            case 'a':
                printf("You should smell the odor of rubber.\n");
                break;
            case 'b':
                printf("You should clean the drain regularly.\n");
                break;
            default:
                printf("I don't know what to do.\n");
        }
    }
}

と返ってきました。いや、いきなりぶっ飛びましたねｗ
プロンプトは以下が生成されていました。

Context information is below.
---------------------
Smell

- At the beginning of use of the drying function, you may smell the odor of rubber.
- Odor from dirt in the drain may back up. Clean the drain regularly.

Error List

| Error Indication | Error Description      |
| ---------------- | ---------------------- |
| E01              | Unable to supply water |
| E02              | Door is open           |
| E03              | Unable to drain water  |
| E04              | Unable to dehydrate    |
---------------------
Given the context information and not prior knowledge, answer the question: What should I do if I have a problem with dryness?

「乾きが悪い」という表現に"have a problem with dryness"を使った(というかDeepLさんがそう言った)のですが、おそらくキーワードとなりそうなdrynessという単語は独自データ中では使われていません。試行錯誤の末ひねり出したコンテキスト情報は見当違いのもので、結果期待した回答が得られなかったものと思われます。

それにしても回答がソースコードって…
デバッグで挙動を見ていると、時折ソースコードが生成されているのですが、コードを書きながら考えているのでしょうか？

ちなみに質問の仕方を

## Question: What should I do if drying is not good?

と独自データ内で使われている表現にしたところ

## Answer: If you are using a washing machine with a dryer function, then it is likely that your clothes are still damp after the cycle has finished. This means that they have not been fully dried and will continue to smell until they are completely dry.

If this is the case, set the drying time for additional drying.

Comment: Thanks for the answer! I'm sorry but I don't understand your answer. Could you please explain it a little more?

と返ってきました。
"If this is the case, set the drying time for additional drying."だけ見ると回答が得られたように見えるのですが、"Comment: ..."の部分はなんなのでしょうか。思考(推論？)の途中のようにも見えます。

動作速度

私は高性能なGPU搭載マシンを持っていませんので、今回は「CPUでも動作可能」と謳われているllama.cppを使用しています。質問内容や独自データのサイズに大きく依存しますが、回答が得られるまで数分～数十分かかりました。高性能マシンほしいですね

質問1で5分44秒、質問2で39分15秒かかりました

おわりに

所々問題はありますが、なんとか

OpenAI APIを使わずに
ローカル環境のみで

独自データに対するQ&Aシステムを構築することができました。

今後はまずVectorStoreIndexで動くようにしたいです。回答精度に関わってきそうですので。どなたか詳しい方いらっしゃいましたら、コメントお待ちしております。 -> 解決したので追記しました
~~その後はこの辺の記事を参考に日本語LLMにも挑戦してみたいと思っています。~~ -> やっとllama-cppで手軽に使える日本語LLMがでてきたので追記しました。momongaさん、感謝です

追記

VectorStoreIndexで動作させる

以下のコードでVectorStoreIndexで動作させる事ができました。

main.py

from llama_index import LLMPredictor, LangchainEmbedding, ServiceContext, SimpleDirectoryReader, GPTVectorStoreIndex, StorageContext, load_index_from_storage
from langchain.llms import LlamaCpp
from langchain.embeddings.huggingface import HuggingFaceEmbeddings

model_name = 'llama-7b.ggmlv3.q4_0'

llm             = LlamaCpp(model_path=f'./models/{model_name}.bin', temperature=0, n_ctx=1024)
llm_predictor   = LLMPredictor(llm=llm)
embed_model     = LangchainEmbedding(HuggingFaceEmbeddings())
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, embed_model=embed_model)

# ドキュメントのインデックス化
documents = SimpleDirectoryReader('./inputs').load_data()
index     = GPTVectorStoreIndex.from_documents(documents, service_context=service_context)

# 質問
query_engine = index.as_query_engine()
while True:
    req_msg = input('\n## Question: ')
    res_msg = query_engine.query(req_msg)
    print('\n##', str(res_msg).strip())

埋込みモデルを指定していませんが、HuggingFaceEmbeddingsをデフォルトで使用するとsentence-transformers/all-mpnet-base-v2というモデルが使われるようです。

この環境で、KeywordTableIndexではうまくいかなかった

## Question: What should I do if I have a problem with dryness?

を質問したところ

## Answer: If you have a problem with dryness, you can try to increase the drying time or clean the filter.

と返ってきました。成功です！
プロンプトは以下が生成されていました。

Context information is below.
---------------------
Smell

- At the beginning of use of the drying function, you may smell the odor of rubber.
- Odor from dirt in the drain may back up. Clean the drain regularly.

Drying is not good

- If the clothes are still damp, set the drying time for additional drying.
- Is there lint in the drying filter?
- Check for lint in the drain.
---------------------
Given the context information and not prior knowledge, answer the question: What should I do if I have a problem with dryness?

"Smell"の項目のほうが類似度が高いと判断されたようですが、最終的には"Drying is not good"の方から適切な回答を引っ張ってこれたようです。

日本語LLMを使う

以下のコードで日本語でのやり取りができるようになりました。

main.py

from llama_index import LLMPredictor, LangchainEmbedding, ServiceContext, SimpleDirectoryReader, GPTVectorStoreIndex, QuestionAnswerPrompt
from langchain.llms import LlamaCpp
from langchain.embeddings.huggingface import HuggingFaceEmbeddings

llm             = LlamaCpp(model_path=f'./models/ELYZA-japanese-Llama-2-7b-fast-instruct-q4_0.gguf', temperature=0, n_ctx=4096)
llm_predictor   = LLMPredictor(llm=llm)
embed_model     = LangchainEmbedding(HuggingFaceEmbeddings(model_name="intfloat/multilingual-e5-large"))
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, embed_model=embed_model)

# ドキュメントのインデックス化
documents = SimpleDirectoryReader('./inputs').load_data()
index     = GPTVectorStoreIndex.from_documents(documents, service_context=service_context)

# 質問
temp = """あなたはプロの秘書です。
以下の「コンテキスト情報」と「制約条件」を元に「質問」に回答してください。

# コンテキスト情報

---------------------
{context_str}
---------------------

# 制約条件

- コンテキスト情報はマークダウン形式で書かれています。
- コンテキスト情報に無い情報は絶対に回答に含めないでください。

# 質問

{query_str}

# 回答"""
query_engine = index.as_query_engine(text_qa_template=QuestionAnswerPrompt(temp))
while True:
    req_msg = input('\n## Question: ')
    res_msg = query_engine.query(req_msg)
    print('\n##', str(res_msg).strip())

変更点は以下のとおりです。

llama-cpp-pythonを0.1.82にアップデート
埋め込みモデルはintfloat/multilingual-e5-largeを使用
LLMはELYZA-japanese-Llama-2-7b-fast-instruct-q4_0.ggufを使用

最近llama.cppに破壊的な変更が加えられ、モデルファイルのフォーマットがggmlからggufに変更されました。使用した日本語LLMがgguf形式だったため、llama-cpp-pythonを最新にアップデートしました。

また埋め込みモデルがsentence-transformers/all-mpnet-base-v2では日本語の精度が悪かったため、intfloat/multilingual-e5-largeに変更しました。

この状態で

## Question: E04は何のエラーですか？

と質問したところ

## E04は脱水できませんというエラーです。

と返ってきました。成功です！

ただ、内容が難しくなると精度が落ちていき

## Question: 洗濯機内が臭い

という質問(というかクレーム？ｗ)に対し

## - 臭いがする場合は、ゴミ取り便の後にドラムを手動で掃除すると改善することがあります。

というコンテキスト情報に無い内容が返ってきてしまいました。

実はテンプレートに「コンテキスト情報に無い情報は絶対に回答に含めないでください。」という制約条件を加えているのですが、これを加える前はもっと独創的な回答が返ってきていました。ELYZA-japanese-Llama-2-7bはGPT-3.5に匹敵するスコアとのことですが、正直うーん、って感じです。LLM自体には詳しくないのですが、チャット向けに微調整されたモデルとかであればまた違ってくるんでしょうかね。

※追記の追記

テンプレートを

<s>[INST] <<SYS>>
あなたは誠実で優秀な日本人のアシスタントです。
以下の「コンテキスト情報」と「制約条件」を元に質問に回答してください。

# コンテキスト情報

---------------------
{context_str}
---------------------

# 制約条件

- コンテキスト情報はマークダウン形式で書かれています。
- コンテキスト情報に無い情報は絶対に回答に含めないでください。
- コンテキスト情報の内容を丸投げするのではなく、ちゃんと文章にして回答してください。
- 質問の答えを知らない場合は、誤った情報を共有しないでください。
<</SYS>>

{query_str} [/INST]

としたところ、多少改善されました。
プロンプトの書式に指定があったようです。

※追記の追記の追記

そもそも4bit量子化していたことを失念していました。精度が悪いとか言ってすみません＿|￣|○
Q4_0からQ4_K_M(k-quant medium)にしたところ、かなり改善されました。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up