ローカルでEmbeddingしてローカルLLMでIndex検索するデモ

Last updated at 2025-01-11Posted at 2025-01-11

注意

つらいところ
ChatGPTに聞きながらやったけど、API仕様が年中変わってて検索が難しい。
ChatGPTもいまいちアホな実装を書いてくるからLlamaの本家のサンプルを見たほうが良い。

目的

私がやりたかったのはOpenAIのAPIを使わないで、オフラインで完結するEmbedding(埋め込み)と検索LLM環境を作りたかった。要するにトークン課金したくない貧乏人。

結果

OpenAI APIなしでもできた。
ネットワークから遮断してもOK。
llama_indexとLlamaCppは神。

概要

構成	モジュール
Embedding(埋め込み)	「intfloat/multilingual-e5-large」をHuggingFaceEmbeddingで利用
LLM	「elyza/Llama-3-ELYZA-JP-8B-q4_k_m.gguf」をLlambaCppで利用
ベクトル検索と類似性検索	Faiss

処理概要

元データ：ニュース記事切り抜きしたテキスト
出典はヤフーニュース記事
https://news.yahoo.co.jp/expert/articles/3966f26897ac443366150559f3ecf9a88dd5375c
「intfloat/multilingual-e5-large」で埋め込みしてFaissのindexを作ってPickle形式で保存
「elyza/Llama-3-ELYZA-JP-8B-q4_k_m.gguf」をLLMモデルに設定
日本語で記事について質問すると回答される

環境

ヤフオクで買った13000円の富士通UH75/B3 + 16GBメモリ
CPU：Intel Core i5-8250U
メモリ：DDR4 20.0GB
GPU：実質なし
Python 3.11.9

UH75/B3はキーボードが良いぞ！マジでお買い得だ！

参考にさせていただいたページ

コード

※pklファイルに保存してロードしてますが、別ファイル化した時の参考なので、こんなことしなくてもよいです。

OfflineEmbeddingAndLlm.py

# OpenAI APIを使わずに完全オフラインでindexを作り、LLMで回答させるデモ

from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import Document
from llama_index.core.settings import Settings
from llama_index.core import get_response_synthesizer
from llama_index.core.query_engine import RetrieverQueryEngine
# FAISSのインデックスを使用
# pip install faiss-cpu
import faiss
#pip install llama-index-vector-stores-faiss
from llama_index.vector_stores.faiss import FaissVectorStore    # ベクトルストア
from llama_index.core.retrievers import VectorIndexRetriever    # 検索機
from llama_index.core import VectorStoreIndex
from llama_index.core.node_parser import SentenceSplitter
# LlamaCppでローカルのggufでllm実行
# pip install llama-index-llms-langchain
from langchain_community.llms.llamacpp import LlamaCpp
#ベクトルストアの保存
import pickle

# LLM, 埋め込みモデルをローカル指定に設定
print("Settings...")
# https://huggingface.co/elyza/Llama-3-ELYZA-JP-8B-GGUF/blob/main/Llama-3-ELYZA-JP-8B-q4_k_m.gguf
llm = LlamaCpp(model_path=f'./models/Llama-3-ELYZA-JP-8B-Q4_K_M.gguf', temperature=0, n_ctx=1024)
Settings.llm         = llm
Settings.embed_model = HuggingFaceEmbedding(model_name="intfloat/multilingual-e5-large")

########## サンプルドキュメント（ニュース記事抜粋） ########## 
texts = [
    "1月7日（米国時間）、メタが第三者機関と連携したファクトチェックのプログラムを終了し、『コミュニティノート』を導入すると発表したことが話題になっています。",
    "ファクトチェックとは、SNSなどで広まっている真偽不明の情報を検証し、発表することを指しています。",
    "メタはファクトチェックのプログラムを終了し、代わりにコミュニティノートの仕組みを導入しました。",
]
documents = [Document(text=t) for t in texts]

########## インデックスの作成 ##########
# FAISSインデックスの初期化
print("Settings faiss.")
dimension = 1024  # "intfloat/multilingual-e5-large"の埋め込み次元数
faiss_index = faiss.IndexFlatL2(dimension)  # L2距離（ユークリッド距離）で初期化

# FAISSストアの作成
print("FaissVectorStore.")
faiss_store = FaissVectorStore(faiss_index=faiss_index)

# ドキュメントからインデックスを作成
print("VectorStoreIndex.from_documents.")
faiss_index = VectorStoreIndex.from_documents(
    documents,
    vector_store=faiss_store,  # ローカルのベクトルストアを指定
    transformations=[SentenceSplitter(chunk_size=512, chunk_overlap=20)],
    show_progress=True
)

# FAISSインデックスと関連データをまとめて保存（別ファイル化したい場合の参考）
print("save faiss data.")
save_file_name = "faiss_index.pkl"
with open(save_file_name, "wb") as f:
    pickle.dump(faiss_index, f)

########## ここからインデックスの検索 ##########
# 保存されたデータを読み込む（別ファイル化したい場合の参考）
print("load faiss data.")
load_file_name = "faiss_index.pkl"
with open(load_file_name, "rb") as f:
    data = pickle.load(f)
# FAISSインデックスを読み込む
faiss_index_from_file = data

# 検索器の設定
print("VectorIndexRetriever.")
retriever = VectorIndexRetriever(
    index=faiss_index_from_file,
    similarity_top_k=2,
)

# 応答合成器の設定
print("get_response_synthesizer.")
response_synthesizer = get_response_synthesizer(
    response_mode="tree_summarize",
)

# クエリエンジンのアセンブル(構築)
print("RetrieverQueryEngine.")
query_engine = RetrieverQueryEngine(
    retriever=retriever,
    response_synthesizer=response_synthesizer,
)

# 質問を実行
question = "メタがファクトチェックの代わりに導入した仕組みは何ですか？"
print(f"Q:{question}")
response = query_engine.query(question)
print(f"A:{response}")

負荷

埋め込み時は意外とCPUを食ってない。
LLM動作時のCPU負荷がすごい。GPUかNPU乗ってないときびしい。
LLM使う場合、Core-i5 Gen8はすでに化石化している。PenⅢでXPを動かしていたころを思い出すぜ。
メモリは16GBでもいけるかも。
ディスクアクセスは試しにpickleデータを書き込んだため。
Wifiつかってないでしょ！

pip freezeメモ(動作したモジュール一覧)

pip freeze

accelerate==1.2.1
aiohappyeyeballs==2.4.4
aiohttp==3.11.11
aiosignal==1.3.2
annotated-types==0.7.0
anyio==4.8.0
asgiref==3.8.1
attrs==24.3.0
backoff==2.2.1
bcrypt==4.2.1
beautifulsoup4==4.12.3
build==1.2.2.post1
cachetools==5.5.0
certifi==2024.12.14
charset-normalizer==3.4.1
chroma-hnswlib==0.7.6
chromadb==0.6.2
click==8.1.8
cmake==3.31.2
colorama==0.4.6
coloredlogs==15.0.1
dataclasses-json==0.6.7
decorator==5.1.1
Deprecated==1.2.15
dirtyjson==1.0.8
diskcache==5.6.3
distro==1.9.0
durationpy==0.9
faiss-cpu==1.9.0.post1
fastapi==0.115.6
ffmpeg-python==0.2.0
filelock==3.16.1
filetype==1.2.0
flatbuffers==24.12.23
frozenlist==1.5.0
fsspec==2024.12.0
future==1.0.0
google-auth==2.37.0
googleapis-common-protos==1.66.0
greenlet==3.1.1
grpcio==1.69.0
h11==0.14.0
httpcore==1.0.7
httptools==0.6.4
httpx==0.27.2
httpx-sse==0.4.0
huggingface-hub==0.27.1
humanfriendly==10.0
idna==3.10
imageio==2.36.1
imageio-ffmpeg==0.5.1
importlib_metadata==8.5.0
importlib_resources==6.5.2
InstructorEmbedding==1.0.1
Jinja2==3.1.5
jiter==0.8.2
joblib==1.4.2
jsonpatch==1.33
jsonpointer==3.0.0
kubernetes==31.0.0
langchain==0.3.14
langchain-community==0.3.14
langchain-core==0.3.29
langchain-text-splitters==0.3.5
langsmith==0.2.10
llama-cloud==0.1.8
llama-index==0.12.10
llama-index-agent-openai==0.4.1
llama-index-cli==0.4.0
llama-index-core==0.12.10.post1
llama-index-embeddings-huggingface==0.5.0
llama-index-embeddings-instructor==0.3.0
llama-index-embeddings-langchain==0.3.0
llama-index-embeddings-openai==0.3.1
llama-index-indices-managed-llama-cloud==0.6.3
llama-index-llms-huggingface==0.4.2
llama-index-llms-langchain==0.5.0
llama-index-llms-ollama==0.5.0
llama-index-llms-openai==0.3.13
llama-index-multi-modal-llms-openai==0.4.2
llama-index-program-openai==0.3.1
llama-index-question-gen-openai==0.3.0
llama-index-readers-file==0.4.2
llama-index-readers-llama-parse==0.4.0
llama-index-vector-stores-faiss==0.3.0
llama-parse==0.5.19
llama_cpp_python==0.3.6
markdown-it-py==3.0.0
MarkupSafe==3.0.2
marshmallow==3.25.0
mdurl==0.1.2
mmh3==5.0.1
monotonic==1.6
moviepy==2.1.1
mpmath==1.3.0
multidict==6.1.0
mypy-extensions==1.0.0
nest-asyncio==1.6.0
networkx==3.4.2
nltk==3.9.1
numpy==1.26.4
oauthlib==3.2.2
ollama==0.4.5
onnxruntime==1.20.1
openai==1.59.6
opentelemetry-api==1.29.0
opentelemetry-exporter-otlp-proto-common==1.29.0
opentelemetry-exporter-otlp-proto-grpc==1.29.0
opentelemetry-instrumentation==0.50b0
opentelemetry-instrumentation-asgi==0.50b0
opentelemetry-instrumentation-fastapi==0.50b0
opentelemetry-proto==1.29.0
opentelemetry-sdk==1.29.0
opentelemetry-semantic-conventions==0.50b0
opentelemetry-util-http==0.50b0
orjson==3.10.14
overrides==7.7.0
packaging==24.2
pandas==2.2.3
pathspec==0.12.1
pillow==10.4.0
posthog==3.7.5
proglog==0.1.10
propcache==0.2.1
protobuf==5.29.3
psutil==6.1.1
pyasn1==0.6.1
pyasn1_modules==0.4.1
pydantic==2.10.5
pydantic-settings==2.7.1
pydantic_core==2.27.2
Pygments==2.19.1
pypdf==5.1.0
PyPika==0.48.9
pyproject_hooks==1.2.0
pyreadline3==3.5.4
python-dateutil==2.9.0.post0
python-dotenv==1.0.1
pytz==2024.2
PyYAML==6.0.2
regex==2024.11.6
requests==2.32.3
requests-oauthlib==2.0.0
requests-toolbelt==1.0.0
rich==13.9.4
rsa==4.9
safetensors==0.5.2
scikit-learn==1.6.0
scikit_build_core==0.10.7
scipy==1.15.0
sentence-transformers==2.7.0
shellingham==1.5.4
six==1.17.0
sniffio==1.3.1
soupsieve==2.6
SQLAlchemy==2.0.36
starlette==0.41.3
striprtf==0.0.26
sympy==1.13.1
tenacity==8.5.0
text-generation==0.7.0
threadpoolctl==3.5.0
tiktoken==0.8.0
tokenizers==0.21.0
torch==2.5.1
tqdm==4.67.1
transformers==4.47.1
typer==0.15.1
typing-inspect==0.9.0
typing_extensions==4.12.2
tzdata==2024.2
urllib3==1.26.20
uvicorn==0.34.0
watchfiles==1.0.3
websocket-client==1.8.0
websockets==14.1
wrapt==1.17.0
yarl==1.18.3
zipp==3.21.0

サンプルテキストを差し替えたい人へ

example.txtに書かれたテキストをチャンクサイズに合わせて配列を出すツールサンプル。
出てきたテキストの中身をペーストするか、テキスト自体を読み込むかお好みで。

TextChunkSplit.py

import os
import textwrap

# 実行中のスクリプトのディレクトリパスを取得
current_dir = os.path.dirname(os.path.abspath(__file__))

# example.txt のパスを作成
file_path = os.path.join(current_dir, 'example.txt')

# ファイルを読み込む
with open(file_path, 'r', encoding='utf-8') as file:
    text = file.read()

# 1024文字付近で分割（改行を考慮する）
wrapped_text = textwrap.fill(text, width=1024, break_long_words=False)

# チャンクごとのリストを作成
texts = wrapped_text.split('\n')

# split_chunk.txt に書き出し
output_path = os.path.join(current_dir, 'split_chunk.txt')
with open(output_path, 'w', encoding='utf-8') as output_file:
    output_file.write('texts = [\n')
    for chunk in texts:
        outtext = f'    \"{chunk}\",\n'
        output_file.write(outtext)  # 各チャンクの後に改行を追加
    output_file.write(']')


print(f"分割されたテキストは {output_path} に保存されました。")

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up