More than 1 year has passed since last update.

データブリックス・ジャパン株式会社

DatabricksでNebulaGraphを用いた知識グラフベースRAGを動かしてみる

Last updated at 2024-01-24Posted at 2024-01-22

こちらを読んでいたら、RAGで知識グラフを使うアプローチがあることを知りました。

そして、NebluraGraphなる製品の存在も知りました。グラフデータベース。

ここでは、DatabricksクラスターでNeburaGraphを稼働させ、ノートブックからアクセスし、知識グラフを用いたRAGを動かすところまでやります。

NebulaGraphのインストール

QuickStartもあるのですが、Docker前提なのでパスします。

シングルノードマシンへのインストール、tarからインストールするアプローチを取ります。

Databricksのクラスターを起動して、Webターミナルを立ち上げます。

クラスターのドライバーはUbuntu Linuxなので、通常のLinuxと同じノリでインストールしていきます。

tarを取ってきて解凍します。

mkdir /databricks/driver/nebula
wget https://oss-cdn.nebula-graph.com.cn/package/3.6.0/nebula-graph-3.6.0.el7.x86_64.tar.gz
tar -xvzf nebula-graph-3.6.0.el7.x86_64.tar.gz -C /databricks/driver/nebula

設定ファイルを準備します。デフォルトのものを使います。

cd /databricks/driver/nebula/nebula-graph-3.6.0.el7.x86_64/etc/
cp nebula-metad.conf.default nebula-metad.conf
cp nebula-graphd.conf.default nebula-graphd.conf
cp nebula-storaged.conf.default nebula-storaged.conf

サービスを起動します。

/databricks/driver/nebula/nebula-graph-3.6.0.el7.x86_64/scripts/nebula.service start all

[INFO] Starting nebula-metad...
[INFO] Done
[INFO] Starting nebula-graphd...
[INFO] Done
[INFO] Starting nebula-storaged...
[INFO] Done

nebulaコンソールをインストールします。

cd /databricks/driver/nebula/nebula-graph-3.6.0.el7.x86_64/bin
wget https://github.com/vesoft-inc/nebula-console/releases/download/v3.6.0/nebula-console-linux-amd64-v3.6.0

mv nebula-console-linux-amd64-v3.6.0 nebula-console
chmod 111 nebula-console

コンソールを起動します。

/databricks/driver/nebula/nebula-graph-3.6.0.el7.x86_64/bin/nebula-console -port 9669 -u root -p nebula

プロンプトが表示されます。

Welcome!

(root@nebula) [(none)]>

ここまでのコマンドをinitスクリプトにするとクラスター起動時に自動でnebulaが起動するようになります。以下のようなinitスクリプトでいけます。

nebula_install.sh

mkdir /databricks/driver/nebula
wget https://oss-cdn.nebula-graph.com.cn/package/3.6.0/nebula-graph-3.6.0.el7.x86_64.tar.gz
tar -xvzf nebula-graph-3.6.0.el7.x86_64.tar.gz -C /databricks/driver/nebula

cd /databricks/driver/nebula/nebula-graph-3.6.0.el7.x86_64/etc/
cp nebula-metad.conf.default nebula-metad.conf
cp nebula-graphd.conf.default nebula-graphd.conf
cp nebula-storaged.conf.default nebula-storaged.conf

/databricks/driver/nebula/nebula-graph-3.6.0.el7.x86_64/scripts/nebula.service start all

cd /databricks/driver/nebula/nebula-graph-3.6.0.el7.x86_64/bin
wget https://github.com/vesoft-inc/nebula-console/releases/download/v3.6.0/nebula-console-linux-amd64-v3.6.0

mv nebula-console-linux-amd64-v3.6.0 nebula-console
chmod 111 nebula-console

こちらの手順に従って動作確認していきます。

上のプロンプトにコマンドを入力していきます。最初にCREATE SPACEをしろと書いてあったので実行したらエラーになりました。FAQによると、ホストを追加する必要があるとのこと。

(root@nebula) [(none)]> CREATE SPACE llamaindex(vid_type=FIXED_STRING(256), partition_num=1, replica_factor=1);
[ERROR (-1005)]: Host not enough!

ADD HOSTSコマンドでホストを追加します。

Mon, 22 Jan 2024 10:08:09 UTC

(root@nebula) [(none)]> add hosts 127.0.0.1:9779
Execution succeeded (time spent 1.064ms/1.423862ms)

Mon, 22 Jan 2024 10:09:07 UTC

(root@nebula) [(none)]> show hosts
+-------------+------+-----------+--------------+----------------------+------------------------+---------+
| Host        | Port | Status    | Leader count | Leader distribution  | Partition distribution | Version |
+-------------+------+-----------+--------------+----------------------+------------------------+---------+
| "127.0.0.1" | 9779 | "OFFLINE" | 0            | "No valid partition" | "No valid partition"   |         |
+-------------+------+-----------+--------------+----------------------+------------------------+---------+
Got 1 rows (time spent 754µs/1.250391ms)

追加されましたので、再度SPACEを作成します。

(root@nebula) [(none)]> CREATE SPACE llamaindex(vid_type=FIXED_STRING(256), partition_num=1, replica_factor=1);
Execution succeeded (time spent 1.153ms/1.459396ms)

その他のコマンドも動きました。

(root@nebula) [(none)]> USE llamaindex;
Execution succeeded (time spent 883µs/1.227876ms)

Mon, 22 Jan 2024 10:09:53 UTC

(root@nebula) [llamaindex]> CREATE TAG entity(name string);
Execution succeeded (time spent 1.136ms/1.476564ms)

Mon, 22 Jan 2024 10:10:00 UTC

(root@nebula) [llamaindex]> CREATE EDGE relationship(relationship string);
Execution succeeded (time spent 1.078ms/1.411607ms)

Mon, 22 Jan 2024 10:10:07 UTC

(root@nebula) [llamaindex]> CREATE TAG INDEX entity_index ON entity(name(256));
Execution succeeded (time spent 1.118ms/1.484611ms)

Mon, 22 Jan 2024 10:10:17 UTC

ノートブックからのアクセス

ノートブックを作成してこちらのコマンドを実行していきます。

%pip install llama-index
%pip install ipython-ngql nebula3-python
dbutils.library.restartPython()

ちなみに、OpenAIではtext-davinci-002が非推奨になってました。gpt-3.5-turbo-instructに切り替えてます。

import os

os.environ["OPENAI_API_KEY"] = dbutils.secrets.get("demo-token-takaaki.yayoi", "openai_api_key")

import logging
import sys

logging.basicConfig(
    stream=sys.stdout, level=logging.INFO
)  # logging.DEBUG for more verbose output

from llama_index import (
    KnowledgeGraphIndex,
    ServiceContext,
    SimpleDirectoryReader,
)
from llama_index.storage.storage_context import StorageContext
from llama_index.graph_stores import NebulaGraphStore
from llama_index.llms import OpenAI

from IPython.display import Markdown, display


# define LLM
# NOTE: at the time of demo, text-davinci-002 did not have rate-limit errors
llm = OpenAI(temperature=0, model="gpt-3.5-turbo-instruct")
service_context = ServiceContext.from_defaults(llm=llm, chunk_size_limit=512)

os.environ["NEBULA_USER"] = "root"
os.environ["NEBULA_PASSWORD"] = "nebula"  # default is "nebula"
os.environ[
    "NEBULA_ADDRESS"
] = "127.0.0.1:9669"  # assumed we have NebulaGraph installed locally

space_name = "llamaindex"
edge_types, rel_prop_names = ["relationship"], [
    "relationship"
]  # default, could be omit if create from an empty kg
tags = ["entity"]  # default, could be omit if create from an empty kg

グラフストアを作成して、ストレージコンテキストで指定します。

graph_store = NebulaGraphStore(
    space_name=space_name,
    edge_types=edge_types,
    rel_prop_names=rel_prop_names,
    tags=tags,
)
storage_context = StorageContext.from_defaults(graph_store=graph_store)

Wikiのドキュメントをロードします。

from llama_index import download_loader

WikipediaReader = download_loader("WikipediaReader")

loader = WikipediaReader()

documents = loader.load_data(
    pages=["Guardians of the Galaxy Vol. 3"], auto_suggest=False
)

ドキュメントからナレッジインデックスを作成します。

kg_index = KnowledgeGraphIndex.from_documents(
    documents,
    storage_context=storage_context,
    max_triplets_per_chunk=10,
    service_context=service_context,
    space_name=space_name,
    edge_types=edge_types,
    rel_prop_names=rel_prop_names,
    tags=tags,
    include_embeddings=True,
)

%pip install ipython-ngql networkx pyvis
%load_ext ngql
%ngql --address 127.0.0.1 --port 9669 --user root --password nebula

ノートブック上にスペースが表示されました。接続できていますね。

まだ、文法理解していませんがグラフストアに問い合わせます。

# Query some random Relationships with Cypher
%ngql USE llamaindex;
%ngql MATCH ()-[e]->() RETURN e LIMIT 10

読み方分かりませんが、結果が返ってきています。

グラフの可視化もできるようですが、ノートブック上にはレンダリングされませんでした。

# draw the result

%ng_draw

代わりにnebulagraph.htmlが生成されています。これをエクスポートすると可視化の結果を確認できました。

これはノートブックにレンダリングさせたいところです。

そして、クエリーエンジンを使って問い合わせを行います。

from llama_index.query_engine import KnowledgeGraphQueryEngine

from llama_index.storage.storage_context import StorageContext
from llama_index.graph_stores import NebulaGraphStore

query_engine = KnowledgeGraphQueryEngine(
    storage_context=storage_context,
    service_context=service_context,
    llm=llm,
    verbose=True,
)

response = query_engine.query(
    "Tell me about Peter Quill?",
)
display(Markdown(f"<b>{response}</b>"))

回答いただきました！

今日はとりあえずここまで。ベクトルストアとの良し悪しもきちんと理解したいところです。

はじめてのDatabricks

Databricks無料トライアル

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up