More than 1 year has passed since last update.

ベクター検索エンジン Qdrantでセマンティックサーチする

Last updated at 2023-05-11Posted at 2023-05-10

はじめに

ベクター検索エンジンであるQdrantの機能の解説をメモ程度に残す

QdrantとOpenAI APIを利用したレコメンドシステム構築については、以下記事でより詳細を記載している

[記事作成中]

Qdrantとは？

Rust製のベクター検索エンジン
Elasticsearchなどの全文検索エンジンと同じような検索もできるし、ベクトル検索（セマンティックサーチ）もできる

主な使い所

テキストベースのセマンティックサーチや、画像検索などで利用できる

主な用語

Qdrant特有の用語があるのでそれをまとめる

collection

下記pointの集合
Elasticsearchで言うところのインデックス

point

vectorとpayloadを保持したレコード
Elasticsearchで言うところのドキュメント

vector

セマンティックサーチのためのベクトル表現
詳細は、後述しているvector項目を参照

payload

Qdrantの特徴の１つであり、vectorと同時に追加情報を保持できる
フィルタリングする際になど利用できる（例：payloadのデータで絞り込んでからベクトル検索する）

参考：https://qdrant.tech/documentation/payload/

はじめ方

Qdrantにはdocker imageとPythonのQdrant clientをがあるのでそれを使ってサクッとベクトル検索してみる

基本的には、↓のquick startと同じ

開発環境

Mac OS 13.2.1
Python 3.11.2
Dokcer Engine: 20.10.23

dockerでQdrantを動かす

$ docker pull qdrant/qdrant
$ docker run -p 6333:6333 \
    -v $(pwd)/qdrant_storage:/qdrant/storage \
    qdrant/qdrant
 __ _  __| |_ __ __ _ _ __ | |_
 / _` |/ _` | '__/ _` | '_ \| __|
| (_| | (_| | | | (_| | | | | |_
 \__, |\__,_|_|  \__,_|_| |_|\__|
    |_|

Access web UI at https://ui.qdrant.tech/?v=v1.1.3

[2023-05-10T14:52:50.651Z INFO  storage::content_manager::consensus::persistent] Initializing new raft state at ./storage/raft_state
[2023-05-10T14:52:50.702Z INFO  qdrant] Distributed mode disabled
[2023-05-10T14:52:50.702Z INFO  qdrant] Telemetry reporting enabled, id: dff3a929-c0cf-411c-a31c-3a9d80ee9de7
[2023-05-10T14:52:50.703Z INFO  qdrant::tonic] Qdrant gRPC listening on 6334
[2023-05-10T14:52:50.704Z INFO  actix_server::builder] Starting 3 workers
[2023-05-10T14:52:50.704Z INFO  actix_server::server] Actix runtime found; starting in Actix runtime

↑axtix使っているのと、gRPCでも接続できることがわかる

Qdrant clinetをインストール

$ pip install qdrant-client

コレクションを作成する

ベクトルの次元数は4次元で登録する

collection.py

client = QdrantClient(host="http://localhost", port=6333, https=False)

client.recreate_collection(
    vectors_config={
        "description": VectorParams(size=4, distance=Distance.COSINE),
    },
    collection_name= "sample_collection"
)

インデックスする

インデックスするとはまた違うが、上述したpointをcollectionに格納する
4次元のベクトルと共に、payloadも付与する

index.py

client.upsert(
    collection_name="sample_collection",
    wait=True,
    points=[
        PointStruct(id=1, vector=[0.05, 0.61, 0.76, 0.74], payload={"city": "京都"}),
        PointStruct(id=2, vector=[0.19, 0.81, 0.75, 0.11], payload={"city": ["大阪", "奈良"]}),
        PointStruct(id=3, vector=[0.36, 0.55, 0.47, 0.94], payload={"city": ["大阪", "滋賀"]}),
        PointStruct(id=4, vector=[0.18, 0.01, 0.85, 0.80], payload={"city": ["奈良", "兵庫"]}),
        PointStruct(id=5, vector=[0.24, 0.18, 0.22, 0.44], payload={"count": [0]}),
        PointStruct(id=6, vector=[0.35, 0.08, 0.11, 0.44]),
    ]
)

検索する

payloadのcityカラムでのフィルタリングと共に検索する

search.py

search_result = client.search(
    collection_name="sample_collection",
    query_vector=[0.2, 0.1, 0.9, 0.7], 
    query_filter=Filter(
        must=[
            FieldCondition(
                key="city",
                match=MatchValue(value="奈良")
            )
        ]
    ),
    limit=3
)

print(search_result[0])
# ScoredPoint(id=4, score=1.362, ...)

機能詳細

ここからは、主にQdrantの各機能についてまとめている

基本的にはQdrantのドキュメントに記載されていることである

storage

vectorとpayloadはそれぞれ別のstorageで構成されている

Vector storage

RAMとmmapの２つの方法がある

これらは設定ファイルで指定できる

RAM
- RAM上に全て保存されるので高速
memmap
- memory mapped storage
- diskに保存しておいて、ファイルシステムとdiskのストレージのマッピングをすることで、あたかもメモリに置いているかのように高速にdisk内にアクセスできる
collectionを作成するときに、mmapを利用する閾値を設定することができる

↓の場合は、vectorのサイズが20000KB(20MB)を超えた場合にRAMではなく、mmapを利用するようになっている

from qdrant_client import QdrantClient, models

client = QdrantClient("localhost", port=6333)

client.recreate_collection(
    collection_name="{collection_name}",
    vectors_config=models.VectorParams(size=768, distance=models.Distance.COSINE),
    optimizers_config=models.OptimizersConfigDiff(memmap_threshold=20000)
)

payload storage

payloadも保存先がRAMかdiskの二種類がある。

payloadの保存DBには、RocksDBを使っている

RocksDB・・・key valueのDBでアプリケーションに組み込み用途で作成されたもの（Facebookで作成された）

RAM
- vectorと同じでメモリ上に配置する。
- 永続化する時にだけ、RocksDBを利用する
on disk
- メモリに保存せずに、毎回RocksDBに直接書き込み・読み込みをしにいく
- RAMはコストカットできるが、アクセスレイテンシが問題になる
- payload indexを作成することでpayloadの検索が高速化できる
  - 詳細については後述しているindex項目を参照

configuration fileかcollectionを作成するときにパラメータとして指定できる

configuration fileの設定：https://qdrant.tech/documentation/configuration/

client.recreate_collection(
    collection_name="{collection_name}",
    vectors_config=models.VectorParams(size=100, distance=models.Distance.COSINE),
		on_disk_payload=True
)

参考：

Storage - Qdrant

index

vectorとpayload両方にindexを作成することができる（RDBで言うindexを貼る）

payload

mongodb,firestoreなどのドキュメント指向データベースのインデックスと似ている

以下のようにindexを作成できる

client.create_payload_index(collection_name="{collection_name}", 
                            field_name="name_of_the_field_to_index", 
                            field_schema="keyword")

typeは以下が存在する

keyword
interger
float
geo
text
- 全文検索で利用される
- 逆にこのindexがないと全文検索はできない
  - https://qdrant.tech/documentation/filtering/#full-text-match

Indexing - Qdrant

full text index

luceneのようにいわゆる全文検索もできる

このインデックスがないと全文検索できない

from qdrant_client import QdrantClient
from qdrant_client.http import models

client = QdrantClient(host="localhost", port=6333)

client.create_payload_index(
    collection_name="{collection_name}",
    field_name="name_of_the_field_to_index",
    field_schema=models.TextIndexParams(
        type="text",
        tokenizer=models.TokenizerType.WORD,
        min_token_len=2,
        max_token_len=15,
        lowercase=True,
    )
)

vector

ベクトルインデックスは、特定の数学的モデルによってベクトル上に構築されたデータ構造

ベクトルインデックスを通じて、目的のベクトルに類似した複数のベクトルを効率的に照会することができる

HNSWというグラフアルゴリズムが用いられている

HNSW（Hierarchical Navigable Small World Graph）は、グラフベースのインデックス作成アルゴリズムである。特定のルールに従って、画像の多層ナビゲーション構造を構築する。この構造では、上層はより疎であり、ノード間の距離はより遠くなります。下層はより密で、ノード間の距離はより近い。検索は最上層から始まり、その層でターゲットに最も近いノードを見つけ、次の層に入って別の検索を開始する。何度も繰り返すうちに、目標位置に素早く近づくことができます。

HNSWの解説：

clustering

データ量が増えた場合にはシングルノードではパフォーマンスでない、コンピューティングリソース的に問題があるので、クラスタリングを組むのが効果的

ローカルのdockerで実現する場合は、以下のようなdocker composeファイルを用意する

version: '3.9'
services:
  qdrant-primary:
    container_name: qdrant-primary # https://hub.docker.com/r/qdrant/qdrant
    image: qdrant/qdrant
    volumes:
      - ./micce-engine/config/production.yaml:/qdrant/config/production.yaml
    ports:
      - "6333:6333"
      - "6334:6334"
    # --uriでクラスタ自体にリーダーノードのuriを教えてあげる
    # 実際のサーバで構築する場合はそのサーバのURIを指定する
    command: ["./qdrant", "--uri", "http://qdrant-primary:6335"]
  qdrant-secondary:
    image: qdrant/qdrant
    volumes:
      - ./micce-engine/config/production.yaml:/qdrant/config/production.yaml
    # primary以外は、--bootstrapでprimary peerのuriを教えてあげる
    command: ["./qdrant", "--bootstrap", "http://qdrant-primary:6335"]

# primary peerの作成
$ docker compose up qdrant-primary

# secondary peerを２つ作成
$ docker compose up --scale qdrant-secondary=2

# コンテナの確認
$ docker stats
CONTAINER ID   NAME                              CPU %     MEM USAGE / LIMIT     MEM %     NET I/O         BLOCK I/O       PIDS
0fdb8c2276aa   qdrant-primary                    1.44%     12.91MiB / 7.773GiB   0.16%     208kB / 203kB   4.1kB / 127kB   22
90485acc831e   micce-engine-qdrant-secondary-2   1.05%     12.24MiB / 7.773GiB   0.15%     107kB / 102kB   0B / 135kB      22
2ae386e2e8b7   micce-engine-qdrant-secondary-1   1.15%     12.39MiB / 7.773GiB   0.16%     107kB / 102kB   0B / 139kB      22

# clusterの確認
# ３つのノードが接続されていることがわかる
$ curl http://localhost:6333/cluster | jq .
{
  "result": {
    "status": "enabled",
    "peer_id": 17468249985399575000,
    "peers": {
      "10168068004055521811": {
        "uri": "http://172.20.0.3:6335/"
      },
      "7651517115579658003": {
        "uri": "http://172.20.0.4:6335/"
      },
      "17468249985399574590": {
        "uri": "http://qdrant-primary:6335/"
      }
    },
    "raft_info": {
      "term": 1,
      "commit": 5,
      "pending_operations": 0,
      "leader": 17468249985399575000,
      "role": "Leader",
      "is_voter": true
    },
    "consensus_thread_status": {
      "consensus_thread_status": "working",
      "last_update": "2023-04-25T17:32:28.582156625Z"
    },
    "message_send_failures": {}
  },
  "status": "ok",
  "time": 1.234e-05
}

この状態にしてから、インデクシングを実行する

クラスターのエンドポイントに対してcollectionを作成するリクエストを投げる

ノード間の相互接続にはRaftを使っている

シャーディング

We recommend selecting the number of shards as a factor of the number of nodes you are currently running in your cluster. For example, if you have 3 nodes, 6 shards could be a good option.

1ノードに2シャードが推奨されてそう

シャード数はコレクション作成時に指定する

from qdrant_client import QdrantClient
from qdrant_client.http import models

client = QdrantClient("localhost", port=6333)

client.recreate_collection(
    name="{collection_name}",
    vectors_config=models.VectorParams(size=300, distance=models.Distance.COSINE),
    shard_number=6
)

clusterのスケーリング

クラスターに新しいノードを追加したり、稼働中のノードを撤退したい場合にはシャードをノード間で移動させることでダウンタイム無しで拡張できる

POST /collections/{collection_name}/cluster

{
  "move_shard": {
    "shard_id": 0,
    "from_peer_id": 381894127,
    "to_peer_id": 467122995
  }
}

シャードの移動が完了したら対象のノードを削除する

削除完了したらサーバ停止できる

DELETE /cluster/peer/{peer_id}

レプリケーション

レプリケーションもできる

シャードのprimary repliceを作ることで、可用性がアップ

コレクション作成時にレプリケーションの設定を行う

↓の場合は、シャード数が6個で各シャード2つにレプリケーションを行うので合計で12個のシャードが作成されることになる

from qdrant_client import QdrantClient
from qdrant_client.http import models

client = QdrantClient("localhost", port=6333)

client.recreate_collection(
    name="{collection_name}",
    vectors_config=models.VectorParams(size=300, distance=models.Distance.COSINE),
    shard_number=6,
    replication_factor=2, #ここ

)

まとめ

以上でQdrantを動かす上で必要になりそうな機能をまとめた

本番運用するためのクラスター構成周りをクラウド環境にあげて実際に構築してみよう

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up