Postgresql@17でベクトル保存を確認する（Mac）

Posted at 2024-10-01

postgresql@17をインストールし、サービスを開始する：

brew services stop postgresql@16
brew install postgresql@17
brew services start postgresql@17
brew services list

Name          Status  User File
postgresql@14 none    tn
postgresql@16 none
postgresql@17 started tn   ~/Library/LaunchAgents/homebrew.mxcl.postgresql@17.plist

dbを作成する：

createdb testdb;
psql -U tn -d testdb

WARNING: psql major version 16, server major version 17.
         Some psql features might not work.

psqlを再インストールしても16.4のままだった：

~ $brew reinstall libpq
~ $psql --version
psql (PostgreSQL) 16.4

vector拡張のインストール、@17でも予想通りのエラー：

testdb=# create extension vector;

ERROR:  extension "vector" is not available
DETAIL:  Could not open extension control file "/usr/local/share/postgresql@17/extension/vector.control": No such file or directory.
HINT:  The extension must first be installed on the system where PostgreSQL is running.

ベクトル拡張のインストール

始めてインストール場合は、*1と同様に：

git clone --branch v0.7.4 https://github.com/pgvector/pgvector.git
cd pgvector
export PG_CONFIG=/usr/local/opt/postgresql@17/bin/pg_config
make PG_CONFIG=$PG_CONFIG all
make PG_CONFIG=$PG_CONFIG install

今回の筆者のようにmakeが2回目以上の場合：

cd ~/pgvector
export PG_CONFIG=/usr/local/opt/postgresql@17/bin/pg_config
make PG_CONFIG=$PG_CONFIG clean
make PG_CONFIG=$PG_CONFIG all 
make PG_CONFIG=$PG_CONFIG install

再度:

testdb=# create extension vector;
CREATE EXTENSION
testdb=# \dx
List of installed extensions
  Name   | Version |   Schema   |                     Description                      
---------+---------+------------+------------------------------------------------------
 plpgsql | 1.0     | pg_catalog | PL/pgSQL procedural language
 vector  | 0.7.4   | public     | vector data type and ivfflat and hnsw access methods
(2 rows)

とベクトル拡張が確認できる。

以下で、*2に習って、dbへの書き込みとその確認を行います。
予めollamaのインストールとembedding モデル"bge-m3"をollama pullしておきます（モデルは適当ですし、ollamaを使わずにOpenAIのもので、もちろん結構です。その場合は適宜変更ください）。

!pip install langchain-postgres psycopg langchain-ollama langchain

from langchain_postgres import PGVector
from langchain_postgres.vectorstores import PGVector
from langchain_core.documents import Document
from langchain_ollama import OllamaEmbeddings

embedding = OllamaEmbeddings(
   model="bge-m3"
)

#パスワードはかけていない
connection = "postgresql+psycopg://tn@localhost:5432/testdb" 
collection_name = "my_docs"

vectorstore = PGVector(
   embeddings=embedding,
   collection_name=collection_name,
   connection=connection,
   use_jsonb=True,
)

# langchainが作成したテーブルの確認
!psql -d testdb -c "\dt"
               List of relations
Schema |          Name           | Type  | Owner 
--------+-------------------------+-------+-------
public | langchain_pg_collection | table | tn
public | langchain_pg_embedding  | table | tn
(2 rows)
!psql -d testdb -c "\d langchain_pg_embedding"               

Table "public.langchain_pg_embedding"
   Column     |       Type        | Collation | Nullable | Default 
---------------+-------------------+-----------+----------+---------
id            | character varying |           | not null | 
collection_id | uuid              |           |          | 
embedding     | vector            |           |          | 
document      | character varying |           |          | 
cmetadata     | jsonb             |           |          | 
...
# *3のデータで、id:10を修正、id:11を追加した
docs = [
   Document(page_content='there are cats in the pond', metadata={"id": 1, "location": "pond", "topic": "animals"}),
   Document(page_content='ducks are also found in the pond', metadata={"id": 2, "location": "pond", "topic": "animals"}),
   Document(page_content='fresh apples are available at the market', metadata={"id": 3, "location": "market", "topic": "food"}),
   Document(page_content='the market also sells fresh oranges', metadata={"id": 4, "location": "market", "topic": "food"}),
   Document(page_content='the new art exhibit is fascinating', metadata={"id": 5, "location": "museum", "topic": "art"}),
   Document(page_content='a sculpture exhibit is also at the museum', metadata={"id": 6, "location": "museum", "topic": "art"}),
   Document(page_content='a new coffee shop opened on Main Street', metadata={"id": 7, "location": "Main Street", "topic": "food"}),
   Document(page_content='the book club meets at the library', metadata={"id": 8, "location": "library", "topic": "reading"}),
   Document(page_content='the library hosts a weekly story time for kids', metadata={"id": 9, "location": "library", "topic": "reading"}),
   Document(page_content='there are tigers in the yard', metadata={"id": 10, "location": "zoo", "topic": "animals"}),
   Document(page_content='there are dogs in the backyard', metadata={"id": 11, "location": "my home", "topic": "animals"})
]

#　dbに書き込む
vectorstore.add_documents(docs, ids=[doc.metadata['id'] for doc in docs])

# オマケ
results = vectorstore.similarity_search_with_score(query="lion",k=5)
for doc, score in results:
   print(f"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]")

* [SIM=0.457508] there are tigers in the yard [{'id': 10, 'topic': 'animals', 'location': 'zoo'}]
* [SIM=0.494071] there are dogs in the backyard [{'id': 11, 'topic': 'animals', 'location': 'my home'}]
* [SIM=0.540048] ducks are also found in the pond [{'id': 2, 'topic': 'animals', 'location': 'pond'}]
* [SIM=0.541976] there are cats in the pond [{'id': 1, 'topic': 'animals', 'location': 'pond'}]
* [SIM=0.557055] the book club meets at the library [{'id': 8, 'topic': 'reading', 'location': 'library'}]

データベースからデータを取得する

メモリクリアのために、jupyterカーネルの再起動を行なって:

!pip install psycopg
import psycopg

conn = psycopg.connect("dbname=testdb user=tn")
cur = conn.cursor()
cur.execute('select * from langchain_pg_embedding')
for row in cur:
    formatted_output = f"id: {row[0]}\n" \
                    f"uuid: {row[1]}\n" \
                    f"page_content: {row[2][:100]}...\n" \
                    f"page_content(string): {row[3]}\n" \
                    f"metadata: {row[4]}\n"
    print(formatted_output)
cur.close()
conn.close()

id: 1
uuid: 46dd887b-7d08-43f5-b89f-6e6650b8594c
page_content: [-0.041045193,0.009569716,-0.093480445,0.01990515,0.00062993786,-0.057626557,-0.02735376,-0.01263911...
page_content(string): there are cats in the pond
metadata: {'id': 1, 'topic': 'animals', 'location': 'pond'}

id: 2
uuid: 46dd887b-7d08-43f5-b89f-6e6650b8594c
page_content: [-0.0365837,0.0019632874,-0.0848764,-0.007010041,-0.028483586,-0.027577631,-1.8776509e-05,-0.0053111...
page_content(string): ducks are also found in the pond
metadata: {'id': 2, 'topic': 'animals', 'location': 'pond'}

id: 3
uuid: 46dd887b-7d08-43f5-b89f-6e6650b8594c
page_content: [0.02102992,0.006635084,-0.05879326,-0.004522216,-0.0065848893,-0.03556204,0.009028944,0.03192697,0....
page_content(string): fresh apples are available at the market
metadata: {'id': 3, 'topic': 'food', 'location': 'market'}

id: 4
uuid: 46dd887b-7d08-43f5-b89f-6e6650b8594c
page_content: [-0.021443207,0.001050144,-0.07157226,0.009703566,9.951767e-05,0.0027933442,-0.01428185,0.008426342,...
page_content(string): the market also sells fresh oranges
metadata: {'id': 4, 'topic': 'food', 'location': 'market'}

id: 5
...
page_content: [-0.04939314,0.0038462288,-0.08330972,0.017653793,-0.023387564,0.011244474,0.02507997,0.012847753,-0...
page_content(string): there are dogs in the backyard
metadata: {'id': 11, 'topic': 'animals', 'location': 'my home'}

Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...

ちゃんと呼び出せたので、作成したDBを削除:

!dropdb -f testdb

参考情報：
*1: https://qiita.com/tnagata/items/7e6ae9956bdcaf167d94
*2: https://qiita.com/tnagata/items/c4a08d868b838e3bb3ea
*3: https://github.com/langchain-ai/langchain-postgres/blob/main/examples/vectorstore.ipynb

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up