postgresql@17をインストールし、サービスを開始する:
brew services stop postgresql@16
brew install postgresql@17
brew services start postgresql@17
brew services list
Name Status User File
postgresql@14 none tn
postgresql@16 none
postgresql@17 started tn ~/Library/LaunchAgents/homebrew.mxcl.postgresql@17.plist
dbを作成する:
createdb testdb;
psql -U tn -d testdb
WARNING: psql major version 16, server major version 17.
Some psql features might not work.
psqlを再インストールしても16.4のままだった:
~ $brew reinstall libpq
~ $psql --version
psql (PostgreSQL) 16.4
vector拡張のインストール、@17でも予想通りのエラー:
testdb=# create extension vector;
ERROR: extension "vector" is not available
DETAIL: Could not open extension control file "/usr/local/share/postgresql@17/extension/vector.control": No such file or directory.
HINT: The extension must first be installed on the system where PostgreSQL is running.
ベクトル拡張のインストール
始めてインストール場合は、*1と同様に:
git clone --branch v0.7.4 https://github.com/pgvector/pgvector.git
cd pgvector
export PG_CONFIG=/usr/local/opt/postgresql@17/bin/pg_config
make PG_CONFIG=$PG_CONFIG all
make PG_CONFIG=$PG_CONFIG install
今回の筆者のようにmakeが2回目以上の場合:
cd ~/pgvector
export PG_CONFIG=/usr/local/opt/postgresql@17/bin/pg_config
make PG_CONFIG=$PG_CONFIG clean
make PG_CONFIG=$PG_CONFIG all
make PG_CONFIG=$PG_CONFIG install
再度:
testdb=# create extension vector;
CREATE EXTENSION
testdb=# \dx
List of installed extensions
Name | Version | Schema | Description
---------+---------+------------+------------------------------------------------------
plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
vector | 0.7.4 | public | vector data type and ivfflat and hnsw access methods
(2 rows)
とベクトル拡張が確認できる。
以下で、*2に習って、dbへの書き込みとその確認を行います。
予めollamaのインストールとembedding モデル"bge-m3"をollama pullしておきます(モデルは適当ですし、ollamaを使わずにOpenAIのもので、もちろん結構です。その場合は適宜変更ください)。
!pip install langchain-postgres psycopg langchain-ollama langchain
from langchain_postgres import PGVector
from langchain_postgres.vectorstores import PGVector
from langchain_core.documents import Document
from langchain_ollama import OllamaEmbeddings
embedding = OllamaEmbeddings(
model="bge-m3"
)
#パスワードはかけていない
connection = "postgresql+psycopg://tn@localhost:5432/testdb"
collection_name = "my_docs"
vectorstore = PGVector(
embeddings=embedding,
collection_name=collection_name,
connection=connection,
use_jsonb=True,
)
# langchainが作成したテーブルの確認
!psql -d testdb -c "\dt"
List of relations
Schema | Name | Type | Owner
--------+-------------------------+-------+-------
public | langchain_pg_collection | table | tn
public | langchain_pg_embedding | table | tn
(2 rows)
!psql -d testdb -c "\d langchain_pg_embedding"
Table "public.langchain_pg_embedding"
Column | Type | Collation | Nullable | Default
---------------+-------------------+-----------+----------+---------
id | character varying | | not null |
collection_id | uuid | | |
embedding | vector | | |
document | character varying | | |
cmetadata | jsonb | | |
...
# *3のデータで、id:10を修正、id:11を追加した
docs = [
Document(page_content='there are cats in the pond', metadata={"id": 1, "location": "pond", "topic": "animals"}),
Document(page_content='ducks are also found in the pond', metadata={"id": 2, "location": "pond", "topic": "animals"}),
Document(page_content='fresh apples are available at the market', metadata={"id": 3, "location": "market", "topic": "food"}),
Document(page_content='the market also sells fresh oranges', metadata={"id": 4, "location": "market", "topic": "food"}),
Document(page_content='the new art exhibit is fascinating', metadata={"id": 5, "location": "museum", "topic": "art"}),
Document(page_content='a sculpture exhibit is also at the museum', metadata={"id": 6, "location": "museum", "topic": "art"}),
Document(page_content='a new coffee shop opened on Main Street', metadata={"id": 7, "location": "Main Street", "topic": "food"}),
Document(page_content='the book club meets at the library', metadata={"id": 8, "location": "library", "topic": "reading"}),
Document(page_content='the library hosts a weekly story time for kids', metadata={"id": 9, "location": "library", "topic": "reading"}),
Document(page_content='there are tigers in the yard', metadata={"id": 10, "location": "zoo", "topic": "animals"}),
Document(page_content='there are dogs in the backyard', metadata={"id": 11, "location": "my home", "topic": "animals"})
]
# dbに書き込む
vectorstore.add_documents(docs, ids=[doc.metadata['id'] for doc in docs])
# オマケ
results = vectorstore.similarity_search_with_score(query="lion",k=5)
for doc, score in results:
print(f"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]")
* [SIM=0.457508] there are tigers in the yard [{'id': 10, 'topic': 'animals', 'location': 'zoo'}]
* [SIM=0.494071] there are dogs in the backyard [{'id': 11, 'topic': 'animals', 'location': 'my home'}]
* [SIM=0.540048] ducks are also found in the pond [{'id': 2, 'topic': 'animals', 'location': 'pond'}]
* [SIM=0.541976] there are cats in the pond [{'id': 1, 'topic': 'animals', 'location': 'pond'}]
* [SIM=0.557055] the book club meets at the library [{'id': 8, 'topic': 'reading', 'location': 'library'}]
データベースからデータを取得する
メモリクリアのために、jupyterカーネルの再起動を行なって:
!pip install psycopg
import psycopg
conn = psycopg.connect("dbname=testdb user=tn")
cur = conn.cursor()
cur.execute('select * from langchain_pg_embedding')
for row in cur:
formatted_output = f"id: {row[0]}\n" \
f"uuid: {row[1]}\n" \
f"page_content: {row[2][:100]}...\n" \
f"page_content(string): {row[3]}\n" \
f"metadata: {row[4]}\n"
print(formatted_output)
cur.close()
conn.close()
id: 1
uuid: 46dd887b-7d08-43f5-b89f-6e6650b8594c
page_content: [-0.041045193,0.009569716,-0.093480445,0.01990515,0.00062993786,-0.057626557,-0.02735376,-0.01263911...
page_content(string): there are cats in the pond
metadata: {'id': 1, 'topic': 'animals', 'location': 'pond'}
id: 2
uuid: 46dd887b-7d08-43f5-b89f-6e6650b8594c
page_content: [-0.0365837,0.0019632874,-0.0848764,-0.007010041,-0.028483586,-0.027577631,-1.8776509e-05,-0.0053111...
page_content(string): ducks are also found in the pond
metadata: {'id': 2, 'topic': 'animals', 'location': 'pond'}
id: 3
uuid: 46dd887b-7d08-43f5-b89f-6e6650b8594c
page_content: [0.02102992,0.006635084,-0.05879326,-0.004522216,-0.0065848893,-0.03556204,0.009028944,0.03192697,0....
page_content(string): fresh apples are available at the market
metadata: {'id': 3, 'topic': 'food', 'location': 'market'}
id: 4
uuid: 46dd887b-7d08-43f5-b89f-6e6650b8594c
page_content: [-0.021443207,0.001050144,-0.07157226,0.009703566,9.951767e-05,0.0027933442,-0.01428185,0.008426342,...
page_content(string): the market also sells fresh oranges
metadata: {'id': 4, 'topic': 'food', 'location': 'market'}
id: 5
...
page_content: [-0.04939314,0.0038462288,-0.08330972,0.017653793,-0.023387564,0.011244474,0.02507997,0.012847753,-0...
page_content(string): there are dogs in the backyard
metadata: {'id': 11, 'topic': 'animals', 'location': 'my home'}
Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...
ちゃんと呼び出せたので、作成したDBを削除:
!dropdb -f testdb
参考情報:
*1: https://qiita.com/tnagata/items/7e6ae9956bdcaf167d94
*2: https://qiita.com/tnagata/items/c4a08d868b838e3bb3ea
*3: https://github.com/langchain-ai/langchain-postgres/blob/main/examples/vectorstore.ipynb