0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

langchain-postgresとpsycopg3を試してみる

python環境は:

(myenv) ~/myenv $pip -V

pip 24.2 from /Users/tn/myenv/lib/python3.12/site-packages/pip (python 3.12)

(myenv) ~/myenv $python -V

Python 3.12.6

以下のコードは、VScode上で作業しています(ipykernelのインストールと登録などは、*1を参照)。またpostgresqlサーバーにvector拡張をインストールしておく必要があります(postgresql16でvector拡張をインストールする方法は、*2を参照)

まず、起動済みのpostgresqlサーバーにdbを作成:

!createdb sampledb1

psycopg3のインストール:

!pip install psycopg  # psycopg3ではないので注意

vectorストアを初期化(大半のコードとデータを*3から拝借してます。embeddingモデルは適当です。):

from langchain_postgres import PGVector
from langchain_postgres.vectorstores import PGVector
from langchain_core.documents import Document
from langchain_ollama import OllamaEmbeddings

embedding = OllamaEmbeddings(
    model="bge-m3"
)

#パスワードはかけていない
connection = "postgresql+psycopg://tn@localhost:5432/sampledb1" 
collection_name = "my_docs"

vectorstore = PGVector(
    embeddings=embedding,
    collection_name=collection_name,
    connection=connection,
    use_jsonb=True,
)

データを用意する(*3のものを少々追加・変更しています):

docs = [
    Document(page_content='there are cats in the pond', metadata={"id": 1, "location": "pond", "topic": "animals"}),
    Document(page_content='ducks are also found in the pond', metadata={"id": 2, "location": "pond", "topic": "animals"}),
    Document(page_content='fresh apples are available at the market', metadata={"id": 3, "location": "market", "topic": "food"}),
    Document(page_content='the market also sells fresh oranges', metadata={"id": 4, "location": "market", "topic": "food"}),
    Document(page_content='the new art exhibit is fascinating', metadata={"id": 5, "location": "museum", "topic": "art"}),
    Document(page_content='a sculpture exhibit is also at the museum', metadata={"id": 6, "location": "museum", "topic": "art"}),
    Document(page_content='a new coffee shop opened on Main Street', metadata={"id": 7, "location": "Main Street", "topic": "food"}),
    Document(page_content='the book club meets at the library', metadata={"id": 8, "location": "library", "topic": "reading"}),
    Document(page_content='the library hosts a weekly story time for kids', metadata={"id": 9, "location": "library", "topic": "reading"}),
    Document(page_content='there are tigers in the yard', metadata={"id": 10, "location": "zoo", "topic": "animals"}),
    Document(page_content='there are dogs in the backyard', metadata={"id": 11, "location": "my home", "topic": "animals"})
]

dbに書き込む:

vectorstore.add_documents(docs, ids=[doc.metadata['id'] for doc in docs])

オマケですが、similarity_search_with_scoreなどの例:

results = vectorstore.similarity_search_with_score(query="lion",k=5)
for doc, score in results:
    print(f"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]")
  • [SIM=0.457508] there are tigers in the yard [{'id': 10, 'topic': 'animals', 'location': 'zoo'}]
  • [SIM=0.494071] there are dogs in the backyard [{'id': 11, 'topic': 'animals', 'location': 'my home'}]
  • [SIM=0.540048] ducks are also found in the pond [{'id': 2, 'topic': 'animals', 'location': 'pond'}]
  • [SIM=0.541976] there are cats in the pond [{'id': 1, 'topic': 'animals', 'location': 'pond'}]
  • [SIM=0.557055] the book club meets at the library [{'id': 8, 'topic': 'reading', 'location': 'library'}]

"lion"に近い順で並んでいるかは微妙。
filterをかける:

vectorstore.similarity_search('lion', k=5, filter={
    'topic': { "$eq": 'animals'}
})

[Document(id='10', metadata={'id': 10, 'topic': 'animals', 'location': 'zoo'}, page_content='there are tigers in the yard'),
Document(id='11', metadata={'id': 11, 'topic': 'animals', 'location': 'my home'}, page_content='there are dogs in the backyard'),
Document(id='2', metadata={'id': 2, 'topic': 'animals', 'location': 'pond'}, page_content='ducks are also found in the pond'),
Document(id='1', metadata={'id': 1, 'topic': 'animals', 'location': 'pond'}, page_content='there are cats in the pond')]
と当然期待通り4つが表示された。

本題に戻ってdbにテーブルが生成されたことを確認:

(myenv) ~ $psql -h localhost -p 5432 -U tn -d sampledb1

psql (16.4)
Type "help" for help.

langchainが生成したテーブルを表示:

sampledb1=# \dt

あるいは jupyter上で

!psql -d sampledb1 -c "\dt" 

List of relations

Schema Name Type Owner
public langchain_pg_collection table tn
public langchain_pg_embedding table tn
(2 rows)

テーブルの構造を表示:

sampledb1=# \d langchain_pg_embedding

あるいは jupyter上で

!psql -d sampledb1 -c "\d langchain_pg_embedding"

Table "public.langchain_pg_embedding"

Column Type Collation Nullable Default
id character varying not null
collection_id uuid
embedding vector
document character varying
cmetadata jsonb

langchain_pg_embeddingに書き込まれるので、データを取得する:
念の為VScode上のjupyterカーネルの「再起動」ボタンをクリックして、

!pip install psycopg

以下のコードは、*4を参考にしています。

import psycopg

conn = psycopg.connect("dbname=sampledb1 user=tn") #注1
cur = conn.cursor()
cur.execute('select * from langchain_pg_embedding')
for row in cur:
    formatted_output = f"id: {row[0]}\n" \
                    f"uuid: {row[1]}\n" \
                    f"page_content: {row[2][:100]}...\n" \
                    f"page_content(string): {row[3]}\n" \
                    f"metadata: {row[4]}\n"
    print(formatted_output)
    #print(row)
cur.close()
conn.close()

実行すると以下の様に表示される:
id: 1
uuid: 8259217a-01a8-4034-9442-2233ec98c7c7
page_content: [-0.041045193,0.009569716,-0.093480445,0.01990515,0.00062993786,-0.057626557,-0.02735376,-0.01263911...
page_content(string): there are cats in the pond
metadata: {'id': 1, 'topic': 'animals', 'location': 'pond'}

id: 2
uuid: 8259217a-01a8-4034-9442-2233ec98c7c7
page_content: [-0.0365837,0.0019632874,-0.0848764,-0.007010041,-0.028483586,-0.027577631,-1.8776509e-05,-0.0053111...
page_content(string): ducks are also found in the pond
metadata: {'id': 2, 'topic': 'animals', 'location': 'pond'}

id: 3
...

dbをダンプするには:

!pg_dump -U tn -h localhost -p 5432 -d sampledb1 -t langchain_pg_embedding -f langchain_pg_embedding_dump.sql
!cat langchain_pg_embedding_dump.sql

参考情報:
*1: https://qiita.com/tnagata/items/a88febd0f8cea88e1be8
*2: https://qiita.com/tnagata/items/7e6ae9956bdcaf167d94
*3: https://github.com/langchain-ai/langchain-postgres/blob/main/examples/vectorstore.ipynbhttps://api.python.langchain.com/en/latest/vectorstores/langchain_postgres.vectorstores.PGVector.html
*4: https://www.psycopg.org/psycopg3/docs/basic/usage.html

注1:

conn = psycopg.connect(dbname="sampledb1",host="localhost",port=5432,user="tn")

とも書けるが、この時psycopg2のようにdatabese=..とするとエラーになる。dbnameとする必要がある。

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?