0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

Postgresql@17でベクトル保存を確認する(WSL)

Posted at

2024/9/24に正式リリースされたPostgreSQL@17にベクトル拡張をインストールし、動作を確認します。*1のWSL編です。
環境:Windows 11 Home, WSL: Ubuntu 24.04

WSLにPostgreSQL@17をインストール

*2に従って:

sudo /usr/share/postgresql-common/pgdg/apt.postgresql.org.sh
# あるいは
sudo apt install curl ca-certificates
sudo install -d /usr/share/postgresql-common/pgdg
sudo curl -o /usr/share/postgresql-common/pgdg/apt.postgresql.org.asc --fail https://www.postgresql.org/media/keys/ACCC4CF8.asc
sudo sh -c 'echo "deb [signed-by=/usr/share/postgresql-common/pgdg/apt.postgresql.org.asc] https://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'

# 
sudo apt update  # ここでエラーが発生した場合、下記の[Error1]を参照
sudo apt -y install postgresql-17

postgresql17の起動と設定

sudo -i -u postgres
Welcome to Ubuntu 24.04.1 LTS (GNU/Linux 5.15.153.1-microsoft-standard-WSL2 x86_64) ...
postgres@PavilionWin11:~$ psql
psql (17.0 (Ubuntu 17.0-1.pgdg24.04+1))
Type "help" for help.
# user postgresのパスワードを適当に設定する("postgres"に設定)
postgres=# \password postgres
Enter new password for user "postgres":
Enter it again:

postgres=# CREATE DATABASE testdb;
CREATE DATABASE
postgres=# \c testdb;
You are now connected to database "testdb" as user "postgres".
testdb=# create extension vector;
ERROR:  extension "vector" is not available
DETAIL:  Could not open extension control file "/usr/share/postgresql/17/extension/vector.control": No such file or directory.
HINT:  The extension must first be installed on the system where PostgreSQL is running.
testdb=# \q

上記のエラーに対処する

pgvectorのインストール

macOSと違ってmakeが不要なので非常に楽です

postgres@PavilionWin11:~$ su - your_user_name
Password:
sudo apt install postgresql-17-pgvector
Reading package lists... Done
...
Removing obsolete dictionary files:

sudo -i -u postgres
postgres@PavilionWin11:~$ psql
psql (17.0 (Ubuntu 17.0-1.pgdg24.04+1))
postgres=# \c testdb
You are now connected to database "testdb" as user "postgres".
testdb=# create extension vector;
CREATE EXTENSION  # 無事成功
testdb=# \dt
Did not find any relations.
testdb=# \dx
                             List of installed extensions
  Name   | Version |   Schema   |                     Description
---------+---------+------------+------------------------------------------------------
 plpgsql | 1.0     | pg_catalog | PL/pgSQL procedural language
 vector  | 0.7.4   | public     | vector data type and ivfflat and hnsw access methods
(2 rows)

とベクトル拡張が確認できる。

以下で、*1に習って、dbへの書き込みとその確認を行います。
予めollamaのインストールとembedding モデル"bge-m3"をollama pullしておきます(モデルは適当ですし、ollamaを使わずにOpenAIのものでも、もちろん結構です。その場合は適宜変更ください。私は貧しいのでLLMアプリにはローカルモデル一本鎗です :-) )
以下は、VScodeのJupyter notebookで確認してます。

データベースへの書き込み

psycopg3の名前はpsycopgなので注意ください

!pip install langchain-postgres psycopg langchain-ollama langchain

from langchain_postgres import PGVector
from langchain_postgres.vectorstores import PGVector
from langchain_core.documents import Document
from langchain_ollama import OllamaEmbeddings

embedding = OllamaEmbeddings(
   model="bge-m3"
)

connection = "postgresql+psycopg://postgres:postgres@localhost:5432/testdb" 
collection_name = "my_docs"

vectorstore = PGVector(
   embeddings=embedding,
   collection_name=collection_name,
   connection=connection,
   use_jsonb=True,
)

# langchainが作成したテーブルの確認
!psql -d testdb -c "\dt"
               List of relations
Schema |          Name           | Type  | Owner 
--------+-------------------------+-------+-------
public | langchain_pg_collection | table | tn
public | langchain_pg_embedding  | table | tn
(2 rows)
!psql -d testdb -c "\d langchain_pg_embedding"               

Table "public.langchain_pg_embedding"
   Column     |       Type        | Collation | Nullable | Default 
---------------+-------------------+-----------+----------+---------
id            | character varying |           | not null | 
collection_id | uuid              |           |          | 
embedding     | vector            |           |          | 
document      | character varying |           |          | 
cmetadata     | jsonb             |           |          | 
...
# データは、*3のものでid:10を修正、id:11を追加した
docs = [
   Document(page_content='there are cats in the pond', metadata={"id": 1, "location": "pond", "topic": "animals"}),
   Document(page_content='ducks are also found in the pond', metadata={"id": 2, "location": "pond", "topic": "animals"}),
   Document(page_content='fresh apples are available at the market', metadata={"id": 3, "location": "market", "topic": "food"}),
   Document(page_content='the market also sells fresh oranges', metadata={"id": 4, "location": "market", "topic": "food"}),
   Document(page_content='the new art exhibit is fascinating', metadata={"id": 5, "location": "museum", "topic": "art"}),
   Document(page_content='a sculpture exhibit is also at the museum', metadata={"id": 6, "location": "museum", "topic": "art"}),
   Document(page_content='a new coffee shop opened on Main Street', metadata={"id": 7, "location": "Main Street", "topic": "food"}),
   Document(page_content='the book club meets at the library', metadata={"id": 8, "location": "library", "topic": "reading"}),
   Document(page_content='the library hosts a weekly story time for kids', metadata={"id": 9, "location": "library", "topic": "reading"}),
   Document(page_content='there are tigers in the yard', metadata={"id": 10, "location": "zoo", "topic": "animals"}),
   Document(page_content='there are dogs in the backyard', metadata={"id": 11, "location": "my home", "topic": "animals"})
]

# dbに書き込む
vectorstore.add_documents(docs, ids=[doc.metadata['id'] for doc in docs])

# オマケ
results = vectorstore.similarity_search_with_score(query="lion",k=5)
for doc, score in results:
   print(f"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]")

* [SIM=0.457508] there are tigers in the yard [{'id': 10, 'topic': 'animals', 'location': 'zoo'}]
* [SIM=0.494071] there are dogs in the backyard [{'id': 11, 'topic': 'animals', 'location': 'my home'}]
* [SIM=0.540048] ducks are also found in the pond [{'id': 2, 'topic': 'animals', 'location': 'pond'}]
* [SIM=0.541976] there are cats in the pond [{'id': 1, 'topic': 'animals', 'location': 'pond'}]
* [SIM=0.557055] the book club meets at the library [{'id': 8, 'topic': 'reading', 'location': 'library'}]

データベースからデータを取得する

メモリクリアのために、jupyterカーネルの再起動を行なって:

!pip install psycopg

import psycopg

conn = psycopg.connect("dbname=testdb user=postgres password=postgres") 
# ここでエラーが発生した場合、下記の[Error2]を参照
cur = conn.cursor()
cur.execute('select * from langchain_pg_embedding')
for row in cur:
    formatted_output = f"id: {row[0]}\n" \
                    f"uuid: {row[1]}\n" \
                    f"page_content: {row[2][:100]}...\n" \
                    f"page_content(string): {row[3]}\n" \
                    f"metadata: {row[4]}\n"
    print(formatted_output)
cur.close()
conn.close()

id: 1
uuid: 46dd887b-7d08-43f5-b89f-6e6650b8594c
page_content: [-0.041045193,0.009569716,-0.093480445,0.01990515,0.00062993786,-0.057626557,-0.02735376,-0.01263911...
page_content(string): there are cats in the pond
metadata: {'id': 1, 'topic': 'animals', 'location': 'pond'}

id: 2
uuid: 46dd887b-7d08-43f5-b89f-6e6650b8594c
page_content: [-0.0365837,0.0019632874,-0.0848764,-0.007010041,-0.028483586,-0.027577631,-1.8776509e-05,-0.0053111...
page_content(string): ducks are also found in the pond
metadata: {'id': 2, 'topic': 'animals', 'location': 'pond'}

id: 3
uuid: 46dd887b-7d08-43f5-b89f-6e6650b8594c
page_content: [0.02102992,0.006635084,-0.05879326,-0.004522216,-0.0065848893,-0.03556204,0.009028944,0.03192697,0....
page_content(string): fresh apples are available at the market
metadata: {'id': 3, 'topic': 'food', 'location': 'market'}

id: 4
uuid: 46dd887b-7d08-43f5-b89f-6e6650b8594c
page_content: [-0.021443207,0.001050144,-0.07157226,0.009703566,9.951767e-05,0.0027933442,-0.01428185,0.008426342,...
page_content(string): the market also sells fresh oranges
metadata: {'id': 4, 'topic': 'food', 'location': 'market'}

id: 5
...
page_content: [-0.04939314,0.0038462288,-0.08330972,0.017653793,-0.023387564,0.011244474,0.02507997,0.012847753,-0...
page_content(string): there are dogs in the backyard
metadata: {'id': 11, 'topic': 'animals', 'location': 'my home'}

Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...

エラーへの対処:
[Error1]:エラーが

"E: Conflicting values set for option Signed-By regarding source https://apt.postgresql.org/pub/repos/apt/ noble-pgdg: /usr/share/postgresql-common/pgdg/apt.postgresql.org.asc != /usr/share/postgresql-common/pgdg/apt.postgresql.org.gpg
E: The list of sources could not be read."

のような場合の解決方法(*4による):

cd /etc/apt/sources.list.d
sudo rm pgdg.sources 
sudo rm pgdg.list 
cd
sudo apt update
OperationalError: connection failed: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: FATAL:  Peer authentication failed for user

である場合は、
/etc/postgresql/17/main/pg_hba.confの以下の1行を下記のように書き換える:

local   all             postgres          peer    -->
local   all             postgres          md5

そして、postgresqlをリブートする

参考情報:
*1: Postgresql@17でベクトル保存を確認する(Mac)https://qiita.com/tnagata/items/1b7d7fe3a54d4e82e468
*2: https://www.postgresql.org/download/linux/ubuntu/
*3: https://github.com/langchain-ai/langchain-postgres/blob/main/examples/vectorstore.ipynb
*4: https://dev.to/motouom_victor/apt-error-unable-to-use-apt-on-ubuntu-3i4p

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?