2024/9/24に正式リリースされたPostgreSQL@17にベクトル拡張をインストールし、動作を確認します。*1のWSL編です。
環境:Windows 11 Home, WSL: Ubuntu 24.04
WSLにPostgreSQL@17をインストール
*2に従って:
sudo /usr/share/postgresql-common/pgdg/apt.postgresql.org.sh
# あるいは
sudo apt install curl ca-certificates
sudo install -d /usr/share/postgresql-common/pgdg
sudo curl -o /usr/share/postgresql-common/pgdg/apt.postgresql.org.asc --fail https://www.postgresql.org/media/keys/ACCC4CF8.asc
sudo sh -c 'echo "deb [signed-by=/usr/share/postgresql-common/pgdg/apt.postgresql.org.asc] https://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
#
sudo apt update # ここでエラーが発生した場合、下記の[Error1]を参照
sudo apt -y install postgresql-17
postgresql17の起動と設定
sudo -i -u postgres
Welcome to Ubuntu 24.04.1 LTS (GNU/Linux 5.15.153.1-microsoft-standard-WSL2 x86_64) ...
postgres@PavilionWin11:~$ psql
psql (17.0 (Ubuntu 17.0-1.pgdg24.04+1))
Type "help" for help.
# user postgresのパスワードを適当に設定する("postgres"に設定)
postgres=# \password postgres
Enter new password for user "postgres":
Enter it again:
postgres=# CREATE DATABASE testdb;
CREATE DATABASE
postgres=# \c testdb;
You are now connected to database "testdb" as user "postgres".
testdb=# create extension vector;
ERROR: extension "vector" is not available
DETAIL: Could not open extension control file "/usr/share/postgresql/17/extension/vector.control": No such file or directory.
HINT: The extension must first be installed on the system where PostgreSQL is running.
testdb=# \q
上記のエラーに対処する
pgvectorのインストール
macOSと違ってmakeが不要なので非常に楽です
postgres@PavilionWin11:~$ su - your_user_name
Password:
sudo apt install postgresql-17-pgvector
Reading package lists... Done
...
Removing obsolete dictionary files:
sudo -i -u postgres
postgres@PavilionWin11:~$ psql
psql (17.0 (Ubuntu 17.0-1.pgdg24.04+1))
postgres=# \c testdb
You are now connected to database "testdb" as user "postgres".
testdb=# create extension vector;
CREATE EXTENSION # 無事成功
testdb=# \dt
Did not find any relations.
testdb=# \dx
List of installed extensions
Name | Version | Schema | Description
---------+---------+------------+------------------------------------------------------
plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
vector | 0.7.4 | public | vector data type and ivfflat and hnsw access methods
(2 rows)
とベクトル拡張が確認できる。
以下で、*1に習って、dbへの書き込みとその確認を行います。
予めollamaのインストールとembedding モデル"bge-m3"をollama pullしておきます(モデルは適当ですし、ollamaを使わずにOpenAIのものでも、もちろん結構です。その場合は適宜変更ください。私は貧しいのでLLMアプリにはローカルモデル一本鎗です :-) )
以下は、VScodeのJupyter notebookで確認してます。
データベースへの書き込み
psycopg3の名前はpsycopgなので注意ください
!pip install langchain-postgres psycopg langchain-ollama langchain
from langchain_postgres import PGVector
from langchain_postgres.vectorstores import PGVector
from langchain_core.documents import Document
from langchain_ollama import OllamaEmbeddings
embedding = OllamaEmbeddings(
model="bge-m3"
)
connection = "postgresql+psycopg://postgres:postgres@localhost:5432/testdb"
collection_name = "my_docs"
vectorstore = PGVector(
embeddings=embedding,
collection_name=collection_name,
connection=connection,
use_jsonb=True,
)
# langchainが作成したテーブルの確認
!psql -d testdb -c "\dt"
List of relations
Schema | Name | Type | Owner
--------+-------------------------+-------+-------
public | langchain_pg_collection | table | tn
public | langchain_pg_embedding | table | tn
(2 rows)
!psql -d testdb -c "\d langchain_pg_embedding"
Table "public.langchain_pg_embedding"
Column | Type | Collation | Nullable | Default
---------------+-------------------+-----------+----------+---------
id | character varying | | not null |
collection_id | uuid | | |
embedding | vector | | |
document | character varying | | |
cmetadata | jsonb | | |
...
# データは、*3のものでid:10を修正、id:11を追加した
docs = [
Document(page_content='there are cats in the pond', metadata={"id": 1, "location": "pond", "topic": "animals"}),
Document(page_content='ducks are also found in the pond', metadata={"id": 2, "location": "pond", "topic": "animals"}),
Document(page_content='fresh apples are available at the market', metadata={"id": 3, "location": "market", "topic": "food"}),
Document(page_content='the market also sells fresh oranges', metadata={"id": 4, "location": "market", "topic": "food"}),
Document(page_content='the new art exhibit is fascinating', metadata={"id": 5, "location": "museum", "topic": "art"}),
Document(page_content='a sculpture exhibit is also at the museum', metadata={"id": 6, "location": "museum", "topic": "art"}),
Document(page_content='a new coffee shop opened on Main Street', metadata={"id": 7, "location": "Main Street", "topic": "food"}),
Document(page_content='the book club meets at the library', metadata={"id": 8, "location": "library", "topic": "reading"}),
Document(page_content='the library hosts a weekly story time for kids', metadata={"id": 9, "location": "library", "topic": "reading"}),
Document(page_content='there are tigers in the yard', metadata={"id": 10, "location": "zoo", "topic": "animals"}),
Document(page_content='there are dogs in the backyard', metadata={"id": 11, "location": "my home", "topic": "animals"})
]
# dbに書き込む
vectorstore.add_documents(docs, ids=[doc.metadata['id'] for doc in docs])
# オマケ
results = vectorstore.similarity_search_with_score(query="lion",k=5)
for doc, score in results:
print(f"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]")
* [SIM=0.457508] there are tigers in the yard [{'id': 10, 'topic': 'animals', 'location': 'zoo'}]
* [SIM=0.494071] there are dogs in the backyard [{'id': 11, 'topic': 'animals', 'location': 'my home'}]
* [SIM=0.540048] ducks are also found in the pond [{'id': 2, 'topic': 'animals', 'location': 'pond'}]
* [SIM=0.541976] there are cats in the pond [{'id': 1, 'topic': 'animals', 'location': 'pond'}]
* [SIM=0.557055] the book club meets at the library [{'id': 8, 'topic': 'reading', 'location': 'library'}]
データベースからデータを取得する
メモリクリアのために、jupyterカーネルの再起動を行なって:
!pip install psycopg
import psycopg
conn = psycopg.connect("dbname=testdb user=postgres password=postgres")
# ここでエラーが発生した場合、下記の[Error2]を参照
cur = conn.cursor()
cur.execute('select * from langchain_pg_embedding')
for row in cur:
formatted_output = f"id: {row[0]}\n" \
f"uuid: {row[1]}\n" \
f"page_content: {row[2][:100]}...\n" \
f"page_content(string): {row[3]}\n" \
f"metadata: {row[4]}\n"
print(formatted_output)
cur.close()
conn.close()
id: 1
uuid: 46dd887b-7d08-43f5-b89f-6e6650b8594c
page_content: [-0.041045193,0.009569716,-0.093480445,0.01990515,0.00062993786,-0.057626557,-0.02735376,-0.01263911...
page_content(string): there are cats in the pond
metadata: {'id': 1, 'topic': 'animals', 'location': 'pond'}
id: 2
uuid: 46dd887b-7d08-43f5-b89f-6e6650b8594c
page_content: [-0.0365837,0.0019632874,-0.0848764,-0.007010041,-0.028483586,-0.027577631,-1.8776509e-05,-0.0053111...
page_content(string): ducks are also found in the pond
metadata: {'id': 2, 'topic': 'animals', 'location': 'pond'}
id: 3
uuid: 46dd887b-7d08-43f5-b89f-6e6650b8594c
page_content: [0.02102992,0.006635084,-0.05879326,-0.004522216,-0.0065848893,-0.03556204,0.009028944,0.03192697,0....
page_content(string): fresh apples are available at the market
metadata: {'id': 3, 'topic': 'food', 'location': 'market'}
id: 4
uuid: 46dd887b-7d08-43f5-b89f-6e6650b8594c
page_content: [-0.021443207,0.001050144,-0.07157226,0.009703566,9.951767e-05,0.0027933442,-0.01428185,0.008426342,...
page_content(string): the market also sells fresh oranges
metadata: {'id': 4, 'topic': 'food', 'location': 'market'}
id: 5
...
page_content: [-0.04939314,0.0038462288,-0.08330972,0.017653793,-0.023387564,0.011244474,0.02507997,0.012847753,-0...
page_content(string): there are dogs in the backyard
metadata: {'id': 11, 'topic': 'animals', 'location': 'my home'}
Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...
エラーへの対処:
[Error1]:エラーが
"E: Conflicting values set for option Signed-By regarding source https://apt.postgresql.org/pub/repos/apt/ noble-pgdg: /usr/share/postgresql-common/pgdg/apt.postgresql.org.asc != /usr/share/postgresql-common/pgdg/apt.postgresql.org.gpg
E: The list of sources could not be read."
のような場合の解決方法(*4による):
cd /etc/apt/sources.list.d
sudo rm pgdg.sources
sudo rm pgdg.list
cd
sudo apt update
OperationalError: connection failed: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: FATAL: Peer authentication failed for user
である場合は、
/etc/postgresql/17/main/pg_hba.confの以下の1行を下記のように書き換える:
local all postgres peer -->
local all postgres md5
そして、postgresqlをリブートする
参考情報:
*1: Postgresql@17でベクトル保存を確認する(Mac)https://qiita.com/tnagata/items/1b7d7fe3a54d4e82e468
*2: https://www.postgresql.org/download/linux/ubuntu/
*3: https://github.com/langchain-ai/langchain-postgres/blob/main/examples/vectorstore.ipynb
*4: https://dev.to/motouom_victor/apt-error-unable-to-use-apt-on-ubuntu-3i4p