RDKit database cartridgeを用いた化合物データベースのORMとしてSQLModelを使用する方法

Last updated at 2025-05-11Posted at 2024-09-17

はじめに

化合物データベースをRDKit database cartridge(以下、RDKitカートリッジ)で構築する際に、ORMとしてSQLModelを使用するサンプルコードを公開しました。

動機

RDKitカートリッジをPython Web フレームワークと統合する際、Djangoでdjango-rdkitを使うケースが多いのではないでしょうか。
このライブラリは公式ドキュメントでも紹介されており、選択肢の一つとして有力です。

しかし、特定のライブラリに依存することでフレームワーク選択の幅が狭まることは避けたいと考え、FastAPIが推奨するSQLModel(ひいてはSQLAlchemy)とRDKitカートリッジの統合方法を検討しました。

既存の手法

調査の結果、唯一使えそうなライブラリとしてraziが見つかりました。
razi はSQLAlchemyでRDKitカートリッジを扱うライブラリであるため、SQLAlchemyベースのSQLModelでも動作することを期待しました。
しかし、部分的には機能するもののメンテナンスが十分でなく、いくつかの機能不足や不具合、ドキュメントの不備があり、採用には至りませんでした。
(一部をオーバーライドし無理やり動作させることも可能でしたが、保守性が損なわれますし、成熟していないライブラリを採用するリスクも懸念されました。)

実行環境

PostgreSQL: 16.2
RDKit: 2024.03.1
Python: 3.10
SQLModel: 0.0.22

実装方法

SQLAlchemy の UserDefinedType や GenericFunction でRDKitカートリッジに対応する型や関数を定義します。

実装にあたっては razi や django-rdkit (models/fields.py)を参考にしました。

Mol型

ここではRDKitの分子に対応する mol 型に関して、登録・取得・比較処理に関連する実装を抜粋して紹介します(詳細はmodels.pyを参照)。

class Mol(UserDefinedType):
    cache_ok = True

登録

get_col_spec : カラムの型として mol を指定
bind_processor : RDKitの Chem.Mol オブジェクトやSMILES文字列をバイナリ形式に変換
bind_expression : mol_from_pkl 関数を使用して分子データをバイナリに変換(例： INSERT INTO ... VALUES mol_from_pkl(...) )

class Mol(UserDefinedType):
    #...
    def get_col_spec(self, **kw):
        return "mol"

    def bind_processor(self, dialect):
        def process(value):
            if isinstance(value, Chem.Mol):
                value = memoryview(value.ToBinary())
            elif isinstance(value, str):
                value = memoryview(Chem.MolFromSmiles(value).ToBinary())
            return value

        return process

    def bind_expression(self, bindvalue):
        return mol_from_pkl(bindvalue)

class mol_from_pkl(GenericFunction):
    name = "mol_from_pkl"
    type = Mol()

取得

column_expression : mol_to_pkl 関数を使用して、データベースから分子データをバイナリ形式で取得(例： SELECT mol_to_pkl(molecule))
result_processor : 取得したバイナリデータをRDKitの Chem.Mol オブジェクトに変換

class Mol(UserDefinedType):
    #...
    def column_expression(self, colexpr):
        return mol_to_pkl(colexpr, type_=self)

    def result_processor(self, dialect, coltype):
        def process(value):
            if value is None:
                return value
            return Chem.Mol(bytes(value))

        return process

class mol_to_pkl(GenericFunction):
    name = "mol_to_pkl"
    type = postgresql.BYTEA()

比較

comparator_factory : カスタムオペレータ(例：部分構造検索の @> や <@ )を使用した比較処理の実装

class Mol(UserDefinedType):
    #...
    class comparator_factory(UserDefinedType.Comparator):
        def hassubstruct(self, other):
            return self.operate(
                operators.custom_op("@>"), other, result_type=sqltypes.Boolean
            )

        def issubstruct(self, other):
            return self.operate(
                operators.custom_op("<@"), other, result_type=sqltypes.Boolean
            )

        def __eq__(self, other):
            return self.operate(
                operators.custom_op("@="), other, result_type=sqltypes.Boolean
            )

モデル

Mol 型のフィールドを含むモデルを定義します。

class Compound(SQLModel, table=True):
    id: int | None = Field(default=None, primary_key=True)
    molecule: Mol = Field(sa_type=Mol)

使用例

以下、READMEから手順を抜粋して簡単に動作の様子を示します。

from sqlmodel import create_engine, select, Session

from models import Compound, morganbv_fp
from molecules import SMILES_SAMPLE

engine = create_engine(
    "postgresql+psycopg://postgres:mysecretpassword@localhost:5432/postgres"
)
session = Session(engine)

# テーブルの作成
Compound.__table__.create(engine)

# テストデータの挿入
for i, smiles in enumerate(SMILES_SAMPLE):
    compound = Compound(
        name=f"Compound {i}",
        molecule=smiles,
        mfp2=morganbv_fp(smiles)
    )
    session.add(compound)
session.commit()

# 部分一致検索
statement = select(Compound).where(Compound.molecule.hassubstruct("C1=C(C)C=CC=C1"))
session.exec(statement).all()

# 類似構造検索
smiles = "CCN1c2ccccc2Sc2ccccc21"
statement = select(Compound).where(Compound.mfp2.tanimoto_sml(morganbv_fp(smiles)))
session.exec(statement).all()

さいごに

本記事ではRDKitカートリッジをSQLModelと統合する方法を紹介しました。
これにより、RDKitカートリッジを扱う際にDjangoだけでなく、FastAPIや他のフレームワークも選択肢に加えることができ、より柔軟な開発が可能となります。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up