More than 1 year has passed since last update.

DiffDockインストール

Posted at 2022-10-25

Diffusion modelに基づくドッキングツール。タンパク質構造に対するリガンドのドッキングの成功率を、それまでの過去最高の23％から一気に38％まで高めたというもの（雑な説明）。

元論文：https://arxiv.org/abs/2210.01776
GitHub：https://github.com/gcorso/DiffDock

マシン環境

Ubuntu 22.04.1 + NVIDIA GPU 3070Ti。またはCentOS 7で確認。
root権限必要なし。Minicondaを使用。

インストール方法

TerrariさんのDiffDock実行コンテナ作成の記事を参考にしつつ、Singularityなしでインストールすることを試みます。公式のやり方だとなんかエラーが出て動かないことがあるので、それを補足する形にもなっています。

## インストール先のディレクトリを適当に指定
INSTALLDIR=/home/apps/

## あとは以下のコマンドをコピペすればOK
# INSTALLDIR以下にDiffDockリポジトリをGitHubからダウンロード
mkdir -p ${INSTALLDIR}
cd ${INSTALLDIR}
git clone https://github.com/gcorso/DiffDock.git
cd DiffDock
# minicondaでDiffDockのための仮想環境を作成
DIFFDOCK_HOME="${INSTALLDIR}/DiffDock"
cd ${DIFFDOCK_HOME}
wget -q -P . https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash ./Miniconda3-latest-Linux-x86_64.sh -b -p ${DIFFDOCK_HOME}/conda
rm Miniconda3-latest-Linux-x86_64.sh
. "${DIFFDOCK_HOME}/conda/etc/profile.d/conda.sh"
export PATH="${DIFFDOCK_HOME}/conda/condabin:${PATH}"
conda update -n base -c defaults conda
conda env create

# 以降の処理はdiffdockの仮想環境を起動した状態で実行する
conda activate diffdock

# GitHubから/usr/local/apps/DiffDockにesmを複製しインストール
cd ${DIFFDOCK_HOME}
git clone https://github.com/facebookresearch/esm
cd ${DIFFDOCK_HOME}/esm
python3 -m pip install -e .
# モデルパラメータのダウンロード。初回に自動的にダウンロードされるものだが、先にやっておいてもよい。
mkdir -p ${DIFFDOCK_HOME}/esm/model_weights/.cache/torch/hub/checkpoints
cd ${DIFFDOCK_HOME}/esm/model_weights/.cache/torch/hub/checkpoints
wget https://dl.fbaipublicfiles.com/fair-esm/models/esm2_t33_650M_UR50D.pt
wget https://dl.fbaipublicfiles.com/fair-esm/regression/esm2_t33_650M_UR50D-contact-regression.pt

## ここまでが公式の説明通りだが、動作のためには一度pipでライブラリを入れ直す必要があるみたい。
# pipを使ってtorch系のライブラリを入れ直さないと動かないと思う。
# https://github.com/gcorso/DiffDock/issues/1#issuecomment-1268823855
python3 -m pip uninstall torch-scatter torch-sparse torch-cluster torch-spline-conv torch-geometric -y
python3 -m pip install -U torch-scatter torch-sparse torch-cluster torch-spline-conv torch-geometric -f https://data.pyg.org/whl/torch-1.12.0+cu113.html -y

# CentOS 7系の場合でGLIBC 2.27 not foundと言われる場合はtorch-spline-convをアンインストールしておく。
# https://github.com/pyg-team/pytorch_geometric/issues/3593#issuecomment-1169504495
python3 -m pip uninstall torch-spline-conv -y

DiffDock/conda/envs/diffdock/lib/python3.9/site-packages以下に必要なパッケージがインストールされている。

実行スクリプト例

以下のrun_diffdock.shスクリプトを作っておき、このスクリプトを置いた場所にinputディレクトリを作成し、その中にPDB: 7VU6のchain A（リガンド抜き）の構造ファイル7vu6_A_noHET.pdbとそのリガンドのSDFファイルフォーマットのensitrelvir_CID_162533924.sdfを指定する。

run_diffdock.sh

#!/bin/bash

# 上のインストール操作でDiffDockをインストールした場所を指定。
DIFFDOCK_HOME="/home/apps/DiffDock"

# run_diffdock.shを置いた場所を以降の起点とする
INFERENCE_DIR="$(cd $(dirname $0) && pwd)"

####
# 蛋白質およびリガンドのインプットファイルの場所 (都度修正)
protein_path="${INFERENCE_DIR}/input/7vu6_A_noHET.pdb"
ligand_path="${INFERENCE_DIR}/input/ensitrelvir_CID_162533924.sdf"
####

. "${DIFFDOCK_HOME}/conda/etc/profile.d/conda.sh"
export PATH="${DIFFDOCK_HOME}/conda/condabin:${PATH}"
conda activate diffdock
export PYTHONPATH="${DIFFDOCK_HOME}"
export PYTHONPATH="$PYTHONPATH:${DIFFDOCK_HOME}/esm"

# ESM2で使用する.fastaの準備 (インプットのprotein.pdbから作成)
mkdir -p ${INFERENCE_DIR}/esm2_output
python3 ${DIFFDOCK_HOME}/datasets/esm_embedding_preparation.py \
    --protein_path ${protein_path} \
    --out_file ${INFERENCE_DIR}/esm2_output/prepared_for_esm.fasta

# ESM2によるembedding for proteinの作成
python3 ${DIFFDOCK_HOME}/esm/scripts/extract.py \
    ${DIFFDOCK_HOME}/esm/model_weights/.cache/torch/hub/checkpoints/esm2_t33_650M_UR50D.pt \
    ${INFERENCE_DIR}/esm2_output/prepared_for_esm.fasta \
    ${INFERENCE_DIR}/esm2_output \
    --repr_layers 33 --include per_tok

# DiffDockの実行
python3 -m inference \
    --model_dir ${DIFFDOCK_HOME}/workdir/paper_score_model \
    --ckpt best_ema_inference_epoch_model.pt \
    --confidence_model_dir ${DIFFDOCK_HOME}/workdir/paper_confidence_model \
    --confidence_ckpt best_model_epoch75.pt \
    --protein_path ${protein_path} \
    --ligand ${ligand_path} \
    --out_dir ${INFERENCE_DIR}/results \
    --esm_embeddings_path ${INFERENCE_DIR}/esm2_output \
    --inference_steps 20 \
    --save_visualisation \
    --samples_per_complex 40 \
    --batch_size 10 \
    --cache_path ${INFERENCE_DIR}/cache/cache

ちなみにmol2フォーマットでも認識してくれたり、SMILESをベタ書きしても動作させることができるらしい。詳しくはGitHub公式の説明を読んでください。

だいたい5〜10分くらいで計算が終わり、resultsディレクトリ上にrank付けされた結果が並んで出てくる。基本的にはrank1のものを使い、PyMOLなどでインプットに使ったPDBファイル(ここでは7vu6_A_noHET.pdb)と、rank1_reverseprocess.pdbを同時に表示させると良い。rank1_confidenceに現れる数値は0に近いほど尤もらしい。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up