AlphaFold3 を Fedora40 にインストール

Last updated at 2025-03-25Posted at 2025-03-24

概要

不慣れな Ubuntu で AlphaFold3 のインストールを試したが失敗したため、自分の使い慣れている Fedora に切り替えた。Fedora40 では dnf でインストールするデフォルトの python3.12.8 でエラーなく AlphaFold3 を実行できるため、設定が簡単である。この記事では、Fedora40 での AlphaFold3 のインストールログをまとめた。

OS インストールや nvidia-driver のインストールなど始めから全て残した。この記事は新しくパソコンをセットアップする際の自分用のメモも兼ねている。

基本的には、以下の github をトレースしたが、Fedora39 から 40 へのバージョンアップで色々と変更点あり。ちなみに Fedora41 ではうまくいかなかった。
Alphafold3-Fedora-install

大先輩の神記事も参考にした。
AlphaFold3 (ver. 3.0.1) インストール

また、User-provided CCD の作り方や RDKit の不具合についてもコメントした。

検証環境

CPU: intel core-i9 7960x
GPU: RTX 4080 super
nvidia-driver:570.133.07
Python 3.12.8
cuda_12.8
gcc (GCC) 14.2.1 20240912 (Red Hat 14.2.1-3)
Fedora 40 server edition
HDD ではなく、4TB の M.2 SSD を使用
RAM:32MB
AlphaFold3.0.0

Fedora インストール

理研のミラーサイトから Fedora 40 server edition をダウンロードし、インストールメディアを作成。

wget https://ftp.riken.jp/Linux/fedora/releases/40/Server/x86_64/iso/Fedora-Server-dvd-x86_64-40-1.14.iso

balena Etcher を使ってインストールメディアを作成。

Fedora40 server edition は、最小容量でのインストールがデフォルトなので、必ず手動でディスクスペースを指定する。AlphaFold3 のデータベースを保存する容量が足りなくなってしまう。

sudo dnf update -y

環境構築

Fedora 40 では、初期設定で root パスワードを設定していないため、su コマンドが実行できない。そのため、まずは以下のコマンドを実行してパスワードを設定する。

sudo passwd root

以下をインストール。

sudo dnf groupinstall "Development Tools" -y
sudo dnf install cmake-data -y
sudo dnf install gcc-c++ cmake -y
sudo dnf install boost-devel -y
sudo dnf install zstd -y
sudo dnf install python -y
sudo dnf install pip -y
python -m pip install --upgrade pip
python -m pip install matplotlib
python -m pip install pandas
sudo dnf install boost-numpy3 python3-numpy python3-numpy-f2py python3-numpy-doc -y

nviida-driver と cuda のインストール

sudo dnf install kernel-devel kernel-headers dkms -y
sudo dnf install xorg-x11-server-devel libglvnd-devel pkgconf-pkg-config acpid -y
cd /opt
sudo wget https://jp.download.nvidia.com/XFree86/Linux-x86_64/570.133.07/NVIDIA-Linux-x86_64-570.133.07.run
sudo chmod 755 NVIDIA-Linux-x86_64-570.133.07.run
# ここで一回再起動する sudo reboot
sudo ./NVIDIA-Linux-x86_64-570.133.07.run

nvidia-driver バージョンの選定は、ハードウェアの影響を受ける印象。

再起動後に nvidia-smi で確認する。

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.133.07             Driver Version: 570.133.07     CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4080 ...    Off |   00000000:B3:00.0 Off |                  N/A |
| 31%   34C    P0             36W /  320W |       0MiB /  16376MiB |      3%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

続いて cuda12.8 をインストールする。

# cuda のインストール
sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/fedora41/x86_64/cuda-fedora41.repo
sudo dnf -y install cuda-toolkit-12-8

# .bashrc に以下を書き込み path を通す
# CUDA Toolkit の PATH を追加
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

# 続いて、nvcc の path を確認する
source .bashrc
nvcc --version

#以下のように表示されるはず
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Fri_Feb_21_20:23:50_PST_2025
Cuda compilation tools, release 12.8, V12.8.93
Build cuda_12.8.r12.8/compiler.35583870_0

HMMER のインストール

ここから先のインストール方法は参考記事とほぼ同じです。

以下を実行する前に、su で root ユーザーになる。
参考記事ではホームディレクトリに biotools というディレクトリを作成して、そこに AlphaFold3 をインストールしているが、今回は /opt にインストールする。

HMMER_DIR="/opt/hmmer"
cd /opt
wget http://eddylab.org/software/hmmer/hmmer-3.4.tar.gz
tar zxvf hmmer-3.4.tar.gz
cd hmmer-3.4
./configure --prefix=${HMMER_DIR}
make -j8
make -j8 install
cd easel
make install

AlphaFold3 のインストール

cd /opt
git clone https://github.com/google-deepmind/alphafold3.git
mv alphafold3 3.0.0
mkdir alphafold3
mv 3.0.0 alphafold3
ALPHAFOLD3DIR="/opt/alphafold3/3.0.0"

cd ${ALPHAFOLD3DIR} 
mkdir public_databases 
chmod +x fetch_databases.sh 
./fetch_databases.sh

./fetch_databases.sh には時間がかかるので、リモートでセットアップしている場合は nohup ./fetch_databases.sh & で実行して、40 分後に確認。

root ユーザーで上記を実行すると /root に public_databases が作成されてしまうので、移動する。

su
sudo mv /root/public_databases /opt/alphafold3/3.0.0

モデルの設定

Google フォームから申請したモデルを /opt に置く。

zstd -d af3.bin.zst
# モデルファイルを移動
mkdir models
mv af3* models
mv models /opt/alphafold3/3.0.0/

設定

cd ${ALPHAFOLD3DIR}
python3 -m venv .venv
source .venv/bin/activate

必要なモジュールを pip でインストール
この pip instal は、python のバージョンによって対応していないものがあるので、注意が必要。

python -m pip install absl-py==2.1.0 chex==0.1.87 dm-haiku==0.0.13 dm-tree==0.1.8 \
    filelock==3.16.1 "jax[cuda12]==0.4.34" jax-cuda12-pjrt==0.4.34 \
    jax-triton==0.2.0 jaxlib==0.4.34 jaxtyping==0.2.34 jmp==0.0.4 \
    ml-dtypes==0.5.0 numpy==2.1.3 nvidia-cublas-cu12==12.6.3.3 \
    nvidia-cuda-cupti-cu12==12.6.80 nvidia-cuda-nvcc-cu12==12.6.77 \
    nvidia-cuda-runtime-cu12==12.6.77 nvidia-cudnn-cu12==9.5.1.17 \
    nvidia-cufft-cu12==11.3.0.4 nvidia-cusolver-cu12==11.7.1.2 \
    nvidia-cusparse-cu12==12.5.4.2 nvidia-nccl-cu12==2.23.4 \
    nvidia-nvjitlink-cu12==12.6.77 opt-einsum==3.4.0 pillow==11.0.0 \
    rdkit==2024.3.5 scipy==1.14.1 tabulate==0.9.0 toolz==1.0.0 \
    tqdm==4.67.0 triton==3.1.0 typeguard==2.13.3 \
    typing-extensions==4.12.2 zstandard==0.23.0

その後、以下のコマンドを実行。

python -m pip install --no-deps .
.venv/bin/build_data
python run_alphafold.py --help

help で色々と表示されれば準備完了。
Fedora40 の方が、デフォルトの python を使えるので、セットアップが楽な印象を受ける。

計算の実行

/opt 内のファイルの権限設定を変更。

cd /opt
sudo chmod -R 755 alphafold3  
sudo chmod -R 755 hmmer

計算の実行は、任意のディレクトリで行う。
ここでは、/home/odyssey というホームディレクトリ上に新たに af3 というディレクトリを作成して行う。
root ユーザーを exit で終了しておく。

cd
mkdir af3
cd af3
vi AF3_script.sh

実行ファイル AF3_script.sh の中身は、こちらの github の AF3_script.sh を使用。
ファイルパスを以下のように変更。

ALPHAFOLD3DIR="/opt/alphafold3/3.0.0"
HMMER3_BINDIR="/opt/hmmer/bin"
DB_DIR="/opt/alphafold3/3.0.0/public_databases"
MODEL_DIR="/opt/alphafold3/3.0.0/models"

json ファイルは、以下を使用。

{
    "name": "1ABC",
    "modelSeeds": [1],
    "sequences": [
        {
            "protein": {
                "id": ["A"],
                "sequence": "MTTGLSTAGAQDIGRSSVRPYLEECTRRFQEMFDRHVVTRPTKVELTDAELREVIDDCNAAVAPLGKTVSDERWISYVGVVLWSQSPRHIKDMEAFKAVCVLNCVTFVWDDMDPALHDFGLFLPQLRKICEKYYGPEDAEVAYEAARAFVTSDHMFRDSPIKAALCTTSPEQYFRFRVTDIGVDFWMKMSYPIYRHPEFTEHAKTSLAARMTTRGLTIVNDFYSYDREVSLGQITNCFRLCDVSDETAFKEFFQARLDDMIEDIECIKAFDQLTQDVFLDLIYGNFVWTTSNKRYKTAVNDVNSRIQAAALEHHHHHH"
            }
        },
        {
            "ligand": {
              "id": ["B","C", "D"],
              "ccdCodes": ["MG"]
            }
        },
        {
            "ligand": {
              "id": ["E"],
              "ccdCodes": ["PPV"]
            }
        }
    ],
    "dialect": "alphafold3",
    "version": 1
}

計算を開始して、10 秒ほど経過してエラーが起きなければ成功。
AF3_script.sh 実行時は、 terminal の横幅を広くした方が良い。Progress バーがどんどん下に表示され続ける。

計算が終了すると、output ディレクトリが作成される。

メモ

RTX 4080 Super と RTX A4000 の速度はほぼ同じ。
また、RTX A4000 と RTX 4000 Ada も大差のない速度。
GPU 2 枚挿しにしても速度は向上しないので、1 job につき 1 GPU を使う方が良い。

User-provided CCD の作り方

任意のリガンドを使用してドッキングを行う際は、以下のように User-provided CCD を指定。中性分子でない場合は、電荷を指定する必要があるが、電荷は整数値でしか指定できない。今回は、リガンド名を "LIG" と設定した。

{
    "name": "1ABC",
    "modelSeeds": [1,2,3,4,5],
    "sequences": [
        {
            "protein": {
                "id": ["A"],
                "sequence": "MTTGLSTAGAQDIGRSSVRPYLEECTRRFQEMFDRHVVTRPTKVELTDAELREVIDDCNAAVAPLGKTVSDERWISYVGVVLWSQSPRHIKDMEAFKAVCVLNCVTFVWDDMDPALHDFGLFLPQLRKICEKYYGPEDAEVAYEAARAFVTSDHMFRDSPIKAALCTTSPEQYFRFRVTDIGVDFWMKMSYPIYRHPEFTEHAKTSLAARMTTRGLTIVNDFYSYDREVSLGQITNCFRLCDVSDETAFKEFFQARLDDMIEDIECIKAFDQLTQDVFLDLIYGNFVWTTSNKRYKTAVNDVNSRIQAAALEHHHHHH"
            }
        },
        {
            "ligand": {
              "id": ["B","C", "D"],
              "ccdCodes": ["MG"]
            }
        },
        {
            "ligand": {
              "id": ["E"],
              "ccdCodes": ["PPV"]
            }
        },
        {
            "ligand": {
              "id": "F",
              "ccdCodes": ["LIG"]
            }
        }
    ],
    "userCCD": "data_LIG\n#\n_chem_comp.id LIG\n_chem_comp.name 'carbocation'\n_chem_comp.type non-polymer\n_chem_comp.formula 'C20 H33'\n_chem_comp.mon_nstd_parent_comp_id ?\n_chem_comp.pdbx_synonyms ?\n_chem_comp.formula_weight 273.2577\n#\nloop_\n_chem_comp_atom.comp_id\n_chem_comp_atom.atom_id\n_chem_comp_atom.type_symbol\n_chem_comp_atom.charge\n_chem_comp_atom.pdbx_leaving_atom_flag\n_chem_comp_atom.pdbx_model_Cartn_x_ideal\n_chem_comp_atom.pdbx_model_Cartn_y_ideal\n_chem_comp_atom.pdbx_model_Cartn_z_ideal\nLIG C00 C 0 N -1.969 1.725 0.208\nLIG C01 C 0 N -2.190 0.370 -0.217\nLIG C02 C 1 N 0.774 0.240 0.196\nLIG C03 C 0 N -1.947 -0.874 0.522\nLIG C04 C 0 N 0.481 -0.489 1.549\nLIG C05 C 0 N -0.994 -0.949 1.714\nLIG H06 H 0 N -2.970 -0.819 0.991\nLIG H07 H 0 N -1.451 -0.348 2.508\nLIG H08 H 0 N -1.019 -1.983 2.076\nLIG C09 C 0 N -0.758 2.271 0.503\nLIG H10 H 0 N -0.794 3.334 0.745\nLIG C11 C 0 N 0.625 1.750 0.300\nLIG H12 H 0 N 0.966 2.243 -0.623\nLIG H13 H 0 N 1.270 2.175 1.078\nLIG C14 C 0 N 1.496 -1.654 1.554\nLIG H15 H 0 N 1.668 -2.032 2.566\nLIG H16 H 0 N 1.129 -2.495 0.955\nLIG C17 C 0 N 2.739 -1.061 0.895\nLIG H18 H 0 N 3.464 -1.821 0.595\nLIG H19 H 0 N 3.254 -0.374 1.576\nLIG C20 C 0 N 2.157 -0.308 -0.310\nLIG H21 H 0 N 1.922 -1.078 -1.061\nLIG C22 C 0 N -2.139 -2.026 -0.503\nLIG H23 H 0 N -2.459 -2.920 0.038\nLIG C24 C 0 N -3.275 -1.462 -1.374\nLIG H25 H 0 N -3.322 -1.922 -2.362\nLIG H26 H 0 N -4.244 -1.612 -0.888\nLIG C27 C 0 N -2.965 0.042 -1.450\nLIG H28 H 0 N -2.258 0.260 -2.269\nLIG H29 H 0 N -3.819 0.703 -1.617\nLIG C30 C 0 N -0.898 -2.363 -1.328\nLIG H31 H 0 N -0.045 -2.617 -0.696\nLIG H32 H 0 N -1.110 -3.226 -1.963\nLIG H33 H 0 N -0.596 -1.540 -1.985\nLIG C34 C 0 N -3.192 2.634 0.115\nLIG H35 H 0 N -4.093 2.145 0.492\nLIG H36 H 0 N -3.021 3.529 0.715\nLIG H37 H 0 N -3.376 2.948 -0.916\nLIG C38 C 0 N 3.086 0.696 -1.007\nLIG H39 H 0 N 3.151 1.606 -0.394\nLIG C40 C 0 N 2.547 1.054 -2.397\nLIG H41 H 0 N 3.119 1.869 -2.847\nLIG H42 H 0 N 2.630 0.184 -3.058\nLIG H43 H 0 N 1.492 1.349 -2.388\nLIG C44 C 0 N 4.503 0.134 -1.149\nLIG H45 H 0 N 4.979 -0.032 -0.180\nLIG H46 H 0 N 4.485 -0.821 -1.688\nLIG H47 H 0 N 5.132 0.823 -1.718\nLIG H48 H 0 N 0.036 -0.082 -0.552\nLIG C49 C 0 N 0.796 0.392 2.773\nLIG H50 H 0 N 0.135 1.263 2.838\nLIG H51 H 0 N 0.646 -0.195 3.684\nLIG H52 H 0 N 1.830 0.748 2.776\n#\nloop_\n_chem_comp_bond.atom_id_1\n_chem_comp_bond.atom_id_2\n_chem_comp_bond.value_order\n_chem_comp_bond.pdbx_aromatic_flag\nC00 C01 SING N\nC00 C09 DOUB N\nC00 C34 SING N\nC01 C03 SING N\nC01 C27 SING N\nC02 C04 SING N\nC02 C11 SING N\nC02 C20 SING N\nC02 H48 SING N\nC03 C05 SING N\nC03 H06 SING N\nC03 C22 SING N\nC04 C05 SING N\nC04 C14 SING N\nC04 C49 SING N\nC05 H07 SING N\nC05 H08 SING N\nC09 H10 SING N\nC09 C11 SING N\nC11 H12 SING N\nC11 H13 SING N\nC14 H15 SING N\nC14 H16 SING N\nC14 C17 SING N\nC17 H18 SING N\nC17 H19 SING N\nC17 C20 SING N\nC20 H21 SING N\nC20 C38 SING N\nC22 H23 SING N\nC22 C24 SING N\nC22 C30 SING N\nC24 H25 SING N\nC24 H26 SING N\nC24 C27 SING N\nC27 H28 SING N\nC27 H29 SING N\nC30 H31 SING N\nC30 H32 SING N\nC30 H33 SING N\nC34 H35 SING N\nC34 H36 SING N\nC34 H37 SING N\nC38 H39 SING N\nC38 C40 SING N\nC38 C44 SING N\nC40 H41 SING N\nC40 H42 SING N\nC40 H43 SING N\nC44 H45 SING N\nC44 H46 SING N\nC44 H47 SING N\nC49 H50 SING N\nC49 H51 SING N\nC49 H52 SING N\n#\n",
    "dialect": "alphafold3",
    "version": 1
}

任意の分子をドッキングさせる場合は、まず gauss view などで分子を作成し、pdb ファイルで保存する。
その後、以下の pdb2ccd.py というスクリプトで、openbabel を利用して ccd ファイルを作成する（以下のスクリプトは C と H のみしか対応していないので、適宜改変して使用すること。特に結合判定の部分）。

#!/usr/bin/env python3
"""
Convert all PDB files in a directory to AlphaFold3 CCD format.
First uses OpenBabel to convert PDB to CIF, then converts CIF to CCD.

Usage: python pdb2ccd.py input_directory output_directory [atom_id charge]

Examples:
    python pdb2ccd.py ./pdb_files ./ccd_files
    python pdb2ccd.py ./pdb_files ./ccd_files C10 1

Arguments:
    input_directory: Directory containing PDB files
    output_directory: Directory where CCD files will be saved
    atom_id: (Optional) Atom ID to assign a charge to (e.g., C10)
    charge: (Optional) Charge value to assign to the atom (e.g., 1)

Requirements:
- OpenBabel (obabel command must be available in the system path)
- Python 3.6+

"""

import sys
import os
import re
import subprocess
import glob
import shutil
from pathlib import Path

def run_obabel(pdb_file, cif_file):
    """
    Convert PDB to CIF using OpenBabel.
    """
    try:
        # Check if obabel is available
        try:
            # Test if obabel command is available
            version_cmd = ["obabel", "-V"]
            result = subprocess.run(version_cmd, check=True, capture_output=True, text=True)
        except FileNotFoundError:
            print("ERROR: OpenBabel (obabel command) is not found in the system path.")
            print("Please install OpenBabel or make sure it's in your PATH.")
            return False
        except subprocess.CalledProcessError:
            print("ERROR: OpenBabel is installed but returned an error when checking version.")
            return False
            
        # Run OpenBabel command
        cmd = ["obabel", pdb_file, "-O", cif_file]
        result = subprocess.run(cmd, check=True, capture_output=True, text=True)
        
        # Check if the CIF file was actually created and has content
        if not os.path.exists(cif_file) or os.path.getsize(cif_file) == 0:
            print(f"ERROR: OpenBabel failed to create a valid CIF file for {pdb_file}.")
            print("The command appeared to succeed but no output file was created.")
            return False
            
        return True
    except subprocess.CalledProcessError as e:
        print(f"ERROR: Failed to convert {pdb_file} to {cif_file} using OpenBabel:")
        print(f"Command: {' '.join(cmd)}")
        print(f"Exit code: {e.returncode}")
        print(f"stdout: {e.stdout}")
        print(f"stderr: {e.stderr}")
        return False
    except Exception as e:
        print(f"ERROR: Unexpected error while converting {pdb_file} with OpenBabel: {e}")
        return False

def parse_cif_file(cif_file):
    """Parse a CIF file and extract atom information."""
    atoms = []
    ligand_id = ""
    
    with open(cif_file, 'r') as f:
        lines = f.readlines()
        
        # Extract the ligand ID from chemical name common
        for line in lines:
            if "_chemical_name_common" in line:
                # Extract text within single quotes
                match = re.search(r"'([^']*)'", line)
                if match:
                    # Remove .pdb extension if present
                    name = match.group(1).replace(".pdb", "")
                    ligand_id = name
                break
        
        # Find atom site loop
        atom_loop_start = False
        for line in lines:
            line = line.strip()
            
            if "loop_" in line:
                atom_loop_start = True
                continue
                
            if atom_loop_start and "_atom_site_label" in line:
                continue
            elif atom_loop_start and "_atom_site_type_symbol" in line:
                continue
            elif atom_loop_start and "_atom_site_fract_x" in line:
                continue
            elif atom_loop_start and "_atom_site_fract_y" in line:
                continue
            elif atom_loop_start and "_atom_site_fract_z" in line:
                continue
            elif atom_loop_start and "_atom_site_occupancy" in line:
                continue
            elif atom_loop_start and line and not line.startswith("#") and not line.startswith("loop_"):
                # Parse atom line
                parts = line.split()
                if len(parts) >= 6:  # Ensure line has enough parts
                    atom_label = parts[0]
                    atom_type = parts[1]
                    x = float(parts[2])
                    y = float(parts[3])
                    z = float(parts[4])
                    
                    # Extract the element and number
                    element_match = re.match(r'([A-Za-z]+)(\d+)', atom_label)
                    if element_match:
                        element = element_match.group(1)
                        number = int(element_match.group(2))
                        atoms.append({
                            "label": atom_label,
                            "element": element,
                            "number": number,
                            "x": x,
                            "y": y,
                            "z": z
                        })
            elif atom_loop_start and (not line or line.startswith("loop_") or line.startswith("#")):
                # End of atom loop
                break
    
    return ligand_id, atoms

def create_ccd(ligand_id, atoms, output_file, charge_atom_id=None, charge_value=0):
    """Create a CCD file from the parsed atoms."""
    # Using fixed values as specified
    mol_name = "carbocation"
    formula = "C20 H33"
    formula_weight = 273.2577
    
    with open(output_file, 'w') as f:
        # Header
        f.write(f"data_LIG\\n")
        f.write("#\\n")
        f.write(f"_chem_comp.id LIG\\n")
        f.write(f"_chem_comp.name '{mol_name}'\\n")
        f.write(f"_chem_comp.type non-polymer\\n")
        f.write(f"_chem_comp.formula '{formula}'\\n")
        f.write(f"_chem_comp.mon_nstd_parent_comp_id ?\\n")
        f.write(f"_chem_comp.pdbx_synonyms ?\\n")
        f.write(f"_chem_comp.formula_weight {formula_weight}\\n")
        f.write("#\\n")
        
        # Atoms loop
        f.write("loop_\\n")
        f.write("_chem_comp_atom.comp_id\\n")
        f.write("_chem_comp_atom.atom_id\\n")
        f.write("_chem_comp_atom.type_symbol\\n")
        f.write("_chem_comp_atom.charge\\n")
        f.write("_chem_comp_atom.pdbx_leaving_atom_flag\\n")
        f.write("_chem_comp_atom.pdbx_model_Cartn_x_ideal\\n")
        f.write("_chem_comp_atom.pdbx_model_Cartn_y_ideal\\n")
        f.write("_chem_comp_atom.pdbx_model_Cartn_z_ideal\\n")
        
        for atom in atoms:
            element = atom["element"]
            atom_id = f"{element}{atom['number']:02d}"
            
            # Assign charge if this atom matches the specified atom_id
            charge = charge_value if atom_id == charge_atom_id else 0
            
            f.write(f"LIG {atom_id} {element} {charge} N {atom['x']:.3f} {atom['y']:.3f} {atom['z']:.3f}\\n")
        
        # Since we don't have bond information, we'll create a simple bonds section
        # with only header information
        f.write("#\\n")
        f.write("loop_\\n")
        f.write("_chem_comp_bond.atom_id_1\\n")
        f.write("_chem_comp_bond.atom_id_2\\n")
        f.write("_chem_comp_bond.value_order\\n")
        f.write("_chem_comp_bond.pdbx_aromatic_flag\\n")
        
        # Generate bonds based on distance with specific criteria
        for i in range(len(atoms)):
            for j in range(i+1, len(atoms)):
                atom_i = atoms[i]
                atom_j = atoms[j]
                
                # Calculate distance between atoms
                dx = atom_i["x"] - atom_j["x"]
                dy = atom_i["y"] - atom_j["y"]
                dz = atom_i["z"] - atom_j["z"]
                distance = (dx**2 + dy**2 + dz**2)**0.5
                
                # Apply specific bond criteria based on element types and distances
                atom_id_1 = f"{atom_i['element']}{atom_i['number']:02d}"
                atom_id_2 = f"{atom_j['element']}{atom_j['number']:02d}"
                
                # C-H bonds: distance <= 1.2 Å → SING
                if (atom_i["element"] == "C" and atom_j["element"] == "H" and distance <= 1.2) or \
                   (atom_i["element"] == "H" and atom_j["element"] == "C" and distance <= 1.2):
                    f.write(f"{atom_id_1} {atom_id_2} SING N\\n")
                
                # C-C bonds: distance <= 1.4 Å → DOUB, distance <= 1.7 Å → SING
                elif atom_i["element"] == "C" and atom_j["element"] == "C":
                    if distance <= 1.4:
                        f.write(f"{atom_id_1} {atom_id_2} DOUB N\\n")
                    elif distance <= 1.7:
                        f.write(f"{atom_id_1} {atom_id_2} SING N\\n")
        
        f.write("#\\n")

def process_pdb_file(pdb_file, temp_dir, output_dir, charge_atom_id=None, charge_value=0):
    """
    Process a single PDB file: convert to CIF, then to CCD.
    """
    try:
        # Create base filename without extension
        base_name = os.path.basename(pdb_file)
        base_name_no_ext = os.path.splitext(base_name)[0]
        
        # Define temporary CIF file path
        temp_cif = os.path.join(temp_dir, f"{base_name_no_ext}.cif")
        
        # Define output CCD file path
        output_ccd = os.path.join(output_dir, f"{base_name_no_ext}.ccd")
        
        # Step 1: Convert PDB to CIF using OpenBabel
        if not run_obabel(pdb_file, temp_cif):
            print(f"Failed to convert {pdb_file} to CIF format. Skipping.")
            return False
        
        # Step 2: Parse the CIF file
        ligand_id, atoms = parse_cif_file(temp_cif)
        
        # Use the base name as ligand_id if not found in the file
        if not ligand_id:
            ligand_id = base_name_no_ext
            
        # Make sure the ligand_id is no more than 3 characters + number
        # If it's longer, truncate it to first 3 chars
        if len(ligand_id) > 3 and not any(char.isdigit() for char in ligand_id):
            ligand_id = ligand_id[:3]
        
        # Step 3: Create the CCD file
        create_ccd(ligand_id, atoms, output_ccd, charge_atom_id, charge_value)
        print(f"Successfully converted {pdb_file} to CCD format: {output_ccd}")
        
        return True
        
    except Exception as e:
        print(f"Error processing {pdb_file}: {e}")
        return False

def main():
    # Check command line arguments
    if len(sys.argv) < 3 or len(sys.argv) > 5:
        print("Usage: python convert_all_pdbs.py input_directory output_directory [atom_id charge]")
        print("Example: python convert_all_pdbs.py ./pdb_files ./ccd_files C10 1")
        sys.exit(1)
    
    input_dir = sys.argv[1]
    output_dir = sys.argv[2]
    
    # Check for optional charge arguments
    charge_atom_id = None
    charge_value = 0
    
    if len(sys.argv) == 5:
        charge_atom_id = sys.argv[3]
        try:
            charge_value = int(sys.argv[4])
        except ValueError:
            print(f"Error: Charge value must be an integer. Got '{sys.argv[4]}'")
            sys.exit(1)
        print(f"Will assign charge {charge_value} to atom {charge_atom_id}")
    
    # Check if input directory exists
    if not os.path.isdir(input_dir):
        print(f"Error: Input directory {input_dir} does not exist")
        sys.exit(1)
    
    # Create output directory if it doesn't exist
    os.makedirs(output_dir, exist_ok=True)
    
    # Create temporary directory for CIF files
    temp_dir = os.path.join(output_dir, "temp_cif")
    os.makedirs(temp_dir, exist_ok=True)
    
    # Get all PDB files in the input directory
    pdb_files = glob.glob(os.path.join(input_dir, "*.pdb"))
    
    if not pdb_files:
        print(f"No PDB files found in {input_dir}")
        sys.exit(0)
    
    print(f"Found {len(pdb_files)} PDB files to process")
    
    # Process each PDB file
    success_count = 0
    for pdb_file in pdb_files:
        if process_pdb_file(pdb_file, temp_dir, output_dir, charge_atom_id, charge_value):
            success_count += 1
    
    # Clean up temporary files
    print("Cleaning up temporary CIF files...")
    try:
        # Remove each CIF file individually to ensure they're all deleted
        for cif_file in glob.glob(os.path.join(temp_dir, "*.cif")):
            os.remove(cif_file)
        
        # Then remove the directory
        os.rmdir(temp_dir)
        print("Temporary files successfully removed.")
    except Exception as e:
        print(f"Warning: Could not completely remove temporary files: {e}")
    
    print(f"Processing complete. Successfully converted {success_count} out of {len(pdb_files)} PDB files.")

if __name__ == "__main__":
    main()

その後、以下の ccd2json.py というスクリプトを用いて json ファイルに ccd の情報を入れる。時々、ccd 情報の最後で改行されてしまう現象が見られるので、必ず目視で確認すること。以下のコードでは、json ファイルの ccd を書き込む部分を "paste here" という文字列にしておく必要がある。

#!/usr/bin/env python3
"""
Integrate CCD files into a JSON template.

This script reads CCD files from a directory and inserts their content into 
a JSON template at the location marked with "paste here". The output is saved
as a new JSON file with a name based on the original JSON and PDB file names.

Usage: python ccd2json.py json_template ccd_directory output_directory

Arguments:
    json_template: Path to the template JSON file containing "paste here" marker
    ccd_directory: Directory containing CCD files to be integrated
    output_directory: Directory where the generated JSON files will be saved

Example:
    python ccd2json.py template.json ./ccd_files ./output_json
"""

import sys
import os
import glob

def read_file_as_text(file_path):
    """Read a file and return its content as text."""
    try:
        with open(file_path, 'r') as f:
            return f.read()
    except Exception as e:
        print(f"ERROR: Could not read the file '{file_path}': {e}")
        return None

def integrate_ccd_into_json(json_template_text, ccd_content, output_path):
    """
    Integrate CCD content into the JSON template text and save to a new file.
    Simply replaces 'paste here' with the CCD content.
    """
    # Replace the marker with CCD content
    if "paste here" not in json_template_text:
        print(f"WARNING: Could not find 'paste here' marker in the JSON template.")
        return False
    
    modified_text = json_template_text.replace("paste here", ccd_content)
    
    # Save the modified text to a new file
    try:
        with open(output_path, 'w') as f:
            f.write(modified_text)
        return True
    except Exception as e:
        print(f"ERROR: Could not write to output file '{output_path}': {e}")
        return False

def main():
    if len(sys.argv) != 4:
        print("Usage: python ccd2json.py json_template ccd_directory output_directory")
        sys.exit(1)
    
    json_template_path = sys.argv[1]
    ccd_directory = sys.argv[2]
    output_directory = sys.argv[3]
    
    # Check if the template exists
    if not os.path.isfile(json_template_path):
        print(f"ERROR: JSON template file '{json_template_path}' does not exist.")
        sys.exit(1)
    
    # Check if the CCD directory exists
    if not os.path.isdir(ccd_directory):
        print(f"ERROR: CCD directory '{ccd_directory}' does not exist.")
        sys.exit(1)
    
    # Create output directory if it doesn't exist
    os.makedirs(output_directory, exist_ok=True)
    
    # Read JSON template as text
    json_template_text = read_file_as_text(json_template_path)
    if json_template_text is None:
        sys.exit(1)
    
    json_basename = os.path.basename(json_template_path)
    json_name_without_ext = os.path.splitext(json_basename)[0]
    
    # Get all CCD files
    ccd_files = glob.glob(os.path.join(ccd_directory, "*.ccd"))
    if not ccd_files:
        print(f"ERROR: No CCD files found in '{ccd_directory}'.")
        sys.exit(1)
    
    success_count = 0
    
    # Process each CCD file
    for ccd_path in ccd_files:
        ccd_basename = os.path.basename(ccd_path)
        ccd_name_without_ext = os.path.splitext(ccd_basename)[0]
        
        # Read CCD content
        ccd_content = read_file_as_text(ccd_path)
        if ccd_content is None:
            continue
        
        # Generate output filename
        output_filename = f"{json_name_without_ext}_{ccd_name_without_ext}.json"
        output_path = os.path.join(output_directory, output_filename)
        
        # Integrate and save
        print(f"Processing: {ccd_basename} -> {output_filename}")
        if integrate_ccd_into_json(json_template_text, ccd_content, output_path):
            success_count += 1
            print(f"SUCCESS: Created {output_filename}")
        else:
            print(f"FAILED: Could not create {output_filename}")
    
    print(f"\nSummary: Successfully processed {success_count} out of {len(ccd_files)} CCD files.")

if __name__ == "__main__":
    main()

ドッキングシミュレーションでは、タンパク側とリガンド側の両方を同時に動かすのが難しいとされている。そこで、リガンドのコンフォメーション・ライブラリーを作成することで、擬似的にリガンドを動かしていることにすることが一般的である。
AlphaFold3 でのドッキングシミュレーションでは、RDKit が配座探索を行うため、ユーザー側でコンフォメーション・ライブラリーを用意する必要はない。
ただし、RDKit の配座探索はゴミなので、誤った立体化学をもつリガンドを大量に生成してドッキングシミュレーションを行ってくれる。
現時点（2025年3月末）では、random_seeds の指定数を増やして大量にドッキングを行い、正しい立体化学のリガンドを選ぶという方法が現実的であると考えられる。

random_seeds の数を増やすとseed-${X}_sample-${Y}という名前のディレクトリが大量に生成する。それらの中から .cif　ファイルをまとめるコマンドは、以下の通り。

for X in {1..5}; do for Y in {0..4}; do cp seed-${X}_sample-${Y}/model.cif model_${X}${Y}.cif; done; done

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up