0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

Alphafold3を玄界で走らせたときに困ったこと

Posted at

Alphafold3すごい

Alphafold3,精度が良すぎてドン引きしちゃう.公式Githubに則って研究室のワークステーションに入れてみたけど,RTX3060TiのよわPCなのに1予想30-40分で終わっちゃうのもとってもえらい.でもやっぱり,こんなに優れたソフトウエアだから,大学のなかで一番の計算機で走らせてみたくなるの,男の子だからね.

玄界(九大スパコンのスパコン)にインストール

Singurarityコンテナを研究室WSのdockerファイルから作って,玄界に転送.condaで環境をつくってcuda12.8とstreamlitをインストール.モデルとデータベースは高速ストレージに保存.もちろん他からはみられないパーミッションになっていることを確認.動作確認まで完了した.

でもなんかエラーがでる

10回に2回くらいの確率でこんな感じのエラーが出る

Error code.out
Detected /root/af_input/input_data.json is an AlphaFold 3 JSON since the top-level is not a list.
2025-02-16 00:31:19.426260: E external/xla/xla/stream_executor/cuda/cuda_driver.cc:152] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: CUDA_ERROR_NOT_INITIALIZED: initialization error
Traceback (most recent call last):
  File "/alphafold3_venv/lib/python3.11/site-packages/jax/_src/xla_bridge.py", line 885, in backends
    backend = _init_backend(platform)
              ^^^^^^^^^^^^^^^^^^^^^^^
  File "/alphafold3_venv/lib/python3.11/site-packages/jax/_src/xla_bridge.py", line 971, in _init_backend
    backend = registration.factory()
              ^^^^^^^^^^^^^^^^^^^^^^
  File "/alphafold3_venv/lib/python3.11/site-packages/jax/_src/xla_bridge.py", line 671, in factory
    return xla_client.make_c_api_client(plugin_name, updated_options, None)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/alphafold3_venv/lib/python3.11/site-packages/jaxlib/xla_client.py", line 200, in make_c_api_client
    return _xla.get_c_api_client(plugin_name, options, distributed_client)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
jaxlib.xla_extension.XlaRuntimeError: FAILED_PRECONDITION: No visible GPU devices.

なんで? ほんとに運要素みたいな感じでエラーがでるかでないか決まる...
そこで以下の文献を発見.
https://stackoverflow.com/questions/69812260/a100-tensorflow-gpu-error-failed-call-to-cuinit-cuda-error-not-initialized-i
おそらくGPUを分割するmigに起因するエラーだと思われる.AF3を走らせるときは1枚のH100を占有しているが,それでも玄界のbノードはmigに対応するGPUがいくつか存在すると思われ,これにあたってしまうとエラーが出てしまうのかもしれない.migにすべてのボードが対応しないcノードを使うとたしかにエラーが出ないので,この説をいまのところ推している.もしだれか原因を知っていたら教えて下さい.

Wet研究者でも使いやすく....

StreamlitでAF3用のJsonファイルを書いてくれるwebアプリを作っている.これがエラーの原因だったら泣くがね....

Streamlit.py

import streamlit as st
import json
import subprocess
import os


st.title('AlphaFold3 on Genkai')
st.text('GenkaiでAlphaFold3を実行します')

st.subheader('Job name')
st.text('実行するファイルの名前をつけましょう.英数字のみ使用できます.前に使った名前となるべく被らないようにします.')

jobname = st.text_input('Job name')

st.subheader('Protein Sequence')
st.text('タンパク質のアミノ酸配列を入力します.改行や空白は使わないでください.ペプチドはこの項に入力しないでください.')
option = st.radio(
    'いくつのアミノ酸配列を入力しますか?',
    ['1', '2', '3', '4', '5']
)
if option == '1':
    A = st.text_area('Protein Sequence 1')
elif option == '2':
    A = st.text_area('Protein Sequence 1')
    B = st.text_area('Protein Sequence 2')
elif option == '3':
    A = st.text_area('Protein Sequence 1')
    B = st.text_area('Protein Sequence 2')
    C = st.text_area('Protein Sequence 3')
elif option == '4':
    A = st.text_area('Protein Sequence 1')
    B = st.text_area('Protein Sequence 2')
    C = st.text_area('Protein Sequence 3')
    D = st.text_area('Protein Sequence 4')
elif option == '5':
    A = st.text_area('Protein Sequence 1')
    B = st.text_area('Protein Sequence 2')
    C = st.text_area('Protein Sequence 3')
    D = st.text_area('Protein Sequence 4')
    E = st.text_area('Protein Sequence 5')
else:
    st.write('Please select the number of sequences to input.')

st.subheader('Ligand SmilesCode')
st.text('リガンドの SMILES Code を入力します.PubchemやChemdrawで調べることができます.改行や空白は使わないでください.ペプチドはこの項に入力してください.')

option_ligand = st.radio(
    'いくつのリガンドを入力しますか?',
    ['0', '1', '2', '3', '4', '5']
)
if option_ligand == '1':
    F = st.text_area('Ligand SmilesCode 1')
elif option_ligand == '2':
    F = st.text_area('Ligand SmilesCode 1')
    G = st.text_area('Ligand SmilesCode 2')
elif option_ligand == '3':
    F = st.text_area('Ligand SmilesCode 1')
    G = st.text_area('Ligand SmilesCode 2')
    H = st.text_area('Ligand SmilesCode 3')
elif option_ligand == '4':
    F = st.text_area('Ligand SmilesCode 1')
    G = st.text_area('Ligand SmilesCode 2')
    H = st.text_area('Ligand SmilesCode 3')
    I = st.text_area('Ligand SmilesCode 4')
elif option_ligand == '5':
    F = st.text_area('Ligand SmilesCode 1')
    G = st.text_area('Ligand SmilesCode 2')
    H = st.text_area('Ligand SmilesCode 3')
    I = st.text_area('Ligand SmilesCode 4')
    J = st.text_area('Ligand SmilesCode 5')
elif option_ligand == '0':
    pass
else:
    st.write('Please select the number of ligands to input.')

st.subheader('丁寧度')
option_seed = st.radio(
    '丁寧度を選んでください(1が最も雑で早い.5が最も丁寧.デフォルトは3.)',
    [ '1', '2', '3', '4', '5']
)

#初期JSONの定義
input_data = {
    "name": jobname,
    "sequences": []
}

#条件分岐
if option == '1':
    input_data["sequences"].append({"protein": {"id": "A", "sequence": A}})
elif option == '2':
    input_data["sequences"].append({"protein": {"id": "A", "sequence": A}})
    input_data["sequences"].append({"protein": {"id": "B", "sequence": B}})
elif option == '3':
    input_data["sequences"].append({"protein": {"id": "A", "sequence": A}})
    input_data["sequences"].append({"protein": {"id": "B", "sequence": B}})
    input_data["sequences"].append({"protein": {"id": "C", "sequence": C}})
elif option == '4':
    input_data["sequences"].append({"protein": {"id": "A", "sequence": A}})
    input_data["sequences"].append({"protein": {"id": "B", "sequence": B}})
    input_data["sequences"].append({"protein": {"id": "C", "sequence": C}})
    input_data["sequences"].append({"protein": {"id": "D", "sequence": D}})
elif option == '5':
    input_data["sequences"].append({"protein": {"id": "A", "sequence": A}})
    input_data["sequences"].append({"protein": {"id": "B", "sequence": B}})
    input_data["sequences"].append({"protein": {"id": "C", "sequence": C}})
    input_data["sequences"].append({"protein": {"id": "D", "sequence": D}})
    input_data["sequences"].append({"protein": {"id": "E", "sequence": E}})

if option_ligand == '1':
    input_data["sequences"].append({"ligand": {"id": "F", "smiles": F}})
elif option_ligand == '2':
    input_data["sequences"].append({"ligand": {"id": "F", "smiles": F}})
    input_data["sequences"].append({"ligand": {"id": "G", "smiles": G}})
elif option_ligand == '3':
    input_data["sequences"].append({"ligand": {"id": "F", "smiles": F}})
    input_data["sequences"].append({"ligand": {"id": "G", "smiles": G}})
    input_data["sequences"].append({"ligand": {"id": "H", "smiles": H}})
elif option_ligand == '4':
    input_data["sequences"].append({"ligand": {"id": "F", "smiles": F}})
    input_data["sequences"].append({"ligand": {"id": "G", "smiles": G}})
    input_data["sequences"].append({"ligand": {"id": "H", "smiles": H}})
    input_data["sequences"].append({"ligand": {"id": "I", "smiles": I}})
elif option_ligand == '5':
    input_data["sequences"].append({"ligand": {"id": "F", "smiles": F}})
    input_data["sequences"].append({"ligand": {"id": "G", "smiles": G}})
    input_data["sequences"].append({"ligand": {"id": "H", "smiles": H}})
    input_data["sequences"].append({"ligand": {"id": "I", "smiles": I}})
    input_data["sequences"].append({"ligand": {"id": "J", "smiles": J}})

input_data["modelSeeds"] = [int(option_seed)]
input_data["dialect"] = "alphafold3"
input_data["version"] = 1

# パスは変数で定義しておく
input_dir = "/home/alphafold3/af_input"
output_dir_base = "/home/alphafold3/af_output"
run_script_path = "/home/alphafold3/run.sh"

st.subheader('実行')

if st.button('Run AlphaFold3'):
    if not jobname:
        st.error("Job name cannot be empty.")
    else:
            with open(os.path.join(input_dir, 'input_data.json'), 'w') as f:
                json.dump(input_data, f, indent=4)

            result = subprocess.run(["pjsub", run_script_path], capture_output=True, text=True)

            # 標準出力と標準エラー出力を両方表示
            st.text("Standard Output:")
            for line in result.stdout.splitlines():
                st.text(line)
           
            if result.returncode != 0:
                st.error(f"玄海にデータを転送できませんでした: {result.stderr}")
            else:
                st.success("玄海に入力データを転送しました.pjstatコマンドでjobが投入されていることを確認してください.run.sh.数字.outも適宜確認してください")
              
exit    

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?