More than 1 year has passed since last update.

pythonでBLAST検索

Last updated at 2023-12-23Posted at 2023-12-19

インストール

conda install biopython
pip install biopython

インポート

from Bio.Blast import NCBIXML
from Bio.Blast import NCBIWWW
from Bio import SeqIO
from Bio import SearchIO

配列の読み込み

filename = './protein.faa'

seq = list(SeqIO.parse(filename, "fasta"))[0].seq

実行

配列数が少ないときにはパラメーターを自動で変更する。

if len(seq) < 31:
    expect = 1000
    word_size = 7
    nucl_reward = 1
    filter = None
    lcase_mask = None
    warnings.warn(
        '"SHORT_QUERY_ADJUST" is incorrectly implemented (by NCBI) for blastn.'
        " We bypass the problem by manually adjusting the search parameters."
        " Thus, results may slightly differ from web page searches.",
        BiopythonWarning)
else:
    expect = 10
    word_size = None
    nucl_reward = None

result = NCBIWWW.qblast("blastp",
                        "refseq_protein",
                        seq,
                        format_type='XML',
                        hitlist_size=50,
                        expect=expect,
                        word_size=word_size,
                        nucl_reward=nucl_reward)

※このままではXMLになっておらずパース出来ないため、一度XMLファイルで保存してからXMLとして読み込む。

結果の表示

with open('result_raw.xml', 'w', encoding='UTF-8') as f:
  f.write(result.getvalue())

result_view = SearchIO.read("./result_raw.xml","blast-xml")

list(result_view)[0:5]
print(result_view[0])
print(result_view[0][0])
print(result_view[0][0][0])

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up