More than 1 year has passed since last update.

pythonでpairwise alignment(ペアワイズアラインメント)

Last updated at 2024-02-22Posted at 2023-12-19

pythonでペアワイズアラインメントを行う方法を記します。

追記
複数の配列ついてに総当たり形式で、全組み合わせのアラインメントする方法を記事にしました。
https://qiita.com/kujira_0120/items/2de5d08925aa7a80e15e

ペアワイズアラインメント

バイオインフォマティクスにおいて、シーケンスアラインメントとは、DNAやRNA、タンパク質の配列（一次構造）の類似した領域を特定できるように並べたもので、機能的、構造的、あるいは進化的な配列の関係性を知る手がかりを与える。
ペアワイズシーケンスアラインメントは、2配列間でのアラインメントで、部分的、あるいは全体の類似性を詳しく調べるときに用いる。
Wikipediaより

アラインメントを行うWebアプリもあります。以下は一例です。

pythonで実装

ライブラリのインストール

conda install biopython
pip install biopython

公式のドキュメント
https://biopython.org/docs/1.75/api/Bio.Align.html

ライブラリのインポート

from Bio import Align
from Bio import SeqIO
from Bio import SearchIO

配列を読み込む

Fastqファイルのパスを入力。FastqはNCBIのサイトなどからダウンロードできる。
（Fastqを使わず配列を手入力でも可能。）

file1 = './[fastqファイル].faa'
file2 = './[fastqファイル].faa'

seq_a = list(SeqIO.parse(file1, "fasta"))[0].seq
seq_b = list(SeqIO.parse(file2, "fasta"))[0].seq

パラメーターの設定、実行

aligner = Align.PairwiseAligner()

# アラインメントパラメータを設定
aligner.match_score = 2
aligner.mismatch_score = -1

aligner.mode = "global" #ローカルの場合は"local"
alignments = aligner.align(seq_a, seq_b)
alignment = alignments[0]

print(alignment)

実行例

target            0 MQQAG--LTLMA----VAVCV--A--------FQ------------------TS-----E
                  0 |------|--------||-----|--------|-------------------||-----|
query             0 M----SRL----RRYEVA---LEAEEEIYWGCF-YFFPWLRMWRRERSPMSPTSQRLSLE

target           21 AI--LPM---------------A----SS--CCTEV--------SHHVSGRLLERVSSCS
                 60 |---||----------------|----||--|||----------|-----|||-|-----
query            48 A-PSLP-LRSWHPWNKTKQKQEALPLPSSTSCCT--QLYRQPLPS-----RLL-R-----

target           50 -I-----QR-ADGDCD-L-AAVI-LHVK--RRRICISP---H--NRT-LKQWMRASEVKK
                120 -|-----|--|||||--|-|-|--||----||----|----|--||--|-----|-----
query            93 RIVHMELQ-EADGDC-HLQA-V-VLH--LARR----S-VCVHPQNR-SL-----A-----

target           92 NGR--ENVCS--GKK--Q---PSR------KDRKGHTTRKHRTRGTHRHEA--SR-----
                180 --|--|------||---|---||-------|--|-------------------|------
query           131 --RWLE----RQGK-RLQGTVPS-LNLVLQK--K-----------------MYS-NPQQQ

target          130 - 130
                240 - 241
query           163 N 164

スコアの表示

import numpy as np

length = alignment.shape[1]

difference = np.array(alignment.aligned)[0][:, 1] - np.array(alignment.aligned)[0][:, 0]
identity = np.sum(difference)
ratio = identity / length*100

print(f'Score:    {alignments.score}')
print(f'Length:   {length}')
print(f'Identity: {ratio:.1f}% ({identity}/{length})')

実行例

Score:    106.0
Length:   241
Identity: 22.0% (53/241)

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up