More than 1 year has passed since last update.

総当たりで全組み合わせのpairwise alignmentを行う

Last updated at 2024-02-21Posted at 2024-02-21

はじめに

Pythonでpairwise alignment(ペアワイズアラインメント)を実装する方法はこちらに記しました。

複数の配列があり、その中でどの配列同士の相同性が高いか検証したい。という目的のもと、2つの配列間のアラインメントを総当たり形式で全組み合わせ行う。その結果をヒートマップで可視化する。

出力例はこんな感じ。

実装

インポート

from Bio import Align
from Bio import SeqIO
from Bio import SearchIO
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

関数の定義

アライメントの結果はresults（辞書）およびdf_results（データフレーム）に格納する。

def alignment (x, y, sequences, labels, results, df_result):

    aligner = Align.PairwiseAligner()

    # アラインメントパラメータを設定
    aligner.match_score = 2
    aligner.mismatch_score = -1
    aligner.mode = "global" #ローカルの場合は"local"

    seq_a = sequences[x]
    seq_b = sequences[y]
    
    alignments = aligner.align(seq_a, seq_b)
    alignment = alignments[0]
   

    length = alignment.shape[1]
    difference = np.array(alignment.aligned)[0][:, 1] - np.array(alignment.aligned)[0][:, 0]
    identity = np.sum(difference)
    ratio = identity / length*100

    dic_temp = {
        "a_label": labels[x],
        "b_label": labels[y],
        "score": alignments.score,
        "length": length,
        "identity": ratio
        }

    results.append(dic_temp)
    df_result.iloc[x,y] = ratio
    
    return results, df_result

配列とラベルの準備

アラインメントを行いたい配列（塩基orアミノ酸どちらもOK）をリスト型でsequencesに入れる。
配列のラベル（遺伝子名など）をlabelsに入れる。ここの順番が対応しているように。

sequences = #ここに配列を入れる
labels = #ここにラベル名（遺伝子名など）を入れる

関数の実行

for文でループして全通りのアラインメントの実行と結果の記録を行う。

num = len(sequences)
combinations = [(x, y) for x in range(num) for y in range(num) if y > x]

results = []
df_result = pd.DataFrame(columns=labels, index=labels)

for x, y in combinations:
    name_x = labels[x]
    name_y = labels[y]
           
    results, df_result = alignment(x, y, sequences, labels, results, df_result)

df_result.fillna('-')
df_result = df_result.astype(float)

結果の表示

fig = plt.figure(figsize=(10,9))
ax = fig.add_subplot(1,1,1) 

sns.heatmap(df_result,
            cmap='Reds',
            annot=True, 
            #fmt="1.1f", #表示する桁数を指定する
            annot_kws={"fontsize":8}
           )

plt.show()

出力結果例

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up