LoginSignup
2
1

More than 1 year has passed since last update.

系統解析ツール「IQ-TREE」を利用した祖先配列推定 (Ancestral Sequence Reconstruction)

Last updated at Posted at 2021-09-09

概要

祖先配列推定(ASR:Ancestral Sequence Reconstruction)は、共通祖先に由来する各生物のホモログ遺伝子により祖先遺伝子配列を推定する分子進化学における手法を示す。祖先配列推定により、遺伝子(タンパク質)の進化のメカニズムやダイナミクスが解明されてきた。

本稿では、系等解析ツールの一つである「IQ-TREE」を利用した祖先配列推定の実行手順及び自動化Pythonスクリプトを記載する。
※ IQ-TREEのv2.x系にて動作を確認したものであり、v1.x系では動作しません

実行手順・結果

IQ-TREEを利用した祖先配列推定における、「入力データ」・「実行コマンド」・「出力結果データ」について記載する。
※ IQ-TREE(v2.x系)がインストール済であること (IQ-TREE公式サイト)

1. 入力データ

  • 系統樹ファイル
    真猿型下目(Simiiformes)の種の系統樹(newick形式)をサンプルとして利用
    内部ノード識別子を与えておくことで、後のデータ解釈・処理が楽になるため、
    図に示すように内部ノードへ'N000,N001...N0XX'の連番IDを付与
simiiformes.nwk
((((Saimiri_boliviensis,Cebus_capucinus)N003,Callithrix_jacchus)N002,Aotus_nancymaae)N001,((((((Pan_paniscus,Pan_troglodytes)N009,Homo_sapiens)N008,Gorilla_gorilla)N007,Pongo_abelii)N006,Nomascus_leucogenys)N005,((Rhinopithecus_roxellana,Rhinopithecus_bieti)N011,(Chlorocebus_sabaeus,(((Macaca_mulatta,Macaca_fascicularis)N015,Macaca_nemestrina)N014,(Papio_anubis,(Cercocebus_atys,Mandrillus_leucophaeus)N017)N016)N013)N012)N010)N004)N000;

simiiformes_nodeid_nwk.png

  • アライメント済のCDS遺伝子配列ファイル
    系統樹に対応する祖先配列推定対象の各生物CDS遺伝子配列における
    アライメント済FASTAファイルをサンプルとして利用
    「系統樹内の各生物名」と「配列アノテーション」の総数及び名称は必ず一致している必要がある。
cds_align.fasta
>Rhinopithecus_bieti
MGSENSALKSYTLKEPPFILPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDVLLALIFLHDRGQLTHNNVCLSSVFVSEDGHWKLGAMETVCKISQATPEFLKSIQSIRDPASIPPEEMSPEFTTLPECHGHARDAFSFGTLVESLLTILNEQVSADVLSSFQQTLHSTLLNPTPNCRPALCTLLSHEFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPHLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSSMRVVCSKHSQIS-----PFSSIFPKCFFSGSMPINSKKHIQRDYYYTLLQTGDPFSQSIKFPINGLSDIKNTLEDSENFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWPREPCDAVKSQCTTLDMEE-SSWDDCEPSNLATNVNPGDGITATKSVTSGEQKPIPALLPLTEES-MPWKSSLPQKTSLVQSGDDPDQITPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLVLPELRTE--MVPKKDEVSSVMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW
>Aotus_nancymaae
MGSENSALKSYTLKEPAFTLPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGHLTHNNVCLSSVFVSEDGHWKLGGMETVCKVPQATPEFLRSIQSVRDPASIPPEEMSPEFTTLPECHGHARDAFSFGILVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPKCRPALCTLLSHDFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPHLLGPKK-GHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSPVRVVCSQHSQISPILENPFSSIFPKCFFSGNMPIN-KKHIHQDYYNTLLQTGNPFSQPIKFPINGLSDVKNSSEDSENFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQVWPREPCDAVKSQCTTLDVGE-SSWDDCEPGSLDTKVNPEGGITATKPVTSGKQKPIPALLPLTEES-TTWKSSLPQKTSLLQSGDDPDQIKPPKVSSQEKPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPNKDDVSPVMQFSSKFAAAEVAEGEAEGW-EEGELNWE--DNNW
>Callithrix_jacchus
MGSENSALKSYTLKEPPFTLPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALDTLSSAEVCAGIYDILLALTFLHDRGHLTHNNVCLSSVFVSEDGHWKLGGMETVCKVSQATPEFLRSIQSVRDPASIPPEEMSPEFTTLPECDGHARDAFSFGILVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPKCRPALCTLLSHDFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPHLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSPVRVVCSQHSQISPILENPFSSIFPKCFFSGNMPIN-KKHIQQDYYNTLLQTGNPFSQPIKFPINGLSDVKNSSEDSENFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWPREPCDAVKSQCTTLDMEE-LSWDDCEPGSLDTKVNLGAGITATKPVTSGKQKPIPALLPLTEES-TTWQSRLSQKTSLVQSRDDPDQIKPPKVSSQERPLKISSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVSNKDDVSPVMQFSSKFAAAEIAEGEAEGW-EEGELNWE--DNNW
>Macaca_mulatta
MGSENSALKSYTLKEPPFILPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGQLTHNNVCLSSVFVSEDGHWKLGAMETVCKISQATPEFLRSIQSIRDPASIPPEEMSPEFTTLPECHGHARDAFSFGTLVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPNCRPALCTLLSHEFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPHLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSSMRVVCSKHSQISPVLENPFSSIFPKCFFSGSMPINSKKHIQRDYYNTLLQTGDPFSQPIKFPINGLLDVKNTLEDSKNFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWHREPCDAVKSQCTTLDMEE-SSWDDCEPSNLDTKVNPGGGITATKSVTSGEQKPIPALLPLTEES-MPWKSSLP------QNGDDPDQIKPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPKKDEICPMMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW
>Cebus_capucinus
MGSENSALKSYTLKEPPFTLPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALTFLHDRGHLTHNNVCLSSVFVSEDGHWKLGGMETVCKVSQATPEFLRSIQSVRDPASIPPEEMSPEFTALPECHGHARDAFSFGILVESLLTILNKQVSADVLSSFQQTLHSTLLNPIPKCRPALCTLLSHDFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPHLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSPVRVVCSQHSQISPILENPFSSIFPKCFFSGNMPIN-KKHIQQDYYNTLLQTGDPFSQPIKFPISGLPDVKNSSEDSENFPSSSKKS-EDWPDWSEPEEP-ENQTVNIQIWPREPCDAVKSQCTTLDVEE-SSWDDCEPGSLDTKVNPGGGITATKPVTSGEQKPIPALLPFTEES-TTWKSSVPGKTGLVQSRDDPDQIKSPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPNKDDVSPVMQFSSKFAAAEITEGEAEGW-EEGELNWE--DNNW
>Saimiri_boliviensis
MGSENSALKSYTLKEPPFTLPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGHLTHNNVCLSSVFVSEDGHWKLGGMETVCKVAQATPEFLRSIQSVRDPASIPPEEMSPEFTTLPECHGHARDAFSFGILVESLLTILNKQVSADVLSSFQQTLHSTLLNPIPKCRPALCTLLSHDFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPHLLGPKK-DHV-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSPVRVVCSQRSQISPILENPFSSIFPKCFFSGNMPIN-KKHIQQDYYNTLLQTGDPFSQPIKFPINGLSDIKNSSEDSENFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWPREPCDAVKSQCTTLDVEE-SSWDDCEPGSLNTKVNPGGGITATKPVTSGEQKLIPALLPFTEES-TTWKSSLPQKTSLVQSRDDPDHIKPPKVSSQERPLKVPSELGLGEEFTIQVRKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPNKDDVSPVMQFSSKFAAAAITEGEAEGW-EEGELNWE--DSNW
>Cercocebus_atys
MGSENSALKSYTLKEPPFILPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGQLTHNNVCLSSVFVSEDGHWKLGAMETVCKISQATPEFLRSIQSIRDPASIPPEEMSPEFTTLPECHGHARDAFSFGTLVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPNCRPALCTLLSHEFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPHLLGPKK-DHA-QGETPCLLSPGLFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSSMRVVCSKHSQISPVLENPFSSIFPKCFFSGSMPINSKKHIQRDYYNTLLQTGDPFSQPIKFPINGLSDVKNTLEDSKNFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWPREPCDAVKSQCTTLDMEE-SSWDDCEPSNLDTKVNPGGGITATKSVTSGEQKPIPALLPLTEES-MPWKSSLPQKTSLVQSGDDPDQIKPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPKKDEVSPVMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW
>Mandrillus_leucophaeus
MGSENSALKSYTLKEPPFILPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGQLTHNNVCLSSVFVSEDGHWKLGAMETVCKISQATPEFLRSIQSIRDPASIPPEEMSPEFTTLPECHGHARDAFSFGTLVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPNCRPALCTLLSHEFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPHLLGPKK----------------------------------------------------------------VLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSSMRVVCSKHSQISPVLENPFSSIFPKCFFSGSMPINSKKHIQRDYYNTLLQTGDPFSQPIKFPINGLSDVKNSLEDSKNFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWPREPCDAVKSQCTTLDMEE-SSWDDCEPSNLDTKVNPGGGITATKSVTLGEQKPIPALLPLTEES-MPWKSSLPQKTSLVQSGDDPDQIKPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPKKDEVSPVMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW
>Chlorocebus_sabaeus
MGSENSALKSYTLKEPPFILPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGQLTHNNVCLSSVFVSEDGHWKLGAMETVCKISQATPEFLRSIQSIRDPASIPPEEMSPEFTTLPECHGHARDAFSFGTLVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPNCRPALCTLLSHDFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPHLLGPKK-DHA-QGETPCLLSPALFRSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSSMRVVCSKHSQISPVLENPFSSIFPKCFFSGSMPINSKKHIQRDYYNTLLQTGDPFSQPIKFPINGLSDVKNTLEDSKNFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWPREPCDAVKSQCTTLDMEE-SSWDDCEPSNLDTKVNPGGGITATKSVTSGEQKPIPALLPLTEES-MPWKSSLPQKTSLVQSGDDPDQIKPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPKKDEVSPVMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW
>Papio_anubis
MGSENSALKSYTLKEPPFILPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGQLTHNNVCLSSVFVSEDGHWKLGAMETVCKISQATPEFLRSIQSIRDPASIPPEEMSPEFTTLPECHGHARDAFSFGTLVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPNCRPALCTLLSHEFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPHLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLS-------------------------------------------------------------QPIKFPINGLSDVKNTLEDSKNFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWPREPCDAVKSQCTTLDMEE-SSWDDCEPSNLDTKVNPGGGITATKSVTSGEQKPIPALLPLTEES-MPWKSSLPQKTSLVQSGDDPDQIKPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPKKDEVSPVMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW
>Macaca_fascicularis
MGSENSALKSYTLKEPPFILPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGQLTHNNVCLSSVFVSEDGHWKLGAMETVCKISQATPEFLRSIQSIRDPASIPPEEMSPEFTTLPECHGHARDAFSFGTLVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPNCRPALCTLLSHEFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPHLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSSMRVVCSKHSQISPVLENPFSSIFPKCFFSGSMPINSKKHIQRDYYNTLLQTGDPFSQPIKFPINGLSDVKNTLEDSKNFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWHREPCDAVKSQCTTLDMEE-SSWDDCEPSNLDTKVNPGGGITATKSVTSMEQKPIPALLPLTEES-MPWKSSLPQKTSLVQNGDDPDQIKPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPKKDEICPMMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW
>Macaca_nemestrina
MGSENSALKSYTLKEPPFILPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGQLTHNNVCLSSVFVSEDGHWKLGAMETVCKISQATPEFLRSIQSIRDPASIPPEEMSPEFTTLPECHGHARDAFSFGTLVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPNCRPALCTLLSHEFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPHLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSSMRVVCSKHSQISPVLENPFSSIFPKCFFSGSMPINSKKHIQRDYYNTLLQTGDPFSQPIKFPINGLSDVKNTLEDSKNFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWHREPCDAVKSQCTTLDMEE-SSWDDCEPSNLATKVNPGGGITATKSVTSGEQKPIPALLPLTEES-MPWKSSLPQKTSLVQNGDDPDQIKPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPKKDEICPMMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW
>Rhinopithecus_roxellana
MGSENSALKSYTLKEPPFILPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGQLTHNNVCLSSVFVSEDGHWKLGAMETVCKISQATPEFLKSIQSIRDPASIPPEEMSPEFTTLPECHGHARDAFSFGTLVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPNCRPALCTLLSHEFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPHLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSSMRVVCSKHSQISPVLENPFSSIFPKCFFSGSMPINSKKHIQRDYYYTLLQTGDPFSQSIKFPINGLSDIKNTLEDSENFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWPREPCDAVKSQCTTLDMEE-SSWDDCEPSNLATNVNPGDGITATKSVTSGEQKPIPALLPLTEES-MPWKSSLPQKTSLVQSGDDPDQITPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLVLPELRTE--MVPKKDEVSSVMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW
>Gorilla_gorilla
MGSENSALKSYTLKEPPFTLPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGHLTHNNVCLSSVFVSEDGHWKLGGMETVCKVSQATPEFLRSIQSIRDPASIPPEEMSPEFTTLPECHGHARDAFSFGTLVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPKCRPALCTLLSHDFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPYLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQDQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPELVVGGERTKIFKRTAPSFTKNIDLSLEDSPMRVVCSHHSQISPILENPFSSIFPKCFFSGSMPINSKKHIQRDYYNTLLQTGDPFSQPIKFPINGLSDVKNTSEDSENFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWPREPCDDVKSQCTTLDVEE-SSWDDCEPSSLDTKVNPGGGITATKPVTSGEQKPIPALLSLTEES-MPWKSSLPQKTSLVQRGDDADQIKPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPKKDDVSPVMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW
>Pan_troglodytes
MGSENSALKSYTLKEPPFTLPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEVDGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGHLTHNNVCLSSVFVSEDGHWKLGGMETVCKVSQATPEFLRSIQSIRDPASIPPEEMSPEFTTLPECHGHARDAFSFGTLVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPKCRPALCTLLSHDFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPYLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSPMRVVCSHHSQISPILENPFSSIFPKCFFSGSMPINSKKHIQRDYYNTLLQTGDPFSQPIKFPINGLSDVKNTSEDSENFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWPRETCDDVKSQCTTLDVEE-SSWDDCEPSSLDTKVNPGGGITATKPVTSGEQKPIPALLSLTEES-MPWKSSLPQKTSLVQRGDDADQIKPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPKKDDVSPVMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW
>Homo_sapiens
MGSENSALKSYTLREPPFTLPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGHLTHNNVCLSSVFVSEDGHWKLGGMETVCKVSQATPEFLRSIQSIRDPASIPPEEMSPEFTTLPECHGHARDAFSFGTLVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPKCRPALCTLLSHDFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPYLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNTDLSLEDSPMCVVCSHHSQISPILENPFSSIFPKCFFSGSTPINSKKHIQRDYYNTLLQTGDPFSQPIKFPINGLSDVKNTSEDSENFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWPREPCDDVKSQCTTLDVEE-SSWDDCEPSSLDTKVNPGGGITATKPVTSGEQKPIPALLSLTEES-MPWKSSLPQKISLVQRGDDADQIEPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPKKDDVSPVMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW
>Pan_paniscus
MGSENSALKSYTLKEPPFTLPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEVDGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGHLTHNNVCLSSVFVSEDGHWKLGGMETVCKVSQATPEFLRSIQSIRDPASIPPEEMSPEFTTLPECHGHARDAFSFGTLVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPKCRPALCTLLSHDFF---------------------------FLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPYLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSPMRVVCSHHSQISPILENPFSSIFPKCFFSGSMPINSKKHIQRDYYNTLLQTGDPFSQPIKFPINGLSDVKNTSEDSENFPSSSKKS-EEWPDWSEPEEP-ENKTVNIQIWPRETCDDVKSQCTTLDVEE-SSWDDCEPSSLDTKVNPGGGITATKPVTSGEQKPIPALLSLTEES-MPWKSSLPQKTSLVQRGDDADQIKPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPKKDDVSPVMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW
>Nomascus_leucogenys
MGSENSALKSYTLKEPPFTLPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGHLTHNNVCLSSVFVSEDGHWKLGGMETVCKVSQATPEFLRSIQSIRDPASIPPEEMSPEFTTLPECHGHARDAFSFGTLVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPKCRPALCTLLSHDFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPHLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSPMRVVCSHHSQILPILENPFSSIFPKCFFSGSMPINSKKHIQQDYYNTLLQTGDPFSQPIKFPINGLSDVKNTLEDSENFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWPREPCDAVKSQCNTLDVEE-SSWDDCEPSSLDTKVNPGGGITATKPVTSGEQKPIPALLPLTEES-MPWKSSLPQKTSFVQRGNDADQIKPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPKKDDVSPVMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW
>Pongo_abelii
MGSENSALKSYTLKEPPFTLPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGHLTHNNVCLSSVFVSEDGHWKLGGMETVCKVSQATPEFLRSIQSIRDPASIPPEEMSPEFTTLPECHGHARDAFSFGTLVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPKCRPALCTLLSHDFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPHLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSPMRVVCSHHSQILPILENPFSSIFPKCFFSGSMPINSKKHIQQDYYNTLLQTGDPFSQPIKFPINGLSDVKNTSEDSENFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWPREPCDAVKSQCTTLDVEE-SSWDDCEPSSLDTKVNPGGGITATKPVTSGEQKPIPALLPLTEES-MPWKSSLPQKTSLVQRGDDPDQIKLPKVSSQGRPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPKKDDVSPVMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW

2. IQ-TREE実行コマンド

前述の入力データを利用した祖先配列推定を下記コマンドにより実行可能

iqtree --ancestral -s cds_align.fasta -te simiiformes.nwk -m LG+I+G4 -asr-min 0.95 --prefix iqtree_asr -T 4 --seed 0 --redo

利用コマンドオプション

オプション 内容
--ancestral 実行モード = 祖先配列推定
-s 入力アライメントファイル
-te 入力系統樹ファイル
-m 置換モデル (指定なしの場合、自動で最適な置換モデルを決定)
-asr-min 祖先配列推定時の確率閾値 (閾値を満たさない場合、gapで穴埋め)
--prefix 出力ファイルのPrefix
-T データ処理時のCPU利用数
--seed 乱数シード (再現性が必要な場合に指定)
--redo 前回結果が存在する場合でも上書き

※ コマンドオプション詳細はiqtree --helpで確認

3. 出力結果データ

IQ-TREEによる祖先配列推定結果が「iqtree_asr.state」ファイルに下記のように出力される。
各系統樹ノード(Node列)の各座位位置(Site列)に対応する推定祖先アミノ酸(State列)が一行毎に記載される。
-asr-minオプションに指定した閾値以上の事後確率をもつアミノ酸が推定祖先アミノ酸(State列)となっている。

iqtree_asr.state
Node    Site    State   p_A p_R p_N p_D p_C p_Q p_E p_G p_H p_I p_L p_K p_M p_F p_P p_S p_T p_W p_Y p_V
N015    1   M   0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 1.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
N015    2   G   0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 1.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
N015    3   S   0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 1.00000 0.00000 0.00000 0.00000 0.00000
〜〜〜(省略)〜〜〜
N011    750 N   0.00000 0.00000 1.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
N011    751 N   0.00000 0.00000 1.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
N011    752 W   0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 1.00000 0.00000 0.00000

上記より系統樹ノード(Node列)に紐づく推定祖先アミノ酸(State列)を抽出することで、
下記に示す各系統樹ノードにおける推定祖先配列を取得できる。

iqtree_asr.fasta
>N001
MGSENSALKSYTLKEPPFTLPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGHLTHNNVCLSSVFVSEDGHWKLGGMETVCKVSQATPEFLRSIQSVRDPASIPPEEMSPEFTTLPECHGHARDAFSFGILVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPKCRPALCTLLSHDFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPHLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSPVRVVCSQHSQISPILENPFSSIFPKCFFSGNMPINSKKHIQQDYYNTLLQTG-PFSQPIKFPINGLSDVKNSSEDSENFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWPREPCDAVKSQCTTLDVEE-SSWDDCEPGSLDTKVNPGGGITATKPVTSG-QKPIPALLPLTEES-TTWKSSLPQKTSLVQSGDDPDQIKPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPNKDDVSPVMQFSSKFAAAEI-EGEAEGWEEEGELNWE--DNNW
>N002
MGSENSALKSYTLKEPPFTLPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLAL-FLHDRGHLTHNNVCLSSVFVSEDGHWKLGGMETVCKVSQATPEFLRSIQSVRDPASIPPEEMSPEFTTLPECHGHARDAFSFGILVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPKCRPALCTLLSHDFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPHLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSPVRVVCSQHSQISPILENPFSSIFPKCFFSGNMPINSKKHIQQDYYNTLLQTG-PFSQPIKFPINGLSDVKNSSEDSENFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWPREPCDAVKSQCTTLDVEE-SSWDDCEPGSLDTKVNPGGGITATKPVTSG-QKPIPALLPLTEES-TTWKSSLPQKTSLVQSRDDPDQIKPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPNKDDVSPVMQFSSKFAAAEI-EGEAEGWEEEGELNWE--DNNW
>N003
MGSENSALKSYTLKEPPFTLPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLAL-FLHDRGHLTHNNVCLSSVFVSEDGHWKLGGMETVCKVSQATPEFLRSIQSVRDPASIPPEEMSPEFTTLPECHGHARDAFSFGILVESLLTILNKQVSADVLSSFQQTLHSTLLNPIPKCRPALCTLLSHDFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPHLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSPVRVVCSQHSQISPILENPFSSIFPKCFFSGNMPIN-KKHIQQDYYNTLLQTGDPFSQPIKFPINGLSDVKNSSEDSENFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWPREPCDAVKSQCTTLDVEE-SSWDDCEPGSLDTKVNPGGGITATKPVTSGEQKPIPALLPFTEES-TTWKSSLPQKTSLVQSRDDPDQIKPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPNKDDVSPVMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW
>N004
MGSENSALKSYTLKEPPFTLPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGHLTHNNVCLSSVFVSEDGHWKLGGMETVCKVSQATPEFLRSIQSIRDPASIPPEEMSPEFTTLPECHGHARDAFSFGTLVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPKCRPALCTLLSHDFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPHLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSPMRVVCS-HSQISPILENPFSSIFPKCFFSGSMPINSKKHIQQDYYNTLLQTGDPFSQPIKFPINGLSDVKNTSEDSENFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWPREPCDAVKSQCTTLDVEE-SSWDDCEPSSLDTKVNPGGGITATKPVTSGEQKPIPALLPLTEES-MPWKSSLPQKTSLVQSGDDPDQIKPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPKKDDVSPVMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW
>N005
MGSENSALKSYTLKEPPFTLPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGHLTHNNVCLSSVFVSEDGHWKLGGMETVCKVSQATPEFLRSIQSIRDPASIPPEEMSPEFTTLPECHGHARDAFSFGTLVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPKCRPALCTLLSHDFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPHLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSPMRVVCSHHSQI-PILENPFSSIFPKCFFSGSMPINSKKHIQQDYYNTLLQTGDPFSQPIKFPINGLSDVKNTSEDSENFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWPREPCDAVKSQCTTLDVEE-SSWDDCEPSSLDTKVNPGGGITATKPVTSGEQKPIPALLPLTEES-MPWKSSLPQKTSLVQRGDD-DQIKPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPKKDDVSPVMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW
>N006
MGSENSALKSYTLKEPPFTLPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGHLTHNNVCLSSVFVSEDGHWKLGGMETVCKVSQATPEFLRSIQSIRDPASIPPEEMSPEFTTLPECHGHARDAFSFGTLVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPKCRPALCTLLSHDFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPHLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSPMRVVCSHHSQI-PILENPFSSIFPKCFFSGSMPINSKKHIQQDYYNTLLQTGDPFSQPIKFPINGLSDVKNTSEDSENFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWPREPCDAVKSQCTTLDVEE-SSWDDCEPSSLDTKVNPGGGITATKPVTSGEQKPIPALLPLTEES-MPWKSSLPQKTSLVQRGDD-DQIKPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPKKDDVSPVMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW
>N007
MGSENSALKSYTLKEPPFTLPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGHLTHNNVCLSSVFVSEDGHWKLGGMETVCKVSQATPEFLRSIQSIRDPASIPPEEMSPEFTTLPECHGHARDAFSFGTLVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPKCRPALCTLLSHDFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPYLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSPMRVVCSHHSQISPILENPFSSIFPKCFFSGSMPINSKKHIQRDYYNTLLQTGDPFSQPIKFPINGLSDVKNTSEDSENFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWPREPCDDVKSQCTTLDVEE-SSWDDCEPSSLDTKVNPGGGITATKPVTSGEQKPIPALLSLTEES-MPWKSSLPQKTSLVQRGDDADQIKPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPKKDDVSPVMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW
>N008
MGSENSALKSYTLKEPPFTLPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGHLTHNNVCLSSVFVSEDGHWKLGGMETVCKVSQATPEFLRSIQSIRDPASIPPEEMSPEFTTLPECHGHARDAFSFGTLVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPKCRPALCTLLSHDFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPYLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSPMRVVCSHHSQISPILENPFSSIFPKCFFSGSMPINSKKHIQRDYYNTLLQTGDPFSQPIKFPINGLSDVKNTSEDSENFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWPREPCDDVKSQCTTLDVEE-SSWDDCEPSSLDTKVNPGGGITATKPVTSGEQKPIPALLSLTEES-MPWKSSLPQKTSLVQRGDDADQIKPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPKKDDVSPVMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW
>N009
MGSENSALKSYTLKEPPFTLPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEVDGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGHLTHNNVCLSSVFVSEDGHWKLGGMETVCKVSQATPEFLRSIQSIRDPASIPPEEMSPEFTTLPECHGHARDAFSFGTLVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPKCRPALCTLLSHDFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPYLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSPMRVVCSHHSQISPILENPFSSIFPKCFFSGSMPINSKKHIQRDYYNTLLQTGDPFSQPIKFPINGLSDVKNTSEDSENFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWPRETCDDVKSQCTTLDVEE-SSWDDCEPSSLDTKVNPGGGITATKPVTSGEQKPIPALLSLTEES-MPWKSSLPQKTSLVQRGDDADQIKPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPKKDDVSPVMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW
>N010
MGSENSALKSYTLKEPPFILPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGQLTHNNVCLSSVFVSEDGHWKLGAMETVCKISQATPEFLRSIQSIRDPASIPPEEMSPEFTTLPECHGHARDAFSFGTLVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPNCRPALCTLLSHEFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPHLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSSMRVVCSKHSQISPVLENPFSSIFPKCFFSGSMPINSKKHIQRDYYNTLLQTGDPFSQPIKFPINGLSDVKNTLEDS-NFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWPREPCDAVKSQCTTLDMEE-SSWDDCEPSNLDTKVNPGGGITATKSVTSGEQKPIPALLPLTEES-MPWKSSLPQKTSLVQSGDDPDQIKPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPKKDEVSPVMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW
>N011
MGSENSALKSYTLKEPPFILPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGQLTHNNVCLSSVFVSEDGHWKLGAMETVCKISQATPEFLKSIQSIRDPASIPPEEMSPEFTTLPECHGHARDAFSFGTLVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPNCRPALCTLLSHEFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPHLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSSMRVVCSKHSQISPVLENPFSSIFPKCFFSGSMPINSKKHIQRDYYYTLLQTGDPFSQSIKFPINGLSDIKNTLEDSENFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWPREPCDAVKSQCTTLDMEE-SSWDDCEPSNLATNVNPGDGITATKSVTSGEQKPIPALLPLTEES-MPWKSSLPQKTSLVQSGDDPDQITPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLVLPELRTE--MVPKKDEVSSVMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW
>N012
MGSENSALKSYTLKEPPFILPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGQLTHNNVCLSSVFVSEDGHWKLGAMETVCKISQATPEFLRSIQSIRDPASIPPEEMSPEFTTLPECHGHARDAFSFGTLVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPNCRPALCTLLSHEFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPHLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSSMRVVCSKHSQISPVLENPFSSIFPKCFFSGSMPINSKKHIQRDYYNTLLQTGDPFSQPIKFPINGLSDVKNTLEDSKNFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWPREPCDAVKSQCTTLDMEE-SSWDDCEPSNLDTKVNPGGGITATKSVTSGEQKPIPALLPLTEES-MPWKSSLPQKTSLVQSGDDPDQIKPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPKKDEVSPVMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW
>N013
MGSENSALKSYTLKEPPFILPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGQLTHNNVCLSSVFVSEDGHWKLGAMETVCKISQATPEFLRSIQSIRDPASIPPEEMSPEFTTLPECHGHARDAFSFGTLVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPNCRPALCTLLSHEFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPHLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSSMRVVCSKHSQISPVLENPFSSIFPKCFFSGSMPINSKKHIQRDYYNTLLQTGDPFSQPIKFPINGLSDVKNTLEDSKNFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWPREPCDAVKSQCTTLDMEE-SSWDDCEPSNLDTKVNPGGGITATKSVTSGEQKPIPALLPLTEES-MPWKSSLPQKTSLVQSGDDPDQIKPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPKKDEVSPVMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW
>N014
MGSENSALKSYTLKEPPFILPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGQLTHNNVCLSSVFVSEDGHWKLGAMETVCKISQATPEFLRSIQSIRDPASIPPEEMSPEFTTLPECHGHARDAFSFGTLVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPNCRPALCTLLSHEFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPHLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSSMRVVCSKHSQISPVLENPFSSIFPKCFFSGSMPINSKKHIQRDYYNTLLQTGDPFSQPIKFPINGLSDVKNTLEDSKNFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWHREPCDAVKSQCTTLDMEE-SSWDDCEPSNLDTKVNPGGGITATKSVTSGEQKPIPALLPLTEES-MPWKSSLPQKTSLVQNGDDPDQIKPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPKKDEICPMMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW
>N015
MGSENSALKSYTLKEPPFILPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGQLTHNNVCLSSVFVSEDGHWKLGAMETVCKISQATPEFLRSIQSIRDPASIPPEEMSPEFTTLPECHGHARDAFSFGTLVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPNCRPALCTLLSHEFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPHLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSSMRVVCSKHSQISPVLENPFSSIFPKCFFSGSMPINSKKHIQRDYYNTLLQTGDPFSQPIKFPINGLSDVKNTLEDSKNFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWHREPCDAVKSQCTTLDMEE-SSWDDCEPSNLDTKVNPGGGITATKSVTSGEQKPIPALLPLTEES-MPWKSSLPQKTSLVQNGDDPDQIKPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPKKDEICPMMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW
>N016
MGSENSALKSYTLKEPPFILPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGQLTHNNVCLSSVFVSEDGHWKLGAMETVCKISQATPEFLRSIQSIRDPASIPPEEMSPEFTTLPECHGHARDAFSFGTLVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPNCRPALCTLLSHEFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPHLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSSMRVVCSKHSQISPVLENPFSSIFPKCFFSGSMPINSKKHIQRDYYNTLLQTGDPFSQPIKFPINGLSDVKNTLEDSKNFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWPREPCDAVKSQCTTLDMEE-SSWDDCEPSNLDTKVNPGGGITATKSVTSGEQKPIPALLPLTEES-MPWKSSLPQKTSLVQSGDDPDQIKPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPKKDEVSPVMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW
>N017
MGSENSALKSYTLKEPPFILPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGQLTHNNVCLSSVFVSEDGHWKLGAMETVCKISQATPEFLRSIQSIRDPASIPPEEMSPEFTTLPECHGHARDAFSFGTLVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPNCRPALCTLLSHEFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPHLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSSMRVVCSKHSQISPVLENPFSSIFPKCFFSGSMPINSKKHIQRDYYNTLLQTGDPFSQPIKFPINGLSDVKNTLEDSKNFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWPREPCDAVKSQCTTLDMEE-SSWDDCEPSNLDTKVNPGGGITATKSVTSGEQKPIPALLPLTEES-MPWKSSLPQKTSLVQSGDDPDQIKPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPKKDEVSPVMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW

祖先配列推定自動化スクリプト

一連の祖先配列推定を自動で実行可能なPythonスクリプトを作成してみた。
Biopythonを利用しているので、事前にインストールが必要 (pip install biopython)
下記コマンドにより、① 系統樹内部ノードへの識別ID付与、② IQ-TREEによる祖先配列推定、③ 推定結果からの祖先配列情報の抽出、まで実行される。

「auto_asr_script.py」実行コマンド
# 引数1:Newick形式の系統樹、引数2:アライメント済CDS配列FASTAファイル、引数3:出力ディレクトリ
python auto_asr_script.py simiiformes.nwk cds_align.fasta output
auto_asr_script.py
import csv
import subprocess as sp
import sys
from collections import defaultdict
from pathlib import Path

from Bio import Phylo


def main(nwk_tree_file: Path, fasta_aln_file: Path, outdir: Path):
    """Ancestral sequence reconstruction automated script

    Args:
        nwk_tree_file (Path): Newick format tree file path
        fasta_aln_file (Path): Fasta format alignment file path
        outdir (Path): Output directory
    """
    outdir.mkdir(exist_ok=True)

    # Add node id to tree
    nwk_nodeid_tree_file = outdir / "user_tree.nwk"
    add_serial_node_id(nwk_tree_file, nwk_nodeid_tree_file)

    # Ancestral sequence reconstruction using iqtree
    prefix = outdir / "iqtree_asr"
    iqtree_asr_cmd = f"iqtree --ancestral -s {fasta_aln_file} -te {nwk_nodeid_tree_file} -m LG+I+G4 -asr-min 0.95 --prefix {prefix} -T 4 --seed 0 --redo"
    sp.run(iqtree_asr_cmd, shell=True)

    # Write asr fasta file from iqtree state file
    asr_state_file = str(prefix) + ".state"
    asr_fasta_file = str(prefix) + ".fasta"
    write_asr_fasta_file(asr_state_file, asr_fasta_file)


def add_serial_node_id(nwk_tree_infile: str, nwk_tree_outfile: str) -> None:
    """Add serial node id(N001, N002...N00X) in each internal tree node

    Args:
        nwk_tree_infile (str): Input newick file path
        nwk_tree_outfile (str): Output newick file path
    """
    tree = Phylo.read(nwk_tree_infile, "newick")
    for cnt, node in enumerate(tree.get_nonterminals()):
        node.name = f"N{cnt:03d}"
    Phylo.write(tree, nwk_tree_outfile, "newick", plain=True)


def write_asr_fasta_file(asr_state_infile: str, asr_fasta_outfile: str) -> None:
    """Write asr fasta file from iqtree asr state file

    Args:
        asr_state_infile (str): asr state file path
        asr_fasta_outfile (str): asr fasta file path
    """
    node_name2asr_seq = defaultdict(str)
    with open(asr_state_infile) as f:
        reader = csv.reader(f, delimiter="\t")
        for row in reader:
            # Skip header row
            if row[0].startswith("#") or row[0].startswith("Node"):
                continue
            # Record node_name & asr_seq
            node_name, aa_pos, asr_aa, prob_list = row[0], row[1], row[2], row[3:]
            node_name2asr_seq[node_name] += asr_aa

    with open(asr_fasta_outfile, "w") as f:
        for node_name, asr_seq in sorted(node_name2asr_seq.items()):
            f.write(f">{node_name}\n{asr_seq}\n")


if __name__ == "__main__":
    args = sys.argv
    nwk_tree_file, fasta_aln_file, outdir = Path(args[1]), Path(args[2]), Path(args[3])

    main(nwk_tree_file, fasta_aln_file, outdir)

参考

IQ-TREE Command Reference
Ancestral Sequence Reconstruction - Wikipedia

その他の祖先配列推定CLIツール(他にも色々あるっぽい)
FastML Source Code
raxml-ng - GitHub
MEGA - Molecular Evolutionary Genetics Analysis

2
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
2
1