概要
祖先配列推定(ASR:Ancestral Sequence Reconstruction)は、共通祖先に由来する各生物のホモログ遺伝子により祖先遺伝子配列を推定する分子進化学における手法を示す。祖先配列推定により、遺伝子(タンパク質)の進化のメカニズムやダイナミクスが解明されてきた。
本稿では、系等解析ツールの一つである「IQ-TREE」を利用した祖先配列推定の実行手順及び自動化Pythonスクリプトを記載する。
※ IQ-TREEのv2.x系にて動作を確認したものであり、v1.x系では動作しません
実行手順・結果
IQ-TREEを利用した祖先配列推定における、「入力データ」・「実行コマンド」・「出力結果データ」について記載する。
※ IQ-TREE(v2.x系)がインストール済であること (IQ-TREE公式サイト)
1. 入力データ
-
系統樹ファイル
真猿型下目(Simiiformes)の種の系統樹(newick形式)をサンプルとして利用
内部ノード識別子を与えておくことで、後のデータ解釈・処理が楽になるため、
図に示すように内部ノードへ'N000,N001...N0XX'の連番IDを付与
((((Saimiri_boliviensis,Cebus_capucinus)N003,Callithrix_jacchus)N002,Aotus_nancymaae)N001,((((((Pan_paniscus,Pan_troglodytes)N009,Homo_sapiens)N008,Gorilla_gorilla)N007,Pongo_abelii)N006,Nomascus_leucogenys)N005,((Rhinopithecus_roxellana,Rhinopithecus_bieti)N011,(Chlorocebus_sabaeus,(((Macaca_mulatta,Macaca_fascicularis)N015,Macaca_nemestrina)N014,(Papio_anubis,(Cercocebus_atys,Mandrillus_leucophaeus)N017)N016)N013)N012)N010)N004)N000;
-
アライメント済のCDS遺伝子配列ファイル
系統樹に対応する祖先配列推定対象の各生物CDS遺伝子配列における
アライメント済FASTAファイルをサンプルとして利用
「系統樹内の各生物名」と「配列アノテーション」の総数及び名称は必ず一致している必要がある。
>Rhinopithecus_bieti
MGSENSALKSYTLKEPPFILPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDVLLALIFLHDRGQLTHNNVCLSSVFVSEDGHWKLGAMETVCKISQATPEFLKSIQSIRDPASIPPEEMSPEFTTLPECHGHARDAFSFGTLVESLLTILNEQVSADVLSSFQQTLHSTLLNPTPNCRPALCTLLSHEFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPHLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSSMRVVCSKHSQIS-----PFSSIFPKCFFSGSMPINSKKHIQRDYYYTLLQTGDPFSQSIKFPINGLSDIKNTLEDSENFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWPREPCDAVKSQCTTLDMEE-SSWDDCEPSNLATNVNPGDGITATKSVTSGEQKPIPALLPLTEES-MPWKSSLPQKTSLVQSGDDPDQITPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLVLPELRTE--MVPKKDEVSSVMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW
>Aotus_nancymaae
MGSENSALKSYTLKEPAFTLPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGHLTHNNVCLSSVFVSEDGHWKLGGMETVCKVPQATPEFLRSIQSVRDPASIPPEEMSPEFTTLPECHGHARDAFSFGILVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPKCRPALCTLLSHDFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPHLLGPKK-GHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSPVRVVCSQHSQISPILENPFSSIFPKCFFSGNMPIN-KKHIHQDYYNTLLQTGNPFSQPIKFPINGLSDVKNSSEDSENFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQVWPREPCDAVKSQCTTLDVGE-SSWDDCEPGSLDTKVNPEGGITATKPVTSGKQKPIPALLPLTEES-TTWKSSLPQKTSLLQSGDDPDQIKPPKVSSQEKPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPNKDDVSPVMQFSSKFAAAEVAEGEAEGW-EEGELNWE--DNNW
>Callithrix_jacchus
MGSENSALKSYTLKEPPFTLPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALDTLSSAEVCAGIYDILLALTFLHDRGHLTHNNVCLSSVFVSEDGHWKLGGMETVCKVSQATPEFLRSIQSVRDPASIPPEEMSPEFTTLPECDGHARDAFSFGILVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPKCRPALCTLLSHDFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPHLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSPVRVVCSQHSQISPILENPFSSIFPKCFFSGNMPIN-KKHIQQDYYNTLLQTGNPFSQPIKFPINGLSDVKNSSEDSENFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWPREPCDAVKSQCTTLDMEE-LSWDDCEPGSLDTKVNLGAGITATKPVTSGKQKPIPALLPLTEES-TTWQSRLSQKTSLVQSRDDPDQIKPPKVSSQERPLKISSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVSNKDDVSPVMQFSSKFAAAEIAEGEAEGW-EEGELNWE--DNNW
>Macaca_mulatta
MGSENSALKSYTLKEPPFILPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGQLTHNNVCLSSVFVSEDGHWKLGAMETVCKISQATPEFLRSIQSIRDPASIPPEEMSPEFTTLPECHGHARDAFSFGTLVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPNCRPALCTLLSHEFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPHLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSSMRVVCSKHSQISPVLENPFSSIFPKCFFSGSMPINSKKHIQRDYYNTLLQTGDPFSQPIKFPINGLLDVKNTLEDSKNFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWHREPCDAVKSQCTTLDMEE-SSWDDCEPSNLDTKVNPGGGITATKSVTSGEQKPIPALLPLTEES-MPWKSSLP------QNGDDPDQIKPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPKKDEICPMMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW
>Cebus_capucinus
MGSENSALKSYTLKEPPFTLPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALTFLHDRGHLTHNNVCLSSVFVSEDGHWKLGGMETVCKVSQATPEFLRSIQSVRDPASIPPEEMSPEFTALPECHGHARDAFSFGILVESLLTILNKQVSADVLSSFQQTLHSTLLNPIPKCRPALCTLLSHDFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPHLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSPVRVVCSQHSQISPILENPFSSIFPKCFFSGNMPIN-KKHIQQDYYNTLLQTGDPFSQPIKFPISGLPDVKNSSEDSENFPSSSKKS-EDWPDWSEPEEP-ENQTVNIQIWPREPCDAVKSQCTTLDVEE-SSWDDCEPGSLDTKVNPGGGITATKPVTSGEQKPIPALLPFTEES-TTWKSSVPGKTGLVQSRDDPDQIKSPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPNKDDVSPVMQFSSKFAAAEITEGEAEGW-EEGELNWE--DNNW
>Saimiri_boliviensis
MGSENSALKSYTLKEPPFTLPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGHLTHNNVCLSSVFVSEDGHWKLGGMETVCKVAQATPEFLRSIQSVRDPASIPPEEMSPEFTTLPECHGHARDAFSFGILVESLLTILNKQVSADVLSSFQQTLHSTLLNPIPKCRPALCTLLSHDFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPHLLGPKK-DHV-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSPVRVVCSQRSQISPILENPFSSIFPKCFFSGNMPIN-KKHIQQDYYNTLLQTGDPFSQPIKFPINGLSDIKNSSEDSENFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWPREPCDAVKSQCTTLDVEE-SSWDDCEPGSLNTKVNPGGGITATKPVTSGEQKLIPALLPFTEES-TTWKSSLPQKTSLVQSRDDPDHIKPPKVSSQERPLKVPSELGLGEEFTIQVRKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPNKDDVSPVMQFSSKFAAAAITEGEAEGW-EEGELNWE--DSNW
>Cercocebus_atys
MGSENSALKSYTLKEPPFILPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGQLTHNNVCLSSVFVSEDGHWKLGAMETVCKISQATPEFLRSIQSIRDPASIPPEEMSPEFTTLPECHGHARDAFSFGTLVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPNCRPALCTLLSHEFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPHLLGPKK-DHA-QGETPCLLSPGLFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSSMRVVCSKHSQISPVLENPFSSIFPKCFFSGSMPINSKKHIQRDYYNTLLQTGDPFSQPIKFPINGLSDVKNTLEDSKNFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWPREPCDAVKSQCTTLDMEE-SSWDDCEPSNLDTKVNPGGGITATKSVTSGEQKPIPALLPLTEES-MPWKSSLPQKTSLVQSGDDPDQIKPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPKKDEVSPVMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW
>Mandrillus_leucophaeus
MGSENSALKSYTLKEPPFILPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGQLTHNNVCLSSVFVSEDGHWKLGAMETVCKISQATPEFLRSIQSIRDPASIPPEEMSPEFTTLPECHGHARDAFSFGTLVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPNCRPALCTLLSHEFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPHLLGPKK----------------------------------------------------------------VLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSSMRVVCSKHSQISPVLENPFSSIFPKCFFSGSMPINSKKHIQRDYYNTLLQTGDPFSQPIKFPINGLSDVKNSLEDSKNFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWPREPCDAVKSQCTTLDMEE-SSWDDCEPSNLDTKVNPGGGITATKSVTLGEQKPIPALLPLTEES-MPWKSSLPQKTSLVQSGDDPDQIKPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPKKDEVSPVMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW
>Chlorocebus_sabaeus
MGSENSALKSYTLKEPPFILPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGQLTHNNVCLSSVFVSEDGHWKLGAMETVCKISQATPEFLRSIQSIRDPASIPPEEMSPEFTTLPECHGHARDAFSFGTLVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPNCRPALCTLLSHDFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPHLLGPKK-DHA-QGETPCLLSPALFRSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSSMRVVCSKHSQISPVLENPFSSIFPKCFFSGSMPINSKKHIQRDYYNTLLQTGDPFSQPIKFPINGLSDVKNTLEDSKNFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWPREPCDAVKSQCTTLDMEE-SSWDDCEPSNLDTKVNPGGGITATKSVTSGEQKPIPALLPLTEES-MPWKSSLPQKTSLVQSGDDPDQIKPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPKKDEVSPVMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW
>Papio_anubis
MGSENSALKSYTLKEPPFILPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGQLTHNNVCLSSVFVSEDGHWKLGAMETVCKISQATPEFLRSIQSIRDPASIPPEEMSPEFTTLPECHGHARDAFSFGTLVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPNCRPALCTLLSHEFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPHLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLS-------------------------------------------------------------QPIKFPINGLSDVKNTLEDSKNFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWPREPCDAVKSQCTTLDMEE-SSWDDCEPSNLDTKVNPGGGITATKSVTSGEQKPIPALLPLTEES-MPWKSSLPQKTSLVQSGDDPDQIKPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPKKDEVSPVMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW
>Macaca_fascicularis
MGSENSALKSYTLKEPPFILPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGQLTHNNVCLSSVFVSEDGHWKLGAMETVCKISQATPEFLRSIQSIRDPASIPPEEMSPEFTTLPECHGHARDAFSFGTLVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPNCRPALCTLLSHEFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPHLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSSMRVVCSKHSQISPVLENPFSSIFPKCFFSGSMPINSKKHIQRDYYNTLLQTGDPFSQPIKFPINGLSDVKNTLEDSKNFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWHREPCDAVKSQCTTLDMEE-SSWDDCEPSNLDTKVNPGGGITATKSVTSMEQKPIPALLPLTEES-MPWKSSLPQKTSLVQNGDDPDQIKPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPKKDEICPMMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW
>Macaca_nemestrina
MGSENSALKSYTLKEPPFILPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGQLTHNNVCLSSVFVSEDGHWKLGAMETVCKISQATPEFLRSIQSIRDPASIPPEEMSPEFTTLPECHGHARDAFSFGTLVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPNCRPALCTLLSHEFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPHLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSSMRVVCSKHSQISPVLENPFSSIFPKCFFSGSMPINSKKHIQRDYYNTLLQTGDPFSQPIKFPINGLSDVKNTLEDSKNFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWHREPCDAVKSQCTTLDMEE-SSWDDCEPSNLATKVNPGGGITATKSVTSGEQKPIPALLPLTEES-MPWKSSLPQKTSLVQNGDDPDQIKPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPKKDEICPMMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW
>Rhinopithecus_roxellana
MGSENSALKSYTLKEPPFILPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGQLTHNNVCLSSVFVSEDGHWKLGAMETVCKISQATPEFLKSIQSIRDPASIPPEEMSPEFTTLPECHGHARDAFSFGTLVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPNCRPALCTLLSHEFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPHLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSSMRVVCSKHSQISPVLENPFSSIFPKCFFSGSMPINSKKHIQRDYYYTLLQTGDPFSQSIKFPINGLSDIKNTLEDSENFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWPREPCDAVKSQCTTLDMEE-SSWDDCEPSNLATNVNPGDGITATKSVTSGEQKPIPALLPLTEES-MPWKSSLPQKTSLVQSGDDPDQITPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLVLPELRTE--MVPKKDEVSSVMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW
>Gorilla_gorilla
MGSENSALKSYTLKEPPFTLPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGHLTHNNVCLSSVFVSEDGHWKLGGMETVCKVSQATPEFLRSIQSIRDPASIPPEEMSPEFTTLPECHGHARDAFSFGTLVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPKCRPALCTLLSHDFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPYLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQDQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPELVVGGERTKIFKRTAPSFTKNIDLSLEDSPMRVVCSHHSQISPILENPFSSIFPKCFFSGSMPINSKKHIQRDYYNTLLQTGDPFSQPIKFPINGLSDVKNTSEDSENFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWPREPCDDVKSQCTTLDVEE-SSWDDCEPSSLDTKVNPGGGITATKPVTSGEQKPIPALLSLTEES-MPWKSSLPQKTSLVQRGDDADQIKPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPKKDDVSPVMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW
>Pan_troglodytes
MGSENSALKSYTLKEPPFTLPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEVDGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGHLTHNNVCLSSVFVSEDGHWKLGGMETVCKVSQATPEFLRSIQSIRDPASIPPEEMSPEFTTLPECHGHARDAFSFGTLVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPKCRPALCTLLSHDFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPYLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSPMRVVCSHHSQISPILENPFSSIFPKCFFSGSMPINSKKHIQRDYYNTLLQTGDPFSQPIKFPINGLSDVKNTSEDSENFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWPRETCDDVKSQCTTLDVEE-SSWDDCEPSSLDTKVNPGGGITATKPVTSGEQKPIPALLSLTEES-MPWKSSLPQKTSLVQRGDDADQIKPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPKKDDVSPVMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW
>Homo_sapiens
MGSENSALKSYTLREPPFTLPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGHLTHNNVCLSSVFVSEDGHWKLGGMETVCKVSQATPEFLRSIQSIRDPASIPPEEMSPEFTTLPECHGHARDAFSFGTLVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPKCRPALCTLLSHDFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPYLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNTDLSLEDSPMCVVCSHHSQISPILENPFSSIFPKCFFSGSTPINSKKHIQRDYYNTLLQTGDPFSQPIKFPINGLSDVKNTSEDSENFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWPREPCDDVKSQCTTLDVEE-SSWDDCEPSSLDTKVNPGGGITATKPVTSGEQKPIPALLSLTEES-MPWKSSLPQKISLVQRGDDADQIEPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPKKDDVSPVMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW
>Pan_paniscus
MGSENSALKSYTLKEPPFTLPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEVDGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGHLTHNNVCLSSVFVSEDGHWKLGGMETVCKVSQATPEFLRSIQSIRDPASIPPEEMSPEFTTLPECHGHARDAFSFGTLVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPKCRPALCTLLSHDFF---------------------------FLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPYLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSPMRVVCSHHSQISPILENPFSSIFPKCFFSGSMPINSKKHIQRDYYNTLLQTGDPFSQPIKFPINGLSDVKNTSEDSENFPSSSKKS-EEWPDWSEPEEP-ENKTVNIQIWPRETCDDVKSQCTTLDVEE-SSWDDCEPSSLDTKVNPGGGITATKPVTSGEQKPIPALLSLTEES-MPWKSSLPQKTSLVQRGDDADQIKPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPKKDDVSPVMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW
>Nomascus_leucogenys
MGSENSALKSYTLKEPPFTLPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGHLTHNNVCLSSVFVSEDGHWKLGGMETVCKVSQATPEFLRSIQSIRDPASIPPEEMSPEFTTLPECHGHARDAFSFGTLVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPKCRPALCTLLSHDFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPHLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSPMRVVCSHHSQILPILENPFSSIFPKCFFSGSMPINSKKHIQQDYYNTLLQTGDPFSQPIKFPINGLSDVKNTLEDSENFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWPREPCDAVKSQCNTLDVEE-SSWDDCEPSSLDTKVNPGGGITATKPVTSGEQKPIPALLPLTEES-MPWKSSLPQKTSFVQRGNDADQIKPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPKKDDVSPVMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW
>Pongo_abelii
MGSENSALKSYTLKEPPFTLPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGHLTHNNVCLSSVFVSEDGHWKLGGMETVCKVSQATPEFLRSIQSIRDPASIPPEEMSPEFTTLPECHGHARDAFSFGTLVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPKCRPALCTLLSHDFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPHLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSPMRVVCSHHSQILPILENPFSSIFPKCFFSGSMPINSKKHIQQDYYNTLLQTGDPFSQPIKFPINGLSDVKNTSEDSENFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWPREPCDAVKSQCTTLDVEE-SSWDDCEPSSLDTKVNPGGGITATKPVTSGEQKPIPALLPLTEES-MPWKSSLPQKTSLVQRGDDPDQIKLPKVSSQGRPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPKKDDVSPVMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW
2. IQ-TREE実行コマンド
前述の入力データを利用した祖先配列推定を下記コマンドにより実行可能
iqtree --ancestral -s cds_align.fasta -te simiiformes.nwk -m LG+I+G4 -asr-min 0.95 --prefix iqtree_asr -T 4 --seed 0 --redo
利用コマンドオプション
オプション | 内容 |
---|---|
--ancestral | 実行モード = 祖先配列推定 |
-s | 入力アライメントファイル |
-te | 入力系統樹ファイル |
-m | 置換モデル (指定なしの場合、自動で最適な置換モデルを決定) |
-asr-min | 祖先配列推定時の確率閾値 (閾値を満たさない場合、gapで穴埋め) |
--prefix | 出力ファイルのPrefix |
-T | データ処理時のCPU利用数 |
--seed | 乱数シード (再現性が必要な場合に指定) |
--redo | 前回結果が存在する場合でも上書き |
※ コマンドオプション詳細はiqtree --help
で確認
3. 出力結果データ
IQ-TREEによる祖先配列推定結果が「iqtree_asr.state」ファイルに下記のように出力される。
各系統樹ノード(Node列)の各座位位置(Site列)に対応する推定祖先アミノ酸(State列)が一行毎に記載される。
-asr-min
オプションに指定した閾値以上の事後確率をもつアミノ酸が推定祖先アミノ酸(State列)となっている。
Node Site State p_A p_R p_N p_D p_C p_Q p_E p_G p_H p_I p_L p_K p_M p_F p_P p_S p_T p_W p_Y p_V
N015 1 M 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 1.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
N015 2 G 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 1.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
N015 3 S 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 1.00000 0.00000 0.00000 0.00000 0.00000
〜〜〜(省略)〜〜〜
N011 750 N 0.00000 0.00000 1.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
N011 751 N 0.00000 0.00000 1.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
N011 752 W 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 1.00000 0.00000 0.00000
上記より**系統樹ノード(Node列)に紐づく推定祖先アミノ酸(State列)**を抽出することで、
下記に示す各系統樹ノードにおける推定祖先配列を取得できる。
>N001
MGSENSALKSYTLKEPPFTLPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGHLTHNNVCLSSVFVSEDGHWKLGGMETVCKVSQATPEFLRSIQSVRDPASIPPEEMSPEFTTLPECHGHARDAFSFGILVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPKCRPALCTLLSHDFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPHLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSPVRVVCSQHSQISPILENPFSSIFPKCFFSGNMPINSKKHIQQDYYNTLLQTG-PFSQPIKFPINGLSDVKNSSEDSENFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWPREPCDAVKSQCTTLDVEE-SSWDDCEPGSLDTKVNPGGGITATKPVTSG-QKPIPALLPLTEES-TTWKSSLPQKTSLVQSGDDPDQIKPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPNKDDVSPVMQFSSKFAAAEI-EGEAEGWEEEGELNWE--DNNW
>N002
MGSENSALKSYTLKEPPFTLPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLAL-FLHDRGHLTHNNVCLSSVFVSEDGHWKLGGMETVCKVSQATPEFLRSIQSVRDPASIPPEEMSPEFTTLPECHGHARDAFSFGILVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPKCRPALCTLLSHDFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPHLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSPVRVVCSQHSQISPILENPFSSIFPKCFFSGNMPINSKKHIQQDYYNTLLQTG-PFSQPIKFPINGLSDVKNSSEDSENFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWPREPCDAVKSQCTTLDVEE-SSWDDCEPGSLDTKVNPGGGITATKPVTSG-QKPIPALLPLTEES-TTWKSSLPQKTSLVQSRDDPDQIKPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPNKDDVSPVMQFSSKFAAAEI-EGEAEGWEEEGELNWE--DNNW
>N003
MGSENSALKSYTLKEPPFTLPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLAL-FLHDRGHLTHNNVCLSSVFVSEDGHWKLGGMETVCKVSQATPEFLRSIQSVRDPASIPPEEMSPEFTTLPECHGHARDAFSFGILVESLLTILNKQVSADVLSSFQQTLHSTLLNPIPKCRPALCTLLSHDFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPHLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSPVRVVCSQHSQISPILENPFSSIFPKCFFSGNMPIN-KKHIQQDYYNTLLQTGDPFSQPIKFPINGLSDVKNSSEDSENFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWPREPCDAVKSQCTTLDVEE-SSWDDCEPGSLDTKVNPGGGITATKPVTSGEQKPIPALLPFTEES-TTWKSSLPQKTSLVQSRDDPDQIKPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPNKDDVSPVMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW
>N004
MGSENSALKSYTLKEPPFTLPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGHLTHNNVCLSSVFVSEDGHWKLGGMETVCKVSQATPEFLRSIQSIRDPASIPPEEMSPEFTTLPECHGHARDAFSFGTLVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPKCRPALCTLLSHDFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPHLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSPMRVVCS-HSQISPILENPFSSIFPKCFFSGSMPINSKKHIQQDYYNTLLQTGDPFSQPIKFPINGLSDVKNTSEDSENFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWPREPCDAVKSQCTTLDVEE-SSWDDCEPSSLDTKVNPGGGITATKPVTSGEQKPIPALLPLTEES-MPWKSSLPQKTSLVQSGDDPDQIKPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPKKDDVSPVMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW
>N005
MGSENSALKSYTLKEPPFTLPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGHLTHNNVCLSSVFVSEDGHWKLGGMETVCKVSQATPEFLRSIQSIRDPASIPPEEMSPEFTTLPECHGHARDAFSFGTLVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPKCRPALCTLLSHDFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPHLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSPMRVVCSHHSQI-PILENPFSSIFPKCFFSGSMPINSKKHIQQDYYNTLLQTGDPFSQPIKFPINGLSDVKNTSEDSENFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWPREPCDAVKSQCTTLDVEE-SSWDDCEPSSLDTKVNPGGGITATKPVTSGEQKPIPALLPLTEES-MPWKSSLPQKTSLVQRGDD-DQIKPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPKKDDVSPVMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW
>N006
MGSENSALKSYTLKEPPFTLPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGHLTHNNVCLSSVFVSEDGHWKLGGMETVCKVSQATPEFLRSIQSIRDPASIPPEEMSPEFTTLPECHGHARDAFSFGTLVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPKCRPALCTLLSHDFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPHLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSPMRVVCSHHSQI-PILENPFSSIFPKCFFSGSMPINSKKHIQQDYYNTLLQTGDPFSQPIKFPINGLSDVKNTSEDSENFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWPREPCDAVKSQCTTLDVEE-SSWDDCEPSSLDTKVNPGGGITATKPVTSGEQKPIPALLPLTEES-MPWKSSLPQKTSLVQRGDD-DQIKPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPKKDDVSPVMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW
>N007
MGSENSALKSYTLKEPPFTLPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGHLTHNNVCLSSVFVSEDGHWKLGGMETVCKVSQATPEFLRSIQSIRDPASIPPEEMSPEFTTLPECHGHARDAFSFGTLVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPKCRPALCTLLSHDFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPYLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSPMRVVCSHHSQISPILENPFSSIFPKCFFSGSMPINSKKHIQRDYYNTLLQTGDPFSQPIKFPINGLSDVKNTSEDSENFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWPREPCDDVKSQCTTLDVEE-SSWDDCEPSSLDTKVNPGGGITATKPVTSGEQKPIPALLSLTEES-MPWKSSLPQKTSLVQRGDDADQIKPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPKKDDVSPVMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW
>N008
MGSENSALKSYTLKEPPFTLPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGHLTHNNVCLSSVFVSEDGHWKLGGMETVCKVSQATPEFLRSIQSIRDPASIPPEEMSPEFTTLPECHGHARDAFSFGTLVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPKCRPALCTLLSHDFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPYLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSPMRVVCSHHSQISPILENPFSSIFPKCFFSGSMPINSKKHIQRDYYNTLLQTGDPFSQPIKFPINGLSDVKNTSEDSENFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWPREPCDDVKSQCTTLDVEE-SSWDDCEPSSLDTKVNPGGGITATKPVTSGEQKPIPALLSLTEES-MPWKSSLPQKTSLVQRGDDADQIKPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPKKDDVSPVMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW
>N009
MGSENSALKSYTLKEPPFTLPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEVDGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGHLTHNNVCLSSVFVSEDGHWKLGGMETVCKVSQATPEFLRSIQSIRDPASIPPEEMSPEFTTLPECHGHARDAFSFGTLVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPKCRPALCTLLSHDFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPYLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSPMRVVCSHHSQISPILENPFSSIFPKCFFSGSMPINSKKHIQRDYYNTLLQTGDPFSQPIKFPINGLSDVKNTSEDSENFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWPRETCDDVKSQCTTLDVEE-SSWDDCEPSSLDTKVNPGGGITATKPVTSGEQKPIPALLSLTEES-MPWKSSLPQKTSLVQRGDDADQIKPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPKKDDVSPVMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW
>N010
MGSENSALKSYTLKEPPFILPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGQLTHNNVCLSSVFVSEDGHWKLGAMETVCKISQATPEFLRSIQSIRDPASIPPEEMSPEFTTLPECHGHARDAFSFGTLVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPNCRPALCTLLSHEFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPHLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSSMRVVCSKHSQISPVLENPFSSIFPKCFFSGSMPINSKKHIQRDYYNTLLQTGDPFSQPIKFPINGLSDVKNTLEDS-NFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWPREPCDAVKSQCTTLDMEE-SSWDDCEPSNLDTKVNPGGGITATKSVTSGEQKPIPALLPLTEES-MPWKSSLPQKTSLVQSGDDPDQIKPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPKKDEVSPVMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW
>N011
MGSENSALKSYTLKEPPFILPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGQLTHNNVCLSSVFVSEDGHWKLGAMETVCKISQATPEFLKSIQSIRDPASIPPEEMSPEFTTLPECHGHARDAFSFGTLVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPNCRPALCTLLSHEFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPHLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSSMRVVCSKHSQISPVLENPFSSIFPKCFFSGSMPINSKKHIQRDYYYTLLQTGDPFSQSIKFPINGLSDIKNTLEDSENFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWPREPCDAVKSQCTTLDMEE-SSWDDCEPSNLATNVNPGDGITATKSVTSGEQKPIPALLPLTEES-MPWKSSLPQKTSLVQSGDDPDQITPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLVLPELRTE--MVPKKDEVSSVMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW
>N012
MGSENSALKSYTLKEPPFILPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGQLTHNNVCLSSVFVSEDGHWKLGAMETVCKISQATPEFLRSIQSIRDPASIPPEEMSPEFTTLPECHGHARDAFSFGTLVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPNCRPALCTLLSHEFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPHLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSSMRVVCSKHSQISPVLENPFSSIFPKCFFSGSMPINSKKHIQRDYYNTLLQTGDPFSQPIKFPINGLSDVKNTLEDSKNFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWPREPCDAVKSQCTTLDMEE-SSWDDCEPSNLDTKVNPGGGITATKSVTSGEQKPIPALLPLTEES-MPWKSSLPQKTSLVQSGDDPDQIKPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPKKDEVSPVMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW
>N013
MGSENSALKSYTLKEPPFILPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGQLTHNNVCLSSVFVSEDGHWKLGAMETVCKISQATPEFLRSIQSIRDPASIPPEEMSPEFTTLPECHGHARDAFSFGTLVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPNCRPALCTLLSHEFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPHLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSSMRVVCSKHSQISPVLENPFSSIFPKCFFSGSMPINSKKHIQRDYYNTLLQTGDPFSQPIKFPINGLSDVKNTLEDSKNFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWPREPCDAVKSQCTTLDMEE-SSWDDCEPSNLDTKVNPGGGITATKSVTSGEQKPIPALLPLTEES-MPWKSSLPQKTSLVQSGDDPDQIKPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPKKDEVSPVMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW
>N014
MGSENSALKSYTLKEPPFILPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGQLTHNNVCLSSVFVSEDGHWKLGAMETVCKISQATPEFLRSIQSIRDPASIPPEEMSPEFTTLPECHGHARDAFSFGTLVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPNCRPALCTLLSHEFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPHLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSSMRVVCSKHSQISPVLENPFSSIFPKCFFSGSMPINSKKHIQRDYYNTLLQTGDPFSQPIKFPINGLSDVKNTLEDSKNFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWHREPCDAVKSQCTTLDMEE-SSWDDCEPSNLDTKVNPGGGITATKSVTSGEQKPIPALLPLTEES-MPWKSSLPQKTSLVQNGDDPDQIKPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPKKDEICPMMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW
>N015
MGSENSALKSYTLKEPPFILPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGQLTHNNVCLSSVFVSEDGHWKLGAMETVCKISQATPEFLRSIQSIRDPASIPPEEMSPEFTTLPECHGHARDAFSFGTLVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPNCRPALCTLLSHEFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPHLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSSMRVVCSKHSQISPVLENPFSSIFPKCFFSGSMPINSKKHIQRDYYNTLLQTGDPFSQPIKFPINGLSDVKNTLEDSKNFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWHREPCDAVKSQCTTLDMEE-SSWDDCEPSNLDTKVNPGGGITATKSVTSGEQKPIPALLPLTEES-MPWKSSLPQKTSLVQNGDDPDQIKPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPKKDEICPMMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW
>N016
MGSENSALKSYTLKEPPFILPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGQLTHNNVCLSSVFVSEDGHWKLGAMETVCKISQATPEFLRSIQSIRDPASIPPEEMSPEFTTLPECHGHARDAFSFGTLVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPNCRPALCTLLSHEFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPHLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSSMRVVCSKHSQISPVLENPFSSIFPKCFFSGSMPINSKKHIQRDYYNTLLQTGDPFSQPIKFPINGLSDVKNTLEDSKNFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWPREPCDAVKSQCTTLDMEE-SSWDDCEPSNLDTKVNPGGGITATKSVTSGEQKPIPALLPLTEES-MPWKSSLPQKTSLVQSGDDPDQIKPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPKKDEVSPVMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW
>N017
MGSENSALKSYTLKEPPFILPSGLAVYPAVLQDGKFASVFVYKRENEDKVNKAAKHLKTLRHPCLLRFLSCTVEADGIHLVTERVQPLEVALETLSSAEVCAGIYDILLALIFLHDRGQLTHNNVCLSSVFVSEDGHWKLGAMETVCKISQATPEFLRSIQSIRDPASIPPEEMSPEFTTLPECHGHARDAFSFGTLVESLLTILNEQVSADVLSSFQQTLHSTLLNPIPNCRPALCTLLSHEFFRNDFLEVVNFLKSLTLKSEEEKTEFFKFLLDRVSCLSEELIASRLVPLLLNQLVFAEPVAVKSFLPHLLGPKK-DHA-QGETPCLLSPALFQSRVIPVLLQLFEVHEEHVRMVLLSHIEAYVEHFTQEQLKKVILPQVLLGLRDTSDSIVAITLHSLAVLVSLLGPEVVVGGERTKIFKRTAPSFTKNIDLSLEDSSMRVVCSKHSQISPVLENPFSSIFPKCFFSGSMPINSKKHIQRDYYNTLLQTGDPFSQPIKFPINGLSDVKNTLEDSKNFPSSSKKS-EEWPDWSEPEEP-ENQTVNIQIWPREPCDAVKSQCTTLDMEE-SSWDDCEPSNLDTKVNPGGGITATKSVTSGEQKPIPALLPLTEES-MPWKSSLPQKTSLVQSGDDPDQIKPPKVSSQERPLKVPSELGLGEEFTIQVKKKPVKDPEMDWFADMIPEIKPSAAFLILPELRTE--MVPKKDEVSPVMQFSSKFAAAEITEGEAEGWEEEGELNWE--DNNW
祖先配列推定自動化スクリプト
一連の祖先配列推定を自動で実行可能なPythonスクリプトを作成してみた。
Biopythonを利用しているので、事前にインストールが必要 (pip install biopython
)
下記コマンドにより、① 系統樹内部ノードへの識別ID付与、② IQ-TREEによる祖先配列推定、③ 推定結果からの祖先配列情報の抽出、まで実行される。
# 引数1:Newick形式の系統樹、引数2:アライメント済CDS配列FASTAファイル、引数3:出力ディレクトリ
python auto_asr_script.py simiiformes.nwk cds_align.fasta output
import csv
import subprocess as sp
import sys
from collections import defaultdict
from pathlib import Path
from Bio import Phylo
def main(nwk_tree_file: Path, fasta_aln_file: Path, outdir: Path):
"""Ancestral sequence reconstruction automated script
Args:
nwk_tree_file (Path): Newick format tree file path
fasta_aln_file (Path): Fasta format alignment file path
outdir (Path): Output directory
"""
outdir.mkdir(exist_ok=True)
# Add node id to tree
nwk_nodeid_tree_file = outdir / "user_tree.nwk"
add_serial_node_id(nwk_tree_file, nwk_nodeid_tree_file)
# Ancestral sequence reconstruction using iqtree
prefix = outdir / "iqtree_asr"
iqtree_asr_cmd = f"iqtree --ancestral -s {fasta_aln_file} -te {nwk_nodeid_tree_file} -m LG+I+G4 -asr-min 0.95 --prefix {prefix} -T 4 --seed 0 --redo"
sp.run(iqtree_asr_cmd, shell=True)
# Write asr fasta file from iqtree state file
asr_state_file = str(prefix) + ".state"
asr_fasta_file = str(prefix) + ".fasta"
write_asr_fasta_file(asr_state_file, asr_fasta_file)
def add_serial_node_id(nwk_tree_infile: str, nwk_tree_outfile: str) -> None:
"""Add serial node id(N001, N002...N00X) in each internal tree node
Args:
nwk_tree_infile (str): Input newick file path
nwk_tree_outfile (str): Output newick file path
"""
tree = Phylo.read(nwk_tree_infile, "newick")
for cnt, node in enumerate(tree.get_nonterminals()):
node.name = f"N{cnt:03d}"
Phylo.write(tree, nwk_tree_outfile, "newick", plain=True)
def write_asr_fasta_file(asr_state_infile: str, asr_fasta_outfile: str) -> None:
"""Write asr fasta file from iqtree asr state file
Args:
asr_state_infile (str): asr state file path
asr_fasta_outfile (str): asr fasta file path
"""
node_name2asr_seq = defaultdict(str)
with open(asr_state_infile) as f:
reader = csv.reader(f, delimiter="\t")
for row in reader:
# Skip header row
if row[0].startswith("#") or row[0].startswith("Node"):
continue
# Record node_name & asr_seq
node_name, aa_pos, asr_aa, prob_list = row[0], row[1], row[2], row[3:]
node_name2asr_seq[node_name] += asr_aa
with open(asr_fasta_outfile, "w") as f:
for node_name, asr_seq in sorted(node_name2asr_seq.items()):
f.write(f">{node_name}\n{asr_seq}\n")
if __name__ == "__main__":
args = sys.argv
nwk_tree_file, fasta_aln_file, outdir = Path(args[1]), Path(args[2]), Path(args[3])
main(nwk_tree_file, fasta_aln_file, outdir)
参考
[IQ-TREE Command Reference](http://www.iqtree.org/doc/Command-Reference#ancestral-sequence-reconstruction
OanO)
Ancestral Sequence Reconstruction - Wikipedia
その他の祖先配列推定CLIツール(他にも色々あるっぽい)
FastML Source Code
raxml-ng - GitHub
MEGA - Molecular Evolutionary Genetics Analysis