動機
系統樹を描くのに使うsingle-copy orthologsをOrthoFinder2で選抜した。
このときSingle_Copy_Orthologue_Sequences
フォルダに OG0000001.fa
のような名前のFASTA形式ファイルが保存される。
FASTAファイル内の各配列のタイトルはタンパク質そのもののIDとなっている。
>BDT62344.1
MENNSHDNVINTPFIDDDNVKNDIFINTSDDDDNNNSDNDSKGNSNNNSSNSSSSSSSSN
SSTTSEDIDNIDLDDYKSVLLLCKEEKIYNNQKNERVENKQFYPNKKKRKLNNIESPQPQ
PSLLSRLPTLSSSSSSSSSSSPSSPPPPSSPTLSVPPPSDDQPPLLITNCKEVVDIIYRH
ERQVMSDSELYLGYASTKMPKEFVECLLMFRDEMTLILQEYARLRRSQQLLGNFNTNSIN
YADEMVKKMIDIILRIDKNSMSVDKYRDVVREALYIYHLVVSKMTGPKHLKRLRTPELHF
DFCMLIALLSHNVENVKNISTKYRVSSIIQFVSALDCQWYLSVVPSILSVFNRSNSICHA
LSFSYMRHAQVNITLCLVFALTETNTTNLVLGVILYLFPESLESIKTNELIKDESLIIPL
CRKIKEHLRSQWIDRMDITNAAYLLLGSCTDGLDTIKVFKKHNYDKKIVAVSIMIQQKLK
QLNQSICFH
>BDT62474.1
MFDHFDPEHFLIVYSPIAFLSIITRNFALIREMSLQDDLYLSTSSDEIEDDNESEEEEDD
DDDYDDNGDSVNDKDDDDDVEFDPDKTNSTVLLTHVPRKGTYVSRIAHDESDSRGCSKSV
ISDTDSGSQTSSVSQSNEHRWRMSLRHQRKRFRHNNFDHHVRTTSPQMVVLSQPMNVLGA
LANEERTAMEVAEITLTQGSLAMSEPDREALLTFYREIKTIISAYLSLVHIQRTLNNYNT
NSINYPEGMVKKILSIIRQIPRHEMSMEKYNVVCRDALFLYYTIITRMTGPKHSKRLRTP
YWQFYFCGVLAMLVNDVPVASDLTVSGKETSLVQFASAVGNPAYQTAVHDISSVYNSSYS
VYKALGLSRSQLTDANMVLAILSARNTHLSDRKPRTMAQSALLYRNPDLIDRMRASGLVQ
DESSLGSTSRAVAAQLRVAGVSNQTLDDASHFLHGSYNQEGVTLRCFGSGQRDLKTVAAS
VLVTEDLRRRIRTTW
>BDV49871.1
MKYIFSDDDDDDNSTISNSNSSSNKSSDNSDIEDDEDDDNDEDLDNYQSILLMSKNDTNH
IIEKKFEHDNSIHSNNIDNKNIPFCSSFDMKETLEIKHDNTSKKNITYEYDSSNNRKDTE
EGQEDKRGPLLITNPKEVVDEILREEKKTSSETIIGYALSKYSKESVESLSMFKEEIILI
LQMYAKMKCRQQLLGNFNTNSTNYSEEMVKKMIEIISRIDKNSMALDKYRDVVREALYIY
HLVVSKMTGPKHLKRLRTPDLHFDFCVIVALLAHNIEVNKSNFSKYRATSLIQFVSALDC
QWYLSVVPNILSVFNSSSSICQIIGFSALKHAKINIILCLLFNIFEKKPNNLVLATILYL
YPESLNVIKQNGLIRDEASVIALSKKIKIHLDDYWIYQSDIHCAANLLIGQWTEGLNIIK
LFKKQHYDKKVIAISLLVQQRLKQLNQTVEYY
系統解析する前に、配列名をもとの生物種名 (=input FASTAファイル名のbasename部分) に変更する必要が生じた。
>MelaMJNV
MENNSHDNVINTPFIDDDNVKNDIFINTSDDDDNNNSDNDSKGNSNNNSSNSSSSSSSSN
SSTTSEDIDNIDLDDYKSVLLLCKEEKIYNNQKNERVENKQFYPNKKKRKLNNIESPQPQ
PSLLSRLPTLSSSSSSSSSSSPSSPPPPSSPTLSVPPPSDDQPPLLITNCKEVVDIIYRH
ERQVMSDSELYLGYASTKMPKEFVECLLMFRDEMTLILQEYARLRRSQQLLGNFNTNSIN
YADEMVKKMIDIILRIDKNSMSVDKYRDVVREALYIYHLVVSKMTGPKHLKRLRTPELHF
DFCMLIALLSHNVENVKNISTKYRVSSIIQFVSALDCQWYLSVVPSILSVFNRSNSICHA
LSFSYMRHAQVNITLCLVFALTETNTTNLVLGVILYLFPESLESIKTNELIKDESLIIPL
CRKIKEHLRSQWIDRMDITNAAYLLLGSCTDGLDTIKVFKKHNYDKKIVAVSIMIQQKLK
QLNQSICFH
>MelaPMNV
MFDHFDPEHFLIVYSPIAFLSIITRNFALIREMSLQDDLYLSTSSDEIEDDNESEEEEDD
DDDYDDNGDSVNDKDDDDDVEFDPDKTNSTVLLTHVPRKGTYVSRIAHDESDSRGCSKSV
ISDTDSGSQTSSVSQSNEHRWRMSLRHQRKRFRHNNFDHHVRTTSPQMVVLSQPMNVLGA
LANEERTAMEVAEITLTQGSLAMSEPDREALLTFYREIKTIISAYLSLVHIQRTLNNYNT
NSINYPEGMVKKILSIIRQIPRHEMSMEKYNVVCRDALFLYYTIITRMTGPKHSKRLRTP
YWQFYFCGVLAMLVNDVPVASDLTVSGKETSLVQFASAVGNPAYQTAVHDISSVYNSSYS
VYKALGLSRSQLTDANMVLAILSARNTHLSDRKPRTMAQSALLYRNPDLIDRMRASGLVQ
DESSLGSTSRAVAAQLRVAGVSNQTLDDASHFLHGSYNQEGVTLRCFGSGQRDLKTVAAS
VLVTEDLRRRIRTTW
>MellatMJNV
MKYIFSDDDDDDNSTISNSNSSSNKSSDNSDIEDDEDDDNDEDLDNYQSILLMSKNDTNH
IIEKKFEHDNSIHSNNIDNKNIPFCSSFDMKETLEIKHDNTSKKNITYEYDSSNNRKDTE
EGQEDKRGPLLITNPKEVVDEILREEKKTSSETIIGYALSKYSKESVESLSMFKEEIILI
LQMYAKMKCRQQLLGNFNTNSTNYSEEMVKKMIEIISRIDKNSMALDKYRDVVREALYIY
HLVVSKMTGPKHLKRLRTPDLHFDFCVIVALLAHNIEVNKSNFSKYRATSLIQFVSALDC
QWYLSVVPNILSVFNSSSSICQIIGFSALKHAKINIILCLLFNIFEKKPNNLVLATILYL
YPESLNVIKQNGLIRDEASVIALSKKIKIHLDDYWIYQSDIHCAANLLIGQWTEGLNIIK
LFKKQHYDKKVIAISLLVQQRLKQLNQTVEYY
rename_single_copy_orthologs.py
Requirements
Usage
Outputディレクトリは事前に作成する必要がある。
./rename_single_copy_orthologs.py
usage: rename_single_copy_orthologs.py [-h] --input DIR --output DIR -t ORTHOGROUPS -s SINGLE_COPY_ORTHOLOGUES
Rename single-copy orthologue sequences identified by OrthoFinder2
optional arguments:
-h, --help show this help message and exit
--input DIR, -i DIR, --in DIR
Single_Copy_Orthologue_Sequences directory
--output DIR, -o DIR, --out DIR
output directory
-t ORTHOGROUPS, --orthogroups ORTHOGROUPS
Orthogroups.tsv
-s SINGLE_COPY_ORTHOLOGUES, --single_copy_orthologues SINGLE_COPY_ORTHOLOGUES
Orthogroups_SingleCopyOrthologues.txt
テーブルを複数指定する必要があるなど若干ぎこちない点があるが、問題なく作動するためよしとする。