Biopythonの持つ機能、efetchを利用することでpython経由で配列情報などを取得することができる。
詳しくは提供元のページをみるとよい。
https://biopython.org/docs/1.75/api/Bio.Entrez.html
今回はGoogle colaboratoryで実施し下さいの出力データについてまとめる。
Biopythonのインストール
以下の方法でbiopythonをインストールする。
!pip install biopython
基本操作
biopythonインストール後以下の操作で配列を取得できる。
#moduleのimport
from Bio import Entrez
#idなどは実際のidなどを記入する必要がある。
Entrez.email = xxxx@xxxxx.com
handle = Entrez.efetch(db="protein", id="GenBank_id", rettype="gb", retmode="text")
実際に大腸菌のflavin reductaseを用いて以下のコードを実行してみる。
典型的なgenebankフォーマットがダウンロードされていることがわかります。
Entrez.email = xxxx@xxxxx.com
handle = Entrez.efetch(db="protein", id="PWL87550.1", rettype="gb", retmode="text")
print(handle.read())
この操作で以下のデータが出力される。
LOCUS PWL87550 202 aa linear ENV 23-MAY-2018
DEFINITION MAG: flavin reductase [Escherichia coli].
ACCESSION PWL87550
VERSION PWL87550.1
DBLINK BioProject: PRJNA397219
BioSample: SAMN08295014
DBSOURCE accession QAMB01000025.1
#以下略
保存したい場合は以下のコードを追記するとよい
with open('PWL87550.1.gb','w') as f:
f.write(handle.read())
rettypeやretmodeを変更することによって異なる情報やファイルが得られます。
詳しくはこちらのtableを参照します。
proteinデータベースにアクセスする場合
retmode="text", rettypeなし
もっとも単純な場合は以下の操作です。
Entrez.email = e_mail
handle = Entrez.efetch(db="protein", id='PWL87550.1', retmode="text")
print(handle.read())
出力結果、ASN.1形式が出力されます。
https://www.ncbi.nlm.nih.gov/Structure/asn1.html
Seq-entry ::= set {
id id 1,
class nuc-prot,
descr {
source {
org {
taxname "Escherichia coli",
db {
{
db "taxon",
tag id 562
}
},
orgname {
name binomial {
genus "Escherichia",
species "coli"
},
attrib "specified",
mod {
{
subtype isolate,
subname "CIM:MAG 560"
#以下略
retmode="text", rettype="gb"
以下のコードを入力した場合
handle = Entrez.efetch(db="protein", id='PWL87550.1', retmode="text", rettype="gb")
print(handle.read())
gb形式で出力されます。
LOCUS PWL87550 202 aa linear ENV 23-MAY-2018
DEFINITION MAG: flavin reductase [Escherichia coli].
ACCESSION PWL87550
VERSION PWL87550.1
DBLINK BioProject: PRJNA397219
BioSample: SAMN08295014
DBSOURCE accession QAMB01000025.1
KEYWORDS ENV; Metagenome Assembled Genome; MAG.
SOURCE Escherichia coli (human gut metagenome)
#以下略
retmode="text", rettype="gp"
入力例
handle = Entrez.efetch(db="protein", id='PWL87550.1', retmode="text", rettype="gp")
print(handle.read())
上と同様にgbフォーマットでダウンロードされます。出力は省略します。
retmode="xml", rettype="gp"
入力例
handle = Entrez.efetch(db="protein", id='PWL87550.1', retmode="xml", rettype="gp")
print(handle.read())
出力例、以下のようにxml形式のファイルが出力されます。情報の抽出にはこちらの方が使いやすいかもしれません。
b'<?xml version="1.0" encoding="UTF-8" ?>\n
<!DOCTYPE GBSet PUBLIC "-//NCBI//NCBI GBSeq/EN" "https://www.ncbi.nlm.nih.gov/dtd/NCBI_GBSeq.dtd">\n
<GBSet>\n
<GBSeq>\n
\n
<GBSeq_locus>PWL87550</GBSeq_locus>\n
<GBSeq_length>202</GBSeq_length>\n
<GBSeq_moltype>AA</GBSeq_moltype>\n
<GBSeq_topology>linear</GBSeq_topology>\n
<GBSeq_division>ENV</GBSeq_division>\n
<GBSeq_update-date>23-MAY-2018</GBSeq_update-date>\n
<GBSeq_create-date>23-MAY-2018</GBSeq_create-date>\n
<GBSeq_definition>MAG: flavin reductase [Escherichia coli]</GBSeq_definition>\n
#以下略
retmode="xml", rettype="gpc"
入力例
handle = Entrez.efetch(db="protein", id='PWL87550.1', retmode="xml", rettype="gpc")
print(handle.read())
出力例
b'<?xml version="1.0" encoding="UTF-8" ?>\n
<!DOCTYPE INSDSet PUBLIC "-//NCBI//INSD INSDSeq/EN" "https://www.ncbi.nlm.nih.gov/dtd/INSD_INSDSeq.dtd">\n
<INSDSet>\n
<INSDSeq>\n
\n
<INSDSeq_locus>PWL87550</INSDSeq_locus>\n
<INSDSeq_length>202</INSDSeq_length>\n
<INSDSeq_moltype>AA</INSDSeq_moltype>\n
<INSDSeq_topology>linear</INSDSeq_topology>\n
<INSDSeq_division>ENV</INSDSeq_division>\n
<INSDSeq_update-date>23-MAY-2018</INSDSeq_update-date>\n
<INSDSeq_create-date>23-MAY-2018</INSDSeq_create-date>\n
<INSDSeq_definition>MAG: flavin reductase [Escherichia coli]</INSDSeq_definition>\n
<INSDSeq_primary-accession>PWL87550</INSDSeq_primary-accession>\n
<INSDSeq_accession-version>PWL87550.1</INSDSeq_accession-version>\n
retmode="xml", rettype="ipg"
入力例
handle = Entrez.efetch(db="protein", id='PWL87550.1', retmode="xml", rettype="ipg")
print(handle.read())
出力例。由来となるゲノムのidを獲得可能です。
b'<?xml version="1.0" encoding="UTF-8" ?>\n
<IPGReportSet xmlns:xsi="https://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="https://www.ncbi.nlm.nih.gov/data_specs/schema/other/seq_report/IPGReportSet.xsd">\n\n
<IPGReport ipg="188447752" product_acc="PWL87550.1">\n
<Product accver="PWL87550.1" name="flavin reductase" taxid="562" slen="202" org="Escherichia coli" kingdom_taxid="2" kingdom="Bacteria"/>\n
<ProteinList>\n
<Protein accver="PWL87550.1" source="INSDC" name="flavin reductase" taxid="562" org="Escherichia coli" kingdom_taxid="2" kingdom="Bacteria" priority="0">\n
<CDSList>\n
<CDS accver="QAMB01000025.1" start="220007" stop="220615" strand="+" taxid="562" org="Escherichia coli" kingdom_taxid="2" kingdom="Bacteria" assembly="GCA_003149995.1"/></CDSList></Protein></ProteinList>\n
<Statistics prot_count="1" nuc_count="1" assmb_count="1"/></IPGReport></IPGReportSet>\n'
retmode="text", rettype="ft"
入力例
handle = Entrez.efetch(db="protein", id='PWL87550.1', retmode="text", rettype="ft")
print(handle.read())
feature tableが出力される。
>Feature gb|PWL87550.1|
1 202 Protein
product flavin reductase
8 159 Region
region Flavin_Reduct
note Flavin reductase like domain
db_xref CDD:426345
169 202 Region
region rubredoxin_SM
note Rubredoxin, Small Modular nonheme iron binding domain containing a [Fe(SCys)4] center, present in rubrerythrin and nigerythrin and detected either N- or C-terminal to such proteins as flavin reductase, NAD(P)H-nitrite reductase, and ...
db_xref CDD:238371
171 171 Site
174 174
189 189
192 192
site_type other
note iron binding site [ion binding]
db_xref CDD:238371
1 202 CDS
product flavin reductase
transl_table 11
protein_id gb|PWL87550.1||gnl|WGS:QAMB|DBY14_05325
inference COORDINATES: similar to AA sequence:RefSeq:WP_009288844.1
retmode="text", rettype="acc"
入力例
handle = Entrez.efetch(db="protein", id="PWL87550.1", retmode="text", rettype="acc")
print(handle.read())
出力、IDのみ出力されます。
PWL87550.1
retmode="text", rettype="fasta"
入力例
handle = Entrez.efetch(db="protein", id="PWL87550.1", retmode="text", rettype="fasta")
print(handle.read())
出力例、fasta形式で出力されます。
>PWL87550.1 MAG: flavin reductase [Escherichia coli]
MSYDKNVLFNISYGLYILSANSSNADNACVINTLSQITSSPDTVSVTVNKKNFTNEMIKKTKKLNVSILS
TDADFELIKRFGFQSGRDIDKFVGFKDMYRSQNGIYFINKGANSFISVDIDEIIDFDTHNMFVGHITDTA
VLSDKESLTYSYYQNNIKPKQNENKKTGYVCTVCGYIHESDTLPDDFVCPVCKHGASAFVKL
retmode="xml"、rettype = "acc"
入力例
handle = Entrez.efetch(db="protein", id="PWL87550.1", retmode="xml", rettype="fasta")
print(handle.read())
出力例
b'<?xml version="1.0" encoding="UTF-8" ?>\n
<!DOCTYPE TSeqSet PUBLIC "-//NCBI//NCBI TSeq/EN" "https://www.ncbi.nlm.nih.gov/dtd/NCBI_TSeq.dtd">\n<TSeqSet>\n<TSeq>\n <TSeq_seqtype value="protein"/>\n
<TSeq_accver>PWL87550.1</TSeq_accver>\n
<TSeq_sid>gnl|WGS:QAMB|DBY14_05325</TSeq_sid>\n
<TSeq_taxid>562</TSeq_taxid>\n
<TSeq_orgname>Escherichia coli</TSeq_orgname>\n
<TSeq_defline>MAG: flavin reductase [Escherichia coli]</TSeq_defline>\n
<TSeq_length>202</TSeq_length>\n
<TSeq_sequence>MSYDKNVLFNISYGLYILSANSSNADNACVINTLSQITSSPDTVSVTVNKKNFTNEMIKKTKKLNVSILSTDADFELIKRFGFQSGRDIDKFVGFKDMYRSQNGIYFINKGANSFISVDIDEIIDFDTHNMFVGHITDTAVLSDKESLTYSYYQNNIKPKQNENKKTGYVCTVCGYIHESDTLPDDFVCPVCKHGASAFVKL</TSeq_sequence>\n
</TSeq>\n\n
</TSeqSet>\n'
retmode="text"、rettype = "seqid"
入力例
handle = Entrez.efetch(db="protein", id="PWL87550.1", retmode="text", rettype="seqid")
print(handle.read())
出力例
Seq-id ::= genbank {
accession "PWL87550" ,
version 1 }
Seq-id ::= general {
db "WGS:QAMB" ,
tag
str "DBY14_05325" }
Seq-id ::= gi 1391407525
retmode="xml", rettype="docsum"
入力例
handle = Entrez.efetch(db="protein", id="PWL87550.1", retmode="xml", rettype="docsum")
print(handle.read())
出力例
b'<?xml version="1.0" encoding="UTF-8" ?>\n
<!DOCTYPE eSummaryResult PUBLIC "-//NLM//DTD esummary v1 20041029//EN" "https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20041029/esummary-v1.dtd">\n
<eSummaryResult>\n\n
<DocSum>\n
\t<Id>1391407525</Id>\n
\t<Item Name="Caption" Type="String">PWL87550</Item>\n
\t<Item Name="Title" Type="String">flavin reductase [Escherichia coli]</Item>\n
\t<Item Name="Extra" Type="String">gi|1391407525|gb|PWL87550.1||gnl|WGS:QAMB|DBY14_05325[1391407525]</Item>\n
\t<Item Name="Gi" Type="Integer">1391407525</Item>\n
\t<Item Name="CreateDate" Type="String">2018/05/23</Item>\n
\t<Item Name="UpdateDate" Type="String">2018/05/23</Item>\n
\t<Item Name="Flags" Type="Integer">0</Item>\n
\t<Item Name="TaxId" Type="Integer">562</Item>\n
\t<Item Name="Length" Type="Integer">202</Item>\n
\t<Item Name="Status" Type="String">live</Item>\n
\t<Item Name="ReplacedBy" Type="String"></Item>\n
\t<Item Name="Comment" Type="String"><![CDATA[ ]]></Item>\n
\t<Item Name="AccessionVersion" Type="String">PWL87550.1</Item>\n
</DocSum>\n</eSummaryResult>\n'
retmode="txt", rettype="uilist"
入力例
handle = Entrez.efetch(db="protein", id="PWL87550.1", retmode="txt", rettype="uilist")
print(handle.read())
出力例
1391407525
retmode="xml", rettype="uilist"
入力例
handle = Entrez.efetch(db="protein", id="PWL87550.1", retmode="xml", rettype="uilist")
print(handle.read())
出力例
b'<?xml version="1.0" ?>\n
<!DOCTYPE eEfetchResult PUBLIC "-//NLM//DTD efetch 20020605//EN" "https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20020605/uilist.dtd">\n
<IdList>\n
<Id>1391407525</Id>\n</IdList>\n'
nucleotideデータベースにアクセスする場合
db="nuccore"とします。
入力例, rettypeを指定しない場合です。
handle = Entrez.efetch(db="nuccore", id='NR_024570', retmode="txt")
print(handle.read())
ASN.1形式で出力されます。出力例は省略します。
retmode="txt", rettype="acc"
入力例
handle = Entrez.efetch(db="nuccore", id='NR_024570', retmode="txt", rettype="acc")
print(handle.read())
出力例
NR_024570.1
retmode="txt", rettype="fasta"
入力例
handle = Entrez.efetch(db="nuccore", id='NR_024570', retmode="txt", rettype="fasta")
print(handle.read())
出力例
>NR_024570.1 Escherichia coli strain U 5/41 16S ribosomal RNA, partial sequence
AGTTTGATCATGGCTCAGATTGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAACGGTAACAGGAAG
CAGCTTGCTGCTTTGCTGACGAGTGGCGGACGGGTGAGTAATGTCTGGGAAACTGCCTGATGGAGGGGGA
TAACTACTGGAAACGGTAGCTAATACCGCATAACGTCGCAAGCACAAAGAGGGGGACCTTAGGGCCTCTT
GCCATCGGATGTGCCCAGATGGGATTAGCTAGTAGGTGGGGTAACGGCTCACCTAGGCGACGATCCCTAG
#以下略
retmode="xml", rettype="fasta"
入力例 ``` handle = Entrez.efetch(db="nuccore", id='NR_024570', retmode="xml", rettype="fasta") print(handle.read()) ```出力例
b'<?xml version="1.0" encoding="UTF-8" ?>\n
<!DOCTYPE TSeqSet PUBLIC "-//NCBI//NCBI TSeq/EN" "https://www.ncbi.nlm.nih.gov/dtd/NCBI_TSeq.dtd">\n<TSeqSet>\n<TSeq>\n
<TSeq_seqtype value="nucleotide"/>\n
<TSeq_accver>NR_024570.1</TSeq_accver>\n
<TSeq_taxid>562</TSeq_taxid>\n
<TSeq_orgname>Escherichia coli</TSeq_orgname>\n
<TSeq_defline>Escherichia coli strain U 5/41 16S ribosomal RNA, partial sequence</TSeq_defline>\n
<TSeq_length>1450</TSeq_length>\n
<TSeq_sequence>AGTTTGATCATGGCTCAGATTGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAACGGTAACAGGAAGCAGCTTGCTGCTTTGCTGACGAGTGGCGGACGGGTGAGTAATGTCTGGGAAACTGCCTGATGGAGGGGGATAACTACTGGAAACGGTAGCTAATACCGCATAACGTCGCAAGCACAAAGAGGGGGACCTTAGGGCCTCTTGCCATCGGATGTGCCCAGATGGGAT
#以下略
retmode="txt", rettype="seqid"
入力例
handle = Entrez.efetch(db="nuccore", id='NR_024570', retmode="txt", rettype="seqid")
print(handle.read())
出力例
Seq-id ::= other {
accession "NR_024570" ,
version 1 }
Seq-id ::= gi 219722938
retmode="txt", rettype="gb"
入力例 ``` handle = Entrez.efetch(db="nuccore", id='NR_024570', retmode="txt", rettype="gb") print(handle.read()) ```出力例
LOCUS NR_024570 1450 bp rRNA linear BCT 11-MAR-2019
DEFINITION Escherichia coli strain U 5/41 16S ribosomal RNA, partial sequence.
ACCESSION NR_024570
VERSION NR_024570.1
DBLINK Project: 33175
BioProject: PRJNA33175
KEYWORDS RefSeq.
SOURCE Escherichia coli
ORGANISM Escherichia coli
Bacteria; Proteobacteria; Gammaproteobacteria; Enterobacterales;
Enterobacteriaceae; Escherichia.
REFERENCE 1
AUTHORS Cilia,V., Lafay,B. and Christen,R.
TITLE Sequence heterogeneities among 16S ribosomal RNA sequences, and
their effect on phylogenetic analyses at the species level
JOURNAL Mol. Biol. Evol. 13 (3), 451-461 (1996)
#以下略
retmode="xml", rettype="gb"
入力例 ``` handle = Entrez.efetch(db="nuccore", id='NR_024570', retmode="xml", rettype="gb") print(handle.read()) ```出力例
b'<?xml version="1.0" encoding="UTF-8" ?>\n<!DOCTYPE GBSet PUBLIC "-//NCBI//NCBI GBSeq/EN" "https://www.ncbi.nlm.nih.gov/dtd/NCBI_GBSeq.dtd">\n<GBSet>\n <GBSeq>\n\n <GBSeq_locus>NR_024570</GBSeq_locus>\n <GBSeq_length>1450</GBSeq_length>\n <GBSeq_strandedness>single</GBSeq_strandedness>\n <GBSeq_moltype>rRNA</GBSeq_moltype>\n <GBSeq_topology>linear</GBSeq_topology>\n <GBSeq_division>BCT</GBSeq_division>\n <GBSeq_update-date>11-MAR-2019</GBSeq_update-date>\n <GBSeq_create-date>07-JAN-2009</GBSeq_create-date>\n <GBSeq_definition>Escherichia coli strain U 5/41 16S ribosomal RNA, partial sequence</GBSeq_definition>\n <GBSeq_primary-accession>NR_024570</GBSeq_primary-accession>\n <GBSeq_accession-version>NR_024570.1</GBSeq_accession-version>\n <GBSeq_other-seqids>\n <GBSeqid>ref|NR_024570.1|</GBSeqid>\n <GBSeqid>gi|219722938</GBSeqid>\n </GBSeq_other-seqids>\n <GBSeq_project>PRJNA33175</GBSeq_project>\n <GBSeq_keywords>\n <GBKeyword>RefSeq</GBKeyword>\n </GBSeq_keywords>\n <GBSeq_source>Escherichia coli</GBSeq_source>\n <GBSeq_organism>Escherichia coli</GBSeq_organism>\n <GBSeq_taxonomy>Bacteria; Proteobacteria; Gammaproteobacteria; Enterobacterales; Enterobacteriaceae; Escherichia</GBSeq_taxonomy>\n <GBSeq_references>\n
#以下略
retmode="xml", rettype="gbc"
入力例
handle = Entrez.efetch(db="nuccore", id='NR_024570', retmode="xml", rettype="gbc")
print(handle.read())
出力例
b'<?xml version="1.0" encoding="UTF-8" ?>\n<!DOCTYPE INSDSet PUBLIC "-//NCBI//INSD INSDSeq/EN" "https://www.ncbi.nlm.nih.gov/dtd/INSD_INSDSeq.dtd">\n<INSDSet>\n <INSDSeq>\n\n <INSDSeq_locus>NR_024570</INSDSeq_locus>\n <INSDSeq_length>1450</INSDSeq_length>\n <INSDSeq_strandedness>single</INSDSeq_strandedness>\n <INSDSeq_moltype>rRNA</INSDSeq_moltype>\n <INSDSeq_topology>linear</INSDSeq_topology>\n <INSDSeq_division>BCT</INSDSeq_division>\n <INSDSeq_update-date>11-MAR-2019</INSDSeq_update-date>\n <INSDSeq_create-date>07-JAN-2009</INSDSeq_create-date>\n <INSDSeq_definition>Escherichia coli strain U 5/41 16S ribosomal RNA, partial sequence</INSDSeq_definition>\n <INSDSeq_primary-accession>NR_024570</INSDSeq_primary-accession>\n <INSDSeq_accession-version>NR_024570.1</INSDSeq_accession-version>\n <INSDSeq_other-seqids>\n <INSDSeqid>ref|NR_024570.1|</INSDSeqid>\n <INSDSeqid>gi|219722938</INSDSeqid>\n </INSDSeq_other-seqids>\n <INSDSeq_project>PRJNA33175</INSDSeq_project>\n <INSDSeq_keywords>\n <INSDKeyword>RefSeq</INSDKeyword>\n </INSDSeq_keywords>\n <INSDSeq_source>Escherichia coli</INSDSeq_source>\n <INSDSeq_organism>Escherichia coli</INSDSeq_organism>\n
#以下略
retmode="txt", rettype="gbwithparts"
入力例
handle = Entrez.efetch(db="nuccore", id='NR_024570', retmode="txt", rettype="gbwithparts")
print(handle.read())
出力例
LOCUS NR_024570 1450 bp rRNA linear BCT 11-MAR-2019
DEFINITION Escherichia coli strain U 5/41 16S ribosomal RNA, partial sequence.
ACCESSION NR_024570
VERSION NR_024570.1
DBLINK Project: 33175
BioProject: PRJNA33175
KEYWORDS RefSeq.
SOURCE Escherichia coli
ORGANISM Escherichia coli
Bacteria; Proteobacteria; Gammaproteobacteria; Enterobacterales;
Enterobacteriaceae; Escherichia.
#以下略
retmode="txt", rettype="fasta_cds_na"
入力例
handle = Entrez.efetch(db="nuccore", id='HQ828986.1', retmode="txt", rettype="fasta_cds_na")
print(handle.read())
出力例
>lcl|HQ828986.1_cds_AEM44333.1_1 [protein=putative NDP-hexose 2, 3-dehydratase] [protein_id=AEM44333.1] [location=complement(<1..1400)] [gbkey=CDS]
ATGTCACCCAAGCTCGCGGCGGCGACCGTACTGCGGCCACGGCAGGACGCGGGCCCCGCCGGCAGGATCG
CCCGCTCCGCGGCCACGGCCGAGGGCGCGCACCTGGCTCCGGGCGAGTTCCACGCGTGGTTCGCCCAGCG
GCGCGCGGCTCATTCCTTCCGCGTCGACCGCATCCCCTTCGCGGAGCTGGAGGGCTGGTCCACCCAGCGG
GACACCGGCAACCTGGTGCACCGCAGCGGGCGGTTCTTCAGCGTCGAGGGCCTGGCGGTGAGCGTGGACG
GCGGCCCGGACGGCTGGCACCAGCCCGTCATCCGCCAGCCGGAGACCGGCATTCTGGGCATCCTCGTCAA
GGAGTTCGACGGCGTCCTGCACTGCCTGATGCAGGCCAAGATG
#以下略、CDSのDNA配列のfastaが出力される。
retmode="txt", rettype="fasta_cds_aa"
入力例
handle = Entrez.efetch(db="nuccore", id='HQ828986.1', retmode="txt", rettype="fasta_cds_aa")
print(handle.read())
出力例
>lcl|HQ828986.1_prot_AEM44333.1_1 [protein=putative NDP-hexose 2, 3-dehydratase] [protein_id=AEM44333.1] [location=complement(<1..1400)] [gbkey=CDS]
MSPKLAAATVLRPRQDAGPAGRIARSAATAEGAHLAPGEFHAWFAQRRAAHSFRVDRIPFAELEGWSTQR
DTGNLVHRSGRFFSVEGLAVSVDGGPDGWHQPVIRQPETGILGILVKEFDGVLHCLMQAKMEPGNPNLLQ
LSPTVQATRSNYTKVHRGADVKYIEYFTRPERGAVLADVLQSEHGSWFLHKHNRNMIVETTGDVPPDDDF
RWLTLGQIAELLRLDNVVNMDARTVLSCVPRTAAQCEPAALHSDAELRAWLTEARARHDVHAERVPLAGL
PGWARDDSSIHHLEGRYFEVVAASVQAGSREVTSWTQPLIRPRGRGVVAFLTRRINGVPHLLAHARTEGG
#以下略、CDSのアミノ酸配列のfastaが出力される
retmode="xml", rettype="docsum"
入力例
handle = Entrez.efetch(db="nuccore", id='NR_024570', retmode="xml", rettype="docsum")
print(handle.read())
出力例
b'<?xml version="1.0" encoding="UTF-8" ?>\n
<!DOCTYPE eSummaryResult PUBLIC "-//NLM//DTD esummary v1 20041029//EN" "https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20041029/esummary-v1.dtd">\n
<eSummaryResult>\n
\n
<DocSum>\n
\t<Id>219722938</Id>\n
\t<Item Name="Caption" Type="String">NR_024570</Item>\n
\t<Item Name="Title" Type="String">Escherichia coli strain U 5/41 16S ribosomal RNA, partial sequence</Item>\n
\t<Item Name="Extra" Type="String">gi|219722938|ref|NR_024570.1|[219722938]</Item>\n
\t<Item Name="Gi" Type="Integer">219722938</Item>\n
\t<Item Name="CreateDate" Type="String">1996/03/29</Item>\n
\t<Item Name="UpdateDate" Type="String">2019/03/11</Item>\n
\t<Item Name="Flags" Type="Integer">512</Item>\n
\t<Item Name="TaxId" Type="Integer">562</Item>\n
\t<Item Name="Length" Type="Integer">1450</Item>\n
\t<Item Name="Status" Type="String">live</Item>\n
\t<Item Name="ReplacedBy" Type="String"></Item>\n
\t<Item Name="Comment" Type="String"><![CDATA[ ]]></Item>\n
\t<Item Name="AccessionVersion" Type="String">NR_024570.1</Item>\n
</DocSum>\n</eSummaryResult>\n'
retmode="txt", rettype="uilist"
入力例
handle = Entrez.efetch(db="nuccore", id='NR_024570', retmode="txt", rettype="uilist")
print(handle.read())
出力例
219722938
retmode="xml", rettype="uilist"
入力例
handle = Entrez.efetch(db="nuccore", id='NR_024570', retmode="xml", rettype="uilist")
print(handle.read())
出力例
b'<?xml version="1.0" ?>\n
<!DOCTYPE eEfetchResult PUBLIC "-//NLM//DTD efetch 20020605//EN" "https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20020605/uilist.dtd">\n
<IdList>\n
<Id>219722938</Id>\n
</IdList>\n'