LoginSignup
0
0

More than 1 year has passed since last update.

Biopythonのefetchを利用した際に出力される情報のまとめ

Last updated at Posted at 2022-11-17

Biopythonの持つ機能、efetchを利用することでpython経由で配列情報などを取得することができる。
詳しくは提供元のページをみるとよい。

https://biopython.org/docs/1.75/api/Bio.Entrez.html
今回はGoogle colaboratoryで実施し下さいの出力データについてまとめる。

Biopythonのインストール

以下の方法でbiopythonをインストールする。

!pip install biopython

基本操作

biopythonインストール後以下の操作で配列を取得できる。

#moduleのimport
from Bio import Entrez

#idなどは実際のidなどを記入する必要がある。
Entrez.email = xxxx@xxxxx.com
handle = Entrez.efetch(db="protein", id="GenBank_id", rettype="gb", retmode="text")

実際に大腸菌のflavin reductaseを用いて以下のコードを実行してみる。
典型的なgenebankフォーマットがダウンロードされていることがわかります。

Entrez.email = xxxx@xxxxx.com
handle = Entrez.efetch(db="protein", id="PWL87550.1", rettype="gb", retmode="text")
print(handle.read())

この操作で以下のデータが出力される。

LOCUS       PWL87550                 202 aa            linear   ENV 23-MAY-2018
DEFINITION  MAG: flavin reductase [Escherichia coli].
ACCESSION   PWL87550
VERSION     PWL87550.1
DBLINK      BioProject: PRJNA397219
            BioSample: SAMN08295014
DBSOURCE    accession QAMB01000025.1
#以下略

保存したい場合は以下のコードを追記するとよい

with open('PWL87550.1.gb','w') as f:
    f.write(handle.read())

rettypeやretmodeを変更することによって異なる情報やファイルが得られます。
詳しくはこちらのtableを参照します。

proteinデータベースにアクセスする場合

retmode="text", rettypeなし

もっとも単純な場合は以下の操作です。

Entrez.email = e_mail
handle = Entrez.efetch(db="protein", id='PWL87550.1', retmode="text")
print(handle.read())

出力結果、ASN.1形式が出力されます。
https://www.ncbi.nlm.nih.gov/Structure/asn1.html

Seq-entry ::= set {
  id id 1,
  class nuc-prot,
  descr {
    source {
      org {
        taxname "Escherichia coli",
        db {
          {
            db "taxon",
            tag id 562
          }
        },
        orgname {
          name binomial {
            genus "Escherichia",
            species "coli"
          },
          attrib "specified",
          mod {
            {
              subtype isolate,
              subname "CIM:MAG 560"
#以下略

retmode="text", rettype="gb"

以下のコードを入力した場合

handle = Entrez.efetch(db="protein", id='PWL87550.1', retmode="text", rettype="gb")
print(handle.read())

gb形式で出力されます。

LOCUS       PWL87550                 202 aa            linear   ENV 23-MAY-2018
DEFINITION  MAG: flavin reductase [Escherichia coli].
ACCESSION   PWL87550
VERSION     PWL87550.1
DBLINK      BioProject: PRJNA397219
            BioSample: SAMN08295014
DBSOURCE    accession QAMB01000025.1
KEYWORDS    ENV; Metagenome Assembled Genome; MAG.
SOURCE      Escherichia coli (human gut metagenome)
#以下略

retmode="text", rettype="gp"

入力例

handle = Entrez.efetch(db="protein", id='PWL87550.1', retmode="text", rettype="gp")
print(handle.read())

上と同様にgbフォーマットでダウンロードされます。出力は省略します。

retmode="xml", rettype="gp"

入力例

handle = Entrez.efetch(db="protein", id='PWL87550.1', retmode="xml", rettype="gp")
print(handle.read())

出力例、以下のようにxml形式のファイルが出力されます。情報の抽出にはこちらの方が使いやすいかもしれません。

b'<?xml version="1.0" encoding="UTF-8"  ?>\n
<!DOCTYPE GBSet PUBLIC "-//NCBI//NCBI GBSeq/EN" "https://www.ncbi.nlm.nih.gov/dtd/NCBI_GBSeq.dtd">\n
<GBSet>\n
  <GBSeq>\n
\n
    <GBSeq_locus>PWL87550</GBSeq_locus>\n
    <GBSeq_length>202</GBSeq_length>\n
    <GBSeq_moltype>AA</GBSeq_moltype>\n
    <GBSeq_topology>linear</GBSeq_topology>\n
    <GBSeq_division>ENV</GBSeq_division>\n
    <GBSeq_update-date>23-MAY-2018</GBSeq_update-date>\n
    <GBSeq_create-date>23-MAY-2018</GBSeq_create-date>\n
    <GBSeq_definition>MAG: flavin reductase [Escherichia coli]</GBSeq_definition>\n

#以下略

retmode="xml", rettype="gpc"

入力例

handle = Entrez.efetch(db="protein", id='PWL87550.1', retmode="xml", rettype="gpc")
print(handle.read())

出力例

b'<?xml version="1.0" encoding="UTF-8"  ?>\n
<!DOCTYPE INSDSet PUBLIC "-//NCBI//INSD INSDSeq/EN" "https://www.ncbi.nlm.nih.gov/dtd/INSD_INSDSeq.dtd">\n
<INSDSet>\n
  <INSDSeq>\n
\n
    <INSDSeq_locus>PWL87550</INSDSeq_locus>\n
    <INSDSeq_length>202</INSDSeq_length>\n
    <INSDSeq_moltype>AA</INSDSeq_moltype>\n
    <INSDSeq_topology>linear</INSDSeq_topology>\n
    <INSDSeq_division>ENV</INSDSeq_division>\n
    <INSDSeq_update-date>23-MAY-2018</INSDSeq_update-date>\n
    <INSDSeq_create-date>23-MAY-2018</INSDSeq_create-date>\n
    <INSDSeq_definition>MAG: flavin reductase [Escherichia coli]</INSDSeq_definition>\n
    <INSDSeq_primary-accession>PWL87550</INSDSeq_primary-accession>\n
    <INSDSeq_accession-version>PWL87550.1</INSDSeq_accession-version>\n
  

retmode="xml", rettype="ipg"

入力例

handle = Entrez.efetch(db="protein", id='PWL87550.1', retmode="xml", rettype="ipg")
print(handle.read())

出力例。由来となるゲノムのidを獲得可能です。

b'<?xml version="1.0" encoding="UTF-8"  ?>\n
<IPGReportSet xmlns:xsi="https://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="https://www.ncbi.nlm.nih.gov/data_specs/schema/other/seq_report/IPGReportSet.xsd">\n\n
<IPGReport  ipg="188447752" product_acc="PWL87550.1">\n
<Product  accver="PWL87550.1" name="flavin reductase" taxid="562" slen="202" org="Escherichia coli" kingdom_taxid="2" kingdom="Bacteria"/>\n
<ProteinList>\n
<Protein  accver="PWL87550.1" source="INSDC" name="flavin reductase" taxid="562" org="Escherichia coli" kingdom_taxid="2" kingdom="Bacteria" priority="0">\n
<CDSList>\n
<CDS  accver="QAMB01000025.1" start="220007" stop="220615" strand="+" taxid="562" org="Escherichia coli" kingdom_taxid="2" kingdom="Bacteria" assembly="GCA_003149995.1"/></CDSList></Protein></ProteinList>\n
<Statistics  prot_count="1" nuc_count="1" assmb_count="1"/></IPGReport></IPGReportSet>\n'

retmode="text", rettype="ft"

入力例

handle = Entrez.efetch(db="protein", id='PWL87550.1', retmode="text", rettype="ft")
print(handle.read())

feature tableが出力される。

>Feature gb|PWL87550.1|
1	202	Protein
			product	flavin reductase
8	159	Region
			region	Flavin_Reduct
			note	Flavin reductase like domain
			db_xref	CDD:426345
169	202	Region
			region	rubredoxin_SM
			note	Rubredoxin, Small Modular nonheme iron binding domain containing a [Fe(SCys)4] center, present in rubrerythrin and nigerythrin and detected either N- or C-terminal to such proteins as flavin reductase, NAD(P)H-nitrite reductase, and ...
			db_xref	CDD:238371
171	171	Site
174	174
189	189
192	192
			site_type	other
			note	iron binding site [ion binding]
			db_xref	CDD:238371
1	202	CDS
			product	flavin reductase
			transl_table	11
			protein_id	gb|PWL87550.1||gnl|WGS:QAMB|DBY14_05325
			inference	COORDINATES: similar to AA sequence:RefSeq:WP_009288844.1

retmode="text", rettype="acc"

入力例

handle = Entrez.efetch(db="protein", id="PWL87550.1", retmode="text", rettype="acc")
print(handle.read())

出力、IDのみ出力されます。

PWL87550.1

retmode="text", rettype="fasta"

入力例

handle = Entrez.efetch(db="protein", id="PWL87550.1", retmode="text", rettype="fasta")
print(handle.read())

出力例、fasta形式で出力されます。

>PWL87550.1 MAG: flavin reductase [Escherichia coli]
MSYDKNVLFNISYGLYILSANSSNADNACVINTLSQITSSPDTVSVTVNKKNFTNEMIKKTKKLNVSILS
TDADFELIKRFGFQSGRDIDKFVGFKDMYRSQNGIYFINKGANSFISVDIDEIIDFDTHNMFVGHITDTA
VLSDKESLTYSYYQNNIKPKQNENKKTGYVCTVCGYIHESDTLPDDFVCPVCKHGASAFVKL

retmode="xml"、rettype = "acc"

入力例

handle = Entrez.efetch(db="protein", id="PWL87550.1", retmode="xml", rettype="fasta")
print(handle.read())

出力例

b'<?xml version="1.0" encoding="UTF-8"  ?>\n
<!DOCTYPE TSeqSet PUBLIC "-//NCBI//NCBI TSeq/EN" "https://www.ncbi.nlm.nih.gov/dtd/NCBI_TSeq.dtd">\n<TSeqSet>\n<TSeq>\n  <TSeq_seqtype value="protein"/>\n  
<TSeq_accver>PWL87550.1</TSeq_accver>\n  
<TSeq_sid>gnl|WGS:QAMB|DBY14_05325</TSeq_sid>\n  
<TSeq_taxid>562</TSeq_taxid>\n  
<TSeq_orgname>Escherichia coli</TSeq_orgname>\n  
<TSeq_defline>MAG: flavin reductase [Escherichia coli]</TSeq_defline>\n
<TSeq_length>202</TSeq_length>\n
<TSeq_sequence>MSYDKNVLFNISYGLYILSANSSNADNACVINTLSQITSSPDTVSVTVNKKNFTNEMIKKTKKLNVSILSTDADFELIKRFGFQSGRDIDKFVGFKDMYRSQNGIYFINKGANSFISVDIDEIIDFDTHNMFVGHITDTAVLSDKESLTYSYYQNNIKPKQNENKKTGYVCTVCGYIHESDTLPDDFVCPVCKHGASAFVKL</TSeq_sequence>\n
</TSeq>\n\n
</TSeqSet>\n'

retmode="text"、rettype = "seqid"

入力例

handle = Entrez.efetch(db="protein", id="PWL87550.1", retmode="text", rettype="seqid")
print(handle.read())

出力例

Seq-id ::= genbank {
  accession "PWL87550" ,
  version 1 }
Seq-id ::= general {
  db "WGS:QAMB" ,
  tag
    str "DBY14_05325" }
Seq-id ::= gi 1391407525

retmode="xml", rettype="docsum"

入力例

handle = Entrez.efetch(db="protein", id="PWL87550.1", retmode="xml", rettype="docsum")
print(handle.read())

出力例

b'<?xml version="1.0" encoding="UTF-8"  ?>\n
<!DOCTYPE eSummaryResult PUBLIC "-//NLM//DTD esummary v1 20041029//EN" "https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20041029/esummary-v1.dtd">\n
<eSummaryResult>\n\n
<DocSum>\n
\t<Id>1391407525</Id>\n
\t<Item Name="Caption" Type="String">PWL87550</Item>\n
\t<Item Name="Title" Type="String">flavin reductase [Escherichia coli]</Item>\n
\t<Item Name="Extra" Type="String">gi|1391407525|gb|PWL87550.1||gnl|WGS:QAMB|DBY14_05325[1391407525]</Item>\n
\t<Item Name="Gi" Type="Integer">1391407525</Item>\n
\t<Item Name="CreateDate" Type="String">2018/05/23</Item>\n
\t<Item Name="UpdateDate" Type="String">2018/05/23</Item>\n
\t<Item Name="Flags" Type="Integer">0</Item>\n
\t<Item Name="TaxId" Type="Integer">562</Item>\n
\t<Item Name="Length" Type="Integer">202</Item>\n
\t<Item Name="Status" Type="String">live</Item>\n
\t<Item Name="ReplacedBy" Type="String"></Item>\n
\t<Item Name="Comment" Type="String"><![CDATA[  ]]></Item>\n
\t<Item Name="AccessionVersion" Type="String">PWL87550.1</Item>\n
</DocSum>\n</eSummaryResult>\n'

retmode="txt", rettype="uilist"

入力例

handle = Entrez.efetch(db="protein", id="PWL87550.1", retmode="txt", rettype="uilist")
print(handle.read())

出力例

1391407525

retmode="xml", rettype="uilist"

入力例

handle = Entrez.efetch(db="protein", id="PWL87550.1", retmode="xml", rettype="uilist")
print(handle.read())

出力例

b'<?xml version="1.0" ?>\n
<!DOCTYPE eEfetchResult PUBLIC "-//NLM//DTD efetch 20020605//EN" "https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20020605/uilist.dtd">\n
<IdList>\n
    <Id>1391407525</Id>\n</IdList>\n'

nucleotideデータベースにアクセスする場合

db="nuccore"とします。
入力例, rettypeを指定しない場合です。

handle = Entrez.efetch(db="nuccore", id='NR_024570', retmode="txt")
print(handle.read())

ASN.1形式で出力されます。出力例は省略します。

retmode="txt", rettype="acc"

入力例

handle = Entrez.efetch(db="nuccore", id='NR_024570', retmode="txt", rettype="acc")
print(handle.read())

出力例

NR_024570.1

retmode="txt", rettype="fasta"

入力例

handle = Entrez.efetch(db="nuccore", id='NR_024570', retmode="txt", rettype="fasta")
print(handle.read())

出力例

>NR_024570.1 Escherichia coli strain U 5/41 16S ribosomal RNA, partial sequence
AGTTTGATCATGGCTCAGATTGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAACGGTAACAGGAAG
CAGCTTGCTGCTTTGCTGACGAGTGGCGGACGGGTGAGTAATGTCTGGGAAACTGCCTGATGGAGGGGGA
TAACTACTGGAAACGGTAGCTAATACCGCATAACGTCGCAAGCACAAAGAGGGGGACCTTAGGGCCTCTT
GCCATCGGATGTGCCCAGATGGGATTAGCTAGTAGGTGGGGTAACGGCTCACCTAGGCGACGATCCCTAG
#以下略

retmode="xml", rettype="fasta"

入力例 ``` handle = Entrez.efetch(db="nuccore", id='NR_024570', retmode="xml", rettype="fasta") print(handle.read()) ```

出力例

b'<?xml version="1.0" encoding="UTF-8"  ?>\n
<!DOCTYPE TSeqSet PUBLIC "-//NCBI//NCBI TSeq/EN" "https://www.ncbi.nlm.nih.gov/dtd/NCBI_TSeq.dtd">\n<TSeqSet>\n<TSeq>\n  
<TSeq_seqtype value="nucleotide"/>\n  
<TSeq_accver>NR_024570.1</TSeq_accver>\n  
<TSeq_taxid>562</TSeq_taxid>\n  
<TSeq_orgname>Escherichia coli</TSeq_orgname>\n  
<TSeq_defline>Escherichia coli strain U 5/41 16S ribosomal RNA, partial sequence</TSeq_defline>\n  
<TSeq_length>1450</TSeq_length>\n  
<TSeq_sequence>AGTTTGATCATGGCTCAGATTGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAACGGTAACAGGAAGCAGCTTGCTGCTTTGCTGACGAGTGGCGGACGGGTGAGTAATGTCTGGGAAACTGCCTGATGGAGGGGGATAACTACTGGAAACGGTAGCTAATACCGCATAACGTCGCAAGCACAAAGAGGGGGACCTTAGGGCCTCTTGCCATCGGATGTGCCCAGATGGGAT
#以下略

retmode="txt", rettype="seqid"

入力例

handle = Entrez.efetch(db="nuccore", id='NR_024570', retmode="txt", rettype="seqid")
print(handle.read())

出力例

Seq-id ::= other {
  accession "NR_024570" ,
  version 1 }
Seq-id ::= gi 219722938

retmode="txt", rettype="gb"

入力例 ``` handle = Entrez.efetch(db="nuccore", id='NR_024570', retmode="txt", rettype="gb") print(handle.read()) ```

出力例

LOCUS       NR_024570               1450 bp    rRNA    linear   BCT 11-MAR-2019
DEFINITION  Escherichia coli strain U 5/41 16S ribosomal RNA, partial sequence.
ACCESSION   NR_024570
VERSION     NR_024570.1
DBLINK      Project: 33175
            BioProject: PRJNA33175
KEYWORDS    RefSeq.
SOURCE      Escherichia coli
  ORGANISM  Escherichia coli
            Bacteria; Proteobacteria; Gammaproteobacteria; Enterobacterales;
            Enterobacteriaceae; Escherichia.
REFERENCE   1
  AUTHORS   Cilia,V., Lafay,B. and Christen,R.
  TITLE     Sequence heterogeneities among 16S ribosomal RNA sequences, and
            their effect on phylogenetic analyses at the species level
  JOURNAL   Mol. Biol. Evol. 13 (3), 451-461 (1996)
#以下略

retmode="xml", rettype="gb"

入力例 ``` handle = Entrez.efetch(db="nuccore", id='NR_024570', retmode="xml", rettype="gb") print(handle.read()) ```

出力例

b'<?xml version="1.0" encoding="UTF-8"  ?>\n<!DOCTYPE GBSet PUBLIC "-//NCBI//NCBI GBSeq/EN" "https://www.ncbi.nlm.nih.gov/dtd/NCBI_GBSeq.dtd">\n<GBSet>\n  <GBSeq>\n\n    <GBSeq_locus>NR_024570</GBSeq_locus>\n    <GBSeq_length>1450</GBSeq_length>\n    <GBSeq_strandedness>single</GBSeq_strandedness>\n    <GBSeq_moltype>rRNA</GBSeq_moltype>\n    <GBSeq_topology>linear</GBSeq_topology>\n    <GBSeq_division>BCT</GBSeq_division>\n    <GBSeq_update-date>11-MAR-2019</GBSeq_update-date>\n    <GBSeq_create-date>07-JAN-2009</GBSeq_create-date>\n    <GBSeq_definition>Escherichia coli strain U 5/41 16S ribosomal RNA, partial sequence</GBSeq_definition>\n    <GBSeq_primary-accession>NR_024570</GBSeq_primary-accession>\n    <GBSeq_accession-version>NR_024570.1</GBSeq_accession-version>\n    <GBSeq_other-seqids>\n      <GBSeqid>ref|NR_024570.1|</GBSeqid>\n      <GBSeqid>gi|219722938</GBSeqid>\n    </GBSeq_other-seqids>\n    <GBSeq_project>PRJNA33175</GBSeq_project>\n    <GBSeq_keywords>\n      <GBKeyword>RefSeq</GBKeyword>\n    </GBSeq_keywords>\n    <GBSeq_source>Escherichia coli</GBSeq_source>\n    <GBSeq_organism>Escherichia coli</GBSeq_organism>\n    <GBSeq_taxonomy>Bacteria; Proteobacteria; Gammaproteobacteria; Enterobacterales; Enterobacteriaceae; Escherichia</GBSeq_taxonomy>\n    <GBSeq_references>\n  
#以下略

retmode="xml", rettype="gbc"

入力例

handle = Entrez.efetch(db="nuccore", id='NR_024570', retmode="xml", rettype="gbc")
print(handle.read())

出力例

b'<?xml version="1.0" encoding="UTF-8"  ?>\n<!DOCTYPE INSDSet PUBLIC "-//NCBI//INSD INSDSeq/EN" "https://www.ncbi.nlm.nih.gov/dtd/INSD_INSDSeq.dtd">\n<INSDSet>\n  <INSDSeq>\n\n    <INSDSeq_locus>NR_024570</INSDSeq_locus>\n    <INSDSeq_length>1450</INSDSeq_length>\n    <INSDSeq_strandedness>single</INSDSeq_strandedness>\n    <INSDSeq_moltype>rRNA</INSDSeq_moltype>\n    <INSDSeq_topology>linear</INSDSeq_topology>\n    <INSDSeq_division>BCT</INSDSeq_division>\n    <INSDSeq_update-date>11-MAR-2019</INSDSeq_update-date>\n    <INSDSeq_create-date>07-JAN-2009</INSDSeq_create-date>\n    <INSDSeq_definition>Escherichia coli strain U 5/41 16S ribosomal RNA, partial sequence</INSDSeq_definition>\n    <INSDSeq_primary-accession>NR_024570</INSDSeq_primary-accession>\n    <INSDSeq_accession-version>NR_024570.1</INSDSeq_accession-version>\n    <INSDSeq_other-seqids>\n      <INSDSeqid>ref|NR_024570.1|</INSDSeqid>\n      <INSDSeqid>gi|219722938</INSDSeqid>\n    </INSDSeq_other-seqids>\n    <INSDSeq_project>PRJNA33175</INSDSeq_project>\n    <INSDSeq_keywords>\n      <INSDKeyword>RefSeq</INSDKeyword>\n    </INSDSeq_keywords>\n    <INSDSeq_source>Escherichia coli</INSDSeq_source>\n    <INSDSeq_organism>Escherichia coli</INSDSeq_organism>\n  
#以下略

retmode="txt", rettype="gbwithparts"

入力例

handle = Entrez.efetch(db="nuccore", id='NR_024570', retmode="txt", rettype="gbwithparts")
print(handle.read())

出力例

LOCUS       NR_024570               1450 bp    rRNA    linear   BCT 11-MAR-2019
DEFINITION  Escherichia coli strain U 5/41 16S ribosomal RNA, partial sequence.
ACCESSION   NR_024570
VERSION     NR_024570.1
DBLINK      Project: 33175
            BioProject: PRJNA33175
KEYWORDS    RefSeq.
SOURCE      Escherichia coli
  ORGANISM  Escherichia coli
            Bacteria; Proteobacteria; Gammaproteobacteria; Enterobacterales;
            Enterobacteriaceae; Escherichia.
#以下略

retmode="txt", rettype="fasta_cds_na"

入力例

handle = Entrez.efetch(db="nuccore", id='HQ828986.1', retmode="txt", rettype="fasta_cds_na")
print(handle.read())

出力例

>lcl|HQ828986.1_cds_AEM44333.1_1 [protein=putative NDP-hexose 2, 3-dehydratase] [protein_id=AEM44333.1] [location=complement(<1..1400)] [gbkey=CDS]
ATGTCACCCAAGCTCGCGGCGGCGACCGTACTGCGGCCACGGCAGGACGCGGGCCCCGCCGGCAGGATCG
CCCGCTCCGCGGCCACGGCCGAGGGCGCGCACCTGGCTCCGGGCGAGTTCCACGCGTGGTTCGCCCAGCG
GCGCGCGGCTCATTCCTTCCGCGTCGACCGCATCCCCTTCGCGGAGCTGGAGGGCTGGTCCACCCAGCGG
GACACCGGCAACCTGGTGCACCGCAGCGGGCGGTTCTTCAGCGTCGAGGGCCTGGCGGTGAGCGTGGACG
GCGGCCCGGACGGCTGGCACCAGCCCGTCATCCGCCAGCCGGAGACCGGCATTCTGGGCATCCTCGTCAA
GGAGTTCGACGGCGTCCTGCACTGCCTGATGCAGGCCAAGATG
#以下略、CDSのDNA配列のfastaが出力される。

retmode="txt", rettype="fasta_cds_aa"

入力例

handle = Entrez.efetch(db="nuccore", id='HQ828986.1', retmode="txt", rettype="fasta_cds_aa")
print(handle.read())

出力例

>lcl|HQ828986.1_prot_AEM44333.1_1 [protein=putative NDP-hexose 2, 3-dehydratase] [protein_id=AEM44333.1] [location=complement(<1..1400)] [gbkey=CDS]
MSPKLAAATVLRPRQDAGPAGRIARSAATAEGAHLAPGEFHAWFAQRRAAHSFRVDRIPFAELEGWSTQR
DTGNLVHRSGRFFSVEGLAVSVDGGPDGWHQPVIRQPETGILGILVKEFDGVLHCLMQAKMEPGNPNLLQ
LSPTVQATRSNYTKVHRGADVKYIEYFTRPERGAVLADVLQSEHGSWFLHKHNRNMIVETTGDVPPDDDF
RWLTLGQIAELLRLDNVVNMDARTVLSCVPRTAAQCEPAALHSDAELRAWLTEARARHDVHAERVPLAGL
PGWARDDSSIHHLEGRYFEVVAASVQAGSREVTSWTQPLIRPRGRGVVAFLTRRINGVPHLLAHARTEGG
#以下略、CDSのアミノ酸配列のfastaが出力される

retmode="xml", rettype="docsum"

入力例

handle = Entrez.efetch(db="nuccore", id='NR_024570', retmode="xml", rettype="docsum")
print(handle.read())

出力例

b'<?xml version="1.0" encoding="UTF-8"  ?>\n
<!DOCTYPE eSummaryResult PUBLIC "-//NLM//DTD esummary v1 20041029//EN" "https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20041029/esummary-v1.dtd">\n
<eSummaryResult>\n
\n
<DocSum>\n
\t<Id>219722938</Id>\n
\t<Item Name="Caption" Type="String">NR_024570</Item>\n
\t<Item Name="Title" Type="String">Escherichia coli strain U 5/41 16S ribosomal RNA, partial sequence</Item>\n
\t<Item Name="Extra" Type="String">gi|219722938|ref|NR_024570.1|[219722938]</Item>\n
\t<Item Name="Gi" Type="Integer">219722938</Item>\n
\t<Item Name="CreateDate" Type="String">1996/03/29</Item>\n
\t<Item Name="UpdateDate" Type="String">2019/03/11</Item>\n
\t<Item Name="Flags" Type="Integer">512</Item>\n
\t<Item Name="TaxId" Type="Integer">562</Item>\n
\t<Item Name="Length" Type="Integer">1450</Item>\n
\t<Item Name="Status" Type="String">live</Item>\n
\t<Item Name="ReplacedBy" Type="String"></Item>\n
\t<Item Name="Comment" Type="String"><![CDATA[  ]]></Item>\n
\t<Item Name="AccessionVersion" Type="String">NR_024570.1</Item>\n
</DocSum>\n</eSummaryResult>\n'

retmode="txt", rettype="uilist"

入力例

handle = Entrez.efetch(db="nuccore", id='NR_024570', retmode="txt", rettype="uilist")
print(handle.read())

出力例

219722938

retmode="xml", rettype="uilist"

入力例

handle = Entrez.efetch(db="nuccore", id='NR_024570', retmode="xml", rettype="uilist")
print(handle.read())

出力例

b'<?xml version="1.0" ?>\n
<!DOCTYPE eEfetchResult PUBLIC "-//NLM//DTD efetch 20020605//EN" "https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20020605/uilist.dtd">\n
<IdList>\n
    <Id>219722938</Id>\n
</IdList>\n'
0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0