1
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

xml形式のgenbankデータから分類情報などを取り出す

1
Posted at

登録配列用スクリプト

Genbankの配列情報に関するxmlからは
以下のスクリプトでtaxon情報を取り出すことができる


import xml.etree.ElementTree as ET 

tree = ET.parse("./gene_file.xml") 
root = tree.getroot()

for child in root.findall('GBSeq'):
   accession = child.find('GBSeq_accession-version').text
   taxon = child.find('GBSeq_taxonomy').text
   for child in child.findall('GBSeq_feature-table'):
       for child in child.findall('GBFeature'):
           for child in child.findall('GBFeature_quals'):
               for child in child.findall('GBQualifier'):
                   if child.find('GBQualifier_value') is not None:
                       taxon_id = child.find('GBQualifier_value').text
                       if('taxon:' in taxon_id):
                           taxon_id_out = taxon_id
                   else:
                       taxon_id_out = ""
   out +=(accession+"\t"+taxon_id_out+ "\t"+ taxon +"\n")

with open("out10.taxon.txt", mode='w') as f:
   f.write(out)

なんで書いたか

flat fileからのパースがめんどくさい+例外が置きまくるので、xmlからの読み込み、抽出に挑戦した。

1
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?