LoginSignup
0
0

More than 3 years have passed since last update.

Biopython Tutorial and Cookbook和訳(4.8)

Last updated at Posted at 2020-08-13

4.8 Adding SeqRecord objects

4.7へ

You can add SeqRecord objects together, giving a new SeqRecord.
What is important here is that any common per-letter annotations are also added, all the features are preserved (with their locations adjusted), and any other common annotation is also kept (like the id, name and description).
SeqRecord同士の足し算は可能で、新しいSeqRecordが返されます。
重要なのはper-letter annotationsも結合します、すべてのfeaturesが保留されます、そして他のannotationも保留されます。(id、name、descriptionなど)

For an example with per-letter annotation, we’ll use the first record in a FASTQ file. Chapter 5 will explain the SeqIO functions:
per-letter annotationの例で説明します、あるFASTQファイルの最初の記録を使います。第5章ではSeqIOを説明します。

>>> from Bio import SeqIO
>>> record = next(SeqIO.parse("example.fastq", "fastq"))
>>> len(record)
25
>>> print(record.seq)
CCCTTCTTGTCTTCAGCGTTTCTCC

>>> print(record.letter_annotations["phred_quality"])
[26, 26, 18, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 22, 26, 26, 26, 26,
26, 26, 26, 23, 23]

Let’s suppose this was Roche 454 data, and that from other information you think the TTT should be only TT.
We can make a new edited record by first slicing the SeqRecord before and after the “extra” third T:
これはRoche 454によるデータだと仮定します、他の情報からTTTはTTのはずだとわかりました。
3番目のTの前後のSeqRecordをスライスすることによって新しいrecordを作成できます。

>>> left = record[:20]
>>> print(left.seq)
CCCTTCTTGTCTTCAGCGTT
>>> print(left.letter_annotations["phred_quality"])
[26, 26, 18, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 22, 26, 26, 26, 26]
>>> right = record[21:]
>>> print(right.seq)
CTCC
>>> print(right.letter_annotations["phred_quality"])
[26, 26, 23, 23]

Now add the two parts together:
スライスした二つの部分を結合させます。

>>> edited = left + right
>>> len(edited)
24
>>> print(edited.seq)
CCCTTCTTGTCTTCAGCGTTCTCC

>>> print(edited.letter_annotations["phred_quality"])
[26, 26, 18, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 22, 26, 26, 26, 26,
26, 26, 23, 23]

Easy and intuitive? We hope so! You can make this shorter with just:
簡単で直感的でしょう?さらに以下のように簡略化できます:

>>> edited = record[:20] + record[21:]

Now, for an example with features, we’ll use a GenBank file. Suppose you have a circular genome:
GenBankファイルを用いてfeaturesの例を説明します。環状ゲノムだと仮定します。

>>> from Bio import SeqIO
>>> record = SeqIO.read("NC_005816.gb", "genbank")

>>> record
SeqRecord(seq=Seq('TGTAACGAACGGTGCAATAGTGATCCACACCCAACGCCTGAAATCAGATCCAGG...CTG',
IUPACAmbiguousDNA()), id='NC_005816.1', name='NC_005816',
description='Yersinia pestis biovar Microtus str. 91001 plasmid pPCP1, complete sequence.',
dbxrefs=['Project:10638'])

>>> len(record)
9609
>>> len(record.features)
41
>>> record.dbxrefs
['Project:58037']

>>> record.annotations.keys()
['comment', 'sequence_version', 'source', 'taxonomy', 'keywords', 'references',
'accessions', 'data_file_division', 'date', 'organism', 'gi']

You can shift the origin like this:
以下のように始点を変更できます。

>>> shifted = record[2000:] + record[:2000]

>>> shifted
SeqRecord(seq=Seq('GATACGCAGTCATATTTTTTACACAATTCTCTAATCCCGACAAGGTCGTAGGTC...GGA',
IUPACAmbiguousDNA()), id='NC_005816.1', name='NC_005816',
description='Yersinia pestis biovar Microtus str. 91001 plasmid pPCP1, complete sequence.',
dbxrefs=[])

>>> len(shifted)
9609

Note that this isn’t perfect in that some annotation like the database cross references and one of the features (the source feature) have been lost:
Note:このやり方は完璧ではないです、dbxrefsやsource featureが失われます。

>>> len(shifted.features)
40
>>> shifted.dbxrefs
[]
>>> shifted.annotations.keys()
[]

This is because the SeqRecord slicing step is cautious in what annotation it preserves (erroneously propagating annotation can cause major problems).
If you want to keep the database cross references or the annotations dictionary, this must be done explicitly:
失われる理由はSeqRecordのスライス時にannotationの保留について慎重です(誤ったannotationは大問題を招きかねない)。
もしdbxrefsおよびannotations辞書を保留したい場合は明示しなければなりません。

>>> shifted.dbxrefs = record.dbxrefs[:]
>>> shifted.annotations = record.annotations.copy()
>>> shifted.dbxrefs
['Project:10638']
>>> shifted.annotations.keys()
['comment', 'sequence_version', 'source', 'taxonomy', 'keywords', 'references',
'accessions', 'data_file_division', 'date', 'organism', 'gi']

Also note that in an example like this, you should probably change the record identifiers since the NCBI references refer to the original unmodified sequence.
note:このような例では、recordのidentifiersも調整すべきです。(NCBIのreferencesは変更されていない元の配列を参照しているため)

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0