More than 1 year has passed since last update.

DDBJの提供する微生物ゲノム遺伝子アノテーションツール, DFASTをGoogle Colaboratoryで利用する

Posted at 2022-09-22

DFASTはDDBJの提供する微生物ゲノム遺伝子アノテーションツールである。

通常はブラウザ版を利用すれば良いが、Google Colaboratoryでも利用可能である。

その利用方法を記述する。
出力ファイルが大きいため、Google Drive上に目的のファイルを保存し、Google drive上に出力することを推奨する。

DFASTのインストール

condaを利用してインストールすると簡便で良い。
https://anaconda.org/bioconda/dfast

最初のセルに以下のコードをコピーし実行する。

#@title installing miniconda and dfast
%%bash
MINICONDA_INSTALLER_SCRIPT=Miniconda3-4.5.4-Linux-x86_64.sh
MINICONDA_PREFIX=/usr/local
wget https://repo.continuum.io/miniconda/$MINICONDA_INSTALLER_SCRIPT
chmod +x $MINICONDA_INSTALLER_SCRIPT
./$MINICONDA_INSTALLER_SCRIPT -b -f -p $MINICONDA_PREFIX

#installing dfast
conda install -y -c bioconda dfast

続いてリファレンスデータベースのインストールを行う。
以下のコードを次のセルに入力し実行するとよい。

#@title reference database install
!dfast_file_downloader.py --protein dfast
!dfast_file_downloader.py --cdd Cog --hmm TIGR

helpを実行し、内容を確認する (しなくてもよい)。

#@title DFAST ヘルプの実行
!dfast -h

DFASTの実行

ファイルをアップロードし実行する場合。
subprocessを利用して実行するとよい。

import subprocess
from subprocess import Popen

#コマンドの作成
cmd = 'dfast ' 
cmd += '--genome file_name.fasta ' #入力するファイル名を指定
cmd += '--out folder_name' #出力するフォルダ名の指定

#dfastの実行
test = subprocess.Popen(cmd, shell=True, text=True,\
                        stdout=subprocess.PIPE, stderr=subprocess.PIPE)
outs, errs = test.communicate()

#ログファイルの記録
with open('Dfast_log.txt','a') as f:
  f.write(outs)
  f.write("\n")
  f.write(errs)
  f.write("\n")

Google drive上で実行する場合

Google drive上にファイルを設置し実行、Google drive上にファイルを出力すると良い。

#Google driveへのアクセス
from google.colab import drive
drive.mount('/content/drive')

上述のfile_name.fasta、folder_nameにgoogle driveのパスを追加し、実行すれば良い。
以下は実際のコマンドの例である。

#コマンドの作成
cmd = 'dfast ' 
cmd += '--genome /content/drive/MyDrive/file_name.fasta ' #入力するファイル名を指定
cmd += '--out /content/drive/MyDrive/folder_name' #出力するフォルダ名の指定

複数のファイルをまとめて実行したい場合

指定したフォルダのファイル全てに対して実行する場合は以下の操作を利用すると良い。
フォルダ名の指定とファイル名の取得を以下の操作で行う。

#目的のGoogle drive上のフォルダ名の入力。最後に「/」を入れる。
DRIVE = '/content/drive/MyDrive/'

#DRIVE内のファイルのリストを取得
files = os.listdir(DRIVE)

#fastaファイルのみを集める
files3 = []  
for i in files:
    if '.fasta' in i:
      files3.append(i)
print(files3)

DFASTを実行するためのdefの作成

def dfast_run(file_name):
  cmd = 'dfast ' 
  cmd += '--genome ' + DRIVE + file_name 
  cmd += ' --out ' + DRIVE + file_name[:-6] #[:-6]はfastaの拡張子を削除するため
  #cmdをプリントし不備がないか確認する
  print(cmd)
  test = subprocess.Popen(cmd, shell=True, text=True,stdout=subprocess.PIPE, stderr=subprocess.PIPE)
  outs, errs = test.communicate()
  #logの保存
  with open(DRIVE + "Dfast_log.txt",'a') as f:
    f.write(outs)
    f.write("\n")
    f.write(errs)
    f.write("\n")

解析の実行

#files3にはfastaファイルのリストが入っている。
for i in files3:
  dfast_run(i)

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up