ゲノムに対してタンパク質あるいは核酸配列をBLASTで貼り付け、その結果をBEDフォーマットに変換したい。
blast2bed.py
Usage
$ ./blast2bed.py
usage: blast2bed.py [-h] -i INPUT [-o OUTPUT] [-s SCORE] [-e EVALUE]
convert BLASTN/BLASTX/TBLASTX output into BED format
optional arguments:
  -h, --help            show this help message and exit
  -i INPUT, --input INPUT
                        tab-separated blast output (required) with "-outfmt [6|7]"
  -o OUTPUT, --output OUTPUT
                        output BED format file (default: stdout)
  -s SCORE, --score SCORE
                        score (default: 0)
  -e EVALUE, --evalue EVALUE
                        E-value threshold (default: 1e-30)
tblastx -query WSSV.fa -subject CoBV.fa -outfmt 6 -out WSSV.CoBV.tblastx.out
WSSV.CoBV.tblastx.out
NC_003225.3     LC741431.1      70.242  289     86      0       163798  164664  210172  211038  0.0     443
NC_003225.3     LC741431.1      59.459  259     105     0       254436  255212  9995    9219    0.0     348
NC_003225.3     LC741431.1      66.341  205     69      0       227273  227887  165779  166393  0.0     347
NC_003225.3     LC741431.1      65.753  219     75      0       16089   15433   85251   85907   0.0     343
NC_003225.3     LC741431.1      60.889  225     88      0       305641  306315  52386   51712   0.0     328
NC_003225.3     LC741431.1      63.636  209     76      0       304672  305298  53370   52744   0.0     315
NC_003225.3     LC741431.1      75.723  173     42      0       176995  177513  221910  222428  0.0     313
NC_003225.3     LC741431.1      59.193  223     91      0       302092  302760  55860   55192   0.0     303
NC_003225.3     LC741431.1      57.282  206     88      0       116435  115818  137829  138446  0.0     279
...
./blast2bed.py -i WSSV.CoBV.tblastx.out -o WSSV.CoBV.bed -e 1e-10
WSSV.CoBV.bed
NC_003225.3     163798  164664  +       0       LC741431.1
NC_003225.3     254436  255212  +       0       LC741431.1
NC_003225.3     227273  227887  +       0       LC741431.1
NC_003225.3     16089   15433   -       0       LC741431.1
NC_003225.3     305641  306315  +       0       LC741431.1
NC_003225.3     304672  305298  +       0       LC741431.1
NC_003225.3     176995  177513  +       0       LC741431.1
NC_003225.3     302092  302760  +       0       LC741431.1
NC_003225.3     116435  115818  -       0       LC741431.1
NC_003225.3     164633  163806  -       0       LC741431.1
NC_003225.3     20141   19605   -       0       LC741431.1
NC_003225.3     303439  303864  +       0       LC741431.1
NC_003225.3     21670   22155   +       0       LC741431.1
NC_003225.3     10802   11305   +       0       LC741431.1
...
似たスクリプトが複数存在する。先人たちの苦労が偲ばれた。

