ゲノムに対してタンパク質あるいは核酸配列をBLASTで貼り付け、その結果をBEDフォーマットに変換したい。
blast2bed.py
Usage
$ ./blast2bed.py
usage: blast2bed.py [-h] -i INPUT [-o OUTPUT] [-s SCORE] [-e EVALUE]
convert BLASTN/BLASTX/TBLASTX output into BED format
optional arguments:
-h, --help show this help message and exit
-i INPUT, --input INPUT
tab-separated blast output (required) with "-outfmt [6|7]"
-o OUTPUT, --output OUTPUT
output BED format file (default: stdout)
-s SCORE, --score SCORE
score (default: 0)
-e EVALUE, --evalue EVALUE
E-value threshold (default: 1e-30)
tblastx -query WSSV.fa -subject CoBV.fa -outfmt 6 -out WSSV.CoBV.tblastx.out
WSSV.CoBV.tblastx.out
NC_003225.3 LC741431.1 70.242 289 86 0 163798 164664 210172 211038 0.0 443
NC_003225.3 LC741431.1 59.459 259 105 0 254436 255212 9995 9219 0.0 348
NC_003225.3 LC741431.1 66.341 205 69 0 227273 227887 165779 166393 0.0 347
NC_003225.3 LC741431.1 65.753 219 75 0 16089 15433 85251 85907 0.0 343
NC_003225.3 LC741431.1 60.889 225 88 0 305641 306315 52386 51712 0.0 328
NC_003225.3 LC741431.1 63.636 209 76 0 304672 305298 53370 52744 0.0 315
NC_003225.3 LC741431.1 75.723 173 42 0 176995 177513 221910 222428 0.0 313
NC_003225.3 LC741431.1 59.193 223 91 0 302092 302760 55860 55192 0.0 303
NC_003225.3 LC741431.1 57.282 206 88 0 116435 115818 137829 138446 0.0 279
...
./blast2bed.py -i WSSV.CoBV.tblastx.out -o WSSV.CoBV.bed -e 1e-10
WSSV.CoBV.bed
NC_003225.3 163798 164664 + 0 LC741431.1
NC_003225.3 254436 255212 + 0 LC741431.1
NC_003225.3 227273 227887 + 0 LC741431.1
NC_003225.3 16089 15433 - 0 LC741431.1
NC_003225.3 305641 306315 + 0 LC741431.1
NC_003225.3 304672 305298 + 0 LC741431.1
NC_003225.3 176995 177513 + 0 LC741431.1
NC_003225.3 302092 302760 + 0 LC741431.1
NC_003225.3 116435 115818 - 0 LC741431.1
NC_003225.3 164633 163806 - 0 LC741431.1
NC_003225.3 20141 19605 - 0 LC741431.1
NC_003225.3 303439 303864 + 0 LC741431.1
NC_003225.3 21670 22155 + 0 LC741431.1
NC_003225.3 10802 11305 + 0 LC741431.1
...
似たスクリプトが複数存在する。先人たちの苦労が偲ばれた。