genometools gt: transcript の orf を決める

インストールはしないで singularity で実行

conda の バージョンは v1.2.1 なのに対して 遺伝研スパコン singularity のバージョンは 1.6.2 だったのでインストールせず、singularity を使用する。(2022/09/05)


singularity exec /usr/local/biotools/g/genometools-genometools:1.6.2--py39h95ed972_0 gt cds --help
Usage: /usr/local/bin/gt cds [option ...] [GFF3_file]
Add CDS (coding sequence) features to exon features given in GFF3 file.

-minorflen      set the minimum length an open reading frame (ORF) must have to
              be added as a CDS feature (measured in amino acids)
              default: 64
-startcodon     require than an ORF must begin with a start codon
              default: no
-finalstopcodon require that the final ORF must end with a stop codon
              default: no
-seqfile        set the sequence file from which to take the sequences
              default: undefined
-encseq         set the encoded sequence indexname from which to take the
              default: undefined
-seqfiles       set the sequence files from which to extract the features
              use '--' to terminate the list of sequence files 
-matchdesc      search the sequence descriptions from the input files for the
              desired sequence IDs (in GFF3), reporting the first match
              default: no
-matchdescstart exactly match the sequence descriptions from the input files for
              the desired sequence IDs (in GFF3) from the beginning to the
              first whitespace
              default: no
-usedesc        use sequence descriptions to map the sequence IDs (in GFF3) to
              actual sequence entries.
              If a description contains a sequence range (e.g.,
              III:1000001..2000000), the first  part is used as sequence ID
              ('III') and the first range position as offset ('1000001')
              default: no
-regionmapping  set file containing sequence-region to sequence file mapping
              default: undefined
-v              be verbose
              default: no
-o              redirect output to specified file
              default: undefined
-gzip           write gzip compressed output file
              default: no
-bzip2          write bzip2 compressed output file
              default: no
-force          force writing to output file
              default: no
-help           display help and exit
-version        display version information and exit

File format for option '-regionmapping':

The file supplied to option -regionmapping defines a ``mapping''.  A mapping
maps the `sequence-region` entries given in the 'GFF3_file' to a sequence file
containing the corresponding sequence. Mappings can be defined in one of the
following two forms:

  mapping = {
    chr1  = "hs_ref_chr1.fa.gz",
    chr2  = "hs_ref_chr2.fa.gz"


  function mapping(sequence_region)
    return "hs_ref_"..sequence_region..".fa.gz"

The first form defines a Lua (http://www.lua.org) table named ``mapping''
which maps each sequence region to the corresponding sequence file.
The second one defines a Lua function ``mapping'', which has to return the
sequence file name when it is called with the `sequence_region` as argument.

singularity exec /usr/local/biotools/g/genometools-genometools:1.6.2--py39h95ed972_0 gt gff3 -sortlines -tidy out.gff3 > out.sorted.gff3
singularity exec /usr/local/biotools/g/genometools-genometools:1.6.2--py39h95ed972_0 gt cds -startcodon -finalstopcodon -seqfile $GENOME -o out.sorted.orf.gff3 out.sorted.gff3


/usr/local/bin/gt cds: error: no mapping rule given and no MD5 tags present in the query seqid "chrX" -- no mapping can be defined

gt cds コマンドに -matchdesc オプションをつけて再実行

-matchdesc      search the sequence descriptions from the input files for the
              desired sequence IDs (in GFF3), reporting the first match
              default: no
singularity exec /usr/local/biotools/g/genometools-genometools:1.6.2--py39h95ed972_0 gt gff3 -sortlines -tidy out.gff3 > out.sorted.gff3
singularity exec /usr/local/biotools/g/genometools-genometools:1.6.2--py39h95ed972_0 gt cds -matchdesc -startcodon -finalstopcodon -seqfile $GENOME -o out.sorted.orf.gff3 out.sorted.gff3



