Posted at

building SRA Toolkit

More than 5 years have passed since last update.

本日はお日柄もよく、US gov shutting downのレッドアラートがNCBIのテーマカラーであるブルーに映えて綺麗です。

nagging SRAの元凶であるSRA toolkitをビルドします。



  1. NCBI SRA にゆきます。


  2. sra_sdk-2.3.3-4.tar.gz をダウンロードします。

  3. レンチンします。

  4. README-build.txt とか README-config.txt とかを読みます。

  5. 読んでなんとかします。

なんともなりません。

> wget "http://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/2.3.3-4/sra_sdk-2.3.3-4.tar.gz"

> cd sra_sdk-2.3.3-4/
> ls
# => Makefile README README-WINDOWS.txt README-build.txt README-config.txt build/ configuration-assistant.perl* doc/ interfaces/ libs/ tools/
> make help
# Before initial build, run 'make OUTDIR=<dir> out' from
# the project root to set the output directory of your builds.
#
# To select a compiler, run 'make <comp>' where
# comp = { GCC VC++ CLANG }.
#
# For hosts that support cross-compilation ( only Macintosh today ),
# you can run 'make <arch>' where arch = { i386 x86_64 sparc32 sparc64 }.
#
# To set a build configuration, run 'make <config>' where
# config = { debug profile release static dynamic }.
> make OUTDIR=/home/inutano/local/sratoolkit out
# current build is linux static rel x86_64 build using gcc tools
# output target directory is '/home/inutano/local/sratoolkit/linux/gcc/stat/x86_64/rel'
> make
# たのしいmake time
> ls /home/inutano/local/sratoolkit/bin64/
# abi-dump@ cache-mgr.2.3.3* illumina-load.2@ nenctool.2.3.3* sff-dump.2@ sra-stat@ vdb-config.2.3.3* vdb-lock.2@ abi-dump.2@ fastq-dump@ illumina-load.2.3.3* nencvalid@ sff-dump.2.3.3* sra-stat.2@ vdb-copy@ vdb-lock.2.3.3* abi-dump.2.3.3* fastq-dump.2@ kar@ nencvalid.2@ sff-load@ sra-stat.2.3.3* vdb-copy.2@ vdb-passwd@ abi-load@ fastq-dump.2.3.3* kar.2@ nencvalid.2.3.3* sff-load.2@ srapath@ vdb-copy.2.3.3* vdb-passwd.2@ abi-load.2@ fastq-load@ kar.2.3.3* prefetch@ sff-load.2.3.3* srapath.2@ vdb-decrypt@ vdb-passwd.2.3.3* abi-load.2.3.3* fastq-load.2@ kdbmeta@ prefetch.2@ sra-kar@ srapath.2.3.3* vdb-decrypt.2@ vdb-unlock@ align-info@ fastq-load.2.3.3* kdbmeta.2@ prefetch.2.3.3* sra-kar.2@ srf-load@ vdb-decrypt.2.3.3* vdb-unlock.2@ align-info.2@ helicos-load@ kdbmeta.2.3.3* rcexplain@ sra-kar.2.3.3* srf-load.2@ vdb-dump@ vdb-unlock.2.3.3* align-info.2.3.3* helicos-load.2@ latf-load@ rcexplain.2@ sra-pileup@ srf-load.2.3.3* vdb-dump.2@ vdb-validate@ bam-load@ helicos-load.2.3.3* latf-load.2@ rcexplain.2.3.3* sra-pileup.2@ test-sra@ vdb-dump.2.3.3* vdb-validate.2@ bam-load.2@ illumina-dump@ latf-load.2.3.3* sam-dump@ sra-pileup.2.3.3* test-sra.2@ vdb-encrypt@ vdb-validate.2.3.3* bam-load.2.3.3* illumina-dump.2@ ncbi/ sam-dump.2@ sra-sort@ test-sra.2.3.3* vdb-encrypt.2@ cache-mgr@ illumina-dump.2.3.3* nenctool@ sam-dump.2.3.3* sra-sort.2@ vdb-config@ vdb-encrypt.2.3.3* cache-mgr.2@ illumina-load@ nenctool.2@ sff-dump@ sra-sort.2.3.3* vdb-config.2@ vdb-lock@
> perl configuration-assistant.perl
# ==========================================
# Welcome to the SRA Toolkit Configuration Script.
# SRA toolkit documentation:
# http://www.ncbi.nlm.nih.gov/Traces/sra/std
# ==========================================
#
# cwd = '/lustre1/home/inutano/local/src/sra_sdk-2.3.3-4'
#
# checking for fastq-dump (local build)... (/home/inutano/local/linux/gcc/stat/x86_64/rel/bin/fastq-dump: found)... yes
# checking for sam-dump (local build)... (/home/inutano/local/linux/gcc/stat/x86_64/rel/bin/sam-dump: found)... yes
# checking for vdb-config (local build)... (/home/inutano/local/linux/gcc/stat/x86_64/rel/bin/vdb-config: found)... yes
#
# Reading configuration
# refseq/servers: not found
# refseq/volumes: not found
# refseq/paths: '/home/inutano/ncbi/refseq': exists
#
# repository: found
# repository/site: not found
#
# repository/remote: found
# repository/remote/protected/CGI/resolver-cgi: 'http://www.ncbi.nlm.nih.gov/Traces/names/names.cgi'
# repository/remote/main/NCBI/apps/refseq/volumes/refseq: 'refseq'
# repository/remote/main/NCBI/apps/wgs/volumes/fuseWGS: 'wgs'
# repository/remote/main/NCBI/apps/sra/volumes/fuse1000: 'sra-instant/reads/ByRun/sra'
# repository/remote/main/NCBI/disabled: 'no'
# repository/remote/main/NCBI/root: 'http://ftp-trace.ncbi.nlm.nih.gov/sra'
# repository/user: found
# repository/user/main/public/apps/refseq/volumes/refseq: 'refseq'
# repository/user/main/public/root: '/home/inutano/ncbi': exists
# repository/user/main/public/cache-enabled: 'true'
# repository/user/main/public/apps/sra/volumes/sraFlat: 'sra'
# repository/user/main/public/apps/wgs/volumes/wgsFlat: 'wgs'
#
# Configuration is correct

通常のビルドみたくconfigure --prefix=で指定する感じとは違うので、どこか扱いやすいところにディレクトリ作ってそこを指定するのがよいです。localを指定したらlocalの下を荒らされたのでよくない。

しかしでかいサイズの圧縮ファイルをfastq-dumpするのに恐ろしく時間がかかるのはこれどうにかならないのだろうか。高速化厨の人に誰かなんとかして欲しい。