DeepVariantを利用してvariantを検出する
-
deepvariantの実行環境の取得
- docker pullの実行
- deepvariantのバージョンに対応するモデルのダウンロード
-
deepvariantの実行
- make_examples
- call_variants
- postprocess_variants
-
deepvariantを一連のステップとして実行するためのスクリプト
DeepVariant実行のための準備
deepvariantを実行するためのベースとなるディレクトリを作成し、そのディレクトリに移動しておく
リファレンス、解析領域のbedはこのディレクトリの下にcommon
ディレクトリを作成して配置しておく
DeepVariantのDocker imageを取得
BIN_VERSION="0.7.2"
MODEL_VERSION="0.7.2"
docker pull gcr.io/deepvariant-docker/deepvariant:"${BIN_VERSION}"
$ deepvariant_prep.sh
実行するDeepVariantのバージョンに対応したモデルの取得
まずは対応するモデルを確認してみる
$ docker run -it -v "${PWD}":/work docker.io/camil/gsutil gsutil ls -R gs://deepvariant/models/DeepVariant/0.7.2/
A newer version of gsutil (4.38) is available than the version you are
running (4.30). A detailed log of gsutil release changes is available
at https://pub.storage.googleapis.com/gsutil_ReleaseNotes.txt if you
would like to read them before updating.
gs://deepvariant/models/DeepVariant/0.7.2/:
gs://deepvariant/models/DeepVariant/0.7.2/DeepVariant-inception_v3-0.7.2+data-wes_standard/:
gs://deepvariant/models/DeepVariant/0.7.2/DeepVariant-inception_v3-0.7.2+data-wes_standard/model.ckpt.data-00000-of-00001
gs://deepvariant/models/DeepVariant/0.7.2/DeepVariant-inception_v3-0.7.2+data-wes_standard/model.ckpt.index
gs://deepvariant/models/DeepVariant/0.7.2/DeepVariant-inception_v3-0.7.2+data-wes_standard/model.ckpt.meta
gs://deepvariant/models/DeepVariant/0.7.2/DeepVariant-inception_v3-0.7.2+data-wgs_standard/:
gs://deepvariant/models/DeepVariant/0.7.2/DeepVariant-inception_v3-0.7.2+data-wgs_standard/model.ckpt.data-00000-of-00001
gs://deepvariant/models/DeepVariant/0.7.2/DeepVariant-inception_v3-0.7.2+data-wgs_standard/model.ckpt.index
gs://deepvariant/models/DeepVariant/0.7.2/DeepVariant-inception_v3-0.7.2+data-wgs_standard/model.ckpt.meta
**解析する対象によって利用するモデルが異なる
- Whole Exome
- DeepVariant-inception_v3-0.7.2+data-wes_standard
- Whole Genome
- DeepVariant-inception_v3-0.7.2+data-wgs_standard
ここでは、Whole Exome の解析データを利用しているので、DeepVariant-inception_v3-0.7.2+data-wes_standard
を利用することとする
####モデルのダウンロード
$ docker run -it -v "${PWD}":/work docker.io/camil/gsutil gsutil cp -r gs://deepvariant/models/DeepVariant/0.7.2 /work
rootの所有になっているファイルの所有権を変更する
$ docker run -it -v "${PWD}":/work docker.io/camil/gsutil chown -R ${UID}:`id -g ${USER}` /work/
DeepVariantによるvariant call
make_examples:パラメーターの設定
後のスクリプトのために変数を設定して置く
export BIN_VERSION="0.7.2"
export MODEL_VERSION="0.7.2"
export MODEL_NAME="DeepVariant-inception_v3-${MODEL_VERSION}+data-wgs_standard"
export REFERENCE="hg38_Minimum.fa" # This file should be located in ${DEEP_VARIANT_WORK}/common
export BED="SSHumAllExonV7-S31285117.hg38.bed"
# Sample dependent variables
export SAMPLE="サンプル名"
export SAMPLE_DIR="data"
# This directory should be set as your work folder
export DEEP_VARIANT_WORK="deepvariantを実行するためのベースとなるディレクトリ"
変数 | 内容 | --- |
---|---|---|
REFERENCE | ./common/の下に配置したリファレンスゲノムのファイル名 | インデックスを作成しておく事 |
BED | ./common/の下に配置した解析領域のbedファイル | |
SAMPLE | サンプルのbamファイルの名前 | インデックスを作成しておく事 |
SAPLE_DIR | deepvariantを実行するためのベースとなるディレクトリの下でSAMPLEファイルが配置されているディレクトリ | |
DEEP_VARIANT_WORK | deepvariantを実行するためのベースとなるディレクトリ |
make_examplesの実行
docker run -v ${DEEP_VARIANT_WORK}:/work \
gcr.io/deepvariant-docker/deepvariant:"${BIN_VERSION}" \
/opt/deepvariant/bin/make_examples \
--mode calling \
--ref "/work/common/${REFERENCE}" \
--reads "/work/${SAMPLE_DIR}/${SAMPLE}.bam" \
--regions "/work/common/${BED}" \
--examples "/work/${SAMPLE_DIR}/${SAMPLE}.record.gz"
####処理時のログ
2019-04-03 05:21:25.328891: W third_party/nucleus/io/sam_reader.cc:525] Unrecognized SAM header type, ignoring:
I0403 05:21:25.329046 140708677289728 genomics_reader.py:213] Reading /work/data/hogehoge.bam with NativeSamReader
I0403 05:21:25.334273 140708677289728 make_examples.py:1080] Preparing inputs
2019-04-03 05:21:25.336864: W third_party/nucleus/io/sam_reader.cc:525] Unrecognized SAM header type, ignoring:
I0403 05:21:25.336949 140708677289728 genomics_reader.py:213] Reading /work/data/hogehoge.bam with NativeSamReader
I0403 05:21:25.347007 140708677289728 make_examples.py:996] Common contigs are [u'chr1', u'chr10', u'chr11', u'chr12', u'chr13', u'chr14', u'chr15', u'chr16', u'chr17', u'chr18', u'chr19', u'chr2', u'chr20', u'chr21', u'chr22', u'chr3', u'chr4', u'chr5', u'chr6', u'chr7', u'chr8', u'chr9', u'chrX', u'chrY']
I0403 05:21:25.349242 140708677289728 genomics_reader.py:213] Reading /work/common/SSHumAllExonV7-S31285117.hg38.bed with NativeBedReader
I0403 05:22:32.101823 140708677289728 make_examples.py:1086] Writing examples to /work/data/hogehoge.record.gz
2019-04-03 05:22:32.222396: I third_party/nucleus/io/sam_reader.cc:561] Setting HTS_OPT_BLOCK_SIZE to 134217728
2019-04-03 05:22:32.302555: W third_party/nucleus/io/sam_reader.cc:525] Unrecognized SAM header type, ignoring:
I0403 05:22:32.302896 140708677289728 genomics_reader.py:213] Reading /work/data/hogehoge.bam with NativeSamReader
I0403 05:22:32.400052 140708677289728 make_examples.py:1119] Task 0: 0 candidates (0 examples) [0.30s elapsed]
I0403 05:25:13.678653 140708677289728 make_examples.py:1119] Task 0: 101 candidates (112 examples) [161.28s elapsed]
I0403 05:27:40.961692 140708677289728 make_examples.py:1119] Task 0: 202 candidates (217 examples) [147.28s elapsed]
I0403 05:28:17.405148 140708677289728 make_examples.py:1134] Writing MakeExamplesRunInfo to /work/data/hogehoge.record.gz.run_info.pbtxt
I0403 05:28:17.434967 140708677289728 make_examples.py:1137] Found 225 candidate variants
I0403 05:28:17.435264 140708677289728 make_examples.py:1138] Created 240 examples
###call_variantsの実行
export MODEL="/work/${MODEL_VERSION}/DeepVariant-inception_v3-0.7.2+data-wes_standard/model.ckpt"
export CALL_VARIANTS_OUTPUT="/work/${SAMPLE_DIR}/call_variants_${SAMPLE}_output.tfrecord.gz"
echo "Do call_variants"
docker run -v ${DEEP_VARIANT_WORK}:/work \
gcr.io/deepvariant-docker/deepvariant:"${BIN_VERSION}" \
/opt/deepvariant/bin/call_variants \
--outfile "${CALL_VARIANTS_OUTPUT}" \
--examples /work/${SAMPLE_DIR}/${SAMPLE}.record.gz \
--checkpoint "${MODEL}"
####処理時のログ
I0403 05:28:20.776186 140458173974272 call_variants.py:292] Set KMP_BLOCKTIME to 0
2019-04-03 05:28:20.791628: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2 FMA
2019-04-03 05:28:20.801085: I tensorflow/core/common_runtime/process_util.cc:69] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
I0403 05:28:20.821276 140458173974272 modeling.py:357] Initializing model with random parameters
W0403 05:28:20.824039 140458173974272 tf_logging.py:125] Using temporary folder as model directory: /tmp/tmplfaD4H
I0403 05:28:20.824404 140458173974272 tf_logging.py:115] Using config: {'_save_checkpoints_secs': 1000, '_num_ps_replicas': 0, '_keep_checkpoint_max': 100000, '_task_type': 'worker', '_global_id_in_cluster': 0, '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fbe5ed52d90>, '_model_dir': '/tmp/tmplfaD4H', '_protocol': None, '_save_checkpoints_steps': None, '_keep_checkpoint_every_n_hours': 10000, '_service': None, '_session_config': allow_soft_placement: true
graph_options {
rewrite_options {
meta_optimizer_iterations: ONE
}
}
, '_tf_random_seed': None, '_save_summary_steps': 100, '_device_fn': None, '_experimental_distribute': None, '_num_worker_replicas': 1, '_task_id': 0, '_log_step_count_steps': 100, '_evaluation_master': '', '_eval_distribute': None, '_train_distribute': None, '_master': ''}
I0403 05:28:20.824896 140458173974272 call_variants.py:350] Writing calls to /work/data/call_variants_hogehoge_output.tfrecord.gz
I0403 05:28:20.843836 140458173974272 tf_logging.py:115] self.input_read_threads=8
W0403 05:28:20.844739 140458173974272 tf_logging.py:125] From /tmp/Bazel.runfiles_ou_hpA/runfiles/com_google_deepvariant/deepvariant/data_providers.py:342: parallel_interleave (from tensorflow.contrib.data.python.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.experimental.parallel_interleave(...)`.
I0403 05:28:20.863727 140458173974272 tf_logging.py:115] self.input_map_threads=48
W0403 05:28:20.864315 140458173974272 tf_logging.py:125] From /tmp/Bazel.runfiles_ou_hpA/runfiles/com_google_deepvariant/deepvariant/data_providers.py:348: map_and_batch (from tensorflow.contrib.data.python.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.experimental.map_and_batch(...)`.
I0403 05:28:20.880060 140458173974272 tf_logging.py:115] Calling model_fn.
I0403 05:28:23.425726 140458173974272 tf_logging.py:115] Done calling model_fn.
I0403 05:28:24.582459 140458173974272 tf_logging.py:115] Graph was finalized.
I0403 05:28:24.585422 140458173974272 tf_logging.py:115] Restoring parameters from /work/0.7.2/DeepVariant-inception_v3-0.7.2+data-wes_standard/model.ckpt
I0403 05:28:25.377002 140458173974272 tf_logging.py:115] Running local_init_op.
I0403 05:28:25.413260 140458173974272 tf_logging.py:115] Done running local_init_op.
I0403 05:28:25.697412 140458173974272 tf_logging.py:115] Reloading EMA...
I0403 05:28:25.698592 140458173974272 tf_logging.py:115] Restoring parameters from /work/0.7.2/DeepVariant-inception_v3-0.7.2+data-wes_standard/model.ckpt
I0403 05:28:28.323101 140458173974272 call_variants.py:368] Processed 1 examples in 1 batches [749.696 sec per 100]
I0403 05:28:28.392738 140458173974272 call_variants.py:370] Done evaluating variants
###postprocess_variantsの実行(vcfの作成)
export FINAL_OUTPUT_VCF="/work/${SAMPLE_DIR}/${SAMPLE}.vcf.gz"
docker run \
-v ${DEEP_VARIANT_WORK}:/work \
gcr.io/deepvariant-docker/deepvariant:"${BIN_VERSION}" \
/opt/deepvariant/bin/postprocess_variants \
--ref "/work/common/${REFERENCE}" \
--infile "${CALL_VARIANTS_OUTPUT}" \
--outfile "${FINAL_OUTPUT_VCF}"
postprocess_variantsの実行時のログ
2019-04-03 05:28:31.406822: I deepvariant/postprocess_variants.cc:88] Read from: /work/data/call_variants_hogehoge_output.tfrecord.gz
2019-04-03 05:28:31.408427: I deepvariant/postprocess_variants.cc:97] Done reading: /work/data/call_variants_hogehoge_output.tfrecord.gz. #entries in single_site_calls = 240
2019-04-03 05:28:31.408586: I deepvariant/postprocess_variants.cc:101] Total #entries in single_site_calls = 240
2019-04-03 05:28:31.408599: I deepvariant/postprocess_variants.cc:103] Start SortSingleSiteCalls
2019-04-03 05:28:31.408779: I deepvariant/postprocess_variants.cc:105] Done SortSingleSiteCalls
I0403 05:28:31.409559 139765684602624 postprocess_variants.py:621] Writing output to VCF file: /work/data/hogehoge.vcf.gz
I0403 05:28:31.411158 139765684602624 genomics_writer.py:163] Writing /work/data/hogehoge.vcf.gz with NativeVcfWriter
I0403 05:28:31.414691 139765684602624 postprocess_variants.py:626] 1 variants written.
出力結果ファイルの所有者の変更
docker上でroot権限でファイルが作成されてしまうことへの弥縫策
docker run -it \
-v "${DEEP_VARIANT_WORK}":/work \
docker.io/camil/gsutil chown -R ${UID}:`id -g ${USER}` /work/${SAMPLE_DIR}/
解析結果のvcfファイルを圧縮したものが下記のファイルとして出力される
./${SAMPLE_DIR}/${sample}.vcf.gz
#Variant解析を実行するためのシェルスクリプト
上記のvariant解析部分を下記のスクリプトにまとめた
変数REFERENCE
, BED
,SAMPLE
, SAMPLE_DIR
, BED
, DEEP_VARIANT_WORK
を適切に設定することで、deepvariantによる変異検出を実行する
#!/bin/bash
export BIN_VERSION="0.7.2"
export MODEL_VERSION="0.7.2"
export MODEL_NAME="DeepVariant-inception_v3-${MODEL_VERSION}+data-wgs_standard"
export REFERENCE="hg38_Minimum.fa" # This file should be located in ${DEEP_VARIANT_WORK}/common
export BED="SSHumAllExonV7-S31285117.hg38.bed"
# Sample dependent variables
export SAMPLE="hogehoge"
export SAMPLE_DIR="data"
# This directory should be set as your work folder
export DEEP_VARIANT_WORK="/workdir/maikeda/deepvariant"
# make_examples
echo "Do make_examples"
docker run -v ${DEEP_VARIANT_WORK}:/work \
gcr.io/deepvariant-docker/deepvariant:"${BIN_VERSION}" \
/opt/deepvariant/bin/make_examples \
--mode calling \
--ref "/work/common/${REFERENCE}" \
--reads "/work/${SAMPLE_DIR}/${SAMPLE}.bam" \
--regions "/work/common/${BED}" \
--examples "/work/${SAMPLE_DIR}/${SAMPLE}.record.gz"
# call_variants
export MODEL="/work/${MODEL_VERSION}/DeepVariant-inception_v3-0.7.2+data-wes_standard/model.ckpt"
export CALL_VARIANTS_OUTPUT="/work/${SAMPLE_DIR}/call_variants_${SAMPLE}_output.tfrecord.gz"
echo "Do call_variants"
docker run -v ${DEEP_VARIANT_WORK}:/work \
gcr.io/deepvariant-docker/deepvariant:"${BIN_VERSION}" \
/opt/deepvariant/bin/call_variants \
--outfile "${CALL_VARIANTS_OUTPUT}" \
--examples /work/${SAMPLE_DIR}/${SAMPLE}.record.gz \
--checkpoint "${MODEL}"
# postprocess_variants
echo "Do postprocess_variants"
export FINAL_OUTPUT_VCF="/work/${SAMPLE_DIR}/${SAMPLE}.vcf.gz"
docker run \
-v ${DEEP_VARIANT_WORK}:/work \
gcr.io/deepvariant-docker/deepvariant:"${BIN_VERSION}" \
/opt/deepvariant/bin/postprocess_variants \
--ref "/work/common/${REFERENCE}" \
--infile "${CALL_VARIANTS_OUTPUT}" \
--outfile "${FINAL_OUTPUT_VCF}"
今回はここまで