LoginSignup
2
3

More than 5 years have passed since last update.

DeepVariantを利用してvariant call を実行する

Last updated at Posted at 2019-04-03

DeepVariantを利用してvariantを検出する

  • deepvariantの実行環境の取得

    • docker pullの実行
    • deepvariantのバージョンに対応するモデルのダウンロード
  • deepvariantの実行

    • make_examples
    • call_variants
    • postprocess_variants
  • deepvariantを一連のステップとして実行するためのスクリプト

DeepVariant実行のための準備

deepvariantを実行するためのベースとなるディレクトリを作成し、そのディレクトリに移動しておく

リファレンス、解析領域のbedはこのディレクトリの下にcommonディレクトリを作成して配置しておく

DeepVariantのDocker imageを取得

deepvariant_prep.sh
BIN_VERSION="0.7.2"
MODEL_VERSION="0.7.2"
docker pull gcr.io/deepvariant-docker/deepvariant:"${BIN_VERSION}"
$ deepvariant_prep.sh

実行するDeepVariantのバージョンに対応したモデルの取得

まずは対応するモデルを確認してみる

$ docker run -it -v "${PWD}":/work docker.io/camil/gsutil gsutil ls -R gs://deepvariant/models/DeepVariant/0.7.2/
A newer version of gsutil (4.38) is available than the version you are
running (4.30). A detailed log of gsutil release changes is available
at https://pub.storage.googleapis.com/gsutil_ReleaseNotes.txt if you
would like to read them before updating.
gs://deepvariant/models/DeepVariant/0.7.2/:

gs://deepvariant/models/DeepVariant/0.7.2/DeepVariant-inception_v3-0.7.2+data-wes_standard/:
gs://deepvariant/models/DeepVariant/0.7.2/DeepVariant-inception_v3-0.7.2+data-wes_standard/model.ckpt.data-00000-of-00001
gs://deepvariant/models/DeepVariant/0.7.2/DeepVariant-inception_v3-0.7.2+data-wes_standard/model.ckpt.index
gs://deepvariant/models/DeepVariant/0.7.2/DeepVariant-inception_v3-0.7.2+data-wes_standard/model.ckpt.meta

gs://deepvariant/models/DeepVariant/0.7.2/DeepVariant-inception_v3-0.7.2+data-wgs_standard/:
gs://deepvariant/models/DeepVariant/0.7.2/DeepVariant-inception_v3-0.7.2+data-wgs_standard/model.ckpt.data-00000-of-00001
gs://deepvariant/models/DeepVariant/0.7.2/DeepVariant-inception_v3-0.7.2+data-wgs_standard/model.ckpt.index
gs://deepvariant/models/DeepVariant/0.7.2/DeepVariant-inception_v3-0.7.2+data-wgs_standard/model.ckpt.meta

**解析する対象によって利用するモデルが異なる

  • Whole Exome
    • DeepVariant-inception_v3-0.7.2+data-wes_standard
  • Whole Genome
    • DeepVariant-inception_v3-0.7.2+data-wgs_standard

ここでは、Whole Exome の解析データを利用しているので、DeepVariant-inception_v3-0.7.2+data-wes_standardを利用することとする

モデルのダウンロード

$ docker run -it -v "${PWD}":/work docker.io/camil/gsutil gsutil cp -r gs://deepvariant/models/DeepVariant/0.7.2 /work

rootの所有になっているファイルの所有権を変更する

$ docker run -it -v "${PWD}":/work docker.io/camil/gsutil chown -R ${UID}:`id -g ${USER}` /work/

DeepVariantによるvariant call

make_examples:パラメーターの設定

後のスクリプトのために変数を設定して置く

export BIN_VERSION="0.7.2"
export MODEL_VERSION="0.7.2"
export MODEL_NAME="DeepVariant-inception_v3-${MODEL_VERSION}+data-wgs_standard"
export REFERENCE="hg38_Minimum.fa" # This file should be located in ${DEEP_VARIANT_WORK}/common
export BED="SSHumAllExonV7-S31285117.hg38.bed"
# Sample dependent variables
export SAMPLE="サンプル名"
export SAMPLE_DIR="data"
# This directory should be set as your work folder
export DEEP_VARIANT_WORK="deepvariantを実行するためのベースとなるディレクトリ"
変数 内容 ---
REFERENCE ./common/の下に配置したリファレンスゲノムのファイル名 インデックスを作成しておく事
BED ./common/の下に配置した解析領域のbedファイル
SAMPLE サンプルのbamファイルの名前  インデックスを作成しておく事
SAPLE_DIR deepvariantを実行するためのベースとなるディレクトリの下でSAMPLEファイルが配置されているディレクトリ
DEEP_VARIANT_WORK deepvariantを実行するためのベースとなるディレクトリ

make_examplesの実行

docker run -v ${DEEP_VARIANT_WORK}:/work \
  gcr.io/deepvariant-docker/deepvariant:"${BIN_VERSION}" \
  /opt/deepvariant/bin/make_examples \
  --mode calling   \
  --ref "/work/common/${REFERENCE}"   \
  --reads "/work/${SAMPLE_DIR}/${SAMPLE}.bam" \
  --regions "/work/common/${BED}" \
  --examples "/work/${SAMPLE_DIR}/${SAMPLE}.record.gz"

処理時のログ

2019-04-03 05:21:25.328891: W third_party/nucleus/io/sam_reader.cc:525] Unrecognized SAM header type, ignoring: 
I0403 05:21:25.329046 140708677289728 genomics_reader.py:213] Reading /work/data/hogehoge.bam with NativeSamReader
I0403 05:21:25.334273 140708677289728 make_examples.py:1080] Preparing inputs
2019-04-03 05:21:25.336864: W third_party/nucleus/io/sam_reader.cc:525] Unrecognized SAM header type, ignoring: 
I0403 05:21:25.336949 140708677289728 genomics_reader.py:213] Reading /work/data/hogehoge.bam with NativeSamReader
I0403 05:21:25.347007 140708677289728 make_examples.py:996] Common contigs are [u'chr1', u'chr10', u'chr11', u'chr12', u'chr13', u'chr14', u'chr15', u'chr16', u'chr17', u'chr18', u'chr19', u'chr2', u'chr20', u'chr21', u'chr22', u'chr3', u'chr4', u'chr5', u'chr6', u'chr7', u'chr8', u'chr9', u'chrX', u'chrY']
I0403 05:21:25.349242 140708677289728 genomics_reader.py:213] Reading /work/common/SSHumAllExonV7-S31285117.hg38.bed with NativeBedReader
I0403 05:22:32.101823 140708677289728 make_examples.py:1086] Writing examples to /work/data/hogehoge.record.gz
2019-04-03 05:22:32.222396: I third_party/nucleus/io/sam_reader.cc:561] Setting HTS_OPT_BLOCK_SIZE to 134217728
2019-04-03 05:22:32.302555: W third_party/nucleus/io/sam_reader.cc:525] Unrecognized SAM header type, ignoring: 
I0403 05:22:32.302896 140708677289728 genomics_reader.py:213] Reading /work/data/hogehoge.bam with NativeSamReader
I0403 05:22:32.400052 140708677289728 make_examples.py:1119] Task 0: 0 candidates (0 examples) [0.30s elapsed]
I0403 05:25:13.678653 140708677289728 make_examples.py:1119] Task 0: 101 candidates (112 examples) [161.28s elapsed]
I0403 05:27:40.961692 140708677289728 make_examples.py:1119] Task 0: 202 candidates (217 examples) [147.28s elapsed]
I0403 05:28:17.405148 140708677289728 make_examples.py:1134] Writing MakeExamplesRunInfo to /work/data/hogehoge.record.gz.run_info.pbtxt
I0403 05:28:17.434967 140708677289728 make_examples.py:1137] Found 225 candidate variants
I0403 05:28:17.435264 140708677289728 make_examples.py:1138] Created 240 examples

call_variantsの実行

export MODEL="/work/${MODEL_VERSION}/DeepVariant-inception_v3-0.7.2+data-wes_standard/model.ckpt"
export CALL_VARIANTS_OUTPUT="/work/${SAMPLE_DIR}/call_variants_${SAMPLE}_output.tfrecord.gz"

echo "Do call_variants"
docker run -v ${DEEP_VARIANT_WORK}:/work \
    gcr.io/deepvariant-docker/deepvariant:"${BIN_VERSION}" \
    /opt/deepvariant/bin/call_variants \
    --outfile "${CALL_VARIANTS_OUTPUT}" \
    --examples /work/${SAMPLE_DIR}/${SAMPLE}.record.gz \
    --checkpoint "${MODEL}"

処理時のログ

I0403 05:28:20.776186 140458173974272 call_variants.py:292] Set KMP_BLOCKTIME to 0
2019-04-03 05:28:20.791628: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2 FMA
2019-04-03 05:28:20.801085: I tensorflow/core/common_runtime/process_util.cc:69] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
I0403 05:28:20.821276 140458173974272 modeling.py:357] Initializing model with random parameters
W0403 05:28:20.824039 140458173974272 tf_logging.py:125] Using temporary folder as model directory: /tmp/tmplfaD4H
I0403 05:28:20.824404 140458173974272 tf_logging.py:115] Using config: {'_save_checkpoints_secs': 1000, '_num_ps_replicas': 0, '_keep_checkpoint_max': 100000, '_task_type': 'worker', '_global_id_in_cluster': 0, '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fbe5ed52d90>, '_model_dir': '/tmp/tmplfaD4H', '_protocol': None, '_save_checkpoints_steps': None, '_keep_checkpoint_every_n_hours': 10000, '_service': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_tf_random_seed': None, '_save_summary_steps': 100, '_device_fn': None, '_experimental_distribute': None, '_num_worker_replicas': 1, '_task_id': 0, '_log_step_count_steps': 100, '_evaluation_master': '', '_eval_distribute': None, '_train_distribute': None, '_master': ''}
I0403 05:28:20.824896 140458173974272 call_variants.py:350] Writing calls to /work/data/call_variants_hogehoge_output.tfrecord.gz
I0403 05:28:20.843836 140458173974272 tf_logging.py:115] self.input_read_threads=8
W0403 05:28:20.844739 140458173974272 tf_logging.py:125] From /tmp/Bazel.runfiles_ou_hpA/runfiles/com_google_deepvariant/deepvariant/data_providers.py:342: parallel_interleave (from tensorflow.contrib.data.python.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.experimental.parallel_interleave(...)`.
I0403 05:28:20.863727 140458173974272 tf_logging.py:115] self.input_map_threads=48
W0403 05:28:20.864315 140458173974272 tf_logging.py:125] From /tmp/Bazel.runfiles_ou_hpA/runfiles/com_google_deepvariant/deepvariant/data_providers.py:348: map_and_batch (from tensorflow.contrib.data.python.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.experimental.map_and_batch(...)`.
I0403 05:28:20.880060 140458173974272 tf_logging.py:115] Calling model_fn.
I0403 05:28:23.425726 140458173974272 tf_logging.py:115] Done calling model_fn.
I0403 05:28:24.582459 140458173974272 tf_logging.py:115] Graph was finalized.
I0403 05:28:24.585422 140458173974272 tf_logging.py:115] Restoring parameters from /work/0.7.2/DeepVariant-inception_v3-0.7.2+data-wes_standard/model.ckpt
I0403 05:28:25.377002 140458173974272 tf_logging.py:115] Running local_init_op.
I0403 05:28:25.413260 140458173974272 tf_logging.py:115] Done running local_init_op.
I0403 05:28:25.697412 140458173974272 tf_logging.py:115] Reloading EMA...
I0403 05:28:25.698592 140458173974272 tf_logging.py:115] Restoring parameters from /work/0.7.2/DeepVariant-inception_v3-0.7.2+data-wes_standard/model.ckpt
I0403 05:28:28.323101 140458173974272 call_variants.py:368] Processed 1 examples in 1 batches [749.696 sec per 100]
I0403 05:28:28.392738 140458173974272 call_variants.py:370] Done evaluating variants

postprocess_variantsの実行(vcfの作成)

export FINAL_OUTPUT_VCF="/work/${SAMPLE_DIR}/${SAMPLE}.vcf.gz"
docker run \
    -v ${DEEP_VARIANT_WORK}:/work \
    gcr.io/deepvariant-docker/deepvariant:"${BIN_VERSION}" \
    /opt/deepvariant/bin/postprocess_variants \
    --ref "/work/common/${REFERENCE}" \
    --infile "${CALL_VARIANTS_OUTPUT}" \
    --outfile "${FINAL_OUTPUT_VCF}"

postprocess_variantsの実行時のログ

2019-04-03 05:28:31.406822: I deepvariant/postprocess_variants.cc:88] Read from: /work/data/call_variants_hogehoge_output.tfrecord.gz
2019-04-03 05:28:31.408427: I deepvariant/postprocess_variants.cc:97] Done reading: /work/data/call_variants_hogehoge_output.tfrecord.gz. #entries in single_site_calls = 240
2019-04-03 05:28:31.408586: I deepvariant/postprocess_variants.cc:101] Total #entries in single_site_calls = 240
2019-04-03 05:28:31.408599: I deepvariant/postprocess_variants.cc:103] Start SortSingleSiteCalls
2019-04-03 05:28:31.408779: I deepvariant/postprocess_variants.cc:105] Done SortSingleSiteCalls
I0403 05:28:31.409559 139765684602624 postprocess_variants.py:621] Writing output to VCF file: /work/data/hogehoge.vcf.gz
I0403 05:28:31.411158 139765684602624 genomics_writer.py:163] Writing /work/data/hogehoge.vcf.gz with NativeVcfWriter
I0403 05:28:31.414691 139765684602624 postprocess_variants.py:626] 1 variants written.

出力結果ファイルの所有者の変更

docker上でroot権限でファイルが作成されてしまうことへの弥縫策

docker run -it \
    -v "${DEEP_VARIANT_WORK}":/work \
    docker.io/camil/gsutil chown -R ${UID}:`id -g ${USER}` /work/${SAMPLE_DIR}/

解析結果のvcfファイルを圧縮したものが下記のファイルとして出力される
./${SAMPLE_DIR}/${sample}.vcf.gz

Variant解析を実行するためのシェルスクリプト

上記のvariant解析部分を下記のスクリプトにまとめた
変数REFERENCE, BED,SAMPLE, SAMPLE_DIR, BED, DEEP_VARIANT_WORKを適切に設定することで、deepvariantによる変異検出を実行する

deepvariant.sh
#!/bin/bash
export BIN_VERSION="0.7.2"
export MODEL_VERSION="0.7.2"

export MODEL_NAME="DeepVariant-inception_v3-${MODEL_VERSION}+data-wgs_standard"

export REFERENCE="hg38_Minimum.fa" # This file should be located in ${DEEP_VARIANT_WORK}/common
export BED="SSHumAllExonV7-S31285117.hg38.bed"

# Sample dependent variables
export SAMPLE="hogehoge"
export SAMPLE_DIR="data"

# This directory should be set as your work folder
export DEEP_VARIANT_WORK="/workdir/maikeda/deepvariant"

# make_examples
echo "Do make_examples"
docker run -v ${DEEP_VARIANT_WORK}:/work \
  gcr.io/deepvariant-docker/deepvariant:"${BIN_VERSION}" \
  /opt/deepvariant/bin/make_examples \
  --mode calling   \
  --ref "/work/common/${REFERENCE}"   \
  --reads "/work/${SAMPLE_DIR}/${SAMPLE}.bam" \
  --regions "/work/common/${BED}" \
  --examples "/work/${SAMPLE_DIR}/${SAMPLE}.record.gz"

# call_variants
export MODEL="/work/${MODEL_VERSION}/DeepVariant-inception_v3-0.7.2+data-wes_standard/model.ckpt"
export CALL_VARIANTS_OUTPUT="/work/${SAMPLE_DIR}/call_variants_${SAMPLE}_output.tfrecord.gz"

echo "Do call_variants"
docker run -v ${DEEP_VARIANT_WORK}:/work \
    gcr.io/deepvariant-docker/deepvariant:"${BIN_VERSION}" \
    /opt/deepvariant/bin/call_variants \
    --outfile "${CALL_VARIANTS_OUTPUT}" \
    --examples /work/${SAMPLE_DIR}/${SAMPLE}.record.gz \
    --checkpoint "${MODEL}"

# postprocess_variants
echo "Do postprocess_variants"
export FINAL_OUTPUT_VCF="/work/${SAMPLE_DIR}/${SAMPLE}.vcf.gz"
docker run \
    -v ${DEEP_VARIANT_WORK}:/work \
    gcr.io/deepvariant-docker/deepvariant:"${BIN_VERSION}" \
    /opt/deepvariant/bin/postprocess_variants \
    --ref "/work/common/${REFERENCE}" \
    --infile "${CALL_VARIANTS_OUTPUT}" \
    --outfile "${FINAL_OUTPUT_VCF}"

今回はここまで:smile:

2
3
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
2
3