More than 3 years have passed since last update.

3D-DNAパイプラインでゲノムのスキャフォールディングをする

Last updated at 2021-11-04Posted at 2021-11-04

目的

ゲノムのcontigをHi-Cリードを使ってscaffoldingする
ツールのインストールにいくつかステップがあったので覚書をつくっておく
singularityコンテナを利用して実行する
日本語の説明が少なかった¹のでちょっとでも参考になるものが増えたらいいなと思った

参考URL

Genome scaffolding with the Juicer, JuiceBox, 3D-DNA pipeline

コマンドの書き方とかめちゃくちゃ参考になりました

使用ツール

Juicer :　Hi-Cリードのアラインメントを行う
3D-DNA : Juicerでつくったアラインメントを元にスキャフォールディングする
JuiceBox : コンタクトマップを見ながらscaffoldsの繋がりを編集する

準備

Juicerのインストール

singularityコンテナのビルド

$ qlogin -l s_vmem=20G -l mem_req=20G
$ singularity build juicer-1.0.13.sif docker://aidenlab/juicer:latest

できているかどうかテスト

$ singularity exec juicer-1.0.13.sif juicer.sh -h
Usage: juicer.sh [-g genomeID] [-d topDir] [-s site] [-a about] [-R end]
                 [-S stage] [-p chrom.sizes path] [-y restriction site file]
                 [-z reference genome file] [-D Juicer scripts directory]
                 [-b ligation] [-t threads] [-r] [-h] [-x]
* [genomeID] must be defined in the script, e.g. "hg19" or "mm10" (default
  "hg19"); alternatively, it can be defined using the -z command
* [topDir] is the top level directory (default
  "/home/mikasaka/tools")
     [topDir]/fastq must contain the fastq files
     [topDir]/splits will be created to contain the temporary split files
     [topDir]/aligned will be created for the final alignment
* [site] must be defined in the script, e.g.  "HindIII" or "MboI"
  (default "MboI")
* [about]: enter description of experiment, enclosed in single quotes
* [stage]: must be one of "merge", "dedup", "final", "postproc", "early", "alignonly", .
    -Use "merge" when alignment has finished but the merged_sort file has not
     yet been created.
    -Use "dedup" when the files have been merged into merged_sort but
     merged_nodups has not yet been created.
    -Use "final" when the reads have been deduped into merged_nodups but the
     final stats and hic files have not yet been created.
    -Use "postproc" when the hic files have been created and only
     postprocessing feature annotation remains to be completed.
    -Use "early" for an early exit, before the final creation of the stats and
     hic files
* [chrom.sizes path]: enter path for chrom.sizes file
* [restriction site file]: enter path for restriction site file (locations of
  restriction sites in genome; can be generated with the script
  misc/generate_site_positions.py)
* [reference genome file]: enter path for reference sequence file, BWA index
  files must be in same directory
* [Juicer scripts directory]: set the Juicer directory,
  which should have scripts/ references/ and restriction_sites/ underneath it
  (default /aidenlab)
* [ligation junction]: use this string when counting ligation junctions
* [threads]: number of threads when running BWA alignment
* -x: exclude fragment-delimited maps from hic file creation
* -h: print this help and exit

helpが表示されていたらOK

3D-DNAのインストール

condaで仮想環境3d-dnaをつくっておく

$ conda create -n 3d-dna # python==2.7.5が入った
$ conda activate 3d-dna

conda経由でインストールする

$ conda install numpy　#Python 3.9.7が入った
$ conda install -c conda-forge scipy
$ conda install -c conda-forge matplotlib
$ conda install -c bioconda java-jdk
$ conda install -c conda-forge parallel

lastzのインストール

$ git clone https://github.com/lastz/lastz.git
$ cd lastz/

make‑include.makのinstallDir = ${HOME}/tools/lastz/binに書き換え

$ cd src/
$ make
$ make install
$ make install
（それぞれインストール先が表示される）
$ make test
（特に何も表示されない）

3D-DNA本体のインストール

$ git clone https://github.com/theaidenlab/3d-dna.git

実行スクリプトに以下を加える

export PATH=$PATH:/home/mikasaka/tools/lastz/bin
export PATH=$PATH:/home/mikasaka/tools/3d-dna

JuiceBoxのインストール

コンタクトマップの可視化とスキャフォールドの編集に必要

ここからデスクトップ版をダウンロードする
https://github.com/aidenlab/Juicebox/wiki/Download

実行

作業ディレクトリの構成

juicer　作業ディレクトリ
juicer/fastq Hi-Cのfastqを置く(シンボリックリンクでOK)
juicer/reference assemblyとbwa indexを置く

bwa indexの作成

コマンド例

$ bwa index juicer/reference/genome.fasta

chrom.sizesファイルの作成

コマンド例

$ singularity exec /usr/local/biotools/b/bioawk:1.0--hed695b0_5 \
    bioawk -c fastx '{print $name"\t"length($seq)}' \
    reference/genome.fasta > juicer/chrom.sizes

chrom.sizesはjuicer直下におく

Hi-Cのアラインメント（Juicer)

コマンド例

$ singularity exec juicer-1.0.13.sif juicer.sh \
      -d juicer \
      -z juicer/reference/genome.fasta \
      -s none \
      -p juicer/chrom.sizes \
      -t 8

-d: 作業ディレクトリ(ここに結果ファイルが入る)
-z: bwa indexのディレクトリ
-s none: Omni-Cの場合DNaseを使っているのでrestriction sitesはないのでnoneとする
-p : chrom.sizesファイルを指定

コンタクトマップの作成とスキャフォールディング(3D-DNA)

すごくたくさんのファイルができるので, 別の作業ディレクトリ3d-dnaをつくって、その下にスクリプトを置いて実行

コマンド例

$ bash /installdir/3d-dna/run-asm-pipeline.sh \
                        genome.fasta \
                        juicer/aligned/merged_nodups.txt

結果

a) .fasta files
“FINAL” – chromosome-length scaffolds; # これがスキャフォールディング済の結果fasta
“final” – input with all the misjoin correction introduced;
b) .hic files
“FINAL“ - after the addition of gaps to the chromosome-length assembly (built on request with --build-gapped-map option);

~.final.* は~.rowchrom.*のシンボリックリンクになっている

(optional)Juiceboxを使って編集する

参考　https://www.dnazoo.org/methods

~.final.hic (または~.rowchrom.hic)
~.final.assembly (または~.rowchrom.assembly)
これらをJuiceBoxから読み込むとコンタクトマップが表示される

(optional) 3D-DNAパイプラインでFinalizeする

5でscaffoldsを編集したら3D-DNAの　run-asm-pipeline-post-review.sh　を実行

そのまま実行すると上書きされるので、3D-DNAの作業ディレクトリをそっくりコピー(元のディレクトリはrename)して以前のversionを保存しておく

コマンド例

$ bash run-asm-pipeline-post-review.sh --sort-output \
            -s seal \
            -i 500 \
            -r result.donefinal.review.assembly \
            ../juicer.fasta \
            ../juicer/aligned/merged_nodups.txt

-s : stage　よくわからないので追記
-r : Juiceboxでreview済のアセンブル
-i : この数値より短いcontigは含めない

Notes

3D-DNAの作業ディレクトリにあるgenome.mnd.txtはjuicer/aligned/merged_nodups.txtのシンボリックリンクなので、juicerのディレクトリを変更すると参照先がなくなる.
unlinkしてもう一度正しいPATHからln -sしておく

こちらは日本語で書いてあってありがたかった　https://qiita.com/awieeeee/items/21dd85f9848e3613a12f ↩

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up