More than 5 years have passed since last update.

DESeq2を使った正規化

Posted at 2017-12-27

RのパッケージであるDESeq2の使い方を紹介したいと思います。

この項ではSRP052999のリードカウントデータを用いて説明していきます。
データの取得からマッピング・リードカウントの算出までの流れは
pfastq-dump
STAR
で解説しています。

DESeq2のインストール

source("https://bioconductor.org/biocLite.R")
biocLite("DESeq2")

サンプルシートの作成

サンプルの情報をタブ区切りのテキストにまとめておきます。

sample_info.txt

SRR_id	GSM_id	sample_name
SRR1781884	GSM1232104	Adult_BM_HSC
SRR1781885	GSM1232105	Adult_BM_HSC
SRR1781886	GSM1232106	Adult_BM_LMPP
SRR1781887	GSM1232107	Adult_BM_LMPP
SRR1781888	GSM1232108	Adult_BM_CLP
SRR1781889	GSM1232109	Adult_BM_CLP
SRR1781890	GSM1232110	Adult_BM_proB
SRR1781891	GSM1232111	Adult_BM_proB

実行

Rで以下のスクリプトを実行します。

setwd("PATH-TO-COUNT_FILEs&sample_info.txt") #作業ディレクトリの変更

# 作成したサンプルシートの読み取り
sample_db_name <- "sample_info.txt"
sample_db <- read.table(sample_db_name,sep="\t",header=T)
get_sampleInfo <- function( sampleID, sample_db ) {
  sub_ids <- which( sample_db$sample_id==sampleID )
  sample_info <- as.character( sample_db[sub_ids,]$title )
  return( sample_info )
}

# 作成したサンプルシート上のサンプルIDから、
# カウントデータにアクセスして、データフレームにまとめる
loop=0
count_summary=NULL

for( sample_id in sample_db$SRR_id ) {
  if (loop==0) {    hoge=read.table(paste(sample_id,".ReadsPerGene.out.tab",sep=""),sep="\t",header=F,row.names=1,skip=4)
    count_summary=hoge[,1,drop=FALSE]
    loop=loop+1
  } else {
  hoge=read.table(paste(sample_id,".ReadsPerGene.out.tab",sep=""),sep="\t",header=F,row.names=1,skip=4)
  count_summary=cbind(count_summary,hoge[,1,drop=FALSE])
  }
}
colnames(count_summary)=sample_db$SRR_id
count_summary1 <- sapply(count_summary,as.integer)
rownames(count_summary1) <- rownames(count_summary)
sample_IDs <- colnames(count_summary1)
sample_info <- as.character(lapply(sample_IDs,get_sampleInfo,sample_db))
sample_type_list <- unique(sample_info)
group = as.factor(sample_db$sample_name)

# DESeq2の実行
library(DESeq2)
dds <- DESeqDataSetFromMatrix(countData = count_summary1,colData=sample_db,design= ~ sample_name)
dds <- DESeq(dds)
rld <- rlog(dds, blind=FALSE) #正規化後のNormalized countを取得 
mat <- assay(rld)
mat <- mat - rowMeans(mat) 
mat <- data.frame(GeneName=rownames(mat),mat)
write.table(mat,file="normalized_count.txt",sep="\t",quote=F,row.names = F)

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up