##参考
Robinson MD, McCarthy DJ, Smyth GK (2010). “edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.” Bioinformatics, 26(1), 139-140. doi: 10.1093/bioinformatics/btp616.
McCarthy DJ, Chen Y, Smyth GK (2012). “Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation.” Nucleic Acids Research, 40(10), 4288-4297. doi: 10.1093/nar/gks042.
Young MD, Wakefield MJ, Smyth GK, Oshlack A (2010). “Gene ontology analysis for RNA-seq: accounting for selection bias.” Genome Biology, 11, R14.
##準備:edgeRで発現変動解析
###サンプルデータを使って解析
suppressPackageStartupMessages(library(edgeR))
suppressPackageStartupMessages(library(goseq))
suppressPackageStartupMessages(library(org.Hs.eg.db))
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(patchwork))
### DE analysis ###
table.summary = read.table(system.file("extdata", "Li_sum.txt", package = "goseq"),
sep = "\t", header = TRUE, stringsAsFactors = FALSE)
counts = table.summary[,-1]
rownames(counts) = table.summary[,1]
grp = factor(rep(c("Control", "Treated"),times = c(4, 3)))
summarized = DGEList(counts,lib.size = colSums(counts), group = grp)
disp = estimateCommonDisp(summarized)
disp$common.dispersion
tested = exactTest(disp)
topTags(tested)
###edgeRの解析結果(の一部)
logFC | logCPM | PValue | FDR | |
---|---|---|---|---|
ENSG00000127954 | 11.557868 | 6.680748 | 2.574972e-80 | 1.274766e-75 |
ENSG00000151503 | 5.398963 | 8.499530 | 1.781732e-65 | 4.410322e-61 |
ENSG00000096060 | 4.897600 | 9.446705 | 7.983756e-60 | 1.317479e-55 |
###発現変動遺伝子 (DEGs) をラベル
- 1 for DEGs, 0 for not DEGs
genes = as.integer(p.adjust(tested$table$PValue[tested$table$logFC != 0],
method = "BH") < 0.05)
genes = as.integer(p.adjust(tested$table$PValue[tested$table$logFC != 0], method = "BH") < 0.05)
names(genes) = row.names(tested$table[tested$table$logFC != 0,])
table(genes)
##goseqで遺伝子オントロジー解析
###PWF (Probability Weighting Function)を得る
### GO analysis ###
pwf = nullp(genes, "hg19", "ensGene")
head(pwf)
GO.wall = goseq(pwf, "hg19", "ensGene")
head(GO.wall)
###goseqの解析結果(の一部)
category | over_represented_pvalue | under_represented_pvalue | numDEInCat | numInCat | term | ontology |
---|---|---|---|---|---|---|
GO:0005737 | 9.843483e-10 | 1 | 1954 | 8980 | cytoplasm | CC |
##ggplot2で解析結果を可視化
###可視化に使うカラムを選択
-
term
はGO termの名前 -
numInCat
はそのGO termに属するすべての遺伝子の数 -
numDEInCat
はそのGO termに属する発現変動遺伝子の数
res <- GO.wall %>%
dplyr::select(term, over_represented_pvalue, numInCat, numDEInCat, ontology)
head(res)
###go_plot関数を定義
-
p value
のトップ10を選択
go_plot <- function(x){
x %>%
top_n(10, wt = -over_represented_pvalue) %>%
mutate(hitsPerc = numDEInCat * 100 / numInCat) %>%
ggplot(aes(x = -log10(over_represented_pvalue),
y = reorder(term, -over_represented_pvalue),
colour = hitsPerc,
size = numDEInCat)) +
geom_point() +
guides(size = guide_legend(order = 1, reverse = TRUE),
colour = guide_colourbar(order = 2)) +
labs(x = "-log10 (p value)", y = "GO term", colour = "Hits (%)", size = "Count") +
theme_grey(base_size = 20)
}
###オントロジーごとに分けてプロット
BP <- res %>%
filter(ontology == "BP")
MF <- res %>%
filter(ontology == "MF")
CC <- res %>%
filter(ontology == "CC")
p1 <- go_plot(BP) + labs(title = "Biological Processes")
p2 <- go_plot(MF) + labs(title = "Molecular Functions")
p3 <- go_plot(CC) + labs(title = "Cellular Components")
p1 / p2 / p3
##プロット結果
-
p value
順に並べた -
Count
はGO termに属する発現変動遺伝子の数 -
Hits
はGO termに属するすべての遺伝子のうちの発現変動遺伝子の割合