Reference
- An Introduction to the ‘spls’ Package, Version 1.0
- Sparse Partial Least Squares Regression for Simultaneous Dimension Reduction and Variable Selection
library
install.packages("spls")
Updating HTML index of packages in '.Library'
Making 'packages.html' ...
done
library(spls)
Sparse Partial Least Squares (SPLS) Regression and
Classification (version 2.2-3)
library(ggplot2)
data
data(yeast)
xはクロマチンの免疫沈降、yは発現サイクル?
help(yeast)
yeast {spls} | R Documentation |
Yeast Cell Cycle Dataset
Description
This is the Yeast Cell Cycle dataset used in Chun and Keles (2010).
Usage
data(yeast)
Format
A list with two components:
- x
-
ChIP-chip data. A matrix with 542 rows and 106 columns.
- y
-
Cell cycle gene expression data. A matrix with 542 rows and 18 columns.
Details
Matrix y
is cell cycle gene expression data (Spellman et al., 1998)
of 542 genes from an α factor based experiment.
Each column corresponds to mRNA levels
measured at every 7 minutes during 119 minutes (a total of 18 measurements).
Matrix x
is the chromatin immunoprecipitation on chip (ChIP-chip) data of
Lee et al. (2002) and it contains the binding information for 106
transcription factors. See Chun and Keles (2010) for more details.
Source
Lee TI, Rinaldi NJ, Robert F, Odom DT, Bar-Joseph Z, Gerber GK, Hannett NM, Harbison CT, Thomson CM, Simon I, Zeitlinger J, Jennings EG, Murray HL, Gordon DB, Ren B, Wyrick JJ, Tagne JB, Volkert TL, Fraenkel E, Gifford DK, and Young RA (2002), "Transcriptional regulatory networks in Saccharomyces cerevisiae", Science, Vol. 298, pp. 799–804.
Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB, Brown PO, Botstein D, and Futcher B (1998), "Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hydrization", Molecular Biology of the Cell, Vol. 9, pp. 3273–3279.
References
Chun H and Keles S (2010), "Sparse partial least squares for simultaneous dimension reduction and variable selection", Journal of the Royal Statistical Society - Series B, Vol. 72, pp. 3–25.
Examples
data(yeast)
yeast$x[1:5,1:5]
yeast$y[1:5,1:5]
yeast$x[1:5,1:5]
ABF1_YPD | ACE2_YPD | ADR1_YPD | ARG80_YPD | ARG81_YPD | |
---|---|---|---|---|---|
21 | -0.2722730 | 0.21932294 | 0.9238359567 | -0.4755756 | -0.10389318 |
41 | 0.1691280 | 0.53831198 | 0.0097604993 | -0.3219534 | -0.19750606 |
71 | -0.1388962 | 0.02636382 | 0.0877516229 | -0.2234093 | 0.10307741 |
78 | -0.2865169 | -0.31409427 | -0.0454998435 | 0.3262217 | 0.27757502 |
102 | -0.4950561 | -0.14827419 | 0.0002987512 | -0.2179458 | -0.02539585 |
plot(yeast$x[,1])
yeast$y[1:5,1:5]
alpha0 | alpha7 | alpha14 | alpha21 | alpha28 | |
---|---|---|---|---|---|
1 | -0.36 | -0.42 | 0.29 | -0.14 | -0.19 |
2 | 1.04 | 0.19 | 0.47 | -1.03 | -0.63 |
5 | -0.30 | -0.45 | 0.75 | 0.37 | 0.27 |
8 | -0.46 | 0.12 | -0.06 | -0.76 | -0.70 |
9 | -1.35 | -0.86 | -0.22 | -0.38 | -0.65 |
plot(yeast$y[,1])
↓ 変数が542個もある・・
str(yeast)
List of 2
$ x: num [1:542, 1:106] -0.272 0.169 -0.139 -0.287 -0.495 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:542] "21" "41" "71" "78" ...
.. ..$ : chr [1:106] "ABF1_YPD" "ACE2_YPD" "ADR1_YPD" "ARG80_YPD" ...
$ y: num [1:542, 1:18] -0.36 1.04 -0.3 -0.46 -1.35 -2.06 -1.61 -0.07 0.11 0.15 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:542] "1" "2" "5" "8" ...
.. ..$ : chr [1:18] "alpha0" "alpha7" "alpha14" "alpha21" ...
SPLS
set.seed(1)
cv <- cv.spls(yeast$x, yeast$y, eta = seq(0.1,0.9,0.1), K = c(5:10))
eta = 0.1
eta = 0.2
eta = 0.3
eta = 0.4
eta = 0.5
eta = 0.6
eta = 0.7
eta = 0.8
eta = 0.9
Optimal parameters: eta = 0.6, K = 8
f <- spls(yeast$x, yeast$y, eta = cv$eta.opt, K = cv$K.opt)
print(f)
Sparse Partial Least Squares for multivariate responses
----
Parameters: eta = 0.6, K = 8, kappa = 0.5
PLS algorithm:
pls2 for variable selection, simpls for model fitting
SPLS chose 56 variables among 106 variables
Selected variables:
ACE2_YPD ARG80_YPD ARG81_YPD ASH1_YPD AZF1_YPD
BAS1_YPD CBF1_YPD CHA4_YPD CRZ1_YPD FHL1_YPD
FKH1_YPD FKH2_YPD FZF1_YPD GAT1_YPD GAT3_YPD
GCN4_YPD GCR2_YPD GLN3_YPD HAA1_YPD HAP2_YPD
HAP5_YPD HIR1_YPD HIR2_YPD IME4_YPD INO4_YPD
A1..MATA1._YPD MBP1_YPD MCM1_YPD MET4_YPD MSN2_YPD
NDD1_YPD NRG1_YPD PHD1_YPD PHO2_YPD PUT3_YPD
RCS1_YPD REB1_YPD RFX1_YPD RIM101_YPD RME1_YPD
RTG1_YPD RTG3_YPD SIP4_YPD SOK2_YPD STB1_YPD
STE12_YPD STP2_YPD SWI4_YPD SWI5_YPD SWI6_YPD
THI2_YPD YAP1_YPD YAP6_YPD YAP7_YPD YFL044C_YPD
YJL206C_YPD
coef.f <- coef(f)
coef.f[1:5,1:5]
alpha0 | alpha7 | alpha14 | alpha21 | alpha28 | |
---|---|---|---|---|---|
ABF1_YPD | 0.0000000 | 0.000000000 | 0.00000000 | 0.0000000000 | 0.000000000 |
ACE2_YPD | 0.0874325 | 0.068452293 | 0.01374781 | -0.0002541969 | -0.033302624 |
ADR1_YPD | 0.0000000 | 0.000000000 | 0.00000000 | 0.0000000000 | 0.000000000 |
ARG80_YPD | -0.0486881 | -0.019092797 | 0.02063442 | 0.0300421634 | 0.007925553 |
ARG81_YPD | -0.0168849 | 0.009465868 | 0.06353825 | 0.0541704059 | 0.006978985 |
plot.spls(f, yvar=1 )
coefplot.spls( f, nwin=c(2,2), xvar=c(1:4) )