LoginSignup
0
0

R3(41) on "W.a.t.m.i. statistical ideas of the past 50 years? " Andrew Gelman, Aki Vehtari

Last updated at Posted at 2021-11-06

R3(References on References on References) on "What are the most important statistical ideas of the past 50 years? " Andrew Gelman, Aki Vehtari(39)

R3(0) on "What are the most important statistical ideas of the past 50 years? " Andrew Gelman, Aki Vehtari
https://qiita.com/kaizen_nagoya/items/a8eac9afbf16d2188901

What are the most important statistical ideas of the past 50 years?
Andrew Gelman, Aki Vehtari
https://arxiv.org/abs/2012.00174

References 41

Efron, B. and Hastie, T. (2016). Computer Age Statistical Inference: Algorithms, Evidence, and Data Science. Cambridge University Press.

References on 41

41.1

Abu-Mostafa, Y. 1995. Hints. Neural Computation, 7, 639–671.

References on 41.1

41.1.1

Abu-Mostafa, Y. 1989. The Vapnik-Chervonenkis dimension: Information ver-sus complexity in learning. Neural Comp. 1, 312-317.

41.1.2

Abu-Mostafa, Y. 1990. Learning from hints in neural networks. J. Complex. 6, 192-198.

41.1.3

Abu-Mostafa, Y. 1993a. Hints and the VC dimension. Neural Comp. 5, 278-288. Abu-Mostafa, Y. 1993b. A method for learning from hints. In Advances in Neural Information Processing Systems, S. Hanson et al., eds., Vol. 5, pp. 73-80.

41.1.4

Morgan Kaufmann, San Mateo, CA. Abu-Mostafa, Y. 1995. Financial market applications of learning from hints. In Neural Networks in the Capital Markets, A. Refenes, ed., pp. 221-232. Wiley, London, UK.

41.1.5

Akaike, H. 1969. Fitting autoregressive models for prediction. Ann. Inst. Stat. Math. 21, 243-247.

41.1.6

AI-Mashouq, K., and Reed, I. 1991. Including hints in training neural networks. Neural Comp. 3,418-427.

41.1.7

Amaldi, E. 1991. On the complexity of training perceptrons. In Proceedings of the 1991 International Conference on Artificial Neural Networks (ICANN '91),

41.1.8

T. Kohonen, K. Makisara, 0. Simula, and J. Kangas, eds., pp. 55-60. North Holland, Amsterdam.

41.1.9

Blumer, A., Ehrenfeucht, A., Haussler, D., and Warmuth, M. 1989. Learnability and the Vapnik-Chervonenkis dimension. J. ACM 36, 929-965.

41.1.10

Cataltepe, Z., and Abu-Mostafa, Y. 1994. Estimating learning performance using hints. In Proceedings of the 1993 Connectionist Models Summer School, M. Mozer et al., eds., pp. 380-386. Erlbaum, Hillsdale, NJ. Cover, T., and Thomas, J. 1991. Elements of Information Theory. Wiley-Interscience, New York. Duda, R., and Hart, P. 1973. Pattern Classification and Scene Analysis. John Wiley, New York. Fyfe, W. 1992. Invariance hints and the VC dimension. Ph.D. thesis, Computer Science Department, Caltech (Caltech-CS-TR-92-20).
Hecht-Nielsen, R. 1990. Neurocomputing. Addison-Wesley, Reading, MA. Hertz, K., Krough; A., and Palmer, R. 1991. Introduction to the Theory of Neural Computation, Lecture Notes, Vol. 1. Santa Fe Institute Studies in The Sciences of Complexity. Hinton, G. 1987. Learning translation invariant recognition in a massively par-allel network. Proc. Conf. Parallel Architectures and Languages Europe, 1-13. Hinton, G., Williams, C., and Revow, M. 1992. Adaptive elastic models for hand-printed character recognition. In Advances in Neural Information Processing Systems, J. Moody, S. Hanson, and R. Lippmann, eds., Vol. 4, pp. 512-519. Morgan Kaufmann, San Mateo, CA. Hu, M. 1962. Visual pattern recognition by moment invariants. IRE Trans. Inform. Theory IT 179-187. Judd, J. S. 1990. Neural Network Design and the Complexity of Learning, MIT Press, Cambridge, MA. Leen, T. 1995. From data distributions to regularization in invariant learning. Neural Comp. (to appear). Malkiel, B. 1973. A Random Walk Down Wall Street. W. W. Norton, New York. McClelland, J., and Rumelhart, D. 1988. Explorations in Parallel Distributed Pro-cessing. MIT Press, Cambridge, MA. Minsky, M., and Papert, S. 1988. Perceptrons, expanded edition. MIT Press, Cambridge, MA. •
Moody, J. 1992. The effective number of parameters: An analysis of generaliza-tion and regularization in nonlinear learning systems. In Advances in Neural Information Processing Systems, J. Moody, S. Hanson, and R. Lippmann, eds., Vol. 4, pp. 847-854. Morgan Kaufmann, San Mateo, CA. Moody, J., and Wu, L. 1994. Statistical analysis and forecasting of high frequency
Hints 671
foreign exchange rates. In Proceedings of Neural Networks in the Capital Markets, Y. Abu-Mostafa et al., eds. Omlin, C., and Giles, C. L. 1992. Training second-order recurrent neural net-works using hints. Machine Learning: Proceedings of the Ninth International Conference, ML-92, D. Sleeman and P. Edwards; eds., Morgan Kaufmann, San Mateo, CA.
Poggio, T., and Vetter, T. 1992. Recognition and structure from one 2D model view: Observations on prototypes, object classes and symmetries. AI Memo No. 1347, Massachusetts Institute of Technology. Rumelhart, D., Hinton, G., and Williams, R. 1986. Learning Internal Represen-tations by Error Propagation. In Parallel Distributed Processing, D. Rumelhart et al., eds., Vol. 1, pp. 318-362. MIT Press, Cambridge, MA. Suddarth, S., and Holden, A. 1991. Symbolic neural systems and the use of hints for developing complex systems. Int. J. Machine Studies, 35,291. Vapnik, V., and Chervonenkis, A. 1971. On the uniform convergence of relative frequencies of events to their probabilities. Theory Prob. Appl. 16, 264-280. Weigend, A., and Rumelhart, D. 1991. Generalization through minimal net-works with application to forecasting. In Proceedings INTERFACE'91-Comput-ing Science and Statistics (23rd Symposium), E. Keramidas, ed., pp. 362-370. Interface Foundation of North America. Weigend, A., Huberman, B., and Rumelhart, D. 1990. Predicting the future: A connectionist approach. Int. J. Neural Syst. 1, 193-209. Weigend, A., Rumelhart, D., and Huberman, B. 1991. Generalization by weight elimination with application to forecasting. In Advances in-Neural Information Processing Systems, R. Lippmann, J. Moody, and D. Touretzky, eds., Vol. 3, pp. 875-882. Morgan Kaufmann, San Mateo, CA. Wismer, D., and Chattergy, R. 1978. Introduction to Nonlinear Optimization. North Holland, Amsterdam

41.2

Achanta, R., and Hastie, T. 2015. Telugu OCR Framework using Deep Learning. Tech.
rept. Statistics Department, Stanford University.

41.3

Akaike, H. 1973. Information theory and an extension of the maximum likelihood principle. Pages 267–281 of: Second International Symposium on Information Theory
(Tsahkadsor, 1971). Akade ́miai Kiado ́, Budapest.

41.4

Anderson, T. W. 2003. An Introduction to Multivariate Statistical Analysis. Third edn. Wiley Series in Probability and Statistics. Wiley-Interscience.

41.5

Bastien, F., Lamblin, P., Pascanu, R., Bergstra, J., Goodfellow, I. J., Bergeron, A.,Bouchard, N., and Bengio, Y. 2012. Theano: new features and speed improvements.
Deep Learning and Unsupervised Feature Learning NIPS 2012 Workshop.

41.6

Becker, R., Chambers, J., and Wilks, A. 1988. The New S Language: A Programming Environment for Data Analysis and Graphics. Pacific Grove, CA: Wadsworth and
Brooks/Cole.

41.7

Bellhouse, D. R. 2004. The Reverend Thomas Bayes, FRS: A biography to celebrate the tercentenary of his birth. Statist. Sci., 19(1), 3–43. With comments and a rejoinder by the author.

41.8

Bengio, Y., Courville, A., and Vincent, P. 2013. Representation learning: a review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1798–1828.

41.9

Benjamini, Y., and Hochberg, Y. 1995. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B, 57(1), 289–300.

41.10

Benjamini, Y., and Yekutieli, D. 2005. False discovery rate-adjusted multiple confidence intervals for selected parameters. J. Amer. Statist. Assoc., 100(469), 71–93.

41.11

Berger, J. O. 2006. The case for objective Bayesian analysis. Bayesian Anal., 1(3),385–402 (electronic).

41.12

Berger, J. O., and Pericchi, L. R. 1996. The intrinsic Bayes factor for model selection
and prediction. J. Amer. Statist. Assoc., 91(433), 109–122.
Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian,
J., Warde-Farley, D., and Bengio, Y. 2010 (June). Theano: a CPU and GPU math expression compiler. In: Proceedings of the Python for Scientific Computing Con- ference (SciPy).
Berk, R., Brown, L., Buja, A., Zhang, K., and Zhao, L. 2013. Valid post-selection inference. Ann. Statist., 41(2), 802–837.
453
454 References
Berkson, J. 1944. Application of the logistic function to bio-assay. J. Amer. Statist. Assoc., 39(227), 357–365.
Bernardo, J. M. 1979. Reference posterior distributions for Bayesian inference. J. Roy. Statist. Soc. Ser. B, 41(2), 113–147. With discussion.
Birch, M. W. 1964. The detection of partial association. I. The 2􏰔2 case. J. Roy. Statist. Soc. Ser. B, 26(2), 313–324.
Bishop, C. 1995. Neural Networks for Pattern Recognition. Clarendon Press, Oxford. Boos, D. D., and Serfling, R. J. 1980. A note on differentials and the CLT and LIL for
statistical functions, with application to M -estimates. Ann. Statist., 8(3), 618–624. Boser, B., Guyon, I., and Vapnik, V. 1992. A training algorithm for optimal margin
classifiers. In: Proceedings of COLT II.
Breiman, L. 1996. Bagging predictors. Mach. Learn., 24(2), 123–140.
Breiman, L. 1998. Arcing classifiers (with discussion). Annals of Statistics, 26, 801–
849.
Breiman, L. 2001. Random forests. Machine Learning, 45, 5–32.
Breiman, L., Friedman, J., Olshen, R. A., and Stone, C. J. 1984. Classification and
Regression Trees. Wadsworth Statistics/Probability Series. Wadsworth Advanced
Books and Software.
Carlin, B. P., and Louis, T. A. 1996. Bayes and Empirical Bayes Methods for Data
Analysis. Monographs on Statistics and Applied Probability, vol. 69. Chapman &
Hall.
Carlin, B. P., and Louis, T. A. 2000. Bayes and Empirical Bayes Methods for Data
Analysis. 2 edn. Texts in Statistical Science. Chapman & Hall/CRC.
Chambers, J. M., and Hastie, T. J. (eds). 1993. Statistical Models in S. Chapman &
Hall Computer Science Series. Chapman & Hall.
Cleveland, W. S. 1981. LOWESS: A program for smoothing scatterplots by robust
locally weighted regression. Amer. Statist., 35(1), 54.
Cox, D. R. 1958. The regression analysis of binary sequences. J. Roy. Statist. Soc. Ser.
B, 20, 215–242.
Cox, D. R. 1970. The Analysis of Binary Data. Methuen’s Monographs on Applied
Probability and Statistics. Methuen & Co.
Cox, D. R. 1972. Regression models and life-tables. J. Roy. Statist. Soc. Ser. B, 34(2),
187–220.
Cox, D. R. 1975. Partial likelihood. Biometrika, 62(2), 269–276.
Cox, D. R., and Hinkley, D. V. 1974. Theoretical Statistics. Chapman & Hall.
Cox, D. R., and Reid, N. 1987. Parameter orthogonality and approximate conditional
inference. J. Roy. Statist. Soc. Ser. B, 49(1), 1–39. With a discussion.
Crowley, J. 1974. Asymptotic normality of a new nonparametric statistic for use in
organ transplant studies. J. Amer. Statist. Assoc., 69(348), 1006–1011.
de Finetti, B. 1972. Probability, Induction and Statistics. The Art of Guessing. John
Wiley & Sons, London-New York-Sydney.
Dembo, A., Cover, T. M., and Thomas, J. A. 1991. Information-theoretic inequalities.
IEEE Trans. Inform. Theory, 37(6), 1501–1518.
Dempster, A. P., Laird, N. M., and Rubin, D. B. 1977. Maximum likelihood from
incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B, 39(1), 1–38. Diaconis, P., and Ylvisaker, D. 1979. Conjugate priors for exponential families. Ann.
Statist., 7(2), 269–281.

References 455
DiCiccio, T., and Efron, B. 1992. More accurate confidence intervals in exponential families. Biometrika, 79(2), 231–245.
Donoho, D. L. 2015. 50 years of data science. R-bloggers. www.r-bloggers. com/50-years-of-data-science-by-david-donoho/.
Edwards, A. W. F. 1992. Likelihood. Expanded edn. Johns Hopkins University Press. Revised reprint of the 1972 original.
Efron, B. 1967. The two sample problem with censored data. Pages 831–853 of: Proc. 5th Berkeley Symp. Math. Statist. and Prob., Vol. 4. University of California Press.
Efron, B. 1975. Defining the curvature of a statistical problem (with applications to second order efficiency). Ann. Statist., 3(6), 1189–1242. With discussion and a reply by the author.
Efron, B. 1977. The efficiency of Cox’s likelihood function for censored data. J. Amer. Statist. Assoc., 72(359), 557–565.
Efron, B. 1979. Bootstrap methods: Another look at the jackknife. Ann. Statist., 7(1), 1–26.
Efron, B. 1982. The Jackknife, the Bootstrap and Other Resampling Plans. CBMS-NSF Regional Conference Series in Applied Mathematics, vol. 38. Society for Industrial and Applied Mathematics (SIAM).
Efron, B. 1983. Estimating the error rate of a prediction rule: Improvement on cross- validation. J. Amer. Statist. Assoc., 78(382), 316–331.
Efron, B. 1985. Bootstrap confidence intervals for a class of parametric problems. Biometrika, 72(1), 45–58.
Efron, B. 1986. How biased is the apparent error rate of a prediction rule? J. Amer. Statist. Assoc., 81(394), 461–470.
Efron, B. 1987. Better bootstrap confidence intervals. J. Amer. Statist. Assoc., 82(397), 171–200. With comments and a rejoinder by the author.
Efron, B. 1988. Logistic regression, survival analysis, and the Kaplan–Meier curve. J. Amer. Statist. Assoc., 83(402), 414–425.
Efron, B. 1993. Bayes and likelihood calculations from confidence intervals. Biometrika, 80(1), 3–26.
Efron, B. 1998. R. A. Fisher in the 21st Century (invited paper presented at the 1996 R. A. Fisher Lecture). Statist. Sci., 13(2), 95–122. With comments and a rejoinder by the author.
Efron, B. 2004. The estimation of prediction error: Covariance penalties and cross- validation. J. Amer. Statist. Assoc., 99(467), 619–642. With comments and a rejoin- der by the author.
Efron, B. 2010. Large-Scale Inference: Empirical Bayes Methods for Estimation, Test- ing, and Prediction. Institute of Mathematical Statistics Monographs, vol. 1. Cam- bridge University Press.
Efron, B. 2011. Tweedie’s formula and selection bias. J. Amer. Statist. Assoc., 106(496), 1602–1614.
Efron, B. 2014a. Estimation and accuracy after model selection. J. Amer. Statist. Assoc., 109(507), 991–1007.
Efron, B. 2014b. Two modeling strategies for empirical Bayes estimation. Statist. Sci., 29(2), 285–301.
Efron, B. 2015. Frequentist accuracy of Bayesian estimates. J. Roy. Statist. Soc. Ser. B, 77(3), 617–646.

456 References
Efron, B. 2016. Empirical Bayes deconvolution estimates. Biometrika, 103(1), 1–20. Efron, B., and Feldman, D. 1991. Compliance as an explanatory variable in clinical
trials. J. Amer. Statist. Assoc., 86(413), 9–17.
Efron, B., and Gous, A. 2001. Scales of evidence for model selection: Fisher versus
Jeffreys. Pages 208–256 of: Model Selection. IMS Lecture Notes Monograph Series, vol. 38. Beachwood, OH: Institute of Mathematics and Statististics. With discussion and a rejoinder by the authors.
Efron, B., and Hinkley, D. V. 1978. Assessing the accuracy of the maximum likelihood estimator: Observed versus expected Fisher information. Biometrika, 65(3), 457– 487. With comments and a reply by the authors.
Efron, B., and Morris, C. 1972. Limiting the risk of Bayes and empirical Bayes estima- tors. II. The empirical Bayes case. J. Amer. Statist. Assoc., 67, 130–139.
Efron, B., and Morris, C. 1977. Stein’s paradox in statistics. Scientific American, 236(5), 119–127.
Efron, B., and Petrosian, V. 1992. A simple test of independence for truncated data with applications to redshift surveys. Astrophys. J., 399(Nov), 345–352.
Efron, B., and Stein, C. 1981. The jackknife estimate of variance. Ann. Statist., 9(3), 586–596.
Efron, B., and Thisted, R. 1976. Estimating the number of unseen species: How many words did Shakespeare know? Biometrika, 63(3), 435–447.
Efron, B., and Tibshirani, R. 1993. An Introduction to the Bootstrap. Monographs on Statistics and Applied Probability, vol. 57. Chapman & Hall.
Efron, B., and Tibshirani, R. 1997. Improvements on cross-validation: The .632+ boot- strap method. J. Amer. Statist. Assoc., 92(438), 548–560.
Efron, B., Hastie, T., Johnstone, I., and Tibshirani, R. 2004. Least angle regression. An- nals of Statistics, 32(2), 407–499. (with discussion, and a rejoinder by the authors). Finney, D. J. 1947. The estimation from individual records of the relationship between
dose and quantal response. Biometrika, 34(3/4), 320–334.
Fisher, R. A. 1915. Frequency distribution of the values of the correlation coefficient in
samples from an indefinitely large population. Biometrika, 10(4), 507–521.
Fisher, R. A. 1925. Theory of statistical estimation. Math. Proc. Cambridge Phil. Soc.,
22(7), 700–725.
Fisher, R. A. 1930. Inverse probability. Math. Proc. Cambridge Phil. Soc., 26(10),
528–535.
Fisher, R. A., Corbet, A., and Williams, C. 1943. The relation between the number of
species and the number of individuals in a random sample of an animal population.
J. Anim. Ecol., 12, 42–58.
Fithian, W., Sun, D., and Taylor, J. 2014. Optimal inference after model selection.
ArXiv e-prints, Oct.
Freund, Y., and Schapire, R. 1996. Experiments with a new boosting algorithm. Pages
148–156 of: Machine Learning: Proceedings of the Thirteenth International Con-
ference. Morgan Kauffman, San Francisco.
Freund, Y., and Schapire, R. 1997. A decision-theoretic generalization of online learn-
ing and an application to boosting. Journal of Computer and System Sciences, 55,
119–139.
Friedman, J. 2001. Greedy function approximation: a gradient boosting machine. An-
nals of Statistics, 29(5), 1189–1232.

References 457
Friedman, J., and Popescu, B. 2005. Predictive Learning via Rule Ensembles. Tech. rept. Stanford University.
Friedman, J., Hastie, T., and Tibshirani, R. 2000. Additive logistic regression: a statis- tical view of boosting (with discussion). Annals of Statistics, 28, 337–307.
Friedman, J., Hastie, T., and Tibshirani, R. 2009. glmnet: Lasso and elastic-net regu- larized generalized linear models. R package version 1.1-4.
Friedman, J., Hastie, T., and Tibshirani, R. 2010. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1), 1–22. Geisser, S. 1974. A predictive approach to the random effect model. Biometrika, 61,
101–107.
Gerber, M., and Chopin, N. 2015. Sequential quasi Monte Carlo. J. Roy. Statist. Soc.
B, 77(3), 509–580. with discussion, doi: 10.1111/rssb.12104.
Gholami, S., Janson, L., Worhunsky, D. J., Tran, T. B., Squires, Malcolm, I., Jin, L. X.,
Spolverato, G., Votanopoulos, K. I., Schmidt, C., Weber, S. M., Bloomston, M., Cho, C. S., Levine, E. A., Fields, R. C., Pawlik, T. M., Maithel, S. K., Efron, B., Norton, J. A., and Poultsides, G. A. 2015. Number of lymph nodes removed and survival after gastric cancer resection: An analysis from the US Gastric Cancer Collaborative. J. Amer. Coll. Surg., 221(2), 291–299.
Good, I., and Toulmin, G. 1956. The number of new species, and the increase in popu- lation coverage, when a sample is increased. Biometrika, 43, 45–63.
Hall, P. 1988. Theoretical comparison of bootstrap confidence intervals. Ann. Statist., 16(3), 927–985. with discussion and a reply by the author.
Hampel, F. R. 1974. The influence curve and its role in robust estimation. J. Amer. Statist. Assoc., 69, 383–393.
Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J., and Stahel, W. A. 1986. Robust Statistics: The approach based on influence functions. Wiley Series in Probability and Mathematical Statistics. John Wiley & Sons.
Harford, T. 2014. Big data: A big mistake? Significance, 11(5), 14–19.
Hastie, T., and Loader, C. 1993. Local regression: automatic kernel carpentry (with
discussion). Statistical Science, 8, 120–143.
Hastie, T., and Tibshirani, R. 1990. Generalized Additive Models. Chapman and Hall. Hastie, T., and Tibshirani, R. 2004. Efficient quadratic regularization for expression
arrays. Biostatistics, 5(3), 329–340.
Hastie, T., Tibshirani, R., and Friedman, J. 2009. The Elements of Statistical Learning.
Data mining, Inference, and Prediction. Second edn. Springer Series in Statistics.
Springer.
Hastie, T., Tibshirani, R., and Wainwright, M. 2015. Statistical Learning with Sparsity:
the Lasso and Generalizations. Chapman and Hall, CRC Press.
Hoeffding, W. 1952. The large-sample power of tests based on permutations of obser-
vations. Ann. Math. Statist., 23, 169–192.
Hoeffding, W. 1965. Asymptotically optimal tests for multinomial distributions. Ann.
Math. Statist., 36(2), 369–408.
Hoerl, A. E., and Kennard, R. W. 1970. Ridge regression: Biased estimation for nonor-
thogonal problems. Technometrics, 12(1), 55–67.
Huber, P. J. 1964. Robust estimation of a location parameter. Ann. Math. Statist., 35,
73–101.

458 References
Jaeckel, L. A. 1972. Estimating regression coefficients by minimizing the dispersion of the residuals. Ann. Math. Statist., 43, 1449–1458.
James, W., and Stein, C. 1961. Estimation with quadratic loss. Pages 361–379 of: Proc. 4th Berkeley Symposium on Mathematical Statistics and Probability, vol. I. University of California Press.
Jansen, L., Fithian, W., and Hastie, T. 2015. Effective degrees of freedom: a flawed metaphor. Biometrika, 102(2), 479–485.
Javanmard, A., and Montanari, A. 2014. Confidence intervals and hypothesis testing for high-dimensional regression. J. of Machine Learning Res., 15, 2869–2909.
Jaynes, E. 1968. Prior probabilities. IEEE Trans. Syst. Sci. Cybernet., 4(3), 227–241. Jeffreys, H. 1961. Theory of Probability. Third ed. Clarendon Press.
Johnson, N. L., and Kotz, S. 1969. Distributions in Statistics: Discrete Distributions.
Houghton Mifflin Co.
Johnson, N. L., and Kotz, S. 1970a. Distributions in Statistics. Continuous Univariate
Distributions. 1. Houghton Mifflin Co.
Johnson, N. L., and Kotz, S. 1970b. Distributions in Statistics. Continuous Univariate
Distributions. 2. Houghton Mifflin Co.
Johnson, N. L., and Kotz, S. 1972. Distributions in Statistics: Continuous Multivariate
Distributions. John Wiley & Sons.
Kaplan, E. L., and Meier, P. 1958. Nonparametric estimation from incomplete obser-
vations. J. Amer. Statist. Assoc., 53(282), 457–481.
Kass, R. E., and Raftery, A. E. 1995. Bayes factors. J. Amer. Statist. Assoc., 90(430),
773–795.
Kass, R. E., and Wasserman, L. 1996. The selection of prior distributions by formal
rules. J. Amer. Statist. Assoc., 91(435), 1343–1370.
Kuffner, R., Zach, N., Norel, R., Hawe, J., Schoenfeld, D., Wang, L., Li, G., Fang,
L., Mackey, L., Hardiman, O., Cudkowicz, M., Sherman, A., Ertaylan, G., Grosse- Wentrup, M., Hothorn, T., van Ligtenberg, J., Macke, J. H., Meyer, T., Scholkopf, B., Tran, L., Vaughan, R., Stolovitzky, G., and Leitner, M. L. 2015. Crowdsourced analysis of clinical trial data to predict amyotrophic lateral sclerosis progression. Nat Biotech, 33(1), 51–57.
LeCun, Y., and Cortes, C. 2010. MNIST Handwritten Digit Database. http://yann.lecun.com/exdb/mnist/.
LeCun, Y., Bengio, Y., and Hinton, G. 2015. Deep learning. Nature, 521(7553), 436– 444.
Lee, J., Sun, D., Sun, Y., and Taylor, J. 2016. Exact post-selection inference, with application to the Lasso. Annals of Statistics, 44(3), 907–927.
Lehmann, E. L. 1983. Theory of Point Estimation. Wiley Series in Probability and Mathematical Statistics. John Wiley & Sons.
Leslie, C., Eskin, E., Cohen, A., Weston, J., and Noble, W. S. 2003. Mismatch string kernels for discriminative pretein classification. Bioinformatics, 1, 1–10.
Liaw, A., and Wiener, M. 2002. Classification and regression by randomForest. R News, 2(3), 18–22.
Liberman, M. 2015 (April). “Reproducible Research and the Common Task Method”. Simons Foundation Frontiers of Data Science Lecture, April 1, 2015; video avail- able.

References 459
Lockhart, R., Taylor, J., Tibshirani, R., and Tibshirani, R. 2014. A significance test for the lasso. Annals of Statistics, 42(2), 413–468. With discussion and a rejoinder by the authors.
Lynden-Bell, D. 1971. A method for allowing for known observational selection in small samples applied to 3CR quasars. Mon. Not. Roy. Astron. Soc., 155(1), 95–18.
Mallows, C. L. 1973. Some comments on Cp . Technometrics, 15(4), 661–675. Mantel, N., and Haenszel, W. 1959. Statistical aspects of the analysis of data from
retrospective studies of disease. J. Natl. Cancer Inst., 22(4), 719–748.
Mardia, K. V., Kent, J. T., and Bibby, J. M. 1979. Multivariate Analysis. Academic
Press.
McCullagh, P., and Nelder, J. 1983. Generalized Linear Models. Monographs on Statis-
tics and Applied Probability. Chapman & Hall.
McCullagh, P., and Nelder, J. 1989. Generalized Linear Models. Second edn. Mono-
graphs on Statistics and Applied Probability. Chapman & Hall.
Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., and Teller, E. 1953. Equation of state calculations by fast computing machines. J. Chem. Phys.,
21(6), 1087–1092.
Miller, Jr, R. G. 1964. A trustworthy jackknife. Ann. Math. Statist, 35, 1594–1605. Miller, Jr, R. G. 1981. Simultaneous Statistical Inference. Second edn. Springer Series
in Statistics. New York: Springer-Verlag.
Nesterov, Y. 2013. Gradient methods for minimizing composite functions. Mathemati-
cal Programming, 140(1), 125–161.
Neyman, J. 1937. Outline of a theory of statistical estimation based on the classical
theory of probability. Phil. Trans. Roy. Soc., 236(767), 333–380.
Neyman, J. 1977. Frequentist probability and frequentist statistics. Synthese, 36(1),
97–131.
Neyman, J., and Pearson, E. S. 1933. On the problem of the most efficient tests of
statistical hypotheses. Phil. Trans. Roy. Soc. A, 231(694-706), 289–337.
Ng, A. 2015. Neural Networks. http://deeplearning.stanford.edu/
wiki/index.php/Neural_Networks. Lecture notes.
Ngiam, J., Chen, Z., Chia, D., Koh, P. W., Le, Q. V., and Ng, A. 2010. Tiled convo-
lutional neural networks. Pages 1279–1287 of: Lafferty, J., Williams, C., Shawe- Taylor, J., Zemel, R., and Culotta, A. (eds), Advances in Neural Information Pro- cessing Systems 23. Curran Associates, Inc.
O’Hagan, A. 1995. Fractional Bayes factors for model comparison. J. Roy. Statist. Soc. Ser. B, 57(1), 99–138. With discussion and a reply by the author.
Park, T., and Casella, G. 2008. The Bayesian lasso. J. Amer. Statist. Assoc., 103(482), 681–686.
Pearson, K. 1900. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Phil. Mag., 50(302), 157–175.
Pritchard, J., Stephens, M., and Donnelly, P. 2000. Inference of Population Structure using Multilocus Genotype Data. Genetics, 155(June), 945–959.
Quenouille, M. H. 1956. Notes on bias in estimation. Biometrika, 43, 353–360.
R Core Team. 2015. R: A Language and Environment for Statistical Computing. R
Foundation for Statistical Computing, Vienna, Austria.

460 References
Ridgeway, G. 2005. Generalized boosted models: A guide to the gbm package. Avail- able online.
Ridgeway, G., and MacDonald, J. M. 2009. Doubly robust internal benchmarking and false discovery rates for detecting racial bias in police stops. J. Amer. Statist. Assoc., 104(486), 661–668.
Ripley, B. D. 1996. Pattern Recognition and Neural Networks. Cambridge University Press.
Robbins, H. 1956. An empirical Bayes approach to statistics. Pages 157–163 of: Proc. 3rd Berkeley Symposium on Mathematical Statistics and Probability, vol. I. Univer- sity of California Press.
Rosset, S., Zhu, J., and Hastie, T. 2004. Margin maximizing loss functions. In: Thrun, S., Saul, L., and Scho ̈lkopf, B. (eds), Advances in Neural Information Processing Systems 16. MIT Press.
Rubin, D. B. 1981. The Bayesian bootstrap. Ann. Statist., 9(1), 130–134.
Savage, L. J. 1954. The Foundations of Statistics. John Wiley & Sons; Chapman &
Hill.
Schapire, R. 1990. The strength of weak learnability. Machine Learning, 5(2), 197–
227.
Schapire, R., and Freund, Y. 2012. Boosting: Foundations and Algorithms. MIT Press. Scheffe ́,H.1953. Amethodforjudgingallcontrastsintheanalysisofvariance.
Biometrika, 40(1-2), 87–110.
Scho ̈lkopf,B.,andSmola,A.2001. LearningwithKernels:SupportVectorMa-
chines, Regularization, Optimization, and Beyond (Adaptive Computation and Ma-
chine Learning). MIT Press.
Schwarz, G. 1978. Estimating the dimension of a model. Ann. Statist., 6(2), 461–464. Senn, S. 2008. A note concerning a selection “paradox” of Dawid’s. Amer. Statist.,
62(3), 206–210.
Soric, B. 1989. Statistical “discoveries” and effect-size estimation. J. Amer. Statist.
Assoc., 84(406), 608–610.
Spevack, M. 1968. A Complete and Systematic Concordance to the Works of Shake-
speare. Vol. 1–6. Georg Olms Verlag.
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. 2014.
Dropout: a simple way to prevent neural networks from overfitting. J. of Machine
Learning Res., 15, 1929–1958.
Stefanski, L., and Carroll, R. J. 1990. Deconvoluting kernel density estimators. Statis-
tics, 21(2), 169–184.
Stein, C. 1956. Inadmissibility of the usual estimator for the mean of a multivariate nor-
mal distribution. Pages 197–206 of: Proc. 3rd Berkeley Symposium on Mathematical
Statististics and Probability, vol. I. University of California Press.
Stein, C. 1981. Estimation of the mean of a multivariate normal distribution. Ann.
Statist., 9(6), 1135–1151.
Stein, C. 1985. On the coverage probability of confidence sets based on a prior distribu-
tion. Pages 485–514 of: Sequential Methods in Statistics. Banach Center Publication,
vol. 16. PWN, Warsaw.
Stigler, S. M. 2006. How Ronald Fisher became a mathematical statistician. Math. Sci.
Hum. Math. Soc. Sci., 176(176), 23–30.

References 461
Stone, M. 1974. Cross-validatory choice and assessment of statistical predictions. J. Roy. Statist. Soc. B, 36, 111–147. With discussion and a reply by the author.
Storey, J. D., Taylor, J., and Siegmund, D. 2004. Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: A unified approach. J. Roy. Statist. Soc. B, 66(1), 187–205.
Tanner, M. A., and Wong, W. H. 1987. The calculation of posterior distributions by data augmentation. J. Amer. Statist. Assoc., 82(398), 528–550. With discussion and a reply by the authors.
Taylor, J., Loftus, J., and Tibshirani, R. 2015. Tests in adaptive regression via the Kac- Rice formula. Annals of Statistics, 44(2), 743–770.
Thisted, R., and Efron, B. 1987. Did Shakespeare write a newly-discovered poem? Biometrika, 74(3), 445–455.
Tibshirani, R. 1989. Noninformative priors for one parameter of many. Biometrika, 76(3), 604–608.
Tibshirani, R. 1996. Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. B, 58(1), 267–288.
Tibshirani, R. 2006. A simple method for assessing sample sizes in microarray experi- ments. BMC Bioinformatics, 7(Mar), 106.
Tibshirani, R., Bien, J., Friedman, J., Hastie, T., Simon, N., Taylor, J., and Tibshirani, R. 2012. Strong rules for discarding predictors in lasso-type problems. J. Roy. Statist. Soc. B, 74.
Tibshirani, R., Tibshirani, R., Taylor, J., Loftus, J., and Reid, S. 2016. selectiveInfer- ence: Tools for Post-Selection Inference. R package version 1.1.3.
Tukey, J. W. 1958. “Bias and confidence in not-quite large samples” in Abstracts of Papers. Ann. Math. Statist., 29(2), 614.
Tukey, J. W. 1960. A survey of sampling from contaminated distributions. Pages 448–485 of: Contributions to Probability and Statistics: Essays in Honor of Harold Hotelling (I. Olkin, et. al, ed.). Stanford University Press.
Tukey, J. W. 1962. The future of data analysis. Ann. Math. Statist., 33, 1–67.
Tukey, J. W. 1977. Exploratory Data Analysis. Behavioral Science Series. Addison-
Wesley.
van de Geer, S., Bu ̈hlmann, P., Ritov, Y., and Dezeure, R. 2014. On asymptotically op-
timal confidence regions and tests for high-dimensional models. Annals of Statistics,
42(3), 1166–1202.
Vapnik, V. 1996. The Nature of Statistical Learning Theory. Springer.
Wager, S., Wang, S., and Liang, P. S. 2013. Dropout training as adaptive regularization.
Pages 351–359 of: Burges, C., Bottou, L., Welling, M., Ghahramani, Z., and Wein- berger, K. (eds), Advances in Neural Information Processing Systems 26. Curran Associates, Inc.
Wager, S., Hastie, T., and Efron, B. 2014. Confidence intervals for random forests: the jacknife and the infintesimal jacknife. J. of Machine Learning Res., 15, 1625–1651.
Wahba, G. 1990. Spline Models for Observational Data. SIAM.
Wahba, G., Lin, Y., and Zhang, H. 2000. GACV for support vector machines. Pages
297–311 of: Smola, A., Bartlett, P., Scho ̈lkopf, B., and Schuurmans, D. (eds), Ad-
vances in Large Margin Classifiers. MIT Press.
Wald, A. 1950. Statistical Decision Functions. John Wiley & Sons; Chapman & Hall.

462 References
Wedderburn, R. W. M. 1974. Quasi-likelihood functions, generalized linear models, and the Gauss–Newton method. Biometrika, 61(3), 439–447.
Welch, B. L., and Peers, H. W. 1963. On formulae for confidence points based on integrals of weighted likelihoods. J. Roy. Statist. Soc. B, 25, 318–329.
Westfall, P., and Young, S. 1993. Resampling-based Multiple Testing: Examples and Methods for p-Value Adjustment. Wiley Series in Probability and Statistics. Wiley- Interscience.
Xie, M., and Singh, K. 2013. Confidence distribution, the frequentist distribution esti- mator of a parameter: A review. Int. Statist. Rev., 81(1), 3–39. with discussion.
Ye, J. 1998. On measuring and correcting the effects of data mining and model selec- tion. J. Amer. Statist. Assoc., 93(441), 120–131.
Zhang, C.-H., and Zhang, S. 2014. Confidence intervals for low-dimensional parame- ters with high-dimensional data. J. Roy. Statist. Soc. B, 76(1), 217–242.
Zou, H., Hastie, T., and Tibshirani, R. 2007. On the “degrees of freedom” of the lasso. Ann. Statist., 35(5), 2173–2192.

参考資料(References)

Data Scientist の基礎(2)
https://qiita.com/kaizen_nagoya/items/8b2f27353a9980bf445c

岩波数学辞典 二つの版がCDに入ってお得
https://qiita.com/kaizen_nagoya/items/1210940fe2121423d777

岩波数学辞典
https://qiita.com/kaizen_nagoya/items/b37bfd303658cb5ee11e

アンの部屋(人名から学ぶ数学:岩波数学辞典)英語(24)
https://qiita.com/kaizen_nagoya/items/e02cbe23b96d5fb96aa1

<この記事は個人の過去の経験に基づく個人の感想です。現在所属する組織、業務とは関係がありません。>

最後までおよみいただきありがとうございました。

いいね 💚、フォローをお願いします。

Thank you very much for reading to the last sentence.

Please press the like icon 💚 and follow me for your happy life.

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0