More than 1 year has passed since last update.

R3 on "What are the most important statistical ideas of the past 50 years? " Andrew Gelman, Aki Vehtari(7)

Last updated at 2024-05-06Posted at 2021-10-03

R3(References on References on References) on "What are the most important statistical ideas of the past 50 years? " Andrew Gelman, Aki Vehtari(4)

R3 on "What are the most important statistical ideas of the past 50 years? " Andrew Gelman, Aki Vehtari(0)
https://qiita.com/kaizen_nagoya/items/a8eac9afbf16d2188901

What are the most important statistical ideas of the past 50 years?
Andrew Gelman, Aki Vehtari
https://arxiv.org/abs/2012.00174

References

7

Bengio, Y., LeCun, Y., and Hinton, G. (2015). Deep learning. Nature 521, 436–444.

BIBLIOGRAPHY on 7

7.1

Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., and Yakhnenko, O. (2013b).Translating embeddings for modeling multi-relational data. In C. Burges, L. Bottou,M. Welling, Z. Ghahramani, and K. Weinberger, editors, Advances in Neural InformationProcessing Systems 26 , pages 2787–2795. Curran Associates, Inc. 479

Reference on 7.1

[1] K.Bollacker,C.Evans,P.Paritosh,T.Sturge,andJ.Taylor.Freebase:acollaborativelycreated graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, 2008.
[2] A. Bordes, X. Glorot, J. Weston, and Y. Bengio. A semantic matching energy function for learning with multi-relational data. Machine Learning, 2013.
[3] A. Bordes, J. Weston, R. Collobert, and Y. Bengio. Learning structured embeddings of knowl- edge bases. In Proceedings of the 25th Annual Conference on Artificial Intelligence (AAAI), 2011.
[4] X. Glorot and Y. Bengio. Understanding the difficulty of training deep feedforward neural net- works. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS)., 2010.
[5] R. A. Harshman and M. E. Lundy. Parafac: parallel factor analysis. Computational Statistics & Data Analysis, 18(1):39–72, Aug. 1994.
[6] R. Jenatton, N. Le Roux, A. Bordes, G. Obozinski, et al. A latent factor model for highly multi-relational data. In Advances in Neural Information Processing Systems (NIPS 25), 2012.
[7] C. Kemp, J. B. Tenenbaum, T. L. Griffiths, T. Yamada, and N. Ueda. Learning systems of concepts with an infinite relational model. In Proceedings of the 21st Annual Conference on Artificial Intelligence (AAAI), 2006.
[8] T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems (NIPS 26), 2013.
[9] G. Miller. WordNet: a Lexical Database for English. Communications of the ACM, 38(11):39– 41, 1995.
[10] K. Miller, T. Griffiths, and M. Jordan. Nonparametric latent feature models for link prediction. In Advances in Neural Information Processing Systems (NIPS 22), 2009.
[11] M. Nickel, V. Tresp, and H.-P. Kriegel. A three-way model for collective learning on multi- relational data. In Proceedings of the 28th International Conference on Machine Learning (ICML), 2011.
[12] M. Nickel, V. Tresp, and H.-P. Kriegel. Factorizing YAGO: scalable machine learning for linked data. In Proceedings of the 21st international conference on World Wide Web (WWW), 2012.
[13] A. P. Singh and G. J. Gordon. Relational learning via collective matrix factorization. In
Proceedings of the 14th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2008.
[14] R. Socher, D. Chen, C. D. Manning, and A. Y. Ng. Learning new facts from knowledge bases with neural tensor networks and semantic word vectors. In Advances in Neural Information Processing Systems (NIPS 26), 2013.
[15] I. Sutskever, R. Salakhutdinov, and J. Tenenbaum. Modelling relational data using bayesian clustered tensor factorization. In Advances in Neural Information Processing Systems (NIPS 22), 2009.
[16] J. Weston, A. Bordes, O. Yakhnenko, and N. Usunier. Connecting language and knowledge bases with embedding models for relation extraction. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2013.
[17] J. Zhu. Max-margin nonparametric latent feature models for link prediction. In Proceedings of the 29th International Conference on Machine Learning (ICML), 2012.

7.2

Bornschein, J. and Bengio, Y. (2015). Reweighted wake-sleep. In ICLR’2015,arXiv:1406.2751 . 690

7.3

Bornschein, J., Shabanian, S., Fischer, A., and Bengio, Y. (2015). Training bidirectionalHelmholtz machines. Technical report, arXiv:1506.03877. 690

7.4

Boser, B. E., Guyon, I. M., and Vapnik, V. N. (1992). A training algorithm for opti-mal margin classiﬁers. In COLT ’92: Proceedings of the ﬁfth annual workshop onComputational learning theory, pages 144–152, New York, NY, USA. ACM. 17, 139

7.5

Bottou, L. (1998). Online algorithms and stochastic approximations. In D. Saad, editor,Online Learning in Neural Networks. Cambridge University Press, Cambridge, UK. 292

7.6

Bottou, L. (2011). From machine learning to machine reasoning. Technical report,arXiv.1102.1808. 394, 396

7.7

Bottou, L. (2015). Multilayer neural networks. Deep Learning Summer School. 434Bottou, L. and Bousquet, O. (2008). The tradeoﬀs of large scale learning. In NIPS’2008 .279, 292

7.8

Boulanger-Lewandowski, N., Bengio, Y., and Vincent, P. (2012). Modeling temporaldependencies in high-dimensional sequences: Application to polyphonic music generationand transcription. In ICML’12 . 682

7.9

Boureau, Y., Ponce, J., and LeCun, Y. (2010). A theoretical analysis of feature pooling invision algorithms. In Proc. International Conference on Machine learning (ICML’10).339

7.10

Boureau, Y., Le Roux, N., Bach, F., Ponce, J., and LeCun, Y. (2011). Ask the locals:multi-way local pooling for image recognition. In Proc. International Conference onComputer Vision (ICCV’11). IEEE. 339

7.11

Bourlard, H. and Kamp, Y. (1988). Auto-association by multilayer perceptrons andsingular value decomposition. Biological Cybernetics, 59, 291–294. 499

7.12

Bourlard, H. and Wellekens, C. (1989). Speech pattern discrimination and multi-layeredperceptrons. Computer Speech and Language, 3, 1–19. 454

7.13

Boyd, S. and Vandenberghe, L. (2004). Convex Optimization. Cambridge UniversityPress, New York, NY, USA. 91723

7.14

Brady, M. L., Raghavan, R., and Slawny, J. (1989). Back-propagation fails to separatewhere perceptrons succeed. IEEE Transactions on Circuits and Systems,36, 665–674.282

7.15

Brakel, P., Stroobandt, D., and Schrauwen, B. (2013). Training energy-based models fortime-series imputation. Journal of Machine Learning Research,14, 2771–2797. 671,695

7.16

Brand, M. (2003). Charting a manifold. In NIPS’2002 , pages 961–968. MIT Press. 160,516

7.17

Breiman, L. (1994). Bagging predictors. Machine Learning, 24(2), 123–140. 253

7.18

Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. (1984). Classiﬁcation andRegression Trees. Wadsworth International Group, Belmont, CA. 142

7.19

Bridle, J. S. (1990). Alphanets: a recurrent ‘neural’ network architecture with a hiddenMarkov model interpretation. Speech Communication, 9(1), 83–92. 182

7.20

Briggman, K., Denk, W., Seung, S., Helmstaedter, M. N., and Turaga, S. C. (2009).Maximin aﬃnity learning of image segmentation. In NIPS’2009 , pages 1865–1873. 353

7.21

Brown, P. F., Cocke, J., Pietra, S. A. D., Pietra, V. J. D., Jelinek, F., Laﬀerty, J. D.,Mercer, R. L., and Roossin, P. S. (1990). A statistical approach to machine translation.Computational linguistics, 16(2), 79–85. 19

7.22

Brown, P. F., Pietra, V. J. D., DeSouza, P. V., Lai, J. C., and Mercer, R. L. (1992). Class-based n-gram models of natural language. Computational Linguistics,18, 467–479.458

7.23

Bryson, A. and Ho, Y. (1969). Applied optimal control: optimization, estimation, andcontrol. Blaisdell Pub. Co. 221

7.24

Bryson, Jr., A. E. and Denham, W. F. (1961). A steepest-ascent method for solvingoptimum programming problems. Technical Report BR-1303, Raytheon Company,Missle and Space Division. 221

7.25

Buciluˇa, C., Caruana, R., and Niculescu-Mizil, A. (2006). Model compression. InProceedings of the 12th ACM SIGKDD international conference on Knowledge discoveryand data mining, pages 535–541. ACM. 443

7.26

Burda, Y., Grosse, R., and Salakhutdinov, R. (2015). Importance weighted autoencoders.arXiv preprint arXiv:1509.00519 . 695

7.27

Cai, M., Shi, Y., and Liu, J. (2013). Deep maxout neural networks for speech recognition.In Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshopon, pages 291–296. IEEE. 190724

7.28

Carreira-Perpiñan, M. A. and Hinton, G. E. (2005). On contrastive divergence learning.In R. G. Cowell and Z. Ghahramani, editors, Proceedings of the Tenth InternationalWorkshop on Artiﬁcial Intelligence and Statistics (AISTATS’05), pages 33–40. Societyfor Artiﬁcial Intelligence and Statistics. 609

7.29

Caruana, R. (1993). Multitask connectionist learning. In Proc. 1993 Connectionist ModelsSummer School, pages 372–379. 241

7.30

Cauchy, A. (1847). Méthode générale pour la résolution de systèmes d’équations simul-tanées. In Compte rendu des séances de l’académie des sciences, pages 536–538. 81,221

7.31

Cayton, L. (2005). Algorithms for manifold learning. Technical Report CS2008-0923,UCSD. 160

7.32

Chandola, V., Banerjee, A., and Kumar, V. (2009). Anomaly detection: A survey. ACMcomputing surveys (CSUR), 41(3), 15. 100

7.33

Chapelle, O., Weston, J., and Schölkopf, B. (2003). Cluster kernels for semi-supervisedlearning. In S. Becker, S. Thrun, and K. Obermayer, editors, Advances in NeuralInformation Processing Systems 15 (NIPS’02), pages 585–592, Cambridge, MA. MITPress. 240

7.34

Chapelle, O., Schölkopf, B., and Zien, A., editors (2006). Semi-Supervised Learning. MITPress, Cambridge, MA. 240, 539

7.35

Chellapilla, K., Puri, S., and Simard, P. (2006). High Performance Convolutional NeuralNetworks for Document Processing. In Guy Lorette, editor, Tenth InternationalWorkshop on Frontiers in Handwriting Recognition, La Baule (France). Université deRennes 1, Suvisoft. http://www.suvisoft.com. 22, 23, 440

7.36

Chen, B., Ting, J.-A., Marlin, B. M., and de Freitas, N. (2010). Deep learning of invariantspatio-temporal features from video. NIPS*2010 Deep Learning and UnsupervisedFeature Learning Workshop. 354

7.37

Chen, S. F. and Goodman, J. T. (1999). An empirical study of smoothing techniques forlanguage modeling. Computer, Speech and Language, 13(4), 359–393. 457, 468

7.38

Chen, T., Du, Z., Sun, N., Wang, J., Wu, C., Chen, Y., and Temam, O. (2014a). DianNao:A small-footprint high-throughput accelerator for ubiquitous machine-learning. In Pro-ceedings of the 19th international conference on Architectural support for programminglanguages and operating systems, pages 269–284. ACM. 446

7.39

Chen, T., Li, M., Li, Y., Lin, M., Wang, N., Wang, M., Xiao, T., Xu, B., Zhang, C.,and Zhang, Z. (2015). MXNet: A ﬂexible and eﬃcient machine learning library forheterogeneous distributed systems. arXiv preprint arXiv:1512.01274 . 25725

7.40

Chen, Y., Luo, T., Liu, S., Zhang, S., He, L., Wang, J., Li, L., Chen, T., Xu, Z., Sun, N.,et al. (2014b). DaDianNao: A machine-learning supercomputer. In Microarchitecture(MICRO), 2014 47th Annual IEEE/ACM International Symposium on, pages 609–622.IEEE. 446

7.41

Chilimbi, T., Suzue, Y., Apacible, J., and Kalyanaraman, K. (2014). Project Adam:Building an eﬃcient and scalable deep learning training system. In 11th USENIXSymposium on Operating Systems Design and Implementation (OSDI’14). 442

7.42

Cho, K., Raiko, T., and Ilin, A. (2010). Parallel tempering is eﬃcient for learning restrictedBoltzmann machines. In IJCNN’2010 . 601, 612

7.43

Cho, K., Raiko, T., and Ilin, A. (2011). Enhanced gradient and adaptive learning rate fortraining restricted Boltzmann machines. In ICML’2011 , pages 105–112. 670

7.44

Cho, K., van Merriënboer, B., Gulcehre, C., Bougares, F., Schwenk, H., and Bengio, Y.(2014a). Learning phrase representations using RNN encoder-decoder for statisticalmachine translation. In Proceedings of the Empiricial Methods in Natural LanguageProcessing (EMNLP 2014). 390, 469, 470

7.45

Cho, K., Van Merriënboer, B., Bahdanau, D., and Bengio, Y. (2014b). On the prop-erties of neural machine translation: Encoder-decoder approaches. ArXiv e-prints,abs/1409.1259. 407

7.46

Choromanska, A., Henaﬀ, M., Mathieu, M., Arous, G. B., and LeCun, Y. (2014). Theloss surface of multilayer networks. 282, 283

7.47

Chorowski, J., Bahdanau, D., Cho, K., and Bengio, Y. (2014). End-to-end continuousspeech recognition using attention-based recurrent NN: First results. arXiv:1412.1602.455

7.48

Chrisman, L. (1991). Learning recursive distributed representations for holistic computa-tion. Connection Science, 3(4), 345–366. 468

7.49

Christianson, B. (1992). Automatic Hessians by reverse accumulation. IMA Journal ofNumerical Analysis, 12(2), 135–150. 220

7.50

Chrupala, G., Kadar, A., and Alishahi, A. (2015). Learning language through pictures.arXiv 1506.03694. 407

7.51

Chung, J., Gulcehre, C., Cho, K., and Bengio, Y. (2014). Empirical evaluation of gatedrecurrent neural networks on sequence modeling. NIPS’2014 Deep Learning workshop,arXiv 1412.3555. 407, 455

7.52

Chung, J., Gülçehre, Ç., Cho, K., and Bengio, Y. (2015a). Gated feedback recurrentneural networks. In ICML’15 . 407

7.53

Chung, J., Kastner, K., Dinh, L., Goel, K., Courville, A., and Bengio, Y. (2015b). Arecurrent latent variable model for sequential data. In NIPS’2015 . 694726

参考資料(References)

Data Scientist の基礎(2)
https://qiita.com/kaizen_nagoya/items/8b2f27353a9980bf445c

岩波数学辞典　二つの版がCDに入ってお得
https://qiita.com/kaizen_nagoya/items/1210940fe2121423d777

岩波数学辞典
https://qiita.com/kaizen_nagoya/items/b37bfd303658cb5ee11e

アンの部屋（人名から学ぶ数学：岩波数学辞典）英語(24)
https://qiita.com/kaizen_nagoya/items/e02cbe23b96d5fb96aa1

＜この記事は個人の過去の経験に基づく個人の感想です。現在所属する組織、業務とは関係がありません。＞

最後までおよみいただきありがとうございました。

いいね　💚、フォローをお願いします。

Thank you very much for reading to the last sentence.

Please press the like icon 💚　and follow me for your happy life.

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up