R3(References on References on References) on "W.a.t.m.i. (What are the most important) statistical ideas of the past 50 years? " Andrew Gelman, Aki Vehtari(20)
R3 on "W.a.t.m.i. (What are the most important) statistical ideas of the past 50 years? " Andrew Gelman, Aki Vehtari(0)
https://qiita.com/kaizen_nagoya/items/a8eac9afbf16d2188901
What are the most important statistical ideas of the past 50 years?
Andrew Gelman, Aki Vehtari
https://arxiv.org/abs/2012.00174
References
20
Buntine, W. L., and Weigend, A. S. (1991). Bayesian back-propagation. Complex Systems 5, 603–643.
References on 20
20.1
Experiments on Learning by Back Propagation.
D. Plaut, S. Nowlan, Geoffrey E. Hinton
Computer Science
1986
References on 20.1
20.1.1
Probable networks and plausible predictions - a review of practical Bayesian methods for supervised neural networks
D. Mackay
Computer Science
1995
20.1.2
Practical techniques based on Gaussian approximations for implementation of these powerful methods for controlling, comparing and using adaptive networks are described. Expand
20.1.3
Bayesian learning for neural networks
Geoffrey E. Hinton, R. Neal
Computer Science
1995
20.1.4
Bayesian Learning for Neural Networks shows that Bayesian methods allow complex neural network models to be used without fear of the "overfitting" that can occur with traditional neural network learning methods. Expand
20.1.5
A Practical Bayesian Framework for Backpropagation Networks
D. Mackay
Mathematics, Computer Science
Neural Computation
1992
20.1.6
A quantitative and practical Bayesian framework is described for learning of mappings in feedforward networks that automatically embodies "Occam's razor," penalizing overflexible and overcomplex models. Expand
20.1.7
Bayesian training of backpropagation networks by the hybrid Monte-Carlo method
R. Neal
Computer Science
1992
20.1.8
It is shown that Bayesian training of backpropagation neural networks can feasibly be performed by the Hybrid Monte Carlo method, and the method has been applied to a test problem, demonstrating that it can produce good predictions, as well as an indication of the uncertainty of these predictions. Expand
20.1.9
Issues in Bayesian Analysis of Neural Network Models
Peter Müller, D. Insua
Computer Science, Medicine
Neural Computation
1998
20.1.10
A very efficient Markov chain Monte Carlo scheme is suggested for inference and prediction with fixed-architecture feedforward neural networks and extended to the variable architecture case, providing a data-driven procedure to identify sensible architectures. Expand
20.1.11
Bayesian Back-Propagation
Wray L. Buntine, A. Weigend
Mathematics, Computer Science
Complex Syst.
1991
20.1.12
Connectionist feed-forward networks, t rained with backpropagat ion, can be used both for nonlinear regression and for (discrete one-of-C ) classification. This paper presents approximate Bayesian… Expand
20.1.13
How to Implement A Priori Information: A Statistical Mechanics Approach
J. C. Lemm
Physics, Mathematics
1998
20.1.14
A new general framework is presented for implementing complex a priori knowledge, having in mind especially situations where the number of available training data is small compared to the complexity… Expand
20.1.15
Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition
J. Bridle
Computer Science
NATO Neurocomputing
1989
20.1.16
Two modifications are explained: probability scoring, which is an alternative to squared error minimisation, and a normalised exponential (softmax) multi-input generalisation of the logistic non- linearity of feed-forward non-linear networks with multiple outputs. Expand
20.1.17
Practical Confidence and Prediction Intervals
T. Heskes
Computer Science
NIPS
1996
20.1.18
This work proposes a new method to compute prediction intervals that is better than existing methods with regard to extrapolation and interpolation in data regimes with a limited amount of data, and yields prediction intervals which actual confidence levels are closer to the desired confidence levels. Expand
20.1.19
Improved Gaussian Mixture Density Estimates Using Bayesian Penalty Terms and Network Averaging
Dirk Ormoneit, Volker Tresp
Mathematics, Computer Science
NIPS
1995
20.1.20
Two regularization methods are compared which can be used to improve the generalization capabilities of Gaussian mixture density estimates and Breiman's "bagging", which recently has been found to produce impressive results for classification networks. Expand
###20.1.21
Recurrent neural networks can be trained to be maximum a posteriori probability classifiers
S. Santini, A. Bimbo
Mathematics, Computer Science
Neural Networks
1995
Abstract This paper proves that supervised learning algorithms used to train recurrent neural networks have an equilibrium point when the network implements a maximum a posteriori probability (MAP)… Expand
###20.1.22
An inhibitory weight initialization improves the speed and quality of recurrent neural networks learning
J. Draye, Davor Pavisic, G. Cheron, G. Libert
Computer Science
Neurocomputing
1997
###20.1.23
The effect that a negative initial weight distribution has on the learning time and learning quality of recurrent neural networks is explored and a statistical analysis of the neural transformation is offered to show mathematically that a Negative Initial Weight Distribution has a great positive impact on the network behavior. ###20.1.24
From data-to dynamics: predicting chaotic time series by hierarchical Bayesian neural nets
Takashi Matsumoto, H. Hamagishi, J. Sugi, M. Saito
Mathematics
1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227)
1998
###20.1.25
A hierarchical Bayesian algorithm was used to make predictions of chaotic time series data generated by the Rossler system which is a continuous dynamical system. The scheme infers a nonlinear… Expand
###20.1.26
Experiments in predicting the German stock index DAX with density estimating neural networks
Dirk Ormoneit, R. Neuneier
Economics, Computer Science
IEEE/IAFE 1996 Conference on Computational Intelligence for Financial Engineering (CIFEr)
1996
###20.1.27
It is claimed that for nontrivial target distributions, density estimating networks should lead to improved predictions, because the latter are capable of embodying more complex probability models for the target noise. Expand
###20.1.28
Estimation of Conditional Densities: A Comparison of Neural Network Approaches
R. Neuneier, F. Hergert, W. Finnoff, Dirk Ormoneit
Computer Science
1994
In recent years, neural networks have been successfully used to attack a wide variety of difficult nonlinear regression and classification tasks and their effectiveness, particularly when the… Expand
###20.1.29
NAR time-series prediction: a Bayesian framework and an experiment
M. Crucianu, Crucianu Uhry, J. A. D. Beauville, R. Boné
Computer Science
ESANN
1998
TLDR
The Bayesian framework is extended to Multi-Layer Perceptron models of Non-linear Auto-Regressive time-series and some common simplifications are discussed. Expand
###20.1.30
Multilayer feedforward networks are universal approximators
K. Hornik, M. Stinchcombe, H. White
Mathematics, Computer Science
Neural Networks
1989
###20.1.31
Learning internal representations by error propagation
D. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams
Computer Science, Mathematics
1986
This chapter contains sections titled: The Problem, The Generalized Delta Rule, Simulation Results, Some Further Generalizations, Conclusion
18,987
20.1.32
Intervalles de confiance pour les séries NAR
Travaux des Journées Francophones sur l'Apprentissage,
1998
Intervalles de confiance pour les séries NAR, Travaux des Journées Francophones sur l'Apprentissage
Intervalles de confiance pour les séries NAR, Travaux des Journées Francophones sur l'Apprentissage
1998
The learning procedure can discover appropriate weights in their kind of network, as well as determine an optimal schedule for varying the nonlinearity of the units during a search. Expand
383
20.2
A statistical approach to learning and generalization in layered neural networks
E. Levin, Naftali Tishby, S. Solla
Computer Science
COLT 1989
1989
This paper presents a general statistical description of the problem of learning from examples, which is posed as an optimization problem: a search in the network parameter space for a network that minimizes an additive error function of the statistically independent examples.
20.3
Learning curves in large neural networks
H. Seung, H. Sompolinsky, Naftali Tishby
Computer Science
COLT '91
1991
Two model perceptrons, with weights that are constrained to be discrete, that exhibit sudden learning are discussed, and a general classification of generalization curves in models of realizable rules is proposed. Expand
20.4
Predicting the Future: a Connectionist Approach
A. Weigend, B. Huberman, D. Rumelhart
Computer Science
Int. J. Neural Syst.
1990
Since the ultimate goal is accuracy in the prediction, it is found that sigmoid networks trained with the weight-elimination algorithm outperform traditional nonlinear statistical approaches. Expand
842
20.5
A statistical approach to learning and generalization in layered neural networks
E. Levin, Naftali Tishby, S. Solla
Computer Science
COLT '89
1989
The proposed formalism is applied to the problems of selecting an optimal architecture and the prediction of learning curves and the Gibbs distribution on the ensemble of networks with a fixed architecture is derived. Expand
20.6
What Size Net Gives Valid Generalization?
E. Baum, D. Haussler
Mathematics, Computer Science
Neural Computation
1989
It is shown that if m O(W/ ∊ log N/∊) random examples can be loaded on a feedforward network of linear threshold functions with N nodes and W weights, so that at least a fraction 1 ∊/2 of the examples are correctly classified, then one has confidence approaching certainty that the network will correctly classify a fraction 2 ∊ of future test examples drawn from the same distribution. Expand
1,698
20.17
Supervised Learning of Probability Distributions by Neural Networks
E. Baum, F. Wilczek
Computer Science
NIPS
1987
We propose that the back propagation algorithm for supervised learning can be generalized, put on a satisfactory conceptual footing, and very likely made more efficient by defining the values of the…
20.8
A structural learning algorithm with forgetting of link weights
M. Ishikawa
Computer Science
International 1989 Joint Conference on Neural Networks
1989
A novel learning algorithm is proposed, called structural learning algorithm, which generates a skeletal structure of a network: a network in which Minimum number of links and a minimum number of hidden units are actually used, which solves the first difficulty of trial and error. Expand
20.9
Refinement of approximate domain theories by knowledge-based neural networks
G. Towell, J. Shavlik, M. Noordewier
Computer Science
AAAI 1990
1990
The KBANN system relaxes this constraint through the use of empirical learning methods to refine approximately correct knowledge, used to determine the structure of an artificial neural network and the weights on its links, thereby making the knowledge accessible for modification by neural learning. Expand
20.10
A new error criterion for posterior probability estimation with neural nets
A. El-Jaroudi, J. Makhoul
Computer Science
1990 IJCNN International Joint Conference on Neural Networks
1990
An error criterion for training is introduced which improves the performance of neural nets as posterior probability estimators, as compared to using least squares, and is similar to the Kullback-Leibler information measure.
20.11
Minimum complexity density estimation
A. Barron, T. Cover
Mathematics, Computer Science
IEEE Trans. Inf. Theory
1991
TLDR
An index of resolvability is proved to bound the rate of convergence of minimum complexity density estimators as well as the information-theoretic redundancy of the corresponding total description length to demonstrate the statistical effectiveness of the minimum description-length principle as a method of inference.
20.12
Learning internal representations by error propagation
D. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams
Computer Science, Mathematics
1986
This chapter contains sections titled: The Problem, The Generalized Delta Rule, Simulation Results, Some Further Generalizations, Conclusion
20.13
Unknown Attribute Values in Induction
J. R. Quinlan
Computer Science
ML
1989
TLDR
This paper compares the effectiveness of several approaches to the development and use of decision tree classifiers as measured by their performance on a collection of datasets.
20.14
Probabilistic reasoning in intelligent systems
J. Pearl
Computer Science
1988
TLDR
The author provides a coherent explication of probability as a language for reasoning with partial belief and offers a unifying perspective on other AI approaches to uncertainty, such as the Dempster-Shafer formalism, truth maintenance systems, and nonmonotonic logic.
20.15
Statistical Decision Theory and Bayesian Analysis
J. Berger
Mathematics
1988
An overview of statistical decision theory, which emphasizes the use and application of the philosophical ideas and mathematical structure of decision theory. The text assumes a knowledge of basic…
20.16
Stati sti cal Decision Theory and Bayesian Analysis (New York, Springer-Verlag
Bayesian Back-Propagation
1985
20.17
A Decision Theoret ic Generalization of t he PA C Learning Mod el and It s Application to Some Feed-forwa rd Neural Networks
Information and Control
1991
20.18
A Decision Theoret ic Generalization of the PAC Learning Model and It s Application to Some Feed-forwa rd Neural Networks
Information and Control, to ap pear (1991) .
1991
20.19
Learn ing Classification Trees
Technica l Report FIA-91-30 , RIACS and NASA Ames Research Center , Moffett F ield, CA, 1991; submit ted to Proceedings of the Thi rd Int ernational Works hop on Artificial Intelligence and Statistics.
1991
20.20
Min imum Complexity Densit y Es t imat ion
IEEE Transactions on Inf orm ation Th eory
1991
20.21
W hat Size Net Gives Valid Gener alization?
Neural Computation
1989
20.22
Estimation and Inferenc e by Compact Encoding
Journal of the Roy al Statistical Soci ety B
1987
20.23
Stochast ic Complexity Jou rna l of th e Royal Statistical Soci ety B
Stochast ic Complexity Jou rna l of th e Royal Statistical Soci ety B
1987
A Practi cal Bayesian Fram ework for Backprop Netwo rks
sub mitted to Neural Computation (199 1) .
A T heo ry of Learning Class ifica tion Rules
A T heo ry of Learning Class ifica tion Rules
1991
Ca lcula ti ng Second Derivati ves on Feed forward Networks
Ca lcula ti ng Second Derivati ves on Feed forward Networks
1991
Ca lcula ting Second Derivati ves on Feedforward Networks
sub mitted (1991).
1991
Generalised Pe rformance of Bayes Optimal Classificat ion Algorit hm for Learning a Perceptron
COLT'91: 1991 Workshop on Computational Learning Th eory
1991
Learn ing Classification Tr ees Technica l Report FIA-91-30 , R IACS a nd NASA Am es Research Center
bmit ted to Proceedings of the Thi rd Int ern ational Works hop on Artificial Intelligen ce and Statistics
1991
Learn ing Curves in Large Neu ral Network s
COLT '91: W orkshop on Computational Learning Th eory
1991
##31
Note on Gener aliza tion , Regul ar ization , and Ar chit ecture Select ion in Non -linear Learning Syst ems
Proceedings of the IEEE W orkshop on N eural Networks for Sign al Processing
1991
Note on Generaliza tion , Regular ization , and Ar chitecture Select ion in Non -linear Learning Syst ems
Proceedings of the IEEE Workshop on Neural Networks for Signal Processing (Los Alamitos, CA, IEEE Computer Society , 1991).
1991
Soft Competitive Adaption, (Doctoral dissertation
Technica l Report CM U-CS -91-126 from the Schoo l of Computer Science
1991
Stone - Weierstrass T heorem and Its Application to Neural Networks
IEEE Transactions on N eural Networks
1991
Unifying Bounds on t he Sample Comp lexity of Bayesian Learning Using In formation Theor y and the VC Dimension
COLT'91: W orkshop on Computatio nal Leam ing Th eory
1991
Unifying Bounds on the Sample Comp lexity of Bayesian Learning Using In formation Theory and the VC Dimension
COLT'91: Workshop on Computational Leam ing Th eory (San Mateo, CA , Morgan Kaufmann, 1991). 642 Wray 1. Buntine and Andreas S. Weigend
1991
A New Error Cr iterion for Posterior P robability Estimation with Neural Nets, pages
Intern ational Joint Conference on Neural Networks
1990
Bayesian Met hods a nd Entropy in E cono mics and Econom etrics
Maximum Entropy and Bayesian Methods
1990
Bayesian Met hods and Entropy in Econo mics and Econometrics
Maximum Entropy and Bayesian Methods, edite d by W. T. Grandy, Jr. and L. Schlick, (Norwell, MA , Kluwer, 1990).
1990
Equivalence P roofs for Multi-layer Per ceptron Class ifiers and t he Bayesian Discrimi nation Function
Proceedings of the 1990 Connectionist Models Summer School
1990
##41
Equivalence P roofs for Multi-layer Perceptron Class ifiers and the Bayesian Discrimination Function
Proceedings of the 1990 Connectionist Models Summer School, edite d by David S. Touretzky, Jeffrey L. Elman, Terrence J. Sejnowski, and Geoffrey E . Hinton (San Mateo, CA , Morgan Kaufmann, 1990).
1990
Generalised Additive Models (London
Cha pman and Hall,
1990
Illustrat ion of Bayesian Inference in Norm al Data Models Using Gibbs Sampling
Journ al of the American Statistical A ssociation, 8 5(412) (1990) 972-985.
1990
Illustrat ion of Bayesian Inference in Norm al Data Models Using Gibbs Sampling Journ al of th e American Statistical A ssociation
Illustrat ion of Bayesian Inference in Norm al Data Models Using Gibbs Sampling Journ al of th e American Statistical A ssociation
1990
Object ive Bayesianism and Geom etry
Maximum En tropy and Bayesian Method s, edited by P. F. Fougere (Norwell, MA , Kluwer, 1990).
1990
Object ive Bayesianism and Geom etry," in Maximum En tropy and Bayesian Method s
Object ive Bayesianism and Geom etry," in Maximum En tropy and Bayesian Method s
1990
Opt imal Br ain Dam age
589 in Ad vances in Neural Inf orm ation Processing Systems 2 (NIPS *89)
1990
Re fine ment of Approximate Domain Theories by Knowledge-b ased Neural Network s
Eighth National Conference on Ar-tificial Intelligence
1990
T he Stone-Weierstrass T heore m and Its Application to Neural Networks
IEEE Transactions on N eural Networks
1990
T he Stone-Weierstrass T heorem and Its Application to Neural Networks
##51
A St atistical Approach t o Learning a nd Gen eralization in Layered Neural Network s
COLT '89: S econd Work shop on Computation al Learning Th eory
1989
A Structural Learning Algorithm with Forget ti ng of Link Weights Technical Report TR-90-7 , E lect rotechnical Lab ora tory
International Joint Conference on N eural N etworks
1989
Bayesian Statistics (New York
Wil ey,
1989
Ge neralization and Network Design St rat egies
Ge neralization and Network Design St rat egies
1989
Ge neralization and Network Design St rategies
Technical Report CRG-T R-89-4, Deptartment of Computer Science, University of Toronto, Toronto, M5S lA4, Can ad a (1989).
1989
Gen eralised Linear Models
Gen eralised Linear Models
1989
P robabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition
F . Fougelman -Sou lie and J . Heraul t , editors, Neuro-computing: Algorithms, Archit ectures and Application s (New York , Springer-Verlag , 1989).
1989
Wi ld , Nonl in ear-R egr-ession
Wi ld , Nonl in ear-R egr-ession
1989
A Unified Fra mewor k for Connectionist Sys t ems
Bi ological Cybernetics
1988
A Unified Fra mewor k for Connectionist Sys tems
Bi ological Cybernetics, 59 (1988) 109-120.
1988
##61
Probabilistic Reasoning in Intelligent Systems (Los Altos, CA, Morga n Kauffman, 1988). Bayesian Back-Propagation
1988
Hierarchical Bayesian An alysis Using Mont e Carlo Integration: Computing Po sterior Distributions When Ther e Are Man y P ossible Models
Th e Statist ician
1987
Hierarchical Bayesian An alysis Using Monte Carlo Integration: Computing Po sterior Distributions When There Are Many Possible Models
Th e Statist ician , 36 (1987) 211-219.
1987
Superv ised Lea rn ing of P robability Distributions by Neural Networ ks
N eural Info rmation Processinq Sys tems (NIP S)
1987
A Fr amework for Comparin g Al ternativ e Formalisms for P lau sible Reasoning
Fifth Nat ional Confe rence on Artificia l Int elligence, P hilade lphia
1986
A Framework for Comparin g Al ternative Formalisms for P lau sible Reasoning
pages 210-214 in Fifth Nat ional Conference on Artificia l Int elligence, P hilade lphia , PA (1986) .
1986
Exp eriment s on Learning by Back-pro paga t ion
Exp eriment s on Learning by Back-pro paga t ion
1986
Stochast ic Complexity , " Jou rna l of the
Royal Statistical Soci ety B
1986
Ad vanced Econom etri cs
Ad vanced Econom etri cs
1985
Estimation of Dependencies Based on Emp iri cal Data
Estimation of Dependencies Based on Emp iri cal Data
1982
##71
Estimation of Dependencies Based on Emp iri cal Data (New York
1982
A Practi cal Bayesian Fr am ework for Backprop Netwo rks
A Practi cal Bayesian Fr am ework for Backprop Netwo rks
Soft Competitive Adaption, (Doctoral dissertation, Carnegie Mellon Univers ity , 1991) ; available as Technica l Report CM U-CS -91-126 from t he Schoo l of Computer Science
Soft Competitive Adaption, (Doctoral dissertation, Carnegie Mellon Univers ity , 1991) ; available as Technica l Report CM U-CS -91-126 from t he Schoo l of Computer Science
参考資料(References)
Data Scientist の基礎(2)
岩波数学辞典 二つの版がCDに入ってお得
岩波数学辞典
アンの部屋(人名から学ぶ数学:岩波数学辞典)英語(24)
<この記事は個人の過去の経験に基づく個人の感想です。現在所属する組織、業務とは関係がありません。>
文書履歴(document history)
ver. 0.01 初稿 20211017
ver. 0.02 ありがとう追記 20230503
最後までおよみいただきありがとうございました。
いいね 💚、フォローをお願いします。
Thank you very much for reading to the last sentence.
Please press the like icon 💚 and follow me for your happy life.