Machine learning 3

There it was hinted, by means of a simple example, that in order to obtain a good representation of the process being modelled, one needs to estimate the model complexity,
parameters and noise characteristics. In addition, it was mentioned that it is beneficial
to incorporate
a priori knowledge so as to mitigate the ill-conditioned nature of the
learning problem. If we follow these specifications, we can almost assuredly obtain a
model that generalises well.
This chapter will briefly review the classical approaches to learning and generalisation in the neural networks field. Aside from regularisation with noise and committees of estimators, most of the standard methods fall into two broadly overlapping
categories: penalised likelihood and predictive assessment methods. Penalised likelihood methods involve placing a penalty term either on the model dimension or on the
smoothness of the response (Hinton, 1987; Le Cun et al., 1990; Poggio and Girosi,
1990). Predictive assessment strategies, such as the cross-validation, jacknife or bootstrap methods (Ripley, 1996; Stone, 1974; Stone, 1978; Wahba and Wold, 1969),
typically entail dividing the training data set into
distinct subsets. The model is
subsequently trained using
~ M of the subsets and its performance is validated on
the omitted subset. The procedure is repeated for each of the subsets. This predictive assessment is often used to set the penalty parameters in the penalised likelihood
formulations.
These methods tend to lack a general and rigorous framework for incorporating
a
priori
knowledge into the modelling process. Furthermore, they do not provide suitable foundations for the study of generalisation in sequential learning. To surmount
these limitations, the Bayesian learning paradigm will be adopted in this thesis. This
approach will allow us to incorporate
a priori knowledge into the modelling process
and to compute, jointly and within a probabilistic framework, the model parameters,
23

Learning and Generalisation 24
noise characteristics, model structure and regularisation coefficients. It will also allow
us to do this sequentially.