What is Overfitting?

Overfitting in mathematics and statistics is one of the most common tasks consisting in attempts to fit a “model” to a set of training data, so as to be able to make reliable predictions on generally untrained data. In overfitting, a statistical model describes random error or noise instead of the underlying relationship. Overfitting occurs when a model is excessively complex, such as having too many parameters relative to the number of observations. A model that has been overfitting has poor predictive performance, as it overreacts to minor fluctuations in the training data. The potential for overfitting depends not only on the number of parameters and data but also the conformability of the model structure with the data shape, and the magnitude of model error compared to the expected level of noise or error in the data. Even when the fitted model does not have an excessive number of parameters, it is to be expected that the fitted relationship will appear to perform less well on a new data set than on the data set used for fitting. In particular, the value of the coefficient of determination will shrink relative to the original training data.