Monday, April 19, 2010

Bias Vs Variance Dilemma

Bias versus Variance dilemma is almost universal in all kinds of data modelling methods. As per Wikipedia, variance of a random variable is the deviation squared of that variable from its expected value. In other words, it is a measure of variation within the values of the variable across all possible values along with their probabilities. The bias of an estimator is the difference between the estimator's expected value and the true value of the parameter being estimated. A very good article on this topic is available here. Without repeating much of the content, I would simply highlight the key points which would make things easier in understanding this topic.

  1. Var(x) = E(x^2) - [E(x)]^2, where E(.) is the expectation value.  This can be rewritten as


    E(x^2) = Var(x) + [E(x)]^2
    If we replace x by e (approximation error of an estimator), we can rewrite above equation as

    E(e^2) = Var(e) + [E(e)]^2
    MSE = Var(e) + Bias^2

    Hence we can see that for a desired mean square error, there is trade-off between the variance and the bias. If one increases, the other decreases.

  2. The complexity of an estimator model is expressed in terms of the number of parameters. The effect of complexity on the bias and the variance of the estimator is explained here. In brief, it can be said that
    1. Low Complexity leads to low variance but large bias.
    2. Highly complex model leads to low bias but large variance.

  3. Large variance implies that the estimator is too sensitive to the data set. Hence the model has a low generalization capability. Excellent performance on design (or training) data, but poor performance on test data. This is the case of overfitting. Large variance is observed in case of complex models with large number of parameters (over-parameterization). Note that because of high complexity, the model fits a  the design data set very well. Hence, the error at individual points (in the data set) is very low and hence the model has a low bias.

  4. Large bias implies that the model is too simple and hence very few data points lie on the regression curve. Hence, the error at individual points (in a given data set) are high leading to large a MSE. But since very few points participate in the model formation, the performance does not differ on different data tests. Hence, it has a low variance. The performance remains same over design and test data sets.

2 comments:

Unknown said...

So a fully trained (or a good model) will have small bias with large variance. Doesn't sound good.

Small bias is understood but variance w.r.to the trained ones (or with the pre-defined set) doesn't give large variance.

Swagat said...

@Srinivas: A good model should have a low bias and a low variance. Usually, it is difficult to achieve very low values for both bias and variance at the same time. Hence, one needs to make a trade-off.

The model should not be too complex (large number of parameters) or too simple (very less number of parameters). The best model lies somewhere in between.

During over-fitting, where a given network has been heavily trained over a pre-defined data set, the variance would be less for this set. However, if you take a new data set (not used in the training), variance will be large. This is not desirable. We want our model to give a reasonable predictions for new data points.

 
pre { margin: 5px 20px; border: 1px dashed #666; padding: 5px; background: #f8f8f8; white-space: pre-wrap; /* css-3 */ white-space: -moz-pre-wrap; /* Mozilla, since 1999 */ white-space: -pre-wrap; /* Opera 4-6 */ white-space: -o-pre-wrap; /* Opera 7 */ word-wrap: break-word; /* Internet Explorer 5.5+ */ }