🏆 Bias – Variance Tradeoff
‘As a Data Scientist – should I be a specialist or generalist? After all, data science is an ocean!’
As someone who was in his first semester pursuing his Master’s in Analytics degree, this is the question I had in my mind after the professors’ introduced a plethora of new terminologies to me in every class, trying to find what I should focus on.
The answer that I have figured out over the course of these eight long months is – you need to hit the ‘sweet spot’ and be both!
Interestingly, the Bias-Variance trade-off has the same principle i.e. your predictive model should hit a sweet spot between being too specific to your data and being too general.
Let’s start with defining the two terms:
Bias – how much the average model overall training sets differs from the desired ‘true’ model i.e. ability of an algorithm to accurately model the problem. High Bias leads to a model with poor predictive power. This leads to a problem called ‘Underfitting’.
Variance – how much the models estimated on a different training data differ from each other i.e. different accuracies on different training data. A High Variance leads to a problem called ‘Overfitting’.
The main goal of your predictive model is to minimize the expected error on test data (unseen data). To do this, an ideal model should have low bias and low variance.
Error = Bias^2 + Variance + Noise (irreducible)
Problem – As model complexity increases, although the bias decreases, it becomes too specific to the data it is being trained upon (overfit) and thus the variance increases (Low training error and high test error may indicate high variance).
Tradeoff – Lower Bias models have high variance and vice – versa.
- Use more data.
- Choose sampling strategy carefully and understand how it is sampled.
- Use Cross – Validation techniques
- Use Regularization (Penalizes a highly complex model)
– Archit Shorey