Underfitting

Back to Glossary

What is Underfitting?

Underfitting occurs when a machine learning model is not complex enough to understand the data it is trying to learn from. This often happens when the model has too few parameters or when the data used to train the model is not representative of the problem space. Underfitting results in a model that performs poorly both on the training data and on new, unseen data. This is because the model fails to capture the essential trends and structures in the training data, leading to high bias. In practical terms, an underfit model might struggle to accurately predict outcomes or make decisions because it hasn't learned enough from the provided data. Addressing underfitting typically involves increasing the model's complexity, using more relevant features, or improving the quality of the training data.

Underfitting is a condition in which a machine learning model is too simple to capture the underlying patterns in the data.

Examples

Predicting House Prices: Imagine using a linear regression model to predict house prices based on just one feature, such as square footage. If the model ignores other important factors like location, number of bedrooms, and age of the house, it will likely underfit, leading to inaccurate price predictions.

Classifying Emails: If you use a simple decision tree with very few branches to classify emails as spam or not spam, the model might miss important patterns in the data such as the presence of certain keywords or the email's sender. This would result in many misclassifications.

Additional Information

Underfitting is the opposite of overfitting, where a model is too complex and captures noise in the data.

Techniques to combat underfitting include adding more features, choosing a more complex model, and improving the quality of the training data.