Scaling up Machine Learning: A Deep Dive into Feature Scaling
The Impact of Feature Scaling on Model’s Performance
In this tutorial, you’ll walk through feature scaling, a crucial and challenging step in feature engineering that can make or break your model’s accuracy and efficiency.
Throughout this tutorial, you’ll cover the following:
- What Feature Scaling is, and Why It Matters?
- Methods of Feature Scaling
- Best Practices and Selection Criteria
What is Feature Scaling?
Feature scaling is a data preprocessing technique used in Machine Learning to standardize or normalize the range of independent variables or features of data. Feature scaling normalizes entire features in a similar range.
Why it Matters?
Feature Scaling brings all features in a similar range, to ensure each features contribute equally to training and no feature dominates over the other.
Let's assume, you have a dataset with three features: Salary, Age, and IQ Level
. The Salary
range from 30,000 to 80,000, the Age
range from 20 to 40, while IQ Level
range from 90 to 120. So if you use these features without scaling them, your machine learning algorithm might give importance to the Salary
feature, because it has a larger range and variance than the other two features. This could lead to biased and inaccurate model. So, you have to scale these features for better results.
Upon analyzing this example, you now have a better understanding of the feature scaling and why it is important.
Methods of Feature Scaling:
Moving on let’s discuss few techniques of feature scaling
1. Standardization:
Standardization, also known as z-score Normalization.
This technique transforms the features to have a mean of 0 and a standard deviation of 1. It performs best when the distribution of data is not Gaussian(Normal).
The formula for standardization is:
from sklearn.preprocessing import StandardScaler
#instance for scaler
scaler = StandardScaler()
X_transform = X.fit_transform(X)
Let’s witness the impact of Standard Scaling:
As you can see, the mean of the distribution is very close to 0 and the standard deviation is 1. The outliers in fare
are not affected by the scaling process.
2. Normalization:
This technique transforms the features within a specific range, typically between 0 and 1, while it doesn’t affect the original distribution and relationship of features. Let’s discuss a few normalizing techniques:
ⅰ. Min-Max Scaling:
This method scales the feature between 0 and 1, the minimum value is scaled into 0, the maximum value is scaled into 1, while the rest are converted to decimal values between 0 and 1. It is useful when data doesn't follow a normal distribution.
The formula for Min-Max Scaling is:
# normalizing the features
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
X_transform = X.fit_transform(X)
Let’s visually analyze how it impacts
You have observed that both features have been scaled within a similar range of 0 to 1.
ⅱ. Robust Scaling:
Robust Scaler works well when there are outliers present in the data. Because it uses the median(x͂) and inter-quartile range(Q3 — Q1), it absorbs the impact of outliers while scaling.
The formula for Robust Scaling is:
from sklearn.preprocessing import RobustScaler
scaler = RobustScaler()
X_transformed = scaler.fit_transform(X)
Let’s visualize the changes after scaling:
As you can see, the data is scaled using the median and the interquartile range. The outliers are still visible in the scaled data, but they do not affect the scaling of the other values.
ⅲ. Max-Abs Scaler:
Maximum absolute scaling scales the data to its maximum value; this scaling is ideal when the features contain both positive and negative values. It scales the data to a range of -1 and 1.
Max-Abs Scaler works very well in sparse data when most of the observations are 0.
The formula for MaxAbs Scaler is:
from sklearn.preprocessing import MaxAbsScaler
scaler = MaxAbsScaler()
X_transformed = scaler.fit_transform(X)
You have observed, it scaled the features to be between 0 and 1. Since the same data was used for each scaling technique, the data is not sparse. This technique is effective for sparse data.
Best Practices for Feature Scaling:
Machine Learning revolves around experimentation, the more you get hands-on, the better your understanding of this field becomes. It’s similar to the trial and error method.
In certain cases, the data may be challenging to understand. When it comes to implementing data preprocessing techniques, it's important not to be limited. Instead, you should explore various methods to achieve optimal results. This is the core idea behind trial and error method. The more hands-on experience you gain, the deeper your understanding of this field will be.
Summary
In this tutorial, you learned what scaling is, why it matters, and how to choose the best scaler for your data.