Contents

- 1 Introduction
- 2 Feature Scaling in Machine Learning
- 3 Feature Scaling Techniques
- 3.1 Standardization
- 3.2 Min-Max Normalization
- 3.3 MaxAbs Scaler
- 3.4 Robust-Scaler

- 4 Sklearn Feature Scaling Examples
- 4.1 About Dataset
- 4.2 Importing Necessary Libraries
- 4.3 Loading Dataset
- 4.4 Regression without Feature Scaling
- 4.5 Applying Sklearn StandardScaler
- 4.6 Applying Sklearn MinMaxScaler
- 4.7 Apply RobustScaler in Sklearn
- 4.8 Summary

## Introduction

In this tutorial, we will go through various options of feature scaling in the Sklearn library – StandardScaler, MinMaxScaler, RobustScaler, and MaxAbsScaler. We will understand the formulae of these techniques in brief and then go through practical examples of the implementation of each of them for easy understanding of the beginners.

**Feature Scaling in Machine Learning**

Feature Scaling is used to normalize the data features of our dataset so that all features are brought to a common scale. This is a very important data preprocessing step before building any machine learning model, otherwise, the resulting model will produce underwhelming results.

To understand why feature scaling is necessary let us take an example, suppose you have several independent features like age, employee salary, and height(in feet). Here the possible values of these features lie within the range (21–100 Years), (25,000–1,50,000 INR), and (4.5 – 7 feet) respectively. As you can see each feature has its own range and when these numbers are fed to the model during the training process, the model will not understand the skewness in the data range. This is because it does not understand years, salary, height all it will see are numbers varying across a big range and all this will result in a bad model.

Feature Scaling will help to bring these vastly different ranges of values within the same range. For example, values of years, salary, height can be normalized in the range from (0,1) and thus giving a more quality input to the ML model.

**Also Read –**Why and How to do Feature Scaling in Machine Learning

**Feature Scaling Techniques**

**Standardization**

Standardization is a useful method to scales independent variables so that it has a distribution with 0 mean value and variance equals 1. However, Standard Scaler is not a good option if our datapoints aren’t normally distributed i.e they do not follow Gaussian distribution.

In Sklearn standard scaling is applied using StandardScaler() function of sklearn.preprocessing module.

**Min-Max Normalization**

In Min-Max Normalization, for any given feature, the minimum value of that feature gets transformed to 0 while the maximum value will transform to 1 and all other values are normalized between 0 and 1. This method however has a drawback as it is sensitive to outliers.

In Sklearn Min-Max scaling is applied using MinMaxScaler() function of sklearn.preprocessing module.

**MaxAbs Scaler**

In MaxAbs-Scaler each feature is scaled by using its maximum value. At first, the absolute maximum value of the feature is found and then the feature values are divided with it. Just like MinMaxScaler MaxAbs Scaler are also sensitive to outliers.

In Sklearn MaxAbs-Scaler is applied using MaxAbsScaler() function of sklearn.preprocessing module.

**Robust-Scaler**

Robust-Scaler is calculated by using the interquartile range(IQR), here, IQR is the range between the 1st quartile (25th quantile) and the 3rd quartile (75th quantile). It can handle outlier data points as well.

In Sklearn Robust-Scaler is applied using RobustScaler() function of sklearn.preprocessing module.

**Sklearn Feature Scaling Examples**

In this section, we shall see examples of Sklearn feature scaling techniques of StandardScaler, MinMaxScaler, RobustScaler, and MaxAbsScaler. For this purpose, we will do regression on the housing dataset, and first, see results without feature scaling and then compare the results by applying feature scaling.

**About Dataset**

The dataset is a California housing dataset that contains various features of the house like its location, age, no. of rooms, house value, etc. The problem statement is to predict the house value given other independent feature variables in the dataset. It contains 20433 rows and 9 columns.

**Importing Necessary Libraries**

To start with let us load all the required libraries required for our examples.

In[1]:

import pandas as pdimport numpy as npfrom sklearn.preprocessing import StandardScaler,MinMaxScaler,MaxAbsScaler,RobustScalerfrom sklearn.model_selection import train_test_splitfrom sklearn.neighbors import KNeighborsRegressorfrom sklearn.metrics import accuracy_scorefrom sklearn import preprocessing

**Loading Dataset**

Next, we load the dataset in a data frame and drop the non-numerical feature ocean_proximity. The top 10 rows of the dataset are then observed.

In[2]:

#reading the datasetdf=pd.read_csv(r"C:\Users\Veer Kumar\Downloads\MLK internship\FeatureScaling\housing.csv")df.drop(['ocean_proximity'],axis=1,inplace=True)df.head(10)

Out[2]:

longitude | latitude | housing_median_age | total_rooms | total_bedrooms | population | households | median_income | median_house_value | |
---|---|---|---|---|---|---|---|---|---|

0 | -122.23 | 37.88 | 41.0 | 880.0 | 129.0 | 322.0 | 126.0 | 8.3252 | 452600.0 |

1 | -122.22 | 37.86 | 21.0 | 7099.0 | 1106.0 | 2401.0 | 1138.0 | 8.3014 | 358500.0 |

2 | -122.24 | 37.85 | 52.0 | 1467.0 | 190.0 | 496.0 | 177.0 | 7.2574 | 352100.0 |

3 | -122.25 | 37.85 | 52.0 | 1274.0 | 235.0 | 558.0 | 219.0 | 5.6431 | 341300.0 |

4 | -122.25 | 37.85 | 52.0 | 1627.0 | 280.0 | 565.0 | 259.0 | 3.8462 | 342200.0 |

5 | -122.25 | 37.85 | 52.0 | 919.0 | 213.0 | 413.0 | 193.0 | 4.0368 | 269700.0 |

6 | -122.25 | 37.84 | 52.0 | 2535.0 | 489.0 | 1094.0 | 514.0 | 3.6591 | 299200.0 |

7 | -122.25 | 37.84 | 52.0 | 3104.0 | 687.0 | 1157.0 | 647.0 | 3.1200 | 241400.0 |

8 | -122.26 | 37.84 | 42.0 | 2555.0 | 665.0 | 1206.0 | 595.0 | 2.0804 | 226700.0 |

9 | -122.25 | 37.84 | 52.0 | 3549.0 | 707.0 | 1551.0 | 714.0 | 3.6912 | 261100.0 |

**Regression without Feature Scaling**

Let us first create the regression model with KNN without applying feature scaling. It can be seen that the accuracy of the regression model is mere 24% without feature scaling.

In [3]:

# Train Test SplitX=df.iloc[:,:-1]y=df.iloc[:,[7]]X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=0)# Creating Regression Modelclf = KNeighborsRegressor()clf.fit(X_train, y_train)# Accuracy on Tesing Dataclf.predict(X_test)score=clf.score(X_test,y_test)print("Accuracy for our testing dataset without Feature scaling is : {:.3f}%".format(score*100) )

Out[3]:

Accuracy for our testing dataset without Feature scaling is : 24.722%

**Applying Sklearn StandardScaler**

Let us now create the regression model by applying the standard scaler during data preprocessing.

First, the dataset is split into train and test. Then a StandardScaler object is created using which the training dataset is fit and transformed and with the same object, the test dataset is also transformed.

In [4]:

# Train Test SplitX=df.iloc[:,:-1]y=df.iloc[:,[7]]X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=0)#Creating StandardScaler Objectscaler = preprocessing.StandardScaler()X_train = scaler.fit_transform(X_train)X_test = scaler.transform(X_test)#Seeing the scaled values of X_trainX_train.head()

Out[4]:

0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | |
---|---|---|---|---|---|---|---|---|

776 | -1.277402 | 0.948735 | -0.765048 | -0.812050 | -0.880144 | -0.898309 | -0.827732 | 0.249804 |

1969 | -0.653503 | 1.379387 | -1.638653 | 0.357171 | 0.156937 | 0.245374 | 0.323227 | 0.111380 |

20018 | 0.150079 | -0.633443 | 1.855769 | -0.181906 | -0.279729 | -0.427070 | -0.267947 | 0.133652 |

8548 | 0.579322 | -0.820683 | 0.426233 | -0.533359 | -0.545525 | -0.818004 | -0.615851 | 1.691745 |

9847 | -1.397190 | 1.262362 | 0.029139 | -0.351683 | -0.585869 | -0.583267 | -0.563534 | 0.502220 |

Now that the standard scaler is applied, let us now train the regression model and check its accuracy. It can be seen that the accuracy of the model is now an impressive 98.419%

In [5]:

# Creating Regression Modelmodel=KNeighborsRegressor()model.fit(X_train,y_train)# Accuracy on Tesing Datay_test_hat=model.predict(X_test)score=model.score(X_test,y_test)print("Accuracy for our testing dataset using Standard Scaler is : {:.3f}%".format(score*100) )

Out[5]:

Accuracy for our testing dataset using Standard Scaler is : 98.419%

**Applying Sklearn MinMaxScaler**

Just like earlier, a MinMaxScaler object is created using which the training dataset is fit and transformed and with the same object, the test dataset is transformed.

In [6]:

# Train Test SplitX=df.iloc[:,:-1]y=df.iloc[:,[7]]X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=0)#Creating MinMax Objectmm = preprocessing.MinMaxScaler()X_train = mm.fit_transform(X_train)X_test = mm.transform(X_test)#Seeing the scaled values of X_trainX_train.head()

Out[6]:

0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |
---|---|---|---|---|---|---|---|---|---|

0 | 0.211155 | 0.567481 | 0.784314 | 0.022331 | 0.019863 | 0.008941 | 0.020556 | 0.539668 | 0.902266 |

1 | 0.212151 | 0.565356 | 0.392157 | 0.180503 | 0.171477 | 0.067210 | 0.186976 | 0.538027 | 0.708247 |

2 | 0.210159 | 0.564293 | 1.000000 | 0.037260 | 0.029330 | 0.013818 | 0.028943 | 0.466028 | 0.695051 |

3 | 0.209163 | 0.564293 | 1.000000 | 0.032352 | 0.036313 | 0.015555 | 0.035849 | 0.354699 | 0.672783 |

4 | 0.209163 | 0.564293 | 1.000000 | 0.041330 | 0.043296 | 0.015752 | 0.042427 | 0.230776 | 0.674638 |

Now this scaled data is used for creating the regression model and again it can be seen that the accuracy of the model is quite good at 98.55%

In [7]:

# Creating Regression Modelmodel=KNeighborsRegressor() model.fit(X_train,y_train)# Accuracy on Tesing Datay_test_hat=model.predict(X_test) score=model.score(X_test,y_test)print("Accuracy for our testing dataset using MinMax Scaler is : {:.3f}%".format(score*100) )

Out [7]:

Accuracy for our testing dataset using MinMax Scaler is : 98.559%

**Applying MaxAbsScaler in Sklearn**

Create a MaxAbsScaler object followed by applying the fit_transform method on the training dataset and then transform the test dataset with the same object.

In [8]:

# Train Test SplitX=df.iloc[:,:-1]y=df.iloc[:,[7]]X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=0)#Creating MaxAbsScaler Objectmab=MaxAbsScaler()X_train = mab.fit_transform(X_train)X_test = mab.transform(X_test)

Next, we create the KNN regression model using the scaled data and it can be seen that the test accuracy is 99.38%

In [9]:

# Creating Regression Modelmodel=KNeighborsRegressor() model.fit(X_train,y_train)# Accuracy on Tesing Datay_test_hat=model.predict(X_test) score=model.score(X_test,y_test)print("Accuracy for our testing dataset using MinMax Scaler is : {:.3f}%".format(score*100) )

Out[9]:

Accuracy for our testing dataset using MaxAbs Scaler is : 99.382%

**Apply RobustScaler in Sklearn**

Create a RobustScaler object followed by applying the fit_transform method on the training dataset and then transform the test dataset with the same object.

In [10]:

# Train Test SplitX=df.iloc[:,:-1]y=df.iloc[:,[7]]X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=0)#Creating RobustScaler Objectrob =RobustScaler()X_train = rob.fit_transform(X_train)X_test = rob.transform(X_test)

Finally, we create the regression model and test the accuracy which turns out to be 98.295%

In [11]:

# Creating Regression Modelmodel=KNeighborsRegressor() model.fit(X_train,y_train)# Accuracy on Tesing Datay_test_hat=model.predict(X_test) score=model.score(X_test,y_test)print("Accuracy for our testing dataset using MinMax Scaler is : {:.3f}%".format(score*100) )

Out[11]:

Accuracy for our testing dataset using Robust Scaler is : 98.295%

**Summary**

From the below observation, it is quite evident that feature scaling is a very important step of data preprocessing before creating the ML model. Without feature scaling the accuracy was very poor and after different feature scaling techniques were applied the test accuracy became above 98%.

Type of Scaling | Test_Accuracy |
---|---|

No Feature Scaling | 24.722% |

StandardScaler | 98.419% |

MinMaxScaler | 98.559% |

MaxAbsScaler | 99.382% |

RobustScaler | 98.295% |

## FAQs

### Which is better StandardScaler or MinMaxScaler? ›

StandardScaler follows Standard Normal Distribution (SND). Therefore, it makes mean = 0 and scales the data to unit variance. MinMaxScaler scales all the data features in the range [0, 1] or else in the range [-1, 1] if there are negative values in the dataset.

### What is feature scaling Sklearn? ›

Feature Scaling or Standardization: It is **a step of Data Pre Processing that is applied to independent variables or features of data**. It basically helps to normalize the data within a particular range. Sometimes, it also helps in speeding up the calculations in an algorithm. Package Used: sklearn.preprocessing.

### What is StandardScaler in Sklearn preprocessing? ›

Python sklearn StandardScaler() function

Python sklearn library offers us with StandardScaler() function to **standardize the data values into a standard format**. Syntax: object = StandardScaler() object. fit_transform(data) According to the above syntax, we initially create an object of the StandardScaler() function.

### What is the function of Robustscaler? ›

Scale features using statistics that are robust to outliers. This Scaler **removes the median and scales the data according to the quantile range** (defaults to IQR: Interquartile Range). The IQR is the range between the 1st quartile (25th quantile) and the 3rd quartile (75th quantile).

### What is the purpose of MinMaxScaler? ›

**Transform features by scaling each feature to a given range**. This estimator scales and translates each feature individually such that it is in the given range on the training set, e.g. between zero and one.

### What is the use of StandardScaler in machine learning? ›

In Machine Learning, StandardScaler is used **to resize the distribution of values** so that the mean of the observed values is 0 and the standard deviation is 1.

### Why do we use StandardScaler in Python? ›

StandardScaler **removes the mean and scales each feature/variable to unit variance**. This operation is performed feature-wise in an independent way. StandardScaler can be influenced by outliers (if they exist in the dataset) since it involves the estimation of the empirical mean and standard deviation of each feature.

### Which scaler is best to use? ›

**Robust Scaler**- Robust scaler is one of the best-suited scalers for outlier data sets. It scales the data according to the interquartile range. The interquartile range is the middle range where most of the data points exist.

### What is MinMaxscaler in Python? ›

The MinMaxscaler is **a type of scaler that scales the minimum and maximum values to be 0 and 1 respectively**. While the StandardScaler scales all values between min and max so that they fall within a range from min to max.

### Why is scaling used in machine learning? ›

Scaling the target value is a good idea in regression modelling; scaling of the data **makes it easy for a model to learn and understand the problem**. Scaling of the data comes under the set of steps of data pre-processing when we are performing machine learning algorithms in the data set.

### How do you use MinMaxScaler on a data frame? ›

**How to scale Pandas DataFrame columns with the scikit-learn MinMaxScaler in Python**

- df = pd. DataFrame({
- "A" : [0, 1, 2, 3, 4],
- "B" : [25, 50, 75, 100, 125]})
- min_max_scaler = MinMaxScaler()
- print(df)
- df[["A", "B"]] = min_max_scaler. fit_transform(df[["A", "B"]])
- print(df)

### What is data scaling in machine learning? ›

It is performed during the data pre-processing to handle highly varying magnitudes or values or units. If feature scaling is not done, then a machine learning algorithm tends to weigh greater values, higher and consider smaller values as the lower values, regardless of the unit of the values.

### Which scaler is best for outliers? ›

Robust Scaler algorithms scale features that are robust to outliers. The method it follows is almost similar to the MinMax Scaler but it uses the interquartile range (rather than the min-max used in MinMax Scaler).

### Should I remove outliers before scaling? ›

Removal of outliers creates a normal distribution in some of my variables, and makes transformations for the other variables more effective. Therefore, it seems that **removal of outliers before transformation is the better option**.

### What is MaxAbsScaler? ›

MaxAbsScaler(*, copy=True)[source] **Scale each feature by its maximum absolute value**. This estimator scales and translates each feature individually such that the maximal absolute value of each feature in the training set will be 1.0. It does not shift/center the data, and thus does not destroy any sparsity.

### Why is feature scaling important? ›

Feature scaling through standardization (or Z-score normalization) **can be an important preprocessing step for many machine learning algorithms**. Standardization involves rescaling the features such that they have the properties of a standard normal distribution with a mean of zero and a standard deviation of one.

### What is the importance of scaling? ›

To scale means **you are able to take on the increased workload in a cost-effective manner and meet the demands of your business without suffering or overstretching**. It's about getting a comfortable handle on the increased workload, customers or users and then delivering.

### Does scaling remove outliers? ›

By scaling data according to the quantile range rather than the standard deviation, **it reduces the range of your features while keeping the outliers in**.

### What is standard scaler sklearn? ›

StandardScaler is **the industry's go-to algorithm**. 🙂 StandardScaler standardizes a feature by subtracting the mean and then scaling to unit variance. Unit variance means dividing all the values by the standard deviation. StandardScaler does not meet the strict definition of scale I introduced earlier.

### How do you normalize data with different scales? ›

**Three obvious approaches are:**

- Standardizing the variables (subtract mean and divide by stddev ). ...
- Re-scaling variables to the range [0,1] by subtracting min(variable) and dividing by max(variable) . ...
- Equalize the means by dividing each value by mean(variable) .

### Why do we standardize data in machine learning? ›

Data standardization is the process of rescaling the attributes so that they have mean as 0 and variance as 1. The ultimate goal to perform standardization is **to bring down all the features to a common scale without distorting the differences in the range of the values**.

### Can we use StandardScaler on categorical features? ›

The continuous variables need to be scaled, but at the same time, a couple of categorical variables are also of integer type. **Applying StandardScaler would result in undesired effects**. On the flip side, the StandardScaler would scale the integer based categorical variables, which is also not what we want.

### What is the difference between normalization and standardization? ›

...

Difference between Normalization and Standardization.

S.NO. | Normalization | Standardization |
---|---|---|

8. | It is a often called as Scaling Normalization | It is a often called as Z-Score Normalization. |

### How do you use standard scaling in Python? ›

To apply standard scaling with Python, you can **use the StandardScaler class from the sklearn.** **preprocessing module**. You need to call the fit_transform() method from the StandardScaler class and pass it your Pandas Dataframe containing the features you want scaled.

### Is MinMaxScaler sensitive to outliers? ›

**Both StandardScaler and MinMaxScaler are very sensitive to the presence of outliers**.

### What are the types of scaling techniques? ›

The comparative scales can further be divided into the following four types of scaling techniques: **(a) Paired Comparison Scale, (b) Rank Order Scale, (c) Constant Sum Scale, and (d) Q-sort Scale**.

### Should I normalize or standardize data? ›

**If you see a bell-curve in your data then standardization is more preferable**. For this, you will have to plot your data. If your dataset has extremely high or low values (outliers) then standardization is more preferred because usually, normalization will compress these values into a small range.

### When should I use MIN-MAX scaler? ›

**CONCLUSION**

- Use MinMaxScaler() if you are transforming a feature, its non distorting.
- Use RobustScaler() if you have outliers, this scaler will reduce the effect the influece of outliers.
- Use StandardScaler for relatively Normal Distribution.

### How do you normalize using MinMaxScaler? ›

...

**We can then normalize any value, like 18.8, as follows:**

- y = (x – min) / (max – min)
- y = (18.8 – (-10)) / (30 – (-10))
- y = 28.8 / 40.
- y = 0.72.

### How do you normalize data in machine learning? ›

Normalization techniques in Machine Learning

The most widely used types of normalization in machine learning are: **Min-Max Scaling – Subtract the minimum value from each column's highest value and divide by the range**. Each new column has a minimum value of 0 and a maximum value of 1.

### Do all machine learning algorithms require feature scaling? ›

The machine learning algorithms that **do not require feature scaling** is mostly non-linear ML algorithms such as Decision trees, Random Forest, AdaBoost, Naïve Bayes, etc.

### Do I need feature scaling? ›

The two reasons that support the need for scaling are:

**Scaling the features makes the flow of gradient descent smooth and helps algorithms quickly reach the minima of the cost function**. Without scaling features, the algorithm may be biased toward the feature which has values higher in magnitude.

### What are the two standard feature scaling techniques? ›

The most common techniques of feature scaling are **Normalization and Standardization**.

### Should I normalize data for linear regression? ›

All the linear models but **linear regression actually require normalization**. Lasso, Ridge and Elastic Net regressions are powerful models, but they require normalization because the penalty coefficients are the same for all the variables.

### What is MIN-MAX scaler formula? ›

Min-max scaling is similar to z-score normalization in that it will replace every value in a column with a new value using a formula. In this case, that formula is: **m = (x -x _{min}) / (x_{max} -x_{min})**

### How do you use MIN-MAX normalization in Python? ›

The min-max approach (often called normalization) rescales the feature to a hard and fast range of [0,1] by subtracting the minimum value of the feature then dividing by the range. We can apply the min-max scaling in Pandas **using the .** **min() and .** **max() methods**.

### Is feature scaling necessary for linear regression? ›

We need to perform Feature Scaling when we are dealing with Gradient Descent Based algorithms (Linear and Logistic Regression, Neural Network) and Distance-based algorithms (KNN, K-means, SVM) as these are very sensitive to the range of the data points.

### What is the maximum value for feature scaling? ›

All the features now have a minimum value of 0 and a maximum value of **1**.

### What is difference between feature scaling and feature selection? ›

In my knowledge: **Feature scaling is to standardize and normalize data.** **Feature selection is to optimize for best features**. In python, feature scaling is enough to get good accuracy %. But in MATLAB, Normalization concept in feature scaling is required with optimization to get good accuracy %.

### When should I use MIN MAX scaler? ›

**CONCLUSION**

- Use MinMaxScaler() if you are transforming a feature, its non distorting.
- Use RobustScaler() if you have outliers, this scaler will reduce the effect the influece of outliers.
- Use StandardScaler for relatively Normal Distribution.

### Is MinMaxScaler standardized? ›

Unlike Normalization, standardization maintains useful information about outliers and makes the algorithm less sensitive to them in contrast to min-max scaling, which scales the data to a limited range of values.

### Should I normalize or standardize data? ›

**If you see a bell-curve in your data then standardization is more preferable**. For this, you will have to plot your data. If your dataset has extremely high or low values (outliers) then standardization is more preferred because usually, normalization will compress these values into a small range.

### Which scaling technique should be used if the data has outliers? ›

One approach to standardizing input variables in the presence of outliers is to ignore the outliers from the calculation of the mean and standard deviation, then use the calculated values to scale the variable. This is called **robust standardization or robust data scaling**.

### Is scaling necessary in machine learning? ›

**Scaling the target value is a good idea in regression modelling**; scaling of the data makes it easy for a model to learn and understand the problem. Scaling of the data comes under the set of steps of data pre-processing when we are performing machine learning algorithms in the data set.

### How do you normalize using MinMaxScaler? ›

...

**We can then normalize any value, like 18.8, as follows:**

- y = (x – min) / (max – min)
- y = (18.8 – (-10)) / (30 – (-10))
- y = 28.8 / 40.
- y = 0.72.

### Does scaling remove outliers? ›

By scaling data according to the quantile range rather than the standard deviation, **it reduces the range of your features while keeping the outliers in**.

### How do you calculate min/max scaling? ›

A Min-Max scaling is typically done via the following equation: **Xsc=X−XminXmax−Xmin**.

### How do you normalize data using MIN-MAX in Python? ›

The min-max approach (often called normalization) rescales the feature to a hard and fast range of [0,1] by subtracting the minimum value of the feature then dividing by the range. We can apply the min-max scaling in Pandas **using the .** **min() and .** **max() methods**.

### Which normalization is best? ›

Normalization Technique | Formula | When to Use |
---|---|---|

Clipping | if x > max, then x' = max. if x < min, then x' = min | When the feature contains some extreme outliers. |

Log Scaling | x' = log(x) | When the feature conforms to the power law. |

Z-score | x' = (x - μ) / σ | When the feature distribution does not contain extreme outliers. |

### What is the difference between normalization and scaling? ›

The difference is that: **in scaling, you're changing the range of your data, while**. **in normalization, you're changing the shape of the distribution of your data**.

### What is the best way to normalize data? ›

**How to use the normalization formula**

- Calculate the range of the data set. ...
- Subtract the minimum x value from the value of this data point. ...
- Insert these values into the formula and divide. ...
- Repeat with additional data points.

### Should I remove outliers before scaling? ›

Removal of outliers creates a normal distribution in some of my variables, and makes transformations for the other variables more effective. Therefore, it seems that **removal of outliers before transformation is the better option**.

### Is feature scaling necessary for linear regression? ›

We need to perform Feature Scaling when we are dealing with Gradient Descent Based algorithms (Linear and Logistic Regression, Neural Network) and Distance-based algorithms (KNN, K-means, SVM) as these are very sensitive to the range of the data points.

### What are the two standard feature scaling techniques? ›

The most common techniques of feature scaling are **Normalization and Standardization**.