Best Evaluation Metrics for Regression Model

Jiyaudeenmeerasa
5 min readSep 30, 2021

In Machine learning algorithams there is way to evaluate the performance of the model for both Regression as well as classification model.

Now we will have look into the regression performance metrics. The following parameters are used to valiate all the regression models.

1.MAE — Mean absolute error

2.MSE — Mean Squarred error

3.RMSE- Root Mean Squarred error

4.R square

5.Adjusted R square

The above parameters are used to validate the regression model performance when the dependent variable is continous which regression models.

Lets look at the performance metrics one by one with simple example to understand easily.

MAE — Mean absolute error: The famous regression model i.e house price predition. Lets say the difference between Actual and predicted house value which is called absolute error.This is the simple defnition of MAE and the mathametical formula looks like below

Sum of all the errors and divide by the number of observations

This is one of the important parameter to evaluate Regression model but it has the own disadvantage The graph of MAE is not differentiable so we have to apply various optimizers like Gradient descent which can be differentiable. To avoid this issue we have a next performancer metrics i.e Mean squarred Error.

Mean Squarred Error: Mean Squarred error is nothing but squarred of difference between actual and predicted output which will give the Mean Squarred Error value. Advantage of Mean Squarred error would avoid the negative values but it will produce unimaginable value when there is a outlier in the data set. The below mathametical formula will give you better understanding of Mean Squarred Error.

Root Mean Squarred Error:

As RMSE is clear by the name itself, that it is a simple square root of mean squared error.

  • The output value you get is in the same unit as the required output variable which makes interpretation of loss easy.
  • It is not that robust to outliers as compared to MAE.

for performing RMSE we have to NumPy NumPy square root function over MSE.

Most of the time people use RMSE as an evaluation metric and mostly when you are working with deep learning techniques the most preferred metric is RMSE.

R Square:

R2 is used to identify the performance of the Regression problems. Normally R2 is in between 0 to 1.The below given formula is used to identify the R2 which gives us the performance of the model.

R2 = SSR/SST

SSR = Sum of squared of residuals

SST = Total sum of squared (Total sum of variance)

How to identify the SSR and SST?

SSR: Difference between Predicted output — Actual output of the given data set

SST : Difference between Mean of the output — Actual output of the given data set.

R2 is also called the coefficient of determination. Generally R2 lies between 0 to 1 whereas as 0% to 100%

0 _ performance of the model is zero

1 _ Performance of the model is better

What is the problem in R2?

Generally R2 will identify the performance of the model until the data set hasn’t changed without adding new features.

The performance of the model (R2) will increase whenever a new independent variable is added to the existing data which provides the incorrect result.

The R2 will never decrease and its increase always. To overcome the issue Algorithm will introduce Adjusted R2 which overcomes the R2 metrics. many of the people thinks that if we add the

a new feature model will always perform better even though the variable is not significant. Adjusted R2 will consider the importance of all the input variables.

What is Adjusted R2?

Adjusted R2 will overcome the existing issue in R2. Whenever data set has added to the existing data set R2 performance will

always increase without checking whether the newly added feature is significant or not. Now Adjusted R2 will consider R2 and the significance of the input variable.

R2 will always be less than or equal to the R2 value.

Adjusted R2 Value = 1- (1-R²)(N-1)/(N-p-1)

R2 = Sample R2

p- No of predictions

N — Total Sample size

Adjusted R2 can be negative when R2 is equal to zero.

Which is better: R2 or Adjusted R2?

Adjusted R2 is better and should be considered whenever the input variables are selected to predict the model, especially the regression model.

Model 1 uses input variables X1, X2, and X3 to predict Y1.

  • Model 2 uses input variables X1 and X2 to predict Y1.

Which model should be used? Information regarding both models are provided below:

Comparing the R-squared between Model 1 and Model 2, the R-squared predicts that Model 1 is a better model as it carries greater explanatory power (0.5923 in Model 1 vs. 0.5612 in Model 2).

Comparing the R-squared between Model 1 and Model 2, the adjusted R-squared predicts that the input variable X3 contributes to explaining output variable Y1 (0.4231 in Model 1 vs. 0.3512 in Model 2).

As such, Model 1 should be used, as the additional X3 input variable contributes to explaining the output

--

--