What is R2 and Adjusted R2?
R2 is used to identify the performance of the Regression problems. Normally R2 is in between 0 to 1.The below given formula is used to identify the R2 which gives us the performance of the model.
R2 = SSR/SST
SSR = Sum of squared of residuals
SST = Total sum of squared (Total sum of variance)
How to identify the SSR and SST?
SSR: Difference between Predicted output — Actual output of the given data set
SST : Difference between Mean of the output — Actual output of the given data set.
R2 is also called the coefficient of determination. Generally R2 lies between 0 to 1 whereas as 0% to 100%
0 _ performance of the model is zero
1 _ Performance of the model is better
What is the problem in R2?
Generally R2 will identify the performance of the model until the data set hasn’t changed without adding new features.
The performance of the model (R2) will increase whenever a new independent variable is added to the existing data which provides the incorrect result.
The R2 will never decrease and its increase always. To overcome the issue Algorithm will introduce Adjusted R2 which overcomes the R2 metrics. many of the people thinks that if we add the
a new feature model will always perform better even though the variable is not significant. Adjusted R2 will consider the importance of all the input variables.
What is Adjusted R2?
Adjusted R2 will overcome the existing issue in R2. Whenever data set has added to the existing data set R2 performance will
always increase without checking whether the newly added feature is significant or not. Now Adjusted R2 will consider R2 and the significance of the input variable.
R2 will always be less than or equal to the R2 value.
Adjusted R2 Value = 1- (1-R²)(N-1)/(N-p-1)
R2 = Sample R2
p- No of predictions
N — Total Sample size
Adjusted R2 can be negative when R2 is equal to zero.
Which is better: R2 or Adjusted R2?
Adjusted R2 is better and should be considered whenever the input variables are selected to predict the model, especially the regression model.
Model 1 uses input variables X1, X2, and X3 to predict Y1.
- Model 2 uses input variables X1 and X2 to predict Y1.
Which model should be used? Information regarding both models are provided below:
Comparing the R-squared between Model 1 and Model 2, the R-squared predicts that Model 1 is a better model as it carries greater explanatory power (0.5923 in Model 1 vs. 0.5612 in Model 2).
Comparing the R-squared between Model 1 and Model 2, the adjusted R-squared predicts that the input variable X3 contributes to explaining output variable Y1 (0.4231 in Model 1 vs. 0.3512 in Model 2).
As such, Model 1 should be used, as the additional X3 input variable contributes to explaining the output