Machine Learning A-Z: Part 2 – Regression (Evaluating Regression Models Performance)
R Squared Intuition
Simple Linear Regression
R Squared
SUM (yi – yi^)2 -> min
SSres = SUM (yi – yi^)2
res = residual
SStot = SUM (yi – yavg)2
tot = total
R2 = 1 – SSres / SStot
Adjusted R2 (R Squared)
R2 = 1 – SSres / SStot
y = b0 + b1 * x1
y = b0 + b1 * x1 + b2 * x2
SSres -> Min
R2 – Goodness of fit (greater is better)
Problem:
y = b0 + b1 * x1 + b2 * x2 (+ b3 * x3)
SSres -> Min
R2 will never decrease
R2 = 1 – SSres / SStot
Adj R2 = 1 – (1 – R2) * (n-1) / (n- p – 1)
p – number of regressors
n – sample size
1. Pros and cons of each regression model
https://www.superdatascience.com/wp-content/uploads/2017/02/Regression-Pros-Cons.pdf
2. How do I know which model to choose for my problem ?
1) Figure out whether your problem is linear or non linear.
– linear:
– only one feature: Simple Linear Regression
– several features: Multiple Linear Regression
– non linear:
– Polynomial Regression
– SVR
– Decision Tree
– Random Forest
3. How can I improve each of these models ?
=> In Part 10 – Model Selection
a. The parameters that are learnt, for example the coefficients in Linear Regression.
b. The hyperparameters.
– not learnt
– fixed values inside the model equations.
https://www.superdatascience.com/wp-content/uploads/2017/02/Regularization.pdf