Machine Learning A-Z: Part 3 – Classification (Logistic Regression)

Linear Regression

– Simple:
y = b0 + b1 * x1

– Multiple:
y = b0 + b1 * x1 + … + bn * xn

Logistic Regression

Sigmoid Function:
p = 1 / (1 + e-y)

ln * (p / (1 – p)) = b0 + b1 * x

y: Actual DV [dependent variable]
p^: Probability [p_hat]
y^: Predicted DV

Implementation

Python

R

Templates

Python

R

Machine Learning A-Z: Part 2 – Regression (Evaluating Regression Models Performance)

R Squared Intuition

Simple Linear Regression

R Squared

SUM (yi – yi^)2 -> min

SSres = SUM (yi – yi^)2
res = residual

SStot = SUM (yi – yavg)2
tot = total

R2 = 1 – SSres / SStot

Adjusted R2 (R Squared)

R2 = 1 – SSres / SStot

y = b0 + b1 * x1

y = b0 + b1 * x1 + b2 * x2

SSres -> Min

R2 – Goodness of fit (greater is better)

Problem:
y = b0 + b1 * x1 + b2 * x2 (+ b3 * x3)

SSres -> Min

R2 will never decrease

R2 = 1 – SSres / SStot

Adj R2 = 1 – (1 – R2) * (n-1) / (n- p – 1)
p – number of regressors
n – sample size

1. Pros and cons of each regression model

https://www.superdatascience.com/wp-content/uploads/2017/02/Regression-Pros-Cons.pdf

2. How do I know which model to choose for my problem ?

1) Figure out whether your problem is linear or non linear.

– linear:
– only one feature: Simple Linear Regression
– several features: Multiple Linear Regression

– non linear:
– Polynomial Regression
– SVR
– Decision Tree
– Random Forest

3. How can I improve each of these models ?

=> In Part 10 – Model Selection

a. The parameters that are learnt, for example the coefficients in Linear Regression.

b. The hyperparameters.
– not learnt
– fixed values inside the model equations.

https://www.superdatascience.com/wp-content/uploads/2017/02/Regularization.pdf

Machine Learning A-Z: Part 2 – Regression (Random Forest Regression)

Random Forest Intuition

Ensemble Learning

STEP 1: Pick at random K data points from the Training set.

STEP 2: Build the Decision Tree associated to these K data points.

STEP 3: Choose the number Ntree of trees you want to build and repeat STEPS 1 & 2.

STEP 4: For a new data point, make each one of your Ntree trees predict the value of Y to for the data point in question, and assign the new data point the average across all of the predicted Y values.

e.g. A wild guessing game using a jar with jellybeans in it.
Calculate the average of many wild guesses.

Random Forest Regression

Python

R

Machine Learning A-Z: Part 2 – Regression (Decision Tree Regression)

Decision Tree Intuition

CART (Classification and Regression Trees)
– Classification Trees
– Regression Trees

Splitting data into segments.
Split 1: X1 < 20
Split 2: X2 < 200
Split 3: X2 < 170
Split 4: X1 < 40

Decision Tree Regression

Python

R

Machine Learning A-Z: Part 2 – Regression (SVR)

Support Vector Regression (SVR)

Python

R

ページトップへ