technical-knockout.com : Machine Learning

Machine Learning A-Z: Part 3 – Classification (Kernel SVM Intuitioin)

Kernel SVM Intuitioin

Data Type:
– Linearly Separable
– Not Linearly Separable => Kernel SVM

A Higher-Dimensional Space

Mapping to a higher dimension.

[1D Space] (x₁)
f = x -5

[2D Space] (x₁, x₂)
f = (x -5)^2

[3D Space] (x₁, x₂, z)
=> can be highly compute-intensive.

The Gaussian RBF Kernel

Types of Kernel Functions

– Gaussian RBF Kernel
K(x^→,l^→i) = e -(||x^→-l^→i||²) / 2σ²

– Sigmoid Kernel
K(X,Y) = tanh(γ・X^TY + r)

– Polynomial Kernel
K(X,Y) = (γ・X^TY + r)^d,γ>0

http://mlkernels.readthedocs.io/en/latest/kernels.html
(http://mlkernels.readthedocs.io/en/latest/kernelfunctions.html)

Implementation

Python

R

Machine Learning A-Z: Part 3 – Classification (SVM Intuitioin)

Support Vector Machine (SVM)

classification: how to classify added new data point.

Maximum Margin: a margin which has maximized space between support vectors.
Support Vectors: vectors which decides the maximum margin.

Maximum Margin Hyperplane (Maximum Margin Classifier)
Positive Hyperplane
Negative Hyperplane

Implementation

Python

R

Machine Learning A-Z: Part 3 – Classification (K-Nearest Neighbors)

K-NN Intuition

How do we classify a new data point between category 1 and 2?
K-NN identifies which category the new data point should be in.

STEP 1: Choose the number K of neighbors

STEP 2: Take the K nearest neighbors of the new data point, according to the Euclidean distance

STEP 3: Among these K neighbors, count the number of data points in each category

STEP 4: Assign the new data point to the category where you counted the most neighbors

Euclidean Distance

2 points:
P₁(x₁, y₁)
P₂(x₂, y₂)

Euclidean Distance between P₁ and P₂ = √((x₂ – x₁)² + (y₂ – y₁)²)

Implementation

Python

R

Machine Learning A-Z: Part 3 – Classification (Logistic Regression)

Linear Regression

– Simple:
y = b₀ + b₁ * x₁

– Multiple:
y = b₀ + b₁ * x₁ + … + b_n * x_n

Logistic Regression

Sigmoid Function:
p = 1 / (1 + e^-y)

ln * (p / (1 – p)) = b₀ + b₁ * x

y: Actual DV [dependent variable]
p^: Probability [p_hat]
y^: Predicted DV

Implementation

Python

R

Templates

Python

R

Machine Learning A-Z: Part 2 – Regression (Evaluating Regression Models Performance)

R Squared Intuition

Simple Linear Regression

R Squared

SUM (y_i – y_i^)² -> min

SS_res = SUM (y_i – y_i^)²
res = residual

SS_tot = SUM (y_i – y_avg)²
tot = total

R² = 1 – SS_res / SS_tot

Adjusted R² (R Squared)

R² = 1 – SS_res / SS_tot

y = b₀ + b₁ * x₁

y = b₀ + b₁ * x₁ + b₂ * x₂

SS_res -> Min

R² – Goodness of fit (greater is better)

Problem:
y = b₀ + b₁ * x₁ + b₂ * x₂ (+ b₃ * x₃)

SS_res -> Min

R² will never decrease

R² = 1 – SS_res / SS_tot

Adj R² = 1 – (1 – R²) * (n-1) / (n- p – 1)
p – number of regressors
n – sample size

1. Pros and cons of each regression model

https://www.superdatascience.com/wp-content/uploads/2017/02/Regression-Pros-Cons.pdf

2. How do I know which model to choose for my problem ?

1) Figure out whether your problem is linear or non linear.

– linear:
– only one feature: Simple Linear Regression
– several features: Multiple Linear Regression

– non linear:
– Polynomial Regression
– SVR
– Decision Tree
– Random Forest

3. How can I improve each of these models ?

=> In Part 10 – Model Selection

a. The parameters that are learnt, for example the coefficients in Linear Regression.

b. The hyperparameters.
– not learnt
– fixed values inside the model equations.

https://www.superdatascience.com/wp-content/uploads/2017/02/Regularization.pdf

technical-knockout.com

Machine Learning A-Z: Part 3 – Classification (Kernel SVM Intuitioin)

Kernel SVM Intuitioin

A Higher-Dimensional Space

The Gaussian RBF Kernel

Implementation

Machine Learning A-Z: Part 3 – Classification (SVM Intuitioin)

Support Vector Machine (SVM)

Implementation

Machine Learning A-Z: Part 3 – Classification (K-Nearest Neighbors)

K-NN Intuition

Euclidean Distance

Implementation

Machine Learning A-Z: Part 3 – Classification (Logistic Regression)

Linear Regression

Logistic Regression

Implementation

Templates

Machine Learning A-Z: Part 2 – Regression (Evaluating Regression Models Performance)

R Squared Intuition

R Squared

Adjusted R2 (R Squared)

1. Pros and cons of each regression model

2. How do I know which model to choose for my problem ?

3. How can I improve each of these models ?

Adjusted R² (R Squared)