technical-knockout.com : Machine Learning

Reference

Huffpost Pollster
http://elections.huffingtonpost.com/pollster

Requests: HTTP for Humans
http://docs.python-requests.org/en/latest/

StringIO and cStringIO – Work with text buffers using file-like API
https://pymotw.com/2/StringIO/

Data Analysis and Visualization: Titanic Project

Reference

Titanic: Machine Learning from Disaster
https://www.kaggle.com/c/titanic-gettingStarted

Color Maps: matplotlib
http://matplotlib.org/users/colormaps.html

Machine Learning A-Z: Part 3 – Classification (Bayes’ Theorem)

Bayes’ Theorem

Machine1:
Spanners: m1, m1, m1 …

Machine2:
Spanners: m2, m2, m2 …

What’s the probability of producing defective spaners?

P(A|B) = (P(B|A) * P(A)) / P(B)

Machine1: 30 wrenches/hr
Machine2: 20 wrenches/hr

Out of all produced parts:
We can SEE that 1% are defective

Out of all defective parts:
We can SEE that 50% came from mach1
And 50 % came from mach2

Question:
What is the probability that a part produced by mach2 is defective=?
-> P(Mach1)=30/50=0.6
-> P(Mach2)=20/50=0.4

-> P(Defect)=1%

-> P(Mach1|Defect)=50%
-> P(Mach2|Defect)=50%

-> P(Defect|Mach2)=?

P(Defect|Mach2)
= (P(Mach2|Defect) * P(Defect)) / P(Mach2)
= (0.5 * 0.01) / 0.4
= 0.0125
= 1.25%

ex)
– 1000 wrenches
– 400 came from Mach2
– 1% have a defect = 10
– of them 50% came from Mach2 = 5
– % defective parts from Mach2 = 5/400 = 1.25%

Obvious question:
If the items are labeled, why couldn’t we just count the number of defective wrenches that came from Mach2 and divide by the total number that came from Mach2?

Quick exercise:
P(Defect|Mach2)
= (0.5 * 0.01) / 0.5
= 0.01
= 1%

Naive Bayes Classifier Intuition

P(A|B) = (P(B|A) * P(A)) / P(B)

P(Walks|X) = (P(X|Walks) * P(Walks)) / P(X)
#4 = (#3 * #1) / #2

#1: Prior Probability
#2: Marginal Likelihood
#3: Likelihood
#4: Posterior Probability

P(Drives|X) = (P(X| Drives) * P(Drives)) / P(X)

P(Walks|X) v P(Drives|X)

#1: P(Walks)
= Number of Walkers / Total Observations
= 10/30

#2: P(X)
= Number of Similar Observations / Total Observations
= 4/30

#3: P(X|Walks)
= Number of Similar Observations Among those who Walk / Total number of Walkers
= 3/10

#4: P(Walks|X)
= (3/10 * 10/30) / (4/30)
= 0.75
= 75%

P(Drives|X)
= (1/20 * 20/30) / (4/30)
= 0.25
= 25%

P(Walks|X) > P(Drives|X)
= 0.75 > 0.25

Naive Bayes Classifier Intuition (Challenge Reveal)

P(Drives|X) = (P(X| Drives) * P(Drives)) / P(X)

#1: P(Drives)
= Number of Drivers / Total Observations
= 20/30

#2: P(X)
= Number of Similar Observations / Total Observations
= 4/30

#3: P(X|Drives)
= Number of Similar Observations Among those who Walk / Total number of Walkers
= 1/20

#4: P(Walks|X)
= (1/20 * 20/30) / (4/30)
= 0.25
= 25%

Reference

https://www.slideshare.net/KojiKosugi/ss-50740386?next_slideshow=3

technical-knockout.com

PyTorch 2 Vector Operations

PyTorch 1 Tensors Crash Course

Data Analysis and Visualization: Election Analysis

Reference

Data Analysis and Visualization: Titanic Project

Reference

Machine Learning A-Z: Part 3 – Classification (Bayes’ Theorem)

Bayes’ Theorem

Naive Bayes Classifier Intuition

Naive Bayes Classifier Intuition (Challenge Reveal)

Reference