Data Analysis and Visualization: Election Analysis
Reference
Huffpost Pollster
http://elections.huffingtonpost.com/pollster
Requests: HTTP for Humans
http://docs.python-requests.org/en/latest/
StringIO and cStringIO – Work with text buffers using file-like API
https://pymotw.com/2/StringIO/
Data Analysis and Visualization: Titanic Project
Reference
Titanic: Machine Learning from Disaster
https://www.kaggle.com/c/titanic-gettingStarted
Color Maps: matplotlib
http://matplotlib.org/users/colormaps.html
Machine Learning A-Z: Part 3 – Classification (Bayes’ Theorem)
Bayes’ Theorem
Machine1:
Spanners: m1, m1, m1 …
Machine2:
Spanners: m2, m2, m2 …
What’s the probability of producing defective spaners?
P(A|B) = (P(B|A) * P(A)) / P(B)
Machine1: 30 wrenches/hr
Machine2: 20 wrenches/hr
Out of all produced parts:
We can SEE that 1% are defective
Out of all defective parts:
We can SEE that 50% came from mach1
And 50 % came from mach2
Question:
What is the probability that a part produced by mach2 is defective=?
-> P(Mach1)=30/50=0.6
-> P(Mach2)=20/50=0.4
-> P(Defect)=1%
-> P(Mach1|Defect)=50%
-> P(Mach2|Defect)=50%
-> P(Defect|Mach2)=?
P(Defect|Mach2)
= (P(Mach2|Defect) * P(Defect)) / P(Mach2)
= (0.5 * 0.01) / 0.4
= 0.0125
= 1.25%
ex)
– 1000 wrenches
– 400 came from Mach2
– 1% have a defect = 10
– of them 50% came from Mach2 = 5
– % defective parts from Mach2 = 5/400 = 1.25%
Obvious question:
If the items are labeled, why couldn’t we just count the number of defective wrenches that came from Mach2 and divide by the total number that came from Mach2?
Quick exercise:
P(Defect|Mach2)
= (0.5 * 0.01) / 0.5
= 0.01
= 1%
Naive Bayes Classifier Intuition
P(A|B) = (P(B|A) * P(A)) / P(B)
P(Walks|X) = (P(X|Walks) * P(Walks)) / P(X)
#4 = (#3 * #1) / #2
#1: Prior Probability
#2: Marginal Likelihood
#3: Likelihood
#4: Posterior Probability
P(Drives|X) = (P(X| Drives) * P(Drives)) / P(X)
P(Walks|X) v P(Drives|X)
#1: P(Walks)
= Number of Walkers / Total Observations
= 10/30
#2: P(X)
= Number of Similar Observations / Total Observations
= 4/30
#3: P(X|Walks)
= Number of Similar Observations Among those who Walk / Total number of Walkers
= 3/10
#4: P(Walks|X)
= (3/10 * 10/30) / (4/30)
= 0.75
= 75%
P(Drives|X)
= (1/20 * 20/30) / (4/30)
= 0.25
= 25%
P(Walks|X) > P(Drives|X)
= 0.75 > 0.25
Naive Bayes Classifier Intuition (Challenge Reveal)
P(Drives|X) = (P(X| Drives) * P(Drives)) / P(X)
#1: P(Drives)
= Number of Drivers / Total Observations
= 20/30
#2: P(X)
= Number of Similar Observations / Total Observations
= 4/30
#3: P(X|Drives)
= Number of Similar Observations Among those who Walk / Total number of Walkers
= 1/20
#4: P(Walks|X)
= (1/20 * 20/30) / (4/30)
= 0.25
= 25%
Reference
https://www.slideshare.net/KojiKosugi/ss-50740386?next_slideshow=3