Naive Bayes is a classifier just like K Nearest Neighbors. The Naive Bayes algorithm applies the popular Bayes Theorem (used to calculate conditional probability) given by the formula:
Bayes formula
Here’s a great explanation to read if you’ve not come across/don’t understand the theorem yet: https://betterexplained.com/articles/anintuitiveandshortexplanationofbayestheorem.
The ML Chops series
 Linear Regression
 K Nearest Neighbors
 Naive Bayes (this article)
 Support Vector Machine
 K Means
Let’s consider the data we used in the last post on KNNs:
1 2 3 4 5 6 
height (ft) weight (kg) sex 6.3 50.2 Male 5.9 79.7 Female 5.1 61.4 Female 5.6 47.1 Male 5.1 59.8 Female 
In Naive Bayes, we calculate the probabilities of an input feature set being in each of the classes in the data and return the class with the highest probability as the predicted output.
Our goal with the data in the table above is to determine whether an individual is Male or Female. Given the height and weight of a person e.g [height, weight] > [5.8, 82.1]
, a Naive Bayes classifier calculates the probabilities of the person being a Male and a Female e.g [(“Male”, 7.92248128417e103), (“Female”, 0.00355626444241)]
then returns the class with the highest probability (in this example “Female”
).
How do we find the probabilities for each class?
You guessed right! Bayes formula.
Let’s put the formula into context for better understanding:
Probability that a person is Male
Substitute Female for Male in the formula and you have the probability that a person is female.
Let’s explain terms in the equation briefly:
 P(Male  height & weight) is the probability that a person is Male given their height and weight (better put: given all the features provided in the data). This is what we’re looking for.
 P(Male) is the probability of selecting a Male person from the data.
 P(height  Male) and P(weight  Male) equate to P(BA) [from the first formula]. P(height  Male) is the probability of getting the height of a person given that they are Male (same for the weight). Essentially we want to find the percentage of Males with the same height as the person we’re classifying. This not feasible with our data however because both height and weight are continuous. Besides, it would be very costly when we have a large amount of training data (we’d have to run through the data every time to count the number of people with same height and weight with the person being classified). Thankfully, we have the Probability Density Function (PDF) to help us with this. We’ll use PDF to determine both P(height  Male) and P(weight  Male) in a bit.
 P(height & weight) or better put P(all features) is the marginal probability. For our classification, it’s not really useful to us because it’s the denominator for all class probabilities. We’re actually interested in finding the class with the highest probability and not the actually probability figure like 0.9 for instance. We might as well not use it since we’re dividing by it in every class probability calculation. It doesn’t change anything. We’ll still get the class with the highest probability.
P(Class)
The probability of selecting a person from a given class is the simplest calculation to perform. From the data table, we can see that there are 5 samples. 2 are Male. Thus P(Male) = ^{2}⁄_{5}. And 3 are Female. Thus P(Female) = ^{3}⁄_{5}.
The PDF
The PDF can be computed using the following formula:
The PDF
Substitute Female with Male and/or weight with height to calculate other PDFs.
Using PDF, we assume:
 Each feature is uncorrelated from the others (i.e height is independent of weight for instance).
 The values of the features (i.e heights, weights) are normally distributed.
These are assumptions and are not completely true most times for a given data set. As such, we’re being “naive” by assuming.
Code
First things first! The data.
For convenience, I’m using 3 arrays:


Next, let’s find P(Class) for Male and Female:


PDFs
We need to find the various means and variances required to compute the PDFs:


Now to the PDF formula in code…
Let’s define a function as we’ll use it severally:


Predict


Output:
1 2 3 
P(Male  height & weight) = 7.92248128417e103 P(Female  height & weight) = 0.00355626444241 class = Female 
Putting everything together, we have:
Don’t forget to check out the ML Chops repo for a more robust and efficient implementation: https://github.com/nicholaskajoh/ML_Chops/tree/master/naivebayes.
If you have any questions, concerns or suggestions, don’t hesitate to comment! 👍