Continue from the last article...
What is Naïve Bayes?
It is named Naïve since it is Naïve. Because it pretends that the appearance of a particular feature in a class is
irrelevant to any other feature's appearance. Let us take an example. Fruit can judge as orange if the fruit colouration is orange, the shape is round, and the diameter is 2.5 inches. All those characteristics independently contribute to the probability, although those characteristics depend on each other.
Naïve Bayes is practical when there are lots of data points. Otherwise, the model can be biased. Even the model is simple; it can outclass some advanced models for the same exercise
Naïve Bayes algorithm
I think the most straightforward way to experience how the algorithm work is to use an example. Cricket is a popular game in some countries. I will use a data set that contains pitch condition and target variable batting first (Many believe if the pitch is grassy, it is good to ball first.). Now we need to classify whether the team chooses batting first based on pitch condition. (Team need to win the coin toss first)
Steps:
- Making of frequency table (of classes).
- Creation of likelihood table of classes falling in a class given a feature value.
- Calculation of posterior probability for each class using the Naïve Bayesian equation.
Data Table
Pitch Condition | Batting First |
---|---|
Dead | Yes |
Grassy | No |
Dusty | Yes |
Dead | Yes |
Grassy | No |
Grassy | Yes |
Dead | Yes |
Dusty | Yes |
Grassy | No |
Dusty | Yes |
Dusty | No |
Dead | Yes |
Grassy | No |
Dead | No |
Dead | No |
Frequency Table
Pitch type | No | Yes |
---|---|---|
Dead | 2 | 4 |
Grassy | 4 | 1 |
Dusty | 1 | 3 |
Total | 7 | 8 |
Likelihood Table
Pitch type | No | Yes | |
---|---|---|---|
Dead | 2 | 4 | = 6/15 = 0.4 |
Grassy | 4 | 1 | = 5/15 = 0.33 |
Dusty | 1 | 3 | = 4/15 = 0.266 |
Total | 7 | 8 | |
= 7/15 = 0.46 | = 8/15 = 0.53 |
Now the Question;
A Team will pick batting first if the pitch is Dusty. Is this comment correct or not?
To solve, let us calculate the posterior probability.
P(Yes, Dusty) = P(Dusty | Yes) * P(Yes) / P(Dusty) = (3/8 * 8/15) / (4/15) = 0.75
So, there is 75% portability batting first if the pitch is Dusty.
Naive Bayes uses a similar process to predict the probability of various classes based on several properties. This algorithm is primarily used in text categorization and with problems having various classes.
Naïve Bayes in Python.
The most popular library that includes the Naive Bayes algorithm is SciKit Learn. There are 3 types of models in the sci-kit learn library.
- Gaussian
- Multinomial
- Bernoulli
We have to decide the model based on the dataset. I am not going to explain one by one here.
Example Code:
from sklearn.naive_bayes import GaussianNB
import numpy as np
# feature and target variables
X = np.array([[8,2],[3,6], [5,1], [2,0], [4,3], [-3,0], [-2,1], [3,1], [-2,4], [5,7], [-1,1]])
y = np.array([1, 2, 3, 3, 2, 1, 1, 2, 3, 3, 2])
model = GaussianNB()
model.fit(X, y)
pred= model.predict([[5,2],[1,4], [-1,3]])
print(pred)
#[2 2 3]
Things to consider:
- Removing correlated features.
- Careful when selecting features and concern about data preprocessing.
- Applying smoothing.
- Convert continuous features to normal distribution.
Let us look advantages and disadvantages of the Naïve Bayes algorithm.
Advantages:
- Easy and fast.
- Worked well in multiclass.
- Good for categorical variable.
Disadvantages:
- Work best if have many data points.
- The training data set must contain all the categorical variables. Otherwise, we have to use smoothing methods.
- Difficult to use in real life situation since algorithm assumes predictors are independent.
*本記事は @qualitia_cdevの中の一人、@nuwanさんが書いてくれました。
*This article is written by @nuwan a member of @qualitia_cdev.