How to Use Naive Bayes to Make Prediction (Demonstration via SQL) ?


The prediction algorithms in the domain of machine learning are all about the probability. The probability is how we represent the likelihood for an event to occur. It is a value between 0 to 1 with 0 meaning NOT going to happen and 1 meaning WILL happen. The numbers closer to 0 represent events less likely to happen while on the opposite, numbers approaching to 1 meaning more likely to happen.

Venn Diagram

The Venn Diagram (see below), is the vivid representation of the probability.

Venn-Diagram How to Use Naive Bayes to Make Prediction (Demonstration via SQL) ? machine learning sql

Venn-Diagram

  • S is the sum of the total events in the domain.
  • P(A): the probability of event A (green)
  • P(B): the probability of event B (blue)
  • tex_f7e0fa785f5809d77a7c69e4eeda4248 How to Use Naive Bayes to Make Prediction (Demonstration via SQL) ? machine learning sql : both A and B occur
  • P(A|B): if B happens, the probability of event A

There is another diagram which may express it: the Probability tree.

probability-tree How to Use Naive Bayes to Make Prediction (Demonstration via SQL) ? machine learning sql

probability-tree

P(A|B) is the likelihood when B already occurs. So it can be also represented by the ratio between probability of both A and B and the event B:

tex_bd81d3e826fbf6bf2ff16ad94686cf89 How to Use Naive Bayes to Make Prediction (Demonstration via SQL) ? machine learning sql

This can also be expressed in the following form, to be integrated in Naive Bayes formula.

tex_7506f3b91bdb2643e9b43e01cfe70566 How to Use Naive Bayes to Make Prediction (Demonstration via SQL) ? machine learning sql

Naive Bayes

Now comes my favorite formula. The Naive Bayes obtains P(B|A) if we know P(A|B), P(A) and P(B). From the above, similarly, we have:

tex_a7d86b99a81ef9a5f0aeb71b3302eed0 How to Use Naive Bayes to Make Prediction (Demonstration via SQL) ? machine learning sql .

From the probability tree (above), we can transform into the following calculations:

Bayes_theorem_visualisation-1024x768 How to Use Naive Bayes to Make Prediction (Demonstration via SQL) ? machine learning sql

Bayes Theorem

What do we have? It is obvious that:

tex_2dca02f0e08e4b47167698dba4e98502 How to Use Naive Bayes to Make Prediction (Demonstration via SQL) ? machine learning sql
tex_21dc93804f7f2ce91094ffeb79e9e474 How to Use Naive Bayes to Make Prediction (Demonstration via SQL) ? machine learning sql

Therefore, we prove that P(B|A) can be computed if we know P(A|B), P(B) and P(B). The Naive Bayes defines such relations:

tex_7b30c8a11667e2240d984c272e9402ab How to Use Naive Bayes to Make Prediction (Demonstration via SQL) ? machine learning sql

where:

  • P(B) is the Prior probability
  • P(A) is also the Prior probability and the normalized constant
  • P(A|B) is the probability of A when B occurs, also known as the likelihood function
  • P(B|A) is the probability of B when A occurs, is the post probability (the value we ask for)
  • P(A|B)/P(A) is the standardised likelihood
naive-bayes How to Use Naive Bayes to Make Prediction (Demonstration via SQL) ? machine learning sql

naive-bayes

with the P(A|B)/P(A), we can further interpret:

  • Larger than 1: the probability of the event is larger than the event that has already occurred.
  • Smaller than 1: the probability of the event is smaller than the event that has already occurred.
  • Equal to 1: the probability of the event is the same as the event that has already occurred.

Case Study via MySQL

We can use Naive Bayes to make predictions based on the given data. For example, given the following MySQL table:

mysql> select * from apple;
+-------+--------+-------+
| color | weight | sweet |
+-------+--------+-------+
|     3 |      4 |     1 |
|     2 |      3 |     1 |
|     0 |      3 |     0 |
|     3 |      2 |     0 |
|     1 |      4 |     0 |
+-------+--------+-------+
5 rows in set (0.00 sec)

sweet=1 means the apple is sweet and 0 otherwise. If we want to predict whether (color = 3, weight = 3) is sweet or not, we can compute each probability (sweet or not) and compare their probability:

P(sweet=1) is the probability of apples being sweet in all instances, we can the following SQL (via nested SQL statements) to obtain such value:

1
2
3
4
5
6
7
mysql> select x/y from (select count(1) as x from apple where sweet=1) as T1, (select count(1) as y from apple) as T2;
+--------+
| x/y    |
+--------+
| 0.4000 |
+--------+
1 row in set (0.00 sec)
mysql> select x/y from (select count(1) as x from apple where sweet=1) as T1, (select count(1) as y from apple) as T2;
+--------+
| x/y    |
+--------+
| 0.4000 |
+--------+
1 row in set (0.00 sec)

P(color=3|sweet) is the probability of sweet apples that have color=3.

1
2
3
4
5
6
7
mysql> select x/y from (select count(1) as x from apple where sweet=1 and weight=3) as T1, (select count(1) as y from apple where weight=3) as T2;
+--------+
| x/y    |
+--------+
| 0.5000 |
+--------+
1 row in set (0.00 sec)
mysql> select x/y from (select count(1) as x from apple where sweet=1 and weight=3) as T1, (select count(1) as y from apple where weight=3) as T2;
+--------+
| x/y    |
+--------+
| 0.5000 |
+--------+
1 row in set (0.00 sec)

Similarly, P(weight=3|sweet) can be computed via SQL:

1
2
3
4
5
6
7
mysql> select x/y from (select count(1) as x from apple where sweet=1 and weight=3) as T1, (select count(1) as y from apple where sweet=1) as T2;
+--------+
| x/y    |
+--------+
| 0.5000 |
+--------+
1 row in set (0.00 sec)
mysql> select x/y from (select count(1) as x from apple where sweet=1 and weight=3) as T1, (select count(1) as y from apple where sweet=1) as T2;
+--------+
| x/y    |
+--------+
| 0.5000 |
+--------+
1 row in set (0.00 sec)

Therefore, the probability of color=3,weight=3 apple being sweet is 0.4*0.5*0.5=0.1. Next, we need to compute the probability of this apple being (NOT sweet):

P(plain) = 0.6

1
2
3
4
5
6
7
mysql> select x/y from (select count(1) as x from apple where sweet=0) as T1, (select count(1) as y from apple) as T2;
+--------+
| x/y    |
+--------+
| 0.6000 |
+--------+
1 row in set (0.00 sec)
mysql> select x/y from (select count(1) as x from apple where sweet=0) as T1, (select count(1) as y from apple) as T2;
+--------+
| x/y    |
+--------+
| 0.6000 |
+--------+
1 row in set (0.00 sec)

P(color=3|plain) = 0.3333

1
2
3
4
5
6
7
mysql> select x/y from (select count(1) as x from apple where sweet=0 and color=3) as T1, (select count(1) as y from apple where sweet=0) as T2;
+--------+
| x/y    |
+--------+
| 0.3333 |
+--------+
1 row in set (0.00 sec)
mysql> select x/y from (select count(1) as x from apple where sweet=0 and color=3) as T1, (select count(1) as y from apple where sweet=0) as T2;
+--------+
| x/y    |
+--------+
| 0.3333 |
+--------+
1 row in set (0.00 sec)


P(weight=3|plain)
= 0.3333

1
2
3
4
5
6
7
mysql> select x/y from (select count(1) as x from apple where sweet=0 and weight=3) as T1, (select count(1) as y from apple where sweet=0) as T2;
+--------+
| x/y    |
+--------+
| 0.3333 |
+--------+
1 row in set (0.00 sec)
mysql> select x/y from (select count(1) as x from apple where sweet=0 and weight=3) as T1, (select count(1) as y from apple where sweet=0) as T2;
+--------+
| x/y    |
+--------+
| 0.3333 |
+--------+
1 row in set (0.00 sec)

Therefore, the probability of this apple being NOT sweet is 0.6*0.3333*0.3333=0.06666 smaller than 0.1, therefore, this apple is likely to be sweet.

–EOF (The Ultimate Computing & Technology Blog) —

GD Star Rating
loading...
1420 words
Last Post: C/C++ Linear Regression Tutorial Using Gradient Descent
Next Post: How to Make SQL Insert Statement Simply Faster?

The Permanent URL is: How to Use Naive Bayes to Make Prediction (Demonstration via SQL) ?

Leave a Reply