The K Nearest Neighbor Algorithm (Prediction) Demonstration by MySQL

The K Nearest Neighbor (KNN) Algorithm is well known by its simplicity and robustness in the domain of data mining and machine learning. It is actually a method based on the statistics. It can be easily described as the following diagram.

K Nearest Neighbor Algorithm

Known blue squares and red triangles, you are asked to predict if the green circle belongs to the squares or the triangles. If we choose K=3, then the minimal distances (closest) to the green circle are 2 triangles and 1 square, therefore, the circle belongs to the triangles. However, if we choose K=5, then we have 3 squares and 2 triangles, which will vote the cirlce to the squares group.

Using KNN as Prediction Algorithm Demonstration by MySQL

By the similar principle, KNN can be used to make predictions by averaging (or with weights by distance) the closest candidates. For example, if we have the following data (MySQL table test1):

mysql> select * from test1;
+------+------+
| x    | y    |
+------+------+
|    1 |   23 |
|  1.2 |   17 |
|  3.2 |   12 |
|    4 |   27 |
|  5.1 |    8 |
+------+------+
5 rows in set (0.04 sec)

mysql> select * from test1;
+------+------+
| x    | y    |
+------+------+
|    1 |   23 |
|  1.2 |   17 |
|  3.2 |   12 |
|    4 |   27 |
|  5.1 |    8 |
+------+------+
5 rows in set (0.04 sec)

We want to predict when x=6.5, the value of y. First of all, for simplicity, we choose K=2, we need to find two points that have the shortest distance to 6.5, so the SQL will be:

mysql> select x,y from test1 order by abs(6.5-x) limit 2;
+------+------+
| x    | y    |
+------+------+
|  5.1 |    8 |
|    4 |   27 |
+------+------+
2 rows in set (0.00 sec)

mysql> select x,y from test1 order by abs(6.5-x) limit 2;
+------+------+
| x    | y    |
+------+------+
|  5.1 |    8 |
|    4 |   27 |
+------+------+
2 rows in set (0.00 sec)

Now, the task is just to average their corresponding y values. So using a nested SQL statement is trivial:

mysql> select avg(y) as predicted from (select y from test1 order by abs(6.5-x) limit 2) as KNN;
+-----------+
| predicted |
+-----------+
|      17.5 |
+-----------+
1 row in set (0.00 sec)

mysql> select avg(y) as predicted from (select y from test1 order by abs(6.5-x) limit 2) as KNN;
+-----------+
| predicted |
+-----------+
|      17.5 |
+-----------+
1 row in set (0.00 sec)

KNN is lazy, meaning that it does not train dataset immediately (actually there is no training). When the input is given to the KNN model, it will look for K nearest neighbours at real time, which is slow and inefficient if the dataset is large.

Algorithms, Blockchain and Cloud

The K Nearest Neighbor Algorithm (Prediction) Demonstration by MySQL

Using KNN as Prediction Algorithm Demonstration by MySQL

Other KNN Tutorials

Leave a Reply

Using KNN as Prediction Algorithm Demonstration by MySQL

Other KNN Tutorials

Related posts:

Leave a Reply