The K Nearest Neighbor Algorithm (Prediction) Demonstration by MySQL


The K Nearest Neighbor (KNN) Algorithm is well known by its simplicity and robustness in the domain of data mining and machine learning. It is actually a method based on the statistics. It can be easily described as the following diagram.

KNN The K Nearest Neighbor Algorithm (Prediction) Demonstration by MySQL machine learning math sql

K Nearest Neighbor Algorithm

Known blue squares and red triangles, you are asked to predict if the green circle belongs to the squares or the triangles. If we choose K=3, then the minimal distances (closest) to the green circle are 2 triangles and 1 square, therefore, the circle belongs to the triangles. However, if we choose K=5, then we have 3 squares and 2 triangles, which will vote the cirlce to the squares group.

Using KNN as Prediction Algorithm Demonstration by MySQL

By the similar principle, KNN can be used to make predictions by averaging (or with weights by distance) the closest candidates. For example, if we have the following data (MySQL table test1):

1
2
3
4
5
6
7
8
9
10
11
mysql> select * from test1;
+------+------+
| x    | y    |
+------+------+
|    1 |   23 |
|  1.2 |   17 |
|  3.2 |   12 |
|    4 |   27 |
|  5.1 |    8 |
+------+------+
5 rows in set (0.04 sec)
mysql> select * from test1;
+------+------+
| x    | y    |
+------+------+
|    1 |   23 |
|  1.2 |   17 |
|  3.2 |   12 |
|    4 |   27 |
|  5.1 |    8 |
+------+------+
5 rows in set (0.04 sec)

We want to predict when x=6.5, the value of y. First of all, for simplicity, we choose K=2, we need to find two points that have the shortest distance to 6.5, so the SQL will be:

1
2
3
4
5
6
7
8
mysql> select x,y from test1 order by abs(6.5-x) limit 2;
+------+------+
| x    | y    |
+------+------+
|  5.1 |    8 |
|    4 |   27 |
+------+------+
2 rows in set (0.00 sec)
mysql> select x,y from test1 order by abs(6.5-x) limit 2;
+------+------+
| x    | y    |
+------+------+
|  5.1 |    8 |
|    4 |   27 |
+------+------+
2 rows in set (0.00 sec)

Now, the task is just to average their corresponding y values. So using a nested SQL statement is trivial:

1
2
3
4
5
6
7
mysql> select avg(y) as predicted from (select y from test1 order by abs(6.5-x) limit 2) as KNN;
+-----------+
| predicted |
+-----------+
|      17.5 |
+-----------+
1 row in set (0.00 sec)
mysql> select avg(y) as predicted from (select y from test1 order by abs(6.5-x) limit 2) as KNN;
+-----------+
| predicted |
+-----------+
|      17.5 |
+-----------+
1 row in set (0.00 sec)

KNN is lazy, meaning that it does not train dataset immediately (actually there is no training). When the input is given to the KNN model, it will look for K nearest neighbours at real time, which is slow and inefficient if the dataset is large.

Other KNN Tutorials

–EOF (The Ultimate Computing & Technology Blog) —

GD Star Rating
loading...
517 words
Last Post: How to Define Inheritable Record/Structure in Delphi (Object Pascal)?
Next Post: C/C++ Linear Regression Tutorial Using Gradient Descent

The Permanent URL is: The K Nearest Neighbor Algorithm (Prediction) Demonstration by MySQL

Leave a Reply