C/C++ Linear Regression Tutorial Using Gradient Descent

In the field of machine learning and data mining, the Gradient Descent is one simple but effective prediction algorithm based on linear-relation data. It can be used to make prediction based on a large number of known data, for things like, predict heights given weights. This tutorial will guide you through the Gradient Descent via the C/C++ code samples.

Tutorial Data Set

For simplicity, we will use the following simple X/Y series for demonstration (so you can easily adapt to some real problems)

Gradient Descent

There are many linear regression algorithms and Gradient Descent is one of the simplest method. For linear regression, you assume the data satisfies the linear releation, for example,

$tex_ed4c4e514024a13b91a95f5ff3ae7998 C/C++ Linear Regression Tutorial Using Gradient Descent$

So, our task is to find the ‘optimal’ B0 and B1 such that the ‘prediction’ gives an acceptable accuracy. Let’s start the value of 0 for both B0 and B1.

$tex_5f81a28beb7b49761ae7d3330ab97f98 C/C++ Linear Regression Tutorial Using Gradient Descent$

So if we calculate the error for the first point $tex_828f763c4c7dfb9e40de8664c6bbc3be C/C++ Linear Regression Tutorial Using Gradient Descent$ , we will have error = -1. Now can we use the error for Gradient Descent to update the weights. Since it is just 1 point, we can say that B0 is accountable for all the error. So we can tune B0 in this manner:

$tex_d14a895022e2b0c107953079c69c9a4b C/C++ Linear Regression Tutorial Using Gradient Descent$

Where B0(t+1) is the updated version of the coefficient we will use on the next training instance (second point). The $tex_4a1eac2e51197f9d39959a2cb9f8a459 C/C++ Linear Regression Tutorial Using Gradient Descent$ is the learning rate and let’s pick a small number 0.01. So plug in these values, we will have:

$tex_c22ab9e2b14643149f3646211bc724d3 C/C++ Linear Regression Tutorial Using Gradient Descent$ .

Now, for B1, it is slightly different, as we will need to take into account the value of x, so the formula becomes:

$tex_54be8006754d3ca774c76c55789450d9 C/C++ Linear Regression Tutorial Using Gradient Descent$

Let’s plug the values and we will have:

$tex_d647c2788dbaf918d0af28939a5e16a0 C/C++ Linear Regression Tutorial Using Gradient Descent$

Now, we can iterately repeat the process to the next few instances until reaches the end, which is called 1 pass (an epoch). We can go round to the begining of the training instances as many times as we want to improve the accuracy of the model.

We can iterate this process 20 times that is 4 epoches. The process can be represented using the following C/C++ code.

double x[] = {1, 2, 4, 3, 5};
double y[] = {1, 3, 3, 2, 5};

double b0 = 0;
double b1 = 0;
double alpha = 0.01;

for (int i = 0; i < 20; i ++) {
    int idx = i % 5;
    double p = b0 + b1 * x[idx];
    double err = p - y[idx];
    b0 = b0 - alpha * err;
    b1 = b1 - alpha * err * x[idx];
}

So, if we print the values of b0, b1 and error, what do we get?

B0 = 0.01, B1 = 0.01, err = -1
B0 = 0.0397, B1 = 0.0694, err = -2.97
B0 = 0.066527, B1 = 0.176708, err = -2.6827
B0 = 0.0805605, B1 = 0.218808, err = -1.40335
B0 = 0.118814, B1 = 0.410078, err = -3.8254
B0 = 0.123526, B1 = 0.414789, err = -0.471107
B0 = 0.143994, B1 = 0.455727, err = -2.0469
B0 = 0.154325, B1 = 0.497051, err = -1.0331
B0 = 0.157871, B1 = 0.507687, err = -0.354521
B0 = 0.180908, B1 = 0.622872, err = -2.3037
B0 = 0.18287, B1 = 0.624834, err = -0.196221
B0 = 0.198544, B1 = 0.656183, err = -1.56746
B0 = 0.200312, B1 = 0.663252, err = -0.176723
B0 = 0.198411, B1 = 0.65755, err = 0.190068
B0 = 0.213549, B1 = 0.733242, err = -1.51384
B0 = 0.214081, B1 = 0.733774, err = -0.0532087
B0 = 0.227265, B1 = 0.760141, err = -1.31837
B0 = 0.224587, B1 = 0.749428, err = 0.267831
B0 = 0.219858, B1 = 0.735242, err = 0.472871
B0 = 0.230897, B1 = 0.790439, err = -1.10393

So, plotted visually: