R Tutorial – Using R to Fit Linear Model – Predit Weight over Height


In this post, we have shown you the C# code to process raw data of 10K rows of gender, height and corresponding weight. It takes a while to code it even in C# but it is useful in understanding the model behind (code it from the scratch). In this tutorial, we show you the relevant commands and functions to fit the linear model (weight over height) using R programming, which is made perfect for this task (statistical analysis).

Data Preparation

We still use the same data, the CSV file containing the 10K data 9P R Tutorial - Using R to Fit Linear Model - Predit Weight over Height machine learning R programming .

data-training-set R Tutorial - Using R to Fit Linear Model - Predit Weight over Height machine learning R programming

data-training-set

Now, we need to read in the CSV in R using read.csv, like this:

data = read.csv("data.csv", header=T);

Since the first line of the CSV is the labels, so we pass header=T (T shorthand for TRUE) parameter to skip it. The data now is a matrix, which you can use dim function to verify:

dim(data)
[1] 10000     5

The nrow and ncol returns the number of rows and columns respectively.

nrow(data)
[1] 10000
ncol(data)
[1] 5

Now, basically, we need to separate the data into two sets, the training data set and the verification data set. The common mistake for Big-Data-Mining learners is to use the machine learning algorithm on the data set and make prediction on the same data set, in which normally very good results are obtained.

We can split the data by using even and odd indices.

training = data[seq(1, nrow(data), 2), ]
verification = data[seq(2, nrow(data), 2), ]

each contains half records:

dim(training)
[1] 5000    5
dim(verification)
[1] 5000    5

Now, let’s further extract these data into variables.

training_weight = training[,5]
training_height = training[,4]
verification_weight = verification[,5]
verification_height = verification[,4]

The weight data is located at the fifth column and the height data is located at the fourth column (in R, the index starts at ONE, not zero-based).

Linear Fit using lm() function in R

Now, we can use the lm() to fit the linear model using the following:

fit = lm(weight~height)

We are basically constructing the linear model: weight = k * height + b. What we get is:

 fit

Call:
lm(formula = weight ~ height)

Coefficients:
(Intercept)       height  
   -158.101        1.372  

That means the model is: weight = 1.372 * height – 158.101, so what is next? We can plot the fit vividly.

plot(height, weight, col='blue',xlab='height (cm)',ylab='weight (kg')
abline(fit,col='red')

This plots the points and fit the line:

r-predict-weight-over-height-lm-function R Tutorial - Using R to Fit Linear Model - Predit Weight over Height machine learning R programming

r-predict-weight-over-height-lm-function

How good is the model?

We can make predictions using the linear model and compare the accuracy:

pred_weight = 1.372 * verification_height - 158.101

And the mean error is:

mean(pred_weight-verification_weight)
-0.07128337

RMSE is: tex_777ef8ad5b86f5bc46a263d3e40beff2 R Tutorial - Using R to Fit Linear Model - Predit Weight over Height machine learning R programming where P is the predicted weight and W is the verification weight.

sqrt(sum((pred_weight-verification_weight)^2)/length(pred_weight))
[1] 5.545324

A reasonable good model (linear fit) is achieved with the RMSE 5.5 and mean error -0.07.

Separate males from the females

The above model does not distinguish the males and females, you could do similarly by using the which function, that returns the indices of the vector when conditions are met.

male=data[which(data[,1]=="Male"), ]
female=data[which(data[,1]=="Female"), ]

R Tutorial

–EOF (The Ultimate Computing & Technology Blog) —

GD Star Rating
loading...
981 words
Last Post: Derangement Permutation Implementation using R Programming
Next Post: How to Prolong Battery Use when Playing Pokemon?

The Permanent URL is: R Tutorial – Using R to Fit Linear Model – Predit Weight over Height

Leave a Reply