Welcome to the second Machine Learning and Artificial Intelligence Tutorial! In this tutorial, we’ll be learning a simple Machine Learning algorithm called Linear Regression. The Linear Regression algorithm mostly used to predict future stock price, weather, traffic etc. So back in high school, we study an equation y = mx + b to put the best fit line in points on Y-axis and X-axis. So the goal is to find the best fit line for our data, you have x now you’ll need to find m and b, so if you have x, m and b you can find y.

As we are lazy programmer we’ll be using some Python Libraries to simplify our life because why not? 😉

linear regression

#1 Importing Libraries:

NumPy: NumPy will help us make arrays of data.
Pandas: Pandas will let us import, read and handle our datasets.
Matplotlib: We’ll use Matplotlib to visualize our graphs and data points. Matplotlib is one of the most powerful libraries of data visualization.
Sk-Learn: Sk-Learn is the most important library in this code and in the future, we’ll be using Sk-Learn to do the hard work for us. Sk-Learn library comes with almost all necessary Machine Learning algorithms that we’ll be learning in this tutorial series.

from sklearn.linear_model import LinearRegression: As you have already realized by its name LinearRegression it contains our linear regression algorithm.

from sklearn.cross_validation import train_test_split: we’ll use train_test_split to split our data into two parts training set and test set. We’ll talk about splitting while splitting the data.

#2 Importing The Dataset:

We’ll use the read_csv function to import our Dataset and let’s first understand our dataset then we’ll talk about X and y.

So we have 30 entries in our dataset 0-29 index in Python starts with 0. the first row is the index of entries. the second row is years of experience and the third row is a salary package. Let’s imagine we have to give right salary package to an employee candidate based on years of experience, not just that we will make a Machine Learning model that will predict right salary package.

We’ll predict Salary(y) using Years of Experience(X), so we make two different NumPy arrays X and y. The X contains years of experience and y contains salary.

Visualizing Our Data!

As you see our data set is linear data set. now the fun part will start! now, we’ll split the dataset into training and test then we’ll fit data in Linear Regression.

#3 Splitting The Data into Training and Test set:

We’ll split X_train, X_test, y_train and y_test, we use X_train and y_train to train our model and X_test and y_test to check the accuracy of the prediction made by our machine learning model.

We pass our array X(years of experience) and y(salary) to make the split.

test_size: test size should be smaller than train size for the sake of good performance and sometimes test size depends on your dataset size and how good is your data.

random_state: random state shuffles data while splitting I set it to 0 so we can get the same result. you can play around with it to see what it does!

#4 Fitting Linear Regression to the Training sets :

We’ll use fit() function to fit the training sets to the regressor and pass X_train and y_train to train our model.

#5 Predicting the y:

To predict y will use Linear Regression’s predict() function. The predict function will use X_test to predict y_test. The predicted values will be stored in y_pred so later we can compare y_test and y_pred.

#6 Accuracy!:


well, we got an accuracy of 97%! The score function uses R squared algorithm[ ((y_true – y_true.mean()) ** 2).sum() ]. If you can you pen and paper to solve R squared. 😉

# 7 Let’s Visualize our Graph, Line and predicted values!:

Get The Code Here!

Download the dataset from here.

Hope you like the tutorial! if you face any problem comment down below! 🙂