Welcome to the second Machine Learning and Artificial Intelligence Tutorial! In this tutorial, we’ll be learning a simple Machine Learning algorithm called Linear Regression. The Linear Regression algorithm mostly used to predict future stock price, weather, traffic etc. So back in high school, we study an equation *y = *mx + b* *to put the best fit line in points on Y-axis and X-axis. So the goal is to find the best fit line for our data, you have *x* now you’ll need to find m and b, so if you have x, m and b you can find y.

As we are lazy programmer we’ll be using some Python Libraries to simplify our life because why not? 😉

## #1 Importing Libraries:

1 2 3 4 5 6 | # importing the libraries import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.linear_model import LinearRegression from sklearn.cross_validation import train_test_split |

** NumPy:** NumPy will help us make arrays of

*data.***Pandas will let us import, read and handle our**

*Pandas:*

*datasets.***We’ll use Matplotlib to visualize our graphs and data points. Matplotlib is one of the most powerful libraries of data**

*Matplotlib:*

*visualization.***Sk-Learn is the most**

*Sk-Learn:***library in this code and in the future, we’ll be using Sk-Learn to do the hard work for us. Sk-Learn library comes with almost all necessary**

*important***that we’ll be learning in this tutorial series.**

*Machine Learning algorithms**from *sklearn*.linear_model import LinearRegression: *As you have already realized by its name LinearRegression it contains our linear regression algorithm.

*from* sklearn.cross_validation import* train_test_split: *we’ll use train_test_split to split our data into two parts training set and test set. We’ll talk about splitting while splitting the data.

## #2 Importing The Dataset:

1 2 3 4 | # Importing the dataset df = pd.read_csv('Salary_Data.csv') X = np.array(df.iloc[:, :-1].values) y = np.array(df.iloc[:, 1].values) |

We’ll use the read_csv function to import our Dataset and let’s first understand our dataset then we’ll talk about X and y.

So we have 30 entries in our dataset 0-29 index in Python starts with 0. the first row is the index of entries. the second row is years of experience and the third row is a salary package. Let’s imagine we have to give right salary package to an employee candidate based on years of experience, not just that we will make a Machine Learning model that will predict right salary package.

We’ll predict ** Salary(y)** using

**, so we make two different NumPy arrays X and y. The**

*Years of Experience(X)***contains years of experience and**

*X***contains salary.**

*y*## Visualizing Our Data!

1 2 3 4 5 6 7 8 9 10 11 | import pandas as pd import numpy as np import matplotlib.pyplot as plt df = pd.read_csv('Salary_Data.csv') X = np.array(df.iloc[:, :-1].values) y = np.array(df.iloc[:, 1].values) plt.scatter(X, y) plt.show() |

As you see our data set is linear data set. now the fun part will start! now, we’ll split the dataset into training and test then we’ll fit data in Linear Regression.

## #3 Splitting The Data into Training and Test set:

1 2 | # Splitting the Data into the Training and Test set X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0) |

We’ll split X_train, X_test, y_train and y_test, we use X_train and y_train to train our model and X_test and y_test to check the accuracy of the prediction made by our machine learning model.

We pass our array X(years of experience) and y(salary) to make the split.

** test_size: **test size should be smaller than train size for the sake of good performance and sometimes test size depends on your dataset size and how good is your data.

** random_state:** random state shuffles data while splitting I set it to 0 so we can get the same result. you can play around with it to see what it does!

## #4 Fitting Linear Regression to the Training sets :

1 2 3 | # Fitting Liner Regression to the Training sets regressor = LinearRegression() regressor.fit(X_train, y_train) |

We’ll use ** fit() **function to fit the training sets to the regressor and pass X_train and y_train to train our model.

## #5 Predicting the y:

1 2 | # Predicting the Test set results y_pred = regressor.predict(X_test) |

To predict y will use Linear Regression’s ** predict() **function. The predict function will use X_test to predict y_test. The predicted values will be stored in y_pred so later we can compare y_test and y_pred.

## #6 Accuracy!:

1 | accuracy = regrassor.score(X_test, y_test) |

0.97409934072135107

well, we got an accuracy of 97%! The score function uses R squared algorithm[ ** ((y_true – y_true.mean()) ** 2).sum() **]. If you can you pen and paper to solve R squared. 😉

## # 7 Let’s Visualize our Graph, Line and predicted values!:

1 2 3 4 5 6 7 8 | # Visualising the Training set results plt.scatter(X_train, y_train, color = 'red') # Training data points plt.scatter(X_test, y_test, color = 'green') # Test data ponts plt.plot(X_train, regressor.predict(X_train), color = 'blue') # our Line plt.title('Salary vs Experience (Training set)') plt.xlabel('Years of xperience') plt.ylabel('Salary') plt.show() |

## Get The Code Here!

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | # Linear Regression # importing the libraries import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.linear_model import LinearRegression from sklearn.cross_validation import train_test_split # Importing the dataset df = pd.read_csv('Salary_Data.csv') X = np.array(df.iloc[:, :-1].values) y = np.array(df.iloc[:, 1].values) # Splitting the Data set into the Training set and Test set X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=1/3, random_state=0) # Fitting Liner Regression to the Training set regressor = LinearRegression() regressor.fit(X_train, y_train) # Predicting the Test set results y_pred = regressor.predict(X_test) # Visualising the Training set results plt.scatter(X_train, y_train, color = 'red') plt.scatter(X_test, y_test, color = 'green') plt.plot(X_train, regressor.predict(X_train), color = 'blue') plt.title('Salary vs Experience (Training set)') plt.xlabel('Years of xperience') plt.ylabel('Salary') plt.show() |

Download the dataset from here.

Hope you like the tutorial! if you face any problem comment down below! 🙂