Linear Regression in Machine Learning is one of the most fundamental algorithms. It is the door to the magical world ahead, but before going further with the algorithm. Letβs have a look at the life cycle of the Machine Learning model.
This diagram explains the Machine Learning model from scratch and then taking the same model further with Hyperparameter tuning to improve the accuracy, and then deciding the deployment strategies for that model.
Once deployed, setting up the logging and monitoring frameworks to generate reports and dashboards based on projects requirement. ππ
Linear Regression is one of the most fundamental and widely known Machine Learning Algorithm.
Building blocks of Linear Regression are:
This data is about the amount spent on advertising through different channels like TV, Radio and Newspaper. The goal is to predict how the expense on each channel affects the sales and is there a way to optimize that sale?
# necessary Imports
import pandas as pd
import matplotlib.pyplot as plt
import pickle
% matpllotlib inline
data= pd.read_csv('Advertising.csv') # Reading the data file
data.head() # checking the first five rows from the dataset
What are the features? ππ
What is the response? π·π·
data.info() # printing the summary of the dataframe
Now, letβs showcase the relationship between the feature and target column
# visualize the relationship between the features and the response using scatterplots
fig, axs = plt.subplots(1, 3, sharey=True)
data.plot(kind='scatter', x='TV', y='sales', ax=axs[0], figsize=(16, 8))
data.plot(kind='scatter', x='radio', y='sales', ax=axs[1])
data.plot(kind='scatter', x='newspaper', y='sales', ax=axs[2])
Simple Linear regression is a method for predicting a quantitative response using a single feature (βinput variableβ). The mathematical equation is: ππ
π¦ =π½0 + π½1π₯ ππ
What do terms represent?
π½0 and π½1 are the model coefficients. To create a model, we must βlearnβ the values of these coefficients. And once we have the value of these coefficients, we can use the model to predict the Sales!
Till now, we have created the model based on only one feature. Now, weβll include multiple features and create a model to see the relationship between those features and the label column. This is called Multiple Linear Regression.
π¦=π½0+π½1π₯1+β¦+π½ππ₯π π΅π²
Each π₯
represents a different feature, and each feature has its own coefficient. In this case:
π¦=π½0+π½1Γππ+π½2Γπ ππππ+π½3Γπππ€π πππππ ππ
Letβs use Stats models to estimate these coefficients
# create X and y
feature_cols = ['TV', 'radio', 'newspaper']
X = data[feature_cols]
y = data.sales
lm = LinearRegression()
lm.fit(X, y)
# print intercept and coefficients
print(lm.intercept_)
print(lm.coef_)
How do we interpret these coefficients? If we look at the coefficients, the coefficient for the newspaper spends is negative. It means that the money spent for newspaper advertisements is not contributing in a positive way to the sales.
A lot of the information we have been reviewing piece-by-piece is available in the model summary output: ππ
lm = smf.ols(formula='sales ~ TV + radio + newspaper', data=data).fit()
lm.conf_int()
lm.summary()
What are the things to be learnt from this summary? ππ
Recommended Reading: How to start learning Python Programming π
Good Luck with your decision making let me know in the comment which project you choose in the end.
If you like my post please follow me to read my latest post on programming and technology.
Problem Statement: GivenΒ nΒ pairs of parentheses, write a function toΒ generate all combinations of well-formed parentheses. Example…
Given an integer A. Compute and return the square root of A. If A is…
Given a zero-based permutation nums (0-indexed), build an array ans of the same length where…
A heap is a specialized tree-based data structure that satisfies the heap property. It is…
What is the Lowest Common Ancestor? In a tree, the lowest common ancestor (LCA) of…