In this blog we will learn how to implement Machine learning algorithms and how to fit it into the models. In this blog we will covers all the algorithms and process to fit it into the models.
Types of Machine Learning Algorithms:
1. Supervised Learning
How it works: This algorithm consist of a target / outcome variable (or dependent variable) which is to be predicted from a given set of predictors (independent variables). Using these set of variables, we generate a function that map inputs to desired outputs. The training process continues until the model achieves a desired level of accuracy on the training data. Examples of Supervised Learning: Regression, Decision Tree, Random Forest, KNN, Logistic Regression etc.
2. Unsupervised Learning
How it works: In this algorithm, we do not have any target or outcome variable to predict / estimate. It is used for clustering population in different groups, which is widely used for segmenting customers in different groups for specific intervention. Examples of Unsupervised Learning: Apriori algorithm, K-means.
3. Reinforcement Learning:
How it works: Using this algorithm, the machine is trained to make specific decisions. It works this way: the machine is exposed to an environment where it trains itself continually using trial and error. This machine learns from past experience and tries to capture the best possible knowledge to make accurate business decisions. Example of Reinforcement Learning: Markov Decision Process
Reading Data
To start learning machine learning first need to prepare data, here below some points which need to prepare data
Before start any ML project first import some basic libraries which almost used in every ML project
## Import libraries ##
Import padas as pd # pandas data frame
Import numpy as np # used for scientific calculations
Import csv
Import matplotlib.pyplot as plt # Used to show data in visual form
##
First read datasets file which given in any format like csv or excel, first import it and read.
df = pd.read_csv(“csvfilename.csv”)
After this need to find some information of data using some code of line:
# display data with head
df.head()
df.info()
df.columnname.unique() #used to show column field unique value
Data cleaning
It is a process of removing or replacing unnecessary data from datasets. Here below some steps which we need to perform for data cleaning.
After this check null value in datasets
df.isnull.sum()
It display all column and count the null value if any column contains nan value.
Then we need to remove these nan fields or replace it by any other value or mean value of column value.
data.dropna() #remove all nan fields
data.fillna(value) # fill nan fields by value which may be any like mean value
Data processing
In this we processing data to fit it into the machine learning algorithms, then we need to split data into the training and testing data for prediction if training and testing data is not given.
Import below library for training and testing data
from sklearn.model_selection import train_test_split
x=df.drop('target_column_name',axis=1) #base data
y=df.target_column_name #data for prediction
# code for split data
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.5)
Fit into the ML model
The next step fit training and testing data into the model, please remember that model always read numeric value so before fit it into the model it is necessary to covert all column string value to numeric or drop column if it is not important for prediction.
Here process for dropping data if need to drop.
x_train1 = x_train.drop(['columnname1','columnname2'],axis=1)
Remember train data contains at least 2 fields when you drop fields or updating fields value by numeric if it is in string.
Now we learn how to fit training and testing data.
1.Linear Regression
# importing required libraries
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
# read the train and test dataset
train_data = pd.read_csv('train.csv') #use train_x and train_y if split
test_data = pd.read_csv('test.csv') #use test_x and test_y if split
#.....................#
if csv file is given then we find train and test data after split it:
from sklearn.model_selection import train_test_split
x=df.drop('target_column_name',axis=1) #base data
y=df.target_column_name #data for prediction
# code for split data
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.5)
#...................#
print(train_data.head())
# shape of the dataset
print('\nShape of training data :',train_data.shape)
print('\nShape of testing data :',test_data.shape)
# separate the independent and target variable on training data
x_train = train_data.drop(columns=['income'],axis=1)
y_train = train_data['income']
# separate the independent and target variable on training data
x_test = test_data.drop(columns=['income'],axis=1)
y_test = test_data['income']
model = LinearRegression()
# fit the model with the training data
model.fit(x_train,y_train)
# coefficients of the trained model
print('\nCoefficient of model :', model.coef_)
# intercept of the model
print('\nIntercept of model',model.intercept_)
# predict the target on the test dataset
predict_train = model.predict(x_train)
print('\nIncome on training data',predict_train)
# Root Mean Squared Error on training dataset
rmse_train = mean_squared_error(y_train,predict_train)**(0.5)
print('\nRMSE on train dataset : ', rmse_train)
# predict the target on the testing dataset
predict_test = model.predict(x_test)
print('\nIncome on test data',predict_test)
# Root Mean Squared Error on testing dataset
rmse_test = mean_squared_error(y_test,predict_test)**(0.5)
print('\nRMSE on test dataset : ', rmse_test)
2. Logistic Regression
# importing required libraries
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
# train and test dataset
train_data = pd.read_csv('train-data.csv')
test_data = pd.read_csv('test-data.csv')
print(train_data.head())
# shape of the dataset
print('Shape of training data :',train_data.shape)
print('Shape of testing data :',test_data.shape)
# seperate the independent and target variable on training data
x_train = train_data.drop(columns=['Income'],axis=1)
y_train = train_data['Income']
# seperate the independent and target variable on testing data
x_test = test_data.drop(columns=['Income'],axis=1)
y_test = test_data['Income']
model = LogisticRegression()
# fit the model with the training data
model.fit(x_train,y_train)
# coefficients of the trained model
print('Coefficient of model :', model.coef_)
# intercept of the model
print('Intercept of model',model.intercept_)
# predict the target on the train dataset
predict_train = model.predict(x_train)
print('Target on train data',predict_train)
# Accuray Score on train dataset
accuracy_train = accuracy_score(y_train,predict_train)
print('accuracy_score on train dataset : ', accuracy_train)
# predict the target on the test dataset
predict_test = model.predict(x_test)
print('Target on test data',predict_test)
# Accuracy Score on test dataset
accuracy_test = accuracy_score(y_test,predict_test)
print('accuracy_score on test dataset : ', accuracy_test)
Other Algorithms which you can try itself.
Other Assignment Help Services which is offered by us:
<Realcode4you> Assignment Help
<Realcode4you> Web Assignment Help
Comments