In this blog we will learn how to analyze amazon review and using train and test data:
---
---
The multilayer perceptron(MLP) has a large wide of classification and regression applications in many fields: pattern recognition, voice and classification problems. But the architecture choice has a great impact on the convergence of these networks. In the present paper we introduce a new approach to optimize the AMAZON REVIEW DATA, for solving the obtained model we use the genetic algorithm and we train the amazon review.
# Introduction
Here we will analyze positive and nagative review of amazon dataset and test the accuracy of train and test data.
---
# Part I - Data preparation
---
# Like importing, reading, cleaning and split, etc.
Data Source
Importing Libraries:
###
import gzip
import itertools
import numpy as np
import pandas as pd
import datetime as dt
import matplotlib.pyplot as plt
from collections import Counter
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import accuracy_score, confusion_matrix, f1_score
%matplotlib inline
###
Reviews into Pandas DataFrame
Here we will first parse the data sets parse_gz() method using which is given in zip formate and then we will convert it into the dataframe by using convert_to_DataFrame() methods
Code for unzip file:
It used to unzip file and then convert it into the data-frame.
def parse_gz(file_path):
g = gzip.open(file_path, 'rb')
for l in g:
yield eval(l)
def convert_to_DataFrame(file_path):
i = 0
df = {}
for d in parse_gz(file_path):
df[i] = d
i += 1
return pd.DataFrame.from_dict(df, orient='index')
We are going to classify Amazon product reviews to understand the positive or negative review. Amazon has different rating(1-stars, 2-stars, etc), which is given in overall column. We will use that to compare our prediction.
Now we go the split data, if you need complete data with how to load and how to clean and prepare for fit into the model then contact us, so we can give an complete details at here or need any help related to machine learning and data science then also contact with us.
Split data:
x_train, x_test, y_train, y_test = train_test_split(sports_data.reviewText,sports_data.review_in_float, random_state=0)
How to use countvectorizer()
It used to change the data into the string to integer
cv = CountVectorizer()
X_traincv = cv.fit_transform(x_train)
X_testcv = cv.transform(x_test)
After this we are fit it into the model
Here we fit it into the MLP Classifier
## import mlp classifier libraries
from sklearn.preprocessing import StandardScaler
# Training the model
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import classification_report,confusion_matrix
##
mlp = MLPClassifier()
mlp.fit(X_traincv,y_train)
# predict the target on the train dataset
pred_train = mlp.predict(X_traincv)
pred_train
# Accuray Score on train dataset
accur_train = accuracy_score(train_y,pred_train)
print('accuracy_score on train dataset : ', accur_train)
#confusion matrix to find to mark predicted value
cnf = confusion_matrix(test_y,predictions)
cnf
#result with score and accuracy
print(classification_report(test_y,predictions))
Other Services which is offered by <realcode4you>
<Realcode4you> Assignment Help
<Realcode4you> Web Assignment Help
#Datascienceassignmenthelp #datascience #python #machinelearningassignmenthelp #Datasciencehomeworkhelp
Comments