top of page

Analyze and Visualize(EDA) the Expected CTC Dataset | Data Analysis Assignment Help

realcode4you

Requirement Details

Question-1

You have been given comprehensive data of expected CTC.

Your task is to analyse this data and figure and also to check if you can build a model .


Please perform the following steps:

  1. Perform EDA on the data

  2. Split the data into train and test (70:30)

  3. Build regression models (at least 2 different model)

  4. Interpret model results



Question-2

This dataset contains subjects classification. It includes two columns, "Questions" and "Subjects". Perform analysis on the data and figure.


Please perform the following steps:

  • Language Detection

  • Named Entity Recognition

  • Data pre processing •

  • Extract Uni-Gram features (retain only 500 columns) by performing pre-processing

  • Split the data into train and test (70:30)

  • Build classification models (at least 2 different model)

  • Interpret model results


Need to Do:

1a) Perform EDA on the data This area will be used by the assessor to leave comments related to this criterion.

1b) Split the data into train and test (70:30) This area will be used by the assessor to leave comments related to this criterion.

1c) Build regression models This area will be used by the assessor to leave comments related to this criterion.

1d) Interpret model results This area will be used by the assessor to leave comments related to this criterion.


--------------------------------------------

2a) Language Detection This area will be used by the assessor to leave comments related to this criterion.

2b) Named Entity Recognition This area will be used by the assessor to leave comments related to this criterion.

2c) Data Pre-Processing This area will be used by the assessor to leave comments related to this criterion.

2d) Extract Uni-Gram features This area will be used by the assessor to leave comments related to this criterion.

2e) Split the data into train and test (70:30) This area will be used by the assessor to leave comments related to this criterion.

2f) Build classification models This area will be used by the assessor to leave comments related to this criterion.

2g) Interpret model results This area will be used by the assessor to leave comments related to this criterion.


Question-1

Import libraries

Importing respective libraries and packages:


Reading the respective data

d1=pd.read_excel('/content/expected_ctc.xlsx')

Observation:

Here we read excel dataset using the pandas and display record using head(), head() used to display top 5 records. read_excel() function used to read the data from excel data file. If your data in csv format then you can use read_csv().


Checking missing data

Observation:

We have the highest nymber of null values in Passing_Year_Of_PHD, PHD_Specialization, University_PHD these 3 columns



Correlation matrix



1a) Perform EDA on the data This area will be used by the assessor to leave comments related to this criterion. 6.0 pts


#heatmap visualization

Observation:

Passing_Year_of_graduation and Passing_Year_of_PG both these columns are highly correlated(1) ,and Expected_CTC and Total_Experience are also highly positively correlated(0.82), Current_CTC and Total_Experience



# Department of employees


Observation:

Marketing and Sales department have highest number of Employees


Auto mobile Industry has highest number of employees



Both are following same distribution

we have encoded all the categorical variables to numerical



1b) Split the data into train and test (70:30) This area will be used by the assessor to leave comments related to this criterion. 1.0 pts


Observation:

Here dataset split into(7:3), means 30 percent test data size and 70 percent for train size. Here we use sklearn train_test_split method to split the dataset.


1c) Build regression models This area will be used by the assessor to leave comments related to this criterion.


Code Analysis:

from sklearn import datasets, linear_model, metrics
#create linear regression object
reg = linear_model.LinearRegression()
 
# train the model using the training sets
reg.fit(X_train, y_train)

Observation:

This is the part of model evaluation, in this part we implement the regression model and fit the train data into this.




Other remaining parts you can do itself to improve your skills and let me know or comment in below comment section to get help.


You can also ping me at mail:


realcode4you@gmail.com

1 Comment


REALCODE4YOU

Realcode4you is the one of the best website where you can get all computer science and mathematics related help, we are offering python project help, java project help, Machine learning project help, and other programming language help i.e., C, C++, Data Structure, PHP, ReactJs, NodeJs, React Native and also providing all databases related help.

Hire Us to get Instant help from realcode4you expert with an affordable price.

USEFUL LINKS

Discount

ADDRESS

Noida, Sector 63, India 201301

Follows Us!

  • Facebook
  • Twitter
  • Instagram
  • LinkedIn

OUR CLIENTS BELONGS TO

  • india
  • australia
  • canada
  • hong-kong
  • ireland
  • jordan
  • malaysia
  • new-zealand
  • oman
  • qatar
  • saudi-arabia
  • singapore
  • south-africa
  • uae
  • uk
  • usa

© 2023 IT Services provided by Realcode4you.com

bottom of page