Analyze and Visualize(EDA) the Expected CTC Dataset

Requirement Details

Question-1

You have been given comprehensive data of expected CTC.

Your task is to analyse this data and figure and also to check if you can build a model .

Please perform the following steps:

Perform EDA on the data
Split the data into train and test (70:30)
Build regression models (at least 2 different model)
Interpret model results

Question-2

This dataset contains subjects classification. It includes two columns, "Questions" and "Subjects". Perform analysis on the data and figure.

Please perform the following steps:

Language Detection
Named Entity Recognition
Data pre processing •
Extract Uni-Gram features (retain only 500 columns) by performing pre-processing
Split the data into train and test (70:30)
Build classification models (at least 2 different model)
Interpret model results

Need to Do:

1a) Perform EDA on the data This area will be used by the assessor to leave comments related to this criterion.

1b) Split the data into train and test (70:30) This area will be used by the assessor to leave comments related to this criterion.

1c) Build regression models This area will be used by the assessor to leave comments related to this criterion.

1d) Interpret model results This area will be used by the assessor to leave comments related to this criterion.

--------------------------------------------

2a) Language Detection This area will be used by the assessor to leave comments related to this criterion.

2b) Named Entity Recognition This area will be used by the assessor to leave comments related to this criterion.

2c) Data Pre-Processing This area will be used by the assessor to leave comments related to this criterion.

2d) Extract Uni-Gram features This area will be used by the assessor to leave comments related to this criterion.

2e) Split the data into train and test (70:30) This area will be used by the assessor to leave comments related to this criterion.

2f) Build classification models This area will be used by the assessor to leave comments related to this criterion.

2g) Interpret model results This area will be used by the assessor to leave comments related to this criterion.

Question-1

Import libraries

Importing respective libraries and packages:

Reading the respective data

d1=pd.read_excel('/content/expected_ctc.xlsx')

Observation:

Here we read excel dataset using the pandas and display record using head(), head() used to display top 5 records. read_excel() function used to read the data from excel data file. If your data in csv format then you can use read_csv().

Checking missing data

Observation:

We have the highest nymber of null values in Passing_Year_Of_PHD, PHD_Specialization, University_PHD these 3 columns

Correlation matrix

1a) Perform EDA on the data This area will be used by the assessor to leave comments related to this criterion. 6.0 pts

#heatmap visualization

Observation:

Passing_Year_of_graduation and Passing_Year_of_PG both these columns are highly correlated(1) ,and Expected_CTC and Total_Experience are also highly positively correlated(0.82), Current_CTC and Total_Experience

# Department of employees

Observation:

Marketing and Sales department have highest number of Employees

Auto mobile Industry has highest number of employees

Both are following same distribution

we have encoded all the categorical variables to numerical

1b) Split the data into train and test (70:30) This area will be used by the assessor to leave comments related to this criterion. 1.0 pts

Observation:

Here dataset split into(7:3), means 30 percent test data size and 70 percent for train size. Here we use sklearn train_test_split method to split the dataset.

1c) Build regression models This area will be used by the assessor to leave comments related to this criterion.

Code Analysis:

from sklearn import datasets, linear_model, metrics
#create linear regression object
reg = linear_model.LinearRegression()
 
# train the model using the training sets
reg.fit(X_train, y_train)

Observation:

This is the part of model evaluation, in this part we implement the regression model and fit the train data into this.

Other remaining parts you can do itself to improve your skills and let me know or comment in below comment section to get help.

You can also ping me at mail:

realcode4you@gmail.com

RealCode4You

Analyze and Visualize(EDA) the Expected CTC Dataset | Data Analysis Assignment Help

Recent Posts

1 comentario