Requirement Details
Question-1
You have been given comprehensive data of expected CTC.
Your task is to analyse this data and figure and also to check if you can build a model .
Please perform the following steps:
Perform EDA on the data
Split the data into train and test (70:30)
Build regression models (at least 2 different model)
Interpret model results
Question-2
This dataset contains subjects classification. It includes two columns, "Questions" and "Subjects". Perform analysis on the data and figure.
Please perform the following steps:
Language Detection
Named Entity Recognition
Data pre processing •
Extract Uni-Gram features (retain only 500 columns) by performing pre-processing
Split the data into train and test (70:30)
Build classification models (at least 2 different model)
Interpret model results
Need to Do:
1a) Perform EDA on the data This area will be used by the assessor to leave comments related to this criterion.
1b) Split the data into train and test (70:30) This area will be used by the assessor to leave comments related to this criterion.
1c) Build regression models This area will be used by the assessor to leave comments related to this criterion.
1d) Interpret model results This area will be used by the assessor to leave comments related to this criterion.
--------------------------------------------
2a) Language Detection This area will be used by the assessor to leave comments related to this criterion.
2b) Named Entity Recognition This area will be used by the assessor to leave comments related to this criterion.
2c) Data Pre-Processing This area will be used by the assessor to leave comments related to this criterion.
2d) Extract Uni-Gram features This area will be used by the assessor to leave comments related to this criterion.
2e) Split the data into train and test (70:30) This area will be used by the assessor to leave comments related to this criterion.
2f) Build classification models This area will be used by the assessor to leave comments related to this criterion.
2g) Interpret model results This area will be used by the assessor to leave comments related to this criterion.
Question-1
Import libraries
Importing respective libraries and packages:
Reading the respective data
d1=pd.read_excel('/content/expected_ctc.xlsx')
Observation:
Here we read excel dataset using the pandas and display record using head(), head() used to display top 5 records. read_excel() function used to read the data from excel data file. If your data in csv format then you can use read_csv().
Checking missing data
Observation:
We have the highest nymber of null values in Passing_Year_Of_PHD, PHD_Specialization, University_PHD these 3 columns
Correlation matrix
1a) Perform EDA on the data This area will be used by the assessor to leave comments related to this criterion. 6.0 pts
#heatmap visualization
Observation:
Passing_Year_of_graduation and Passing_Year_of_PG both these columns are highly correlated(1) ,and Expected_CTC and Total_Experience are also highly positively correlated(0.82), Current_CTC and Total_Experience
# Department of employees
Observation:
Marketing and Sales department have highest number of Employees
Auto mobile Industry has highest number of employees
Both are following same distribution
we have encoded all the categorical variables to numerical
1b) Split the data into train and test (70:30) This area will be used by the assessor to leave comments related to this criterion. 1.0 pts
Observation:
Here dataset split into(7:3), means 30 percent test data size and 70 percent for train size. Here we use sklearn train_test_split method to split the dataset.
1c) Build regression models This area will be used by the assessor to leave comments related to this criterion.
Code Analysis:
from sklearn import datasets, linear_model, metrics
#create linear regression object
reg = linear_model.LinearRegression()
# train the model using the training sets
reg.fit(X_train, y_train)
Observation:
This is the part of model evaluation, in this part we implement the regression model and fit the train data into this.
Other remaining parts you can do itself to improve your skills and let me know or comment in below comment section to get help.
You can also ping me at mail:
realcode4you@gmail.com
business setup in Dubai UAE