Aug 18, 20223 min read

K-means Clustering on the Car Dataset to segment the cars into various categories | Sample Paper

Part A:

Domain: Automobile
Context: The data concerns city-cycle fuel consumption in miles per gallon to be predicted in terms of 3 multivalued discrete and 5 continuous attributes.
Data Description:

4. Project Objective: To understand K-means Clustering by applying on the Car Dataset to segment the cars into various categories.

Steps and Tasks

1. Data Understanding and Exploration

2. Data Preparation and Analysis

Check and print feature wise percentage of missing values present in the data and impute with the best suitable approach.
for duplicate values in the data and impute with best suitable approach.
Plot a Pairplot for all the features.
Visualize a scatter plot for “wt” and “disp”. Datapoints should be distinguishable by “cyl”.
Share insights for Q2.d.
Visualize a scatterplot for 'wt' and 'mpg'. Datapoints should be distinguishable by 'cyl'.
Share insights for Q2.f.
Check for unexpected values in all the features and datapoints with such values.

3. Clustering

Apply K-Means clustering for 2 to 10 clusters.
Plot a visual and find elbow point.
On the above visual, highlight which are the possible Elbow points.
Train a K-means clustering model once again on the optimal number of clusters.
Add a new feature in the DataFrame which will have labels based upon cluster value.
Plot a visual and color the datapoints based upon clusters.
Pass a new DataPoint and predict which cluster it belongs to.

Part B

Domain: Automobile
Context: The purpose is to classify a given silhouette as one of three types of vehicle, using a set of features extracted from the silhouette. The vehicle may be viewed from one of many different angles.
Data Description: The data contains features extracted from the silhouette of vehicles in different angles. Four "Corgie" model vehicles were used for the experiment: a double decker bus, Cheverolet van, Saab 9000 and an Opel Manta 400 cars. This particular combination of vehicles was chosen with the expectation that the bus, van and either one of the cars would be readily distinguishable, but it would be more difficult to distinguish between the cars.
All the features are numeric i.e., geometric features extracted from the silhouette.
Project Objective: Apply dimensionality reduction technique - PCA and train a model and compare relative results.

Steps and Tasks

1. Data Understanding and Cleaning

2. Data Preparation

3. Model Building

Train a base Classification model using SVM.
Print Classification metrics for train data.
Apply PCA on the data with 10 components.
Visualize Cumulative Variance Explained with Number of Components.
Draw a horizontal line on the above plot to highlight the threshold of 90%.
Apply PCA on the data. This time Select Minimum Components with 90% or above variance explained.
Train SVM model on components selected from above step.
Print Classification metrics for train data of above model and share insights.

4. Performance Improvement

another SVM on the components out of PCA. Tune the parameters to improve performance.
Train another SVM on the components out of PCA. Tune the parameters to improve performance.
Share best Parameters observed from above step.
Print Classification metrics for train data of above model and share relative improvement in performance in all the models along with insights.

5. Data Understanding and Cleaning

To get solution of above problem comment in below comment section or send your query at:

realcode4you@gmail.com

Here you get code without any plagiarism issue with an affordable price.

RealCode4You