House-price-Prediction
Import Libraries
%matplotlib inline import matplotlib.pyplot as plt from mpl_toolkits.mplot3d import Axes3D import pandas as pd import numpy as np from sklearn.model_selection import train_test_split from sklearn.manifold import MDS
Random Sampling
Random sampling approach (i.e train_test_split), using a test size of 30% of data and a random_state of 42.
# X--> feature set ,, y --> target variable x = df1.drop(['id', 'price'],axis=1) y = df1['price'] x_train , x_test , y_train , y_test = train_test_split(x , y , test_size = 0.30,random_state =42) print('shapes of training and test set ') x_train.shape,x_test.shape
Straitified Sampling
target = 'price' X = df1.drop(target, axis = 'columns', inplace = False) Y = df1[target]
#method: 2 df2 = df1[df1[target].isin(df1[target].value_counts()[df1[target].value_counts()>2].index)] y2 = df2[target] X2 = df2.fillna(0)
X2_train, X2_test, y2_train, y2_test = train_test_split(X2, y2, test_size=0.33, random_state=42, stratify=X2[target])
X2_train.shape,X2_test.shape
K- mean(elbow)
The Elbow method is a very popular technique and the idea is to run k-means clustering for a range of clusters k (let’s say from 1 to 10) and for each value, we are calculating the sum of squared distances from each point to its assigned center(distortions).
from matplotlib import style from sklearn.cluster import KMeans
df1 = df1.drop('date', axis = 'columns', inplace = False)
distortions = [] K = range(1,11) for k in K: kmeanModel = KMeans(n_clusters=k) kmeanModel.fit(df1) distortions.append(kmeanModel.inertia_)
plt.figure(figsize=(8,2)) plt.plot(K, distortions, 'bx-') plt.xlabel('k') plt.ylabel('Distortion') plt.title('The Elbow Method showing the optimal k') plt.show()
Dimension reduction on both org and 2 types of reduced data using PCA
#import libraries
from sklearn.decomposition import PCA model = PCA()
#fit into model
model.fit(df1)
#transform model
transformed = model.transform(df1) print('Principle components: ',model.components_)
# PCA variance from sklearn.preprocessing import StandardScaler scaler = StandardScaler() df1 = scaler.fit_transform(df1) pca = PCA() pca.fit_transform(df1) pca_variance = pca.explained_variance_ plt.bar(range(pca.n_components_), pca_variance) plt.xlabel('PCA feature') plt.ylabel('variance') plt.show()
Intrinsic dimension
PCA identifies intrinsic dimension when samples have any number of features
intrinsic dimension = number of PCA feature with significant variance
In order to choose intrinsic dimension try all of them and find best accuracy
#color_list=['black','gray'] pca = PCA(n_components = 3) pca.fit(df1) transformed = pca.transform(df1) transformed.shape
I hope this may help you to understand basic flow of data science concept, if you are face any other issue or need any assignment related help then you can directly send your quote so we can help you as soon as we can.
You can send quote at given main directly:
"realcode4you@gmail.com"
or
Submit your requirement details at here:
Comments