Data Description:
Here we applying Linear regression to predict the result, below the steps to complete whole task:
#import libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import ShuffleSplit
from sklearn.linear_model import LinearRegression, Ridge
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score
Assigning Index name because data has no index name:
column_names = ["CRIM","ZN","INDUS","CHAS","NOX","RM","AGE","DIS","RAD","TAX","PTRATIO","B","LSTAT","MEDV"]
Reading Data:
datafile = "housing.data"
dataFrame = pd.read_csv(datafile,header=None, delim_whitespace = True, names = column_names)
Selecting Target and features variable:
prices = dataFrame['MEDV']
features = dataFrame.drop('MEDV', axis = 1)
Finding mean of this target variable:
mean = dataFrame['MEDV'].mean()
mean
Finding median of this target variable:
median = np.median(dataFrame['MEDV'])
median
Split data:
# TODO: Shuffle and split the data into training and testing subsets
X_train, X_test, y_train, y_test = train_test_split(features, prices, test_size=0.2, random_state=10)
Implement a linear regression model with ridge regression that predicts median house prices from the other variables.
# initialize
from sklearn.linear_model import Ridge
from sklearn import metrics
Fit into model:
## training the model
ridgeReg = Ridge(alpha=0.05, normalize=True)
ridgeReg.fit(X_train,y_train)
pred_X = ridgeReg.predict(X_test)
Predicting Result:
pred_X
Finding Score:
ridgeReg.score(X_test,y_test)