Requirement Details
The purpose of the project is to create a simple analytics program with a Tkinter GUI.
This project is a GUI implementation of the Case Study 15.5 from the Python textbook. For each section, you will need to have user interaction and the ability to restart at any time. The user must also be prevented from completing an action that is invalid. For example, the user cannot explore the data until the data is loaded.
Load the dataset (15.5.1)
Explore the data (15.5.2)
Split the data for training and testing (15.5.4)
Train the data model (15.5.5)
Test the data model (15.5.6)
Visualize the expected vs. predicted (15.5.7)
Create the regression model metrics (15.5.8)
Problem Statement
In this task I want to analyse Boston House Data and then creating basic GUI related python Tkinter tool. I would try to spot Real Estate trends for BostonSuburbs, or predict sale value of residential property in Boston suburbs based on critical factors. I have downloaded the dataset from Kaggle linked here. Recent time large number of houses society developed in every year and it is the good place for residence. Prediction of Real Estate Investment Data is deciding the appropriate price which is applicable for both buyer and investor. Now a day many Real Estate Investor invest the money every year to develop the society from last decades. In every year many societies developed. In this some of the gain profit and some of not success. Main objective of each Real Estate Investor to gain profit to earning point and make it suitable for buyer for both viewer point and price. In future it gives more accurate result when data is increases.
Description of the Data
There are 13 columns and 511 records in this dataset, the details are listed in the table below.
Attributes Data Type Description of attribute
CRIM - Numeric per capita crime rate by town
ZN - Numeric proportion of residential land zoned for lots over 25,000 sq.ft.
INDUS - Numeric proportion of non-retail business acres per town.
CHAS - Categorical Charles River dummy variable (1 if tract bounds river; 0 otherwise)
NOX - Numeric nitric oxides concentration (parts per 10 million)
RM - Numeric average number of rooms per dwelling
AGE - Numeric proportion of owner-occupied units built prior to 1940
DIS - Numeric weighted distances to five employment centres
RAD - Categorical index of accessibility to radial highways
TAX - Numeric full-value property-tax rate per $10,000
PTRATIO - Numeric pupil-teacher ratio by town
LSTAT - Numeric % lower status of the population
MEDV - Numeric Median value of occupied homes in $1000’s
Data Pre-Processing
Any data or real-world data generally contains many issues like noises, missing values, and not given in proper format which cannot be directly used for machine learning algorithms. This is the process for cleaning the data and making it suitable for a ML model to increase the model efficiency and increase the accuracy of the model also.
Removing Null Values:
In our code we use below methods to remove null values from dataset column:
Feature Selection
This is the next steps after pre-process the dataset. In machine learning feature selection is the process of reducing the number of input variables when developing a predictive model.
#Deviding the target and features variables
X = df.drop('MEDV', axis = 1)
Y = df['MEDV']
X is a feature variable and Y is the target variable
Code Implementation
Here in above code block I import all important libraries which used to create tkinter GUI and used for data analytics and data visualization.
In the data, the column RM has some missing values so we need to remove these missing values and fill it using median. This is the data pre-processing step.
Here we need to select the features and target variable to predict the machine learning model.
Above the sklearn train_test_split method which used for split the dataset.
The next step is how to train the model
Here Linear Regression use to train the model:
import sys
from tkinter import *
import pandas as pd
from sklearn import linear_model
import tkinter as tk
import matplotlib.pyplot as plt
from matplotlib.backends.backend_tkagg import FigureCanvasTkAgg
from sklearn import preprocessing
# Import 'train_test_split'
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import train_test_split
def ml_tkinter_gui():
global root
# tkinter GUI
root= tk.Tk()
#Read Dataset
df = pd.read_csv("Boston Real Est.csv")
#Data Processing
#Remove Missing Values by Median
df['RM'].fillna(df['RM'].median(), inplace=True)
#Deviding the target and features variables
X = df.drop('MEDV', axis = 1)
Y = df['MEDV']
#Normalize the Features usign MinMaxScaler
min_max_scaler = preprocessing.MinMaxScaler()
X_scaled = min_max_scaler.fit_transform(X)
# Shuffle and split the data
X_train, X_test, y_train, y_test = train_test_split(X_scaled, Y, test_size=0.2, random_state=2)
# Train the model
regr = linear_model.LinearRegression()
regr.fit(X, Y)
#Creating canvas to show the result
canvas1 = tk.Canvas(root, width = 500, height = 450)
canvas1.pack()
#Accuracy Matrices
def score():
scores = cross_val_score(regr, X_train, y_train)
Prediction_result = ('Scores: ', scores)
label_Prediction = tk.Label(root, text= Prediction_result, bg='orange')
canvas1.create_window(20, 200, window=label_Prediction)
#Test the data model
def test_score():
y_pred = regr.predict(X_test)
Prediction = ('test Scores: ', y_pred)
label_Prediction = tk.Label(root, text= Prediction, bg='orange')
canvas1.create_window(475, 300, window=label_Prediction)
#Function to close the window
def close_window():
root.destroy()
#Add butoon
button = tk.Button(text = "Click and Quit", command = close_window, bg='red')
#Place the button on the x=700 and y=90 window position
button.place(x=900, y=90)
#Creating 'Calculate Score' button
button1 = tk.Button (root, text='Calculate Score',command=score, bg='orange') # button to call the 'score' command above
canvas1.create_window(20, 100, window=button1)
#Creating 'Calculate Score' button
button2 = tk.Button (root, text='Calculate test score',command=test_score, bg='orange') # button to call the 'test_score' command above
canvas1.create_window(300, 100, window=button2)
#Add butoon
button = tk.Button(text = "Referece & Restart", command = refresh, bg='green')
#Place the button on the x=700 and y=90 window position
button.place(x=200, y=400)
#plot 1st scatter
figure3 = plt.Figure(figsize=(5,4), dpi=100)
ax3 = figure3.add_subplot(111)
ax3.scatter(df['PTRATIO'].astype(float),df['LSTAT'].astype(float), color = 'r')
scatter3 = FigureCanvasTkAgg(figure3, root)
scatter3.get_tk_widget().pack(side=tk.RIGHT, fill=tk.BOTH)
ax3.legend(['PTRATIO'])
ax3.set_xlabel('LSTAT')
ax3.set_title('PTRATIO Vs. LSTAT')
#plot 2nd scatter
figure4 = plt.Figure(figsize=(5,4), dpi=100)
ax4 = figure4.add_subplot(111)
ax4.scatter(df['RM'].astype(float),df['LSTAT'].astype(float), color = 'g')
scatter4 = FigureCanvasTkAgg(figure4, root)
scatter4.get_tk_widget().pack(side=tk.RIGHT, fill=tk.BOTH)
ax4.legend(['RM'])
ax4.set_xlabel('LSTAT')
ax4.set_title('RM Vs. LSTAT')
root.mainloop()
if __name__ == '__main__':
def refresh():
root.destroy()
ml_tkinter_gui()
ml_tkinter_gui()
Creating Canvas to display the result:
Now I create the function ‘score ‘ which used to calculate the score:
Now creating button ('Calculate Score') and give the command to execute the above function:
Now next I create another function which used to print the prediction of test data:
Next I add another function which use to exit the window:
destroy() in built function which used to close or exit the window.
Complete code you can see in code file(.py), Output of this implementation is below:
To get any other Tkinter related help you can contact us at:
realcode4you@gmail.com
where is code file(.py) for the GUI portion of code?