Directions
The main purpose of this assignment is for you to gain experience creating and visualizing a Decision Tree along with sweeping a problem's parameter space - in this case by performing a grid search. Doing so allows you to identify the optimal hyperparameter values to be used for training your model.
Preliminaries
Let's import some common packages:
import numpy as np
import pandas as pd
from sklearn import datasets
Load and Split Iris Data Set
Complete the following:
Load the Iris data set by calling the load_iris() function of the datasets library from sklearn - name the dictionary that is returned iris.
Call train_test_split() with a test_size of 40% and a random_state of 0. Save the output into X_train, X_test, y_train, and y_test, respectively. (Be sure to import the train_test_split() function first.)
### ENTER CODE HERE ###
---
Create a Single Decision Tree
Complete the following:
(Cell 1:)
Import the DecisionTreeClassifier class from the sklearn.tree library
Create a DecisionTreeClassifier object called tree_clf with a random_state of 42
Fit the DecisionTreeClassifier object on the training data.
(Cell 2:)
Make a prediction on the test data, and name the predicted values output by the model preds.
Compute the performance of the model by measuring the accuracy score on the test set. You must import the accuracy_score() function from the sklearn.metrics library. Name the accuracy score value you compute acc_score.
Print the accuracy score to the screen.
### ENTER CODE HERE ###
--
### ENTER CODE HERE ##
--
### ENTER CODE HERE ###
--
### ENTER CODE HERE ###
--
Visualize Optimal Decision Tree as Text
Instantiate a new DecisionTreeClassifier object, and use the best_params_ attribute of the grid_search_cv object to specify the best max_depth, max_leaf_nodes and min_samples_split values calculated from the grid search along with a random_state of 42. Retrain the "optimal" (for the few parameters that we swept) decision tree.
Next, use the tree.export_text() method to visualize the "optimal" decision tree. This function takes a trained classifier as its first parameter, and a set of feature names as its second parameter (the feature names are included in the iris dictionary returned from the load_iris() function). The result is a text based visualization of the decision tree. Note that this method returns a string, so you'll want to print() the result to get it to look right.
### ENTER CODE HERE ###
--
Visualize Optimal Decision Tree as Image
Use the tree.plot_tree() method to visualize the "optimal" decision tree, which takes a trained classifier as its only parameter and returns a graphical visualization of the decision tree. Use filled=True as an argument to the method to add color to the image.
### ENTER CODE HERE ###
--
Comments