top of page

What is Classification And Regression In Machine Learning? Get Help In Regression And Classification

realcode4you

Classification and Regression

Decision Trees can be used for both.


Classification

  • Spam / not Spam

  • Admit to ICU /not

  • Lend money / deny

  • Intrusion detections


Regression

  • Predict stock returns

  • Pricing a house or a car

  • Weather predictions (temp, rain fall etc)

  • Economic growth predictions

  • Predicting sports scores


Decision Trees

  • The general idea is that we will segment the space into a number of simple regions.

  • The segmentation can be illustrated as a tree

  • The end nodes can have a category (classification) or a continuous number (regression)

  • These methods, while quite simple are very powerful.

Visualizing Classification as a Tree

Metrics

  • Algorithms for constructing decision trees usually work top-down, by choosing a variable at each step that best splits the set of items.

  • Different algorithms use different metrics for measuring “best"

  • These metrics measure how similar a region or a node is. They are said to measure the impurity of a region.

  • Larger these impurity metrics the larger the “dissimilarity” of a nodes/regions data.

  • Examples: Gini impurity, Entropy, Variance


Algorithms for building Decision Trees

Popular ones include

  • CART (Classification And Regression Tree)

  • CHAID (CHi-squared Automatic Interaction Detector)

  • C4.5

We will focus on CART, that uses the Gini impurity as its impurity measure.




Gini impurity

  • Used by the CART

  • Is a measure of how often a randomly chosen element from the set would be incorrectly labeled if it was randomly labeled according to the distribution of labels in the subset.

Splitting using Gini impurity

  • When splitting, the Gini impurity of the two resulting nodes are combined using a weighted average.

  • With weights being the fraction of data on each node.

  • The CART algorithm simply chooses the right “split” by finding the split that maximizes the “decrease in Gini impurity” - also called the Gini Gain.


Decision trees are prone to ‘overfitting’

  • Decision Tree is a powerful algorithm that can adapt well and capture various patterns in the data

  • If allowed to grow fully, they become over-complex & tend to fit even the noise 10

  • Thus, a fully grown tree may not ‘generalize’ well on test or new unseen data


Pruning

  • Ideally we would like a tree that does not over-fit the given data

  • One popular and simple way to prune a decision tree is by limiting the depth of the tree to avoid over fitting.

  • For example the tree on the right below is generated with a max depth of 2 while the tree on the left has no depth restriction (and hence overfits the data)


Pre-Pruning

  • Stop growing the tree before it grows too big

  • This can be achieved by bounding hyperparameters 12 Hyperparameters in Decision Trees

- max_depth - The maximum depth of the tree. If set to ‘None’, then nodes are

expanded until all leaves are pure. Higher the value, more complex the tree

- min_samples_split - The minimum number of samples required to split a node.

Doesn’t split any node that is smaller than this number. Higher the values, less

complex the tree

- min_samples_leaf - The minimum number of samples required at a leaf node. All

leaf nodes have at least these many data points. Higher the value, less

complex the tree


Hyperparameter Tuning using Grid Search

  • Grid Search is a process of searching the best combination of hyperparameters from a predefined set of values

  • A parameter grid (Hyperparameters and corresponding values) is provided as an input to the Grid-search function

  • It tries all the combinations of the values passed and evaluates the model for each combination

  • It returns the combination of hyperparameter values that works best as per the metric provided for model evaluation

  • GridSearchCV( ) is an implementation of Grid Search with Cross Validation.


Cross Validation

- Cross Validation is a common Machine Learning technique that splits the data into n non-overlapping groups, and runs n experiments:

  • In each experiment, n-1 groups are used to train a model and the model is tested on the left out group.

  • The results are summarized over the n experiments.

- It gives a mechanism that allows us to test a model repeatedly on data that was not used to build the model.

- For Decision Trees, a very common approach is simply to choose the tree with minimum cross validation error.



Post-Pruning: Cost-complexity pruning

- Starting from the Full tree, create a sequence of trees that are sequentially smaller (pruned)

- At each step the algorithm

  • try removing each possible subtree

  • find the ‘relative error decrease per node’ for that subtree - Complexity parameter,

  • And remove the subtree with the minimum

- With the list of subtrees, one usually reverts back to using crossvalidation errors to find the best final pruned tree



Few Additional Thoughts

  • Other impurity measures

  • Regression Trees

  • Pros and Cons


Impurity Measures in Decision Trees



Regression Trees


Pros and Cons of Decision Trees

Pros -

  • Easy to understand and interpret

  • Useful in data exploration as it gives the splitting based on the significance of variables • Not influenced by the outlier/Null values and hence requires less data cleaning. Requires less time and effort during data pre-processing than other algorithms.

  • Can handle both continuous and categorical variables

  • Does not require any underlying assumptions in data. Works with both linearly and nonlinearly related variables.


Cons-

  • A small change in the data-set can result in large change in the structure of the decision tree causing instability in the model.

  • Large trees can be difficult to interpret.

  • Tends to overfit.

Comments


REALCODE4YOU

Realcode4you is the one of the best website where you can get all computer science and mathematics related help, we are offering python project help, java project help, Machine learning project help, and other programming language help i.e., C, C++, Data Structure, PHP, ReactJs, NodeJs, React Native and also providing all databases related help.

Hire Us to get Instant help from realcode4you expert with an affordable price.

USEFUL LINKS

Discount

ADDRESS

Noida, Sector 63, India 201301

Follows Us!

  • Facebook
  • Twitter
  • Instagram
  • LinkedIn

OUR CLIENTS BELONGS TO

  • india
  • australia
  • canada
  • hong-kong
  • ireland
  • jordan
  • malaysia
  • new-zealand
  • oman
  • qatar
  • saudi-arabia
  • singapore
  • south-africa
  • uae
  • uk
  • usa

© 2023 IT Services provided by Realcode4you.com

bottom of page