top of page

Introduction to Machine Learning and Data Analysis | Realcode4you

realcode4you

What is machine learning?

1- A set of tools and methods that attempt to extract insight from a record of the observable world and infer patterns.


2- Studying and understanding a phenomenon

  • Make observations and collect the relevant data

  • Model the underlying patterns

  • Use the model to inform our understanding of the phenomenon

  • Make predictions!!!


3- An important feature of any ML method is its ability to learn and improve with experience, i.e. both existing and new data.


ML attempts to answer:

How does learning performance vary with the number of training examples?

Which learning algorithms are most appropriate for various types of learning tasks?


ML draws on concepts and results from:

  • Statistics

  • Artificial intelligence

  • Philosophy

  • Information theory

  • Biology

  • Cognitive science

  • Control theory


Introduction to Machine Learning


Supervised Learning

Infers a function that maps set of inputs (features, predictors, covariates, independent variables) to an output (response, target, outcome, dependent variable) from input/output pairs.


The function is inferred from training examples, which are mapped to new examples.


Goals:

  • Accurately predict unseen cases, i.e. test cases (primary)

  • Understand the relationship between inputs and output (secondary)


Two sub-categories:

  • Regression – a continuous outcome

  • Classification – a categorical/qualitative outcome


Unsupervised Learning

  1. No distinctions between input(s) and output within a data set.

  2. Attempts to uncover the underlying structure or pattern within a data set.

  3. Can lead to testable hypotheses.

  4. Difficult to know how well you have done.


Two sub-categories:

  • Dimension reduction – visualisation of multi-dimensional data in lower dimensions, 2-D and 3-D.

  • Clustering – grouping of objects based on some similarity measures.



Introduction to Data Analysis

What is statistics?

  • Statistics allow us to learn from our data.

  • Data are numbers with context.

  • Data contains information about some group of individuals.

  • A characteristic of an individual is referred to as a variable.


Data Types

Two main types of data:

Qualitative (categorical) – variables that represent qualities and cannot be measured.

  • Nominal – characteristics have no order, e.g. eye colour, gender (male/female).

  • Ordinal – characteristics that are intrinsically ordered, e.g. educational attainment (primary, secondary, tertiary).


Quantitative (numerical)

  • Discrete – able to take only certain distinct values within an allowable range. The allowable range maybe finite or infinite. For example, outcome of a dice roll, and number of students.

  • Continuous – data measured on a scale, able to take on any values within an allowable range which maybe finite or infinite, e.g. body mass, height.


Exploratory Data Analysis (EDA)

  • EDA is the process of describing the data and summarising the main characteristics.

  • Provides some insight into the behaviour of the data.

  • A critical aspect of EDA is outlier detection.

  • Describe graphically and numerically.


Describing Qualitative and Discrete Data

  • Qualitative and discrete (finite) data are typically expressed as count data.

  • For a better perspective, counts are often expressed as a percentage of the total.


Describing Quantitative Data

Three aspects are addressed

  • Measure of Centre: describes how data cluster around a particular value.

  • Measure of Spread: describes the dispersion/variability of data

  • Measure of Shape: describes the distribution (or pattern) of data.


Measure of Centre (Central Tendency)


Measure of Spread (Dispersion)

Example :


Percentiles and 5-Number Summary

Example :


Measure of Shape

Two measures:

1.Skewness – a measure of symmetry, or more precisely, the lack of symmetry. A distribution is symmetric if it looks the same to the left and right of the centre point.

2.Kurtosis – is a measure tailed-ness relative to a normal distribution.


Skewed Distributions


Excess Kurtosis


Data Quality Issues

What are the issues to consider?

Data entry errors

  • Values outside of expected range(s)


Missing values

  • Noted as NA in R

  • >20% is not good.


Outliers

  • Certain descriptive statistics and modelling are sensitive to them

  • Can lead to bias estimates and potentially incorrect findings


Handling Missing Values

Approach 1: Drop any features with missing values

  • Typically not recommended

  • Depends on the % of missing values

  • The is.na(.)  command in R can be used to check for missing values


Approach 2: Analyse complete cases only

  • Use the na.omit(.) command in R

  • Important to note % of cases removed


Approach 3: Impute the missing values

  • Mean imputation, regression imputation, K-NN and etc.

  • Recommended only for continuous data


Detecting Outliers

  • Check the range, i.e. min to max

  • Visualise the data, e.g. histograms, boxplots, etc.

  • Use thresholds, e.g. (Q_1, Q_3) ±1.5 × IQR, z-scores outside of ±3, etc.

 

Handling Outliers

Approach 1: Remove them

  • Typically not recommended, in particular with smaller datasets

  • Somewhat acceptable for large datasets

Approach 2: Investigate the source and find out why  this has happened

Approach 3: Non-linear data transformation

  • Square-root and log-transformation for right skewed data.

Commenti


REALCODE4YOU

Realcode4you is the one of the best website where you can get all computer science and mathematics related help, we are offering python project help, java project help, Machine learning project help, and other programming language help i.e., C, C++, Data Structure, PHP, ReactJs, NodeJs, React Native and also providing all databases related help.

Hire Us to get Instant help from realcode4you expert with an affordable price.

USEFUL LINKS

Discount

ADDRESS

Noida, Sector 63, India 201301

Follows Us!

  • Facebook
  • Twitter
  • Instagram
  • LinkedIn

OUR CLIENTS BELONGS TO

  • india
  • australia
  • canada
  • hong-kong
  • ireland
  • jordan
  • malaysia
  • new-zealand
  • oman
  • qatar
  • saudi-arabia
  • singapore
  • south-africa
  • uae
  • uk
  • usa

© 2023 IT Services provided by Realcode4you.com

bottom of page