Machine Learning
Machine learning is a subset of artificial intelligence (AI)
Here, the goal, according to Arthur Samuel,1 is to give “computers the ability to learn without being explicitly programmed”
Tom Mitchell puts it more formally: “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E.”
Machine learning explores the use of algorithms that can learn from the data and use that knowledge to make predictions on data they have not seen before
Data-driven predictions or decisions through building a model from sample inputs.
Machine Learning Applications Examples
1- Self-driving Google car (now rebranded as WAYMO)
The car is taking a real view of the road to recognize objects and patterns such as sky, road signs, and moving vehicles in a different lane
The self-driving car needs not only to carry out such object recognition, but also to make decisions about navigation.
the car needs to know the rules of driving, have the ability to do object and pattern recognition, and apply these to making decisions in real time. In addition, it needs to keep improving. That is where machine learning comes into play.
2- Optical Character Recognition (OCR)
Humans are good with recognizing hand-written characters, but computers are not.
what we need is a basic set of rules that tells the computer what “A,”“a,”“5,” etc., look like, and then have it make a decision based on pattern recognition.
The way this happens is by showing several versions of a character to the computer so it learns that character, just like a child will do through repetitions, and then have it go through the recognition process
Machine Learning Applications Other Examples
Facebook uses machine learning to personalize each member’s news feed.
Most financial institutions use machine learning algorithms to detect fraud.
Intelligence agencies use machine learning to sift through mounds of information to look for credible threats of terrorism
Machine Learning features
In machine learning, a target is called a label.
A variable in statistics is called a feature in machine learning.
Machine learning algorithms are organized into a taxonomy, based on the desired out-come of the algorithm. Common algorithm types include:
- Supervised learning. When we know the labels on the training examples we
are using to learn.
- Unsupervised learning. When we do not know the labels (or even the
number of labels or classes) from the training examples we are using for
learning.
- Reinforcement learning. When we want to provide feedback to the system
based on how it performs with training examples. Robotics is a well-known
example.
Association Rules
Which of my products tend to be purchased together?
What do other people like this person tend to like/buy/watch?
- Discover "interesting" relationships among variables in a large database
- Rules of the form “If X is observed, then Y is also observed"
- The definition of "interesting“ varies with the algorithm used for discovery
Not a predictive method; finds similarities, relationships
Apriori Algorithm - What is it?
Support
Earliest of the association rule algorithms
Frequent itemset: a set of items L that appears together "often enough“:
Formally: meets a minimum support criterion
Support: the % of transactions that contain L
Apriori Property: Any subset of a frequent itemset is also frequent
It has at least the support of its superset
Confidence
Iteratively grow the frequent itemsets from size 1 to size K (or until we run out of support).
Apriori property tells us how to prune the search space
Frequent itemsets are used to find rules X->Y with a minimum confidence:
Confidence: The % of transactions that contain X, which also contain Y
Output: The set of all rules X -> Y with minimum support and confidence
Lift
Example on Association Rules, example
Support
Transaction1: {Apple, Juice, Rice, Chicken}
Transaction2: {Apple, Juice, Rice}
Transaction3: {Apple, Juice}
Transaction4: {Apple, Grapes}
Transaction5: {Milk, Juice, Rice, Chicken}
Transaction6: {Milk, Juice, Rice}
Transaction7: {Milk, Juice}
Transaction8: {Milk, Grapes}
Support(Apple) = 4/8
Confidence
How likely item Juice is purchased when item Apple is purchased, expressed as {Apple -> Juice}. This is measured by the proportion of transactions with item Apple, in which Juice also appears. In Table 1, the confidence of {apple -> Juice} is 3 out of 4, or 75%.
Confidence {Apple -> Juice} = Support {Apple, Juice}/ Support {Apple}
(3/8)/ (4/8)
(3/4)
Lift
Measure 3: Lift. This example shows how likely Juice is purchased when apple is purchased, while controlling for how popular Juice is.
In Table 1, the lift of {apple -> Juice} is 1, which implies no strong association between items. A lift value greater than 1 means that item Juice is likely to be bought if apple is bought, while a value less than 1 means that Juice is unlikely to be bought if apple is bought.
Lift {Apple -> Juice} = Support {Apple, Juice}/ (Support {Apple} * (Support {Juice})
(3/8)/ ((4/8)*(6/8))
(3/4*(6/8)) = 1
Computing Confidence and Lift
Suppose we have 1000 credit records:
713 home_owners, 527 have good credit.
home_owner -> credit_good has confidence 527/713 = 74%
700 with good credit, 527 of them are home_owners
credit_good -> home_owner has confidence 527/700 = 75%
The lift of these two rules is
0.527 / (0.700*0.713) = 1.055
Finally: Find Confidence Rules
If we want confidence > 80%:
IF job_skilled THEN credit_good
Association Rules
First of all, you will need to download the weather data from Black Bord
Import the data from your computer to R
Set your working directory to the right place where your data is located
#R code
setwd(“~\\data\\")
weather <- read.csv("weather.csv")
weather
Building the model R: apriori
# install the “arules” Library
library(arules)
# find association rules with default settings
rules.all <- apriori(weather)
inspect(rules.all)
inspect(rules.all)
# outputs
lhs rhs support confidence lift
[1] {Outlook=Overcast} => {Play.Soccer=Yes} 0.2857143 1.0000000 1.555556
If Outlook is Overcast then Play Soccer is yes with 100% confidence
# rules with rhs containing "Play or not" only
# rules with rhs containing "Play or not" only
rulesPlay = apriori(weather, parameter = list(minlen=2, supp=0.005, conf=0.8), appearance = list(
rhs=c(" Play.Soccer =No", " Play.Soccer =Yes"), default="lhs"))
inspect(rulesPlay)
# Improve rules quality and appearance (1)
quality(rulesPlay) <- round(quality(rulesPlay), digits=3)
rulesPlay.sorted <- sort(rulesPlay, by="lift")
inspect(rulesPlay.sorted)
Visualizing Association Rules
# Visualizing Association Rules
library(arulesViz)
plot(rulesPlay.pruned )
plot(rulesPlay.pruned , method="graph")
Complete Code
setwd("C:\\Users\\z10095\\Desktop\\data\\")
weather1 <- read.csv("weather.csv")
weather1
# install the "arules" Library
# install.packages("arules")
library(arules)
# find association rules with default settings
rules.all <- apriori(weather1)
inspect(rules.all)
# rules with rhs containing "Play or not" only
rulesPlay = apriori(weather1, parameter = list(minlen=2, supp=0.005, conf=0.8), appearance = list(
rhs=c("Play.golf=No", "Play.golf=Yes"), default="lhs"))
inspect(rulesPlay)
### Quality
## for better comparison we sort the rules by confidence and add Bayado's improvement
quality(rulesPlay) <- round(quality(rulesPlay), digits=3)
rulesPlaySorted <- sort(rulesPlay, by = "lift")
inspect(rulesPlaySorted)
is.redundant(rulesPlaySorted)
## redundant rules
inspect(rulesPlaySorted[is.redundant(rulesPlaySorted)])
## non-redundant rules
inspect(rulesPlaySorted[!is.redundant(rulesPlaySorted)])
rulesPlay.pruned = rulesPlaySorted[!is.redundant(rulesPlaySorted)]
inspect(rulesPlay.pruned)
# Visualizing Association Rules
#install.packages("arulesViz")
library(arulesViz)
plot(rulesPlay.pruned )
plot(rulesPlay.pruned , method="graph")
Comments