Use the prostate cancer dataset for the following exercises
Read the dataset
Examine the structure of the dataset.
Remove the first variable(id) from the data set
Get the number of Benign (B) cases and Malignant (M) cases. Hint: ‘table’
Create a normalize function
Using the function created in Question 5, normalize the numeric features in the data set.
Confirm that the normalization worked
Create the training(1 through 65) and test datasets (66 through 100)
Use the knn() function to classify test data
Evaluate the model performance
Code Implementation
##install packages
install.packages("psych")
install.packages('class')
library(class)
library(psych)
#1. Read the dataset
data <- read.csv("C:/Users/navee/OneDrive/Desktop/Oct 2022/Deadline 14 Oct +1 (202) 902-3768 R Programming/Prostate_Cancer.csv",
stringsAsFactors=TRUE,sep = ",")
#2. Examine the structure of the dataset.
str(data)
#3. Remove the first variable(id) from the data set
data <- data[,-1]
head(data)
#4. Get the number of Benign (B) cases and Malignant (M) cases. Hint: 'table'
table(data["diagnosis_result"])
#5. Create a normalize function
normalize <- function(x) {
return ((x - min(x)) / (max(x) - min(x)))
}
#6. Using the function created in Question 5, normalize the numeric features in the data set.
data.n <- as.data.frame(lapply(data[,2:9], normalize))
#7. Confirm that the normalization worked
head(data.n)
#8. Create the training(1 through 65) and test datasets (66 through 100)
train.data <- data.n[1:65,]
test.data <- data.n[66:100,]
head(train.data)
head(test.data)
train.label <- data[1:65,1]
test.label <- data[66:100,1]
head(train.label)
head(test.label)
#9. Use the knn() function to classify test data
knn.res <- knn(train=train.data, test=test.data, cl=train.label, k=8)
#10. Evaluate the model performance
ACC.res <- 100 * sum(test.label == knn.res)/NROW(test.label)
ACC.res
Comments