Exercise 1
Below you will find the R code for a function called train2(), similar to what we have considered in class. You should read (and test) the function, to help complete the following two tasks:
Describe, in words, how this algorithm differs from what we covered in class, then discuss how this might affect the performance of the algorithm. Demonstrate your argument by including a coded example, comparing this algorithm, and point to elements of the output that help illustrate your answer. You can use external resources to help you (just make sure to cite any useful materials).
Notes: If you compare this function to our previous function, I recommend setting the learning rate of both to 0.001 It is fine to copy the code from the seminar solutions (doing so will not count as plagiarism).
Critique the function more generally. What are some of the general constraints or limitations of the implementation? Are there any ways you could improve it further? You should answer this question in words, but you may additionally provide code snippets (or “pseudocode”) to illustrate any suggested improvements (you do not have to incorporate these directly into the provided function).
train2 <- function(X, y, l_rate, m = 0.9, epochs) {
coefs <- rep(0, ncol(X))
v <- rep(0, ncol(X))
for (b in 1:epochs) {
for (i in sample(1:nrow(X))) {
row_vec <- as.numeric(X[i,])
yhat_i <- predict_row(row_vec, coefficients = coefs)
for(k in 1:length(coefs)) {
v[k] <- m*v[k] + l_rate*(yhat_i - y[i])*row_vec[k]
coefs[k] <- coefs[k] - v[k]
}
}
yhat <- apply(X, 1, predict_row, coefficients = coefs)
MSE_epoch <- MSE(y, yhat)
NLL_epoch <- NLL(y, yhat)
message(
paste0(
"Iteration ", b ,"/",epochs," | NLL = ", round(NLL_epoch,5),"; MSE = ", round(MSE_epoch,5)
)
)
}
return(coefs)
}
Exercise 2
Consider the following research context:
An analyst has a dataset consisting of a continuous outcome variable (y), and a single continuous predictor variable (x)
They are interested in generating the most predictive model they can of the form y=f(x)+λ∗R(f), where f is a linear model
They have at their disposal the ability to adjust both λ and to include polynomials of X up to the 7th order
The analyst does not have access to any in-built cross-validation functions (for example, cv.glmnet())
Your task is to write a function that:
Takes in an outcome y and explanatory variable x (both are continuous, and supplied as separate vectors)
Identifies the optimal combination of both λ and f in a principled way
Prints, as separate lines:
The optimal model form
The test loss of the optimal model
Returns the final trained model
Your answer should be a single code chunk which contains the function. If you make any notable choices while building this function, you may describe these under the code chunk.
Notes: Your answer will be assessed by running the function on unseen data, which conforms to the description above Code comments should be used to clearly annotate your function, and to draw attention to any notable features of your implementation.
Exercise 3
We have provided you with a dataset called civil_wars.RData, which records for each country and every year, whether that country was engaged in a civil war. The data was taken from Kaufman et al’s (2019) replication materials, which focuses on the use of boosted decision trees:
Kaufman AR, Kraft P, Sen M. Improving Supreme Court Forecasting Using Boosted Decision Trees. Political Analysis. 2019;27(3):381-387. doi:10.1017/pan.2018.59
To read this dataset you need to call load(civil_wars.RData), which will load the data as a variable called civwars into your environment. When you view the data in R (i.e. View(civwars)), you will see the variable descriptions under the variable names. We have removed country names from the dataset intentionally, so that all variables are numerical, and imputed any missing values.
Your task is to build a model to predict civil war. You should use the data however you see fit, and you may use any class of model we have considered (including any already covered in this assignment). This includes gradient descent, LASSO and Ridge Regression, K-nearest neighbour (KNN).
Hire our expert to complete above R programming exercises or any other R programming related projects or assignments.
Send your assignment or project requirement details at:
Need help with your R programming assignments? Our expert coders offer top-quality academic assistance in R, covering data analysis, statistical modeling, and more. Get original, well-documented code with timely delivery to boost your grades. Available 24/7 to support your studies. Contact us now for reliable and affordable R programming assignment help.