All the numbered exercises are from the course book (ESL).
Exercise set 1
Pen-and-paper
Coding
Exercise set 2
Pen-and-paper
Coding
- We are going to consider a data set consisting of 252 observations of an estimated percentage of body fat along with 13 continuous input variables (age, weight, height and 10 body circumference measurements). You can find the data in edu_bodyfat_both > edu_bodyfat > edu_bodyfat.csv in the unzipped file download from this link. The aim is then to predict the percentage body fat (variable "$\texttt{pcfat}$") based on the input variables using a linear model with subset selection. More specifically, apply best-subset, forward and backward selection and plot the RSS for each method against the number of included predictors and comment on the results. Is there any clear difference between the methods in terms of RSS? Which predictors appear to be the most important?.