R-help to exercise 3.3 in BSS
# Read the data into a dataframe, give names to the variables, and inspect the data:
firms<-read.table("http://www.math.uio.no/avdc/kurs/STK4900/data/exer3_3.dat")
names(firms)<-c("months","size","type")
firms
# Check that the data correspond to those given in the exercise.
# Attach the dataframe:
attach(firms)
# Compute summary measures for the variables:
summary(firms)
# Make sure that you understand what the summary measures tell you!
# Make plots (side by side) of months versus each of the other two variables:
# For the numeric covariate size we make a scatterplot, while we make a box plot for the categorical covariate type:
par(mfrow=c(1,2))
plot(size,months)
boxplot(months~type)
par(mfrow=c(1,1))
# What do the plots tell you?
# Do univariate regression analyses of months versus each of the other two variables:
fit1<-lm(months~size)
fit2<-lm(months~type)
summary(fit1)
summary(fit2)
# Which of the two variables, size and type, is most important for explaining the variation in the number of months elapsed?
# Does any of the variables (alone) have a significant effect?
# In the latter of the two regression models, we have only one categorical covariate (type).
# Could we have estimated/tested its effect using another method? Would that give different conclusions?
# Do a regression analysis including both size and risk type:
fit3<-lm(months~size+type)
summary(fit3)
# What does this model tell you? Does it look better than the best of the two models with only one covariate?
# Try yourself models with interaction and/or a second order term for size.