Practical exercise 8
I practical exercise 7 we used Cox regression to study how smoking habits, systolic blood pressure and body mass index influence the total mortality (i.e. mortality due to any cause) for Norwegian males and females. In this exercise, we will check if the assumptions for Cox regression are (reasonably well) fulfilled for the fitted model.
In practical exercise 7 it is explained how you may read the data into R. As in that exercise, we will disregard the 71 individuals who smoke pipe or cigar.
a) Fit a Cox model with sex, smoking habits, systolic blood pressure and body mass index as covariates by the commands:
fit.a=coxph(Surv(agestart,agestop,dead)~factor(sex)+factor(smkgr)+sbp+bmi,data=norw.death)
summary(fit.a)
Give an interpretation of the fitted model and describe the assumptions for this Cox regression fit.
b) We may obtain a model using a penalized smoothing spline for systolic blood pressure by the commands:
fit.b=coxph(Surv(agestart,agestop,dead)~factor(sex)+factor(smkgr)+pspline(sbp)+bmi, data=norw.death)
print(fit.b)
termplot(fit.b,se=T)
[To obtain the plots of the fitted model, you should repeatedly click on the graphical window. Alternatively you may give the command par(mfrow=c(2,2)) before termplot to obtain all plots in one graphical window.]
Perform the commands above and use the results to check if there is a log-linear effect of systolic blood pressure.
c) Check the assumption of log-linearity for body mass index.
If you find that the effect of systolic blood pressure or body mass index is not log-linear, you may either find a "suitable transformation" or create a categorical covariate by grouping the numeric covariate into "sensible groups". For body mass index, a "sensible grouping" is given by
- Below 20.0: Underweight
- 20.0-24.9: Normal
- 25.0-29.9: Overweight
- Over 30.0: Obese
This corresponds to the usual classification, except that we have moved the border between underweight and normal from 18.5 to 20.0 and disregarded the morbidly obese group (bmi above 40) to get a fair number of individuals in the lowest and highest groups.
In the further analysis you should continue with the new covariates.
d) We will then perform a graphical check of proportional hazards for sex. If you work with the original measurements for systolic blood pressure and body mass index, this may be obtained by the commands:
fit.d=coxph(Surv(agestart,agestop,dead)~strata(sex)+factor(smkgr)+sbp+bmi, data=norw.death)
plot(survfit(fit.d,newdata=data.frame(smkgr=1,sbp=130,bmi=25)), fun="cumhaz",lty=1:2,log=T,mark.time=F,xlim=c(40,70))
If you work with grouped or transformed covariates, you should do the obvious modifications of the above commands.
Perform the commands (modified, if needed). Discuss what you may learn from the plot.
e) Make similar graphical checks for proportionality of smoking habits, systolic blood pressure and body mass index.
f) To perform a formal test for proportional hazards for a fitted model (stored in the object fit), we may give the command:
cox.zph(fit,transform='log')
Perform the test for the model you arrived at in question c, and describe what you may conclude from the test.
Also give the command:
plot(cox.zph(fit,transform='log'))
and discuss what the plots tell you. [Before you make the plots, you need to give the command par(mfrow=c(m,n)) for suitable values of m and n so that you get all the plots in one graphical window (the number of plots is equal to the number of estimated regression coefficients).]
g) Finally we perform a graphical check of the global model fit by grouping the individuals according to their prognostic index and comparing a Kaplan-Meier estimate with a model based estimate for each group. If the model to be checked is stored in the object fit, this may be obtained by the commands below. (Body mass index is not recorded for all individuals, so we need to work with the subset of the data where we have information on bmi.)
break.pi=quantile(fit$linear.predictor)
norw.death.reduced=norw.death[!is.na(norw.death$bmi),]
norw.death.reduced$pred.group=cut(fit$linear.predictor,break.pi, include.lowest=T,label=1:4)
plot(survfit(Surv(agestart,agestop,dead)~pred.group,data=norw.death.reduced),mark.time=F,lty=1:4,xlim=c(40,70))
surv.norw=survfit(fit)
for (i in 1:4)
{
mean.pi=mean(fit$linear.predictor[norw.death.reduced$pred.group==i])
lines(surv.norw$time,surv.norw$surv^exp(mean.pi),type="s",lty=i,col="red")
}
Perform the commands for the model you arrived at in question c, and describe what you may learn from the plot.