# Checking
the assumptions in Cox regression:
# methods
and illustration for the melanoma data
# Cox's regression model
assumes that the hazard for an individual with vector of fixed covariates
# x = (x1,x2,
... ,xp)T takes the form: a(t|x) = a0(t)*exp{bTx}. The
model is based on two key assumptions:
# 1) Log-linear effect of
numeric covariates: log a(t|x) = log a0(t) + bTx
#???? This means that one unit's increase in a
numeric covariate xj should have the same effect whatever
#???? the value of xj and whatever
the values of the other covariates (if no interaction effects are included).
# 2) Proportional hazards: a(t|x1)/ a(t|x2) = exp{{bT(x1-x2)}.
#???? This means that the hazard ratio is
independent of time t. In particular if x1 and x2 are equal
#???? except for the j-th component, where x2j
= x1j +1, the hazard rate becomes exp(bj).
# We will
illustrate the model checking methods using the melanoma data.
# The
commands below assume that the melanoma data set has been stored
# in
the dataframe melanoma, that
ulcer is defined as a factor, and that
# the survival library has
been attached.
# We start
out the model:
cox.fit=coxph(Surv(lifetime,status==1)~ulcer+thickn,data=melanoma)
# We first check whether there is a
log-linear effect of the numeric covariate thickness
# A simple method is to make
a categorical variable for thickness, estimate a separate effect
# for each thickness group
and see if we get a fairly linear trend for the estimates:
quantile(melanoma$thickn[melanoma$status==1])
melanoma$thickg=cut(melanoma$thickn,breaks=c(0,2,4,6,20),labels=1:4)
cox.fitg=coxph(Surv(lifetime,status==1)~ulcer+thickg,data=melanoma)
summary(cox.fitg)
th=rep(0,4)
for
(i in 1:4) th[i]=mean(melanoma$thickn[melanoma$thickg==i])
plot(th,c(0,cox.fitg$coef[2:4]),pch=1)
# The plot indicates that the
effect of thickness is not log-linear.
# A more advanced method is
to fit a penalized smoothing spline for the effect of thickness:
cox.fitps=coxph(Surv(lifetime,status==1)~ulcer+pspline(thickn),data=melanoma)
print(cox.fitps)
termplot(cox.fitps)
# (When you use the termplot command, you need to click repeatedly on the
graphical window to see the plots.)
# Both the formal test (from
the print command) and the plot (from the termplot command) indicate that the effect of thickness is not log-linear.
# (As there are only 7
individuals with a survival time of more than 12 years, you should not put much
emphasis on the estimated curve for long survival times.)
# The model check above
indicates that it is better to use log2-thickness, so we define log2-thickness as
a new numeric covariate and continue with the model:
melanoma$log2thick=log(melanoma$thickn,2)
cox.fit2=coxph(Surv(lifetime,status==1)~ulcer+log2thick,data=melanoma)
# We then check for a
log-linear effect of the covariate log2-thickness:
melanoma$log2thick=log(melanoma$thickn,2)
cox.fitlps=coxph(Surv(lifetime,status==1)~ulcer+pspline(log2thick),data=melanoma)
print(cox.fitlps)
termplot(cox.fitlps)
# Now both the formal test
(from the print command) and the plot (from the termplot command) indicate that the effect of log2-thickness
is reasonably log-linear.
# We then check the assumption of proportional hazards
# A simple method is to fit a
stratified Cox model, where we stratify according to the levels of a
categorical covariate,
# and plot the
stratum-specific cumulative baseline hazard estimates on a log-scale.
# If we have proportional
hazards, the curves should be fairly parallel (i.e. have a constant vertical
distance).
# We first check for
ulceration:
cox.fitstru=coxph(Surv(lifetime,status==1)~strata(ulcer)+log2thick,data=melanoma)
plot(survfit(cox.fitstru,newdata=data.frame(log2thick=0)),fun="cumhaz",lty=c(1,2),log=T)
# We then check for thickness:
cox.fitstrth=coxph(Surv(lifetime,status==1)~ulcer+strata(thickg),data=melanoma)
plot(survfit(cox.fitstrth,newdata=data.frame(ulcer=0)),fun="cumhaz",lty=c(1,2,3,4),log=T)
# For both covariates the
curves are fairly parallel, but there is a tendency, especially for
tumor-thickness, that the curves are closer together for large values of t than
for smaller ones.
# We
finally make a formal test for proportionality of the covariates. This is done
by, for each covariate x,
# adding time-dependent
covariate x*log(t), and testing whether the time-dependent covariates are
significant using a score test
cox.zph(cox.fit2,transform='log')
# The test shows that the
effect of tumor-thickness is not proportional.
#We also make plots that give
nonparametric estimates of the (possible) time dependent effect of the
covariates:
par(mfrow=c(1,2))
plot(cox.zph(cox.fit2))
# The plot indicate that the
effect of tumor thickness is decreasing with time t.