|
|
Overview
Note that unlike parametric models discussed in the section on event history models (EHA), Cox regression is semi-parametric and does not require the researcher to specify a baseline hazard rate or estimate absolute risk. For this reason, Cox regression may be preferred over parametric EHA models when there is no clear theoretical reason for positing a particular baseline hazard ratio. In most cases there is no such strong, clear reason and the more stringent data assumptions of parametric EHA models are not justified, making Cox models the better choice. Stata is the preferred software package for Cox regression and survival analysis. In addition to Stata, Limdep is another statistical package with extensive support for event history models, including Cox models. In Stata, declare the data with the stset command, then execute Cox regression with the stcox command. For ordinary Cox regression in SPSS, select Analyze, Survival, Cox Regression; enter the Time variable; enter the Covariates variable(s); enter the Status variable (the event variable) and Define Event to specify the value of the event occurring (ex., death = 1); in Options you may wish to check that you want 'Display baseline function' to get the time-only effect to compared to the covariate effect. Cox regression is a particular model within the broader category of event history analysis. For important related treatment, see the separate discussion of event history methods. See also the separate discussion of Kaplan-Meier survival analysis, a procedure for estimating survival and hazard functions but not covariate effects. See also the life tables procedure, used for descriptive, actuarial studies of duration where time is the only salient variable and censored and uncensored cases do not differ.
|
|
The "Categorical Variable Codings" table documents the actual codes applied and is helpful when there is a need to recall what the omitted reference category is.
The example above graphs the cumulative expected probabilities of a state taking the number of days reflected on the X axis prior to voting for ratification of the Constitution, for a hypothetical state which is at the mean of the predictor variables, which are VotePct (percent favoring ratification, reflecting tightness of the vote) and Size (small states and medium states vs. the reference category of large states). Note significance testing may show a covariate is not significant, however.
A key assumption of the Cox model is proportional hazards: the ratio will remain constant over time. The Cox model says nothing about the absolute shape of the curve formed by two hazard rates over time, only that their ratio will be constant.Note that proportional hazards means that hazards are proportional over time, not that they are the same over time. The slopes of the proportional hazard rates for two groups may be downward, for example, indicating decreased hazard over time. Note also that hazard rates are not hazard ratios, and their respective interpretations differ (this is a confusion in a portion of the extant literature using Cox regression).
As illustrated above, the baseline cumulative hazard for the intercept only model and the cumulative hazard at the mean of covariates in the full model is shown in the "Survival Table" of SPSS output. This is discussed further below, in the section on statistical output.
Example. Hazard ratios below 1.0 indicate that the more the covariate, the less the hazard. Hazard ratios above 1.0 indicate that the greater the covariate, the greater the hazard. Thus, in a model of electric generator life given type or ball bearings and given electric load, if "bearings" = 0 for old style and bearings=1 for new style, and the hazard ratio for bearings is .06, this means that going from old to new style bearings reduced the risk of the generator failing, controlling for load. The hazard ratio of .06 is the proportional change in hazard when the variable bearings increases by 1 unit (i.e., goes from 0 old style to 1 new style). If, however, the low and high confidence intervals on the bearings hazard ratio included 1.0, we could not be sure at the 95% confidence level that bearings really made a difference. For the same model as illustrated above for the covariate survival function, the covariate hazard function looks like this:
As a second example, for the event "governor reelected = 0, not reelected = 1," for the covariate "Republican state = 0, Democratic state = 1," a hazard ratio of 1.5 would mean that a governor in a Democratic state who had been in office to time t has an odds of 1.5:1 (or 3:2) of not being reelected at time t +1, compared to a governor in a Republican state. This is equivalent to saying that there is a 60% (3/5) chance that the duration in office until the event of non-reelection will occur sooner for a governor in a Democratic state compared to one in a Republican state.
In the figure above, the null (intercept-only) model had -2LL = 45.104. The full model had -2LL = 32.224, a model chi-square difference of 12.88, which is significant at the .012 level. That is, the covariates significantly contribute to explaining days duration of states until ratification of the Constitution, which is the simple example used here..
Rights was coded 1=heavily involved, 0 = not, so the negative sign of the Rights coefficient in Model 2 signifies reduced hazard (of the event ratification), which translates into more days until ratification. Likewise, the positive sign of the VotePct covariate in Model 1 signifies increased hazard of ratification, which equates to fewer days until ratification. That is, the closer the vote (lower VotePct) or the being in the higher Bill of Rights category (1=heavily) both tend to increase days until ratification when considered separately, but when considered together, Rights controls for VotePct.
In the example above, the categorical predictor is Size, referring to whether a state is small, medium, or large. With respect to the example of ratification of the Constitution, a major debate at the Constitutional Conventions concerned compromises between large and small states, so it may be of interest to compare states with regard to the hazard function (where the "hazard" is ratification of the Constitution, and time is days until ratification). In the illustration above, we see that the predicted hazard functions were indeed different for small, medium, and large states. (Note Size was not a significant predictor of duration once other variables were controlled, but then since the data are an enumeration of all 13 original states and not a random sample, significance does not have its normal meaning and relevance).
The saved DfBeta output above has four DfBetas to represent the four terms in Model 2 of the example used in this module, in order as entered in the model: VotePct, Size(1), Size(2), and Rights. In Model 2, Rights was the only significant predictor. In the "Variables in the Equation" table above, Rights had a parameter coefficient of -4.227. DFB4_1 estimates the change in this coefficient if that case is removed. Removing SC would have the most effect in a negative direction. Removing NC would have the most effect in a positive direction. A positive direction corresponds to increase risk (of ratification, the status event) and thus fewer days duration to ratification. NC had among the highest durations, so removing it would leave a dataset with fewer days duration on the average. However, the highest positive DfBeta does not necessarily correspond the the unit with the highest time score (that would be RI, which is not an outlier by the DfBeta criterion). Rather, DfBeta reflects effects on duration to event of a particular variable after other variables in the model are controlled. .
If a covariate fails this assumption, then for hazard ratios that increase over time for that covariate, relative risk is overestimated (that is, for diverging hazards, coefficient estimates are inflated). For ratios that decrease over time, relative risk is often underestimated (that is, for converging hazards, coefficient estimates are deflated and biased toward zero). ["Converging" means that the hazard rates for two groups formed by a covariate factor are tending toward the same rate over time]. Correspondingly, standard errors are incorrect and significance tests are decreased in power (Box-Steffensmeier & Zorn, 2001: 972). It is common for a covariate to fail the assumption of proportional hazards, and the implication for estimation should be reported. There are alternative ways to check:
With time series data, one should assume that it is quite possible that data will be temporally dependent (the value at time t+1 is partly a function of its value at time t). This is related to the autocorrelation problem in time series analysis. In Cox and other EHA models, however, the researcher need not de-trend the data but only use "robust variance estimation", which refers to algorithms by Lin & Wei (1989) and Huber (1967) to adjust standard errors for time dependency. Robust estimation is usually the default in Cox and EHA software. It results in the same parameter estimates as standard variance (algorithms assuming independence) but higher standard errors. That is, robust estimation increases the possibility that parameters will be found to be non-significant. Robust estimation is recommended for parameter estimation for time-dependent covariates unless the researcher can demonstrate lack of time dependency in the data (that is, robust estimation should be used most of the time).