|
|
Overview
The Generalized Estimating Equations (GEE) module in SPSS extends GZLM even further by providing support for non-independent data, such as repeated measures and clustered data. GEE thus supports repeated measures logistic regression and many other models for time series or correlated data. In SPSS, select Analyze, Generalized Linear Models, Generalized Estimating Equations.
|
|
In SPSS, select Analyze, General Linear Models, General Linear Models; under the "Type of Model" tab, select "Gamma with log link"; under the "Response" tab, select the dependent variable (ex., the survival/duration variable); fill out other tabs as usual.
Loglinear models. Note that loglinear data is count data (cell counts within tables). Therefore Poisson regression models (Poisson distribution, log link function) are loglinear models when all the predictors are factors (have finite levels, like row and column variables in a table) and one models factors and/or interactions involving factors. Parameter estimates will be the same as in Analyze, Loglinear, General in SPSS menus.
In SPSS, select "Poisson loglinear" in the "Counts" section of the "Type of Model" tab of the GZLM dialog, thus selecting a model for a dependent variable with a Poisson distribution, using a natural log link function. Set the dependent variable (number of pieces of luggage lost, in our example) under the "Response" tab. Under the "Predictors" tab, enter factors and/or covariates as predictors, then click the "Variable" radio button under "Offset" and enter the offset variable name (the variable containing the natural log of number of air miles traveled, in our example). Complete "Model," "Statistics," and other tabs as usual, but note the Offset Variable cannot be used as a model effect as it is cannot be a predictor.
Type I analysis is recommended when the researcher is able to stipulate beforehand the order of model predictors, in a hierarchical design where main effects are specified before first-order interaction effects, and first-order interaction effects are specified before second-order interaction effects, etc. Type I is also used for purely nested models where a first effect is nested within a second effect, the second within a third, etc.; and is used in polynomial regression models where simple terms are specified before higher-order terms (ex., squared terms). Type I tests may be thought of as a test of sequential effects: if an effect is non-significant by Type I, this means it cannot be assumed to be different from 0 when that effect is entered into a model containing only effects that precede it in order of entry.
More generally, the likelihood ratio test may be used to compare the researcher's full model not only against the intercept-only model but against any nested model. The likelihood ratio statistic equals -2*(log likelihood of the intercept-only or other reduced model minus the log-likelihood of the researcher's full model), where in SPSS the log likelihood is printed in the "Goodness of Fit" table. One could, for instance, run an intercept-only model, get its log-likelihood and note its degrees of freedom, then run one's full model; compute the difference in log-likelhood values times -2; look up this likelihood ratio value in a chi-square table using as degrees of freedom the difference in df between the two models; and get the same value as in the likelihood ratio test in the SPSS "Omnibus Test" table. However, the comparison nested model need not be the intercept-only model but can be any two models where one is nested within the other.
In the tables below, while both the ordinal logistic regression model and the OLS linear regression model are highly significant by the likelihood-ratio test, by the AIC criterion, the ordinal logistic model is somewhat better since the smaller value is the better fit.
In the "Tests of Model Effects" tables below, one comes to generally the same substantive conclusions for both the ordinal logistic and linear regression models, with the exception that race is a significant predictor of political attitudes in the regression model but not in the ordinal logistic model, once other variables in the model are taken into account.
QIC and QICC. For GEE, the "Model summary statistics" option displays a "Goodness of Fit" table with QIC and QICC coefficients instead of GZLM's "Omnibus Test" table (the "Tests of Model Effects" table appears for both GZLM and GEE). QIC is the "quasi-likelihood under independence criterion" and QICC is a corrected version which penalizes for model complexity (that is, it rewards parsimony). Both are adaptations of AIC for repeated measures, where quasi log likelihood is used instead of log likelihood. As with AIC, smaller is better model fit for both QIC and QICC. QIC is used for choosing the best working correlation structure assumption in the "Repeated" tab (simply run the model for different working correlation structure assumptions, choose the assumption with the lowest QIC value). QICC is used for choosing the best subset of predictors (simply run a model and a nested model dropping one of the predictors, then compare QICC coefficients, with lower being better fit).
Warnings: Note that if Type I sums of squares are requested in the "Analysis Type" options of the Statistics tab, effect tests will be sensitive to the entry order of the predictors (unlike Type III, which is not sensitive to entry order). Regardless of Type I or III, the Wald test gives overly large standard errors for large coefficients, leading to Type 2 errors (accepting a false null hypothesis, concluding there is no relationship when there is one). For large coefficients, a likelihood ratio test is preferred (this is done as described above using a full model and a reduced model with one variable dropped). Finally, model coefficients are partial coefficients: they will change depending on which variables are predictors. Many recommend that non-significant variables should be dropped from the model one at a time until all remaining predictors are significant.
As the tables below illustrate, the significance levels for the covariates (here income and education) and the binary variables (sex, belief in the afterlife) are the same as reported above in the "Tests of Model Effects" table. However, the "Parameter Estimates" table also enables the researcher to assess the significance of particular values of the categorical predictors like race, which has three levels, in relation to the reference categories.
The output below is for the OLS linear regression model, but was not available for the ordinal logistic regression model used for comparison in above. The OLS regression model treats the dependent variable, political views, as if its category codings 1 to 5 are interval numbers (not mulitnomial categories as in ordinal logistic regression). Thus the "Estimates" table gives the estimated means on political views for each race, since estimated marginal means were requested for race. The "Individual Test Results" table shows contrasts between each of the first two racial groups with the third reference category (by default the highest category, here black = 3). Neither contrast is significant for these data. The "Overall Test Results" table shows that race is nonetheless a significant predictor of political views in the linear regression model (the .007 significance level is the same as reported in the "Tests of Model Effects" table).