|
|
Overview
Data requirements. In all GLM models, the dependent(s) is/are numeric. The independents may be categorical factors (including both numeric and string types) or quantitative covariates. Data are assumed to come from a random sample for purposes of significance testing. The variance(s) of the dependent variable(s) is/are assumed to be the same for each cell formed by categories of the factor(s) (this is the homogeneity of variances assumption). Regression in GLM is simply a matter of entering the independent variables as covariates and, if there are sets of dummy variables (ex., Region, which would be translated into dummy variables in OLS regression, for ex., South = 1 or 0), the set variable (ex., Region) is entered as a fixed factor with no need for the researcher to create dummy variables manually. The b coefficients will be identical whether the regression model is run under ordinary regression (in SPSS, under Analyze, Regression, Linear) or under GLM (in SPSS, under Analyze, General Linear Model, Univariate), [However, in GLM the researcher must ask for "Parameter estimates" under the Options button in the GLM dialog, whereas their printing in the Regression procedure is automatic.] The R-square from the Regression procedure will equal the partial Eta squared from the GLM regression model. Anova family. Although regression models may be run easily in GLM, as a practical matter univariate GLM is used primarily to run analysis of variance (ANOVA) and analysis of covariance (ANCOVA) models. Multivariate GLM is used primarily to run multiple analysis of variance (MANOVA) and multiple analysis of covariance (MANCOVA) models.
The key statistic in ANOVA is the F-test of difference of group means, testing if the means of the groups formed by values of the independent variable (or combinations of values for multiple independent variables) are different enough not to have occurred by chance. If the group means do not differ significantly then it is inferred that the independent variable(s) did not have an effect on the dependent variable. If the F test shows that overall the independent variable(s) is (are) related to the dependent variable, then multiple comparison tests of significance are used to explore just which values of the independent(s) have the most to do with the relationship. If the data involve repeated measures of the same variable, as in before-after or matched pairs tests, the F-test is computed differently from the usual between-groups design, but the inference logic is the same. There are also a large variety of other ANOVA designs for special purposes, all with the same general logic. Note that analysis of variance tests the null hypotheses that group means do not differ. It is not a test of differences in variances, but rather assumes relative homogeneity of variances. Thus some key ANOVA assumptions are that the groups formed by the independent variable(s) are relatively equal in size and have similar variances on the dependent variable ("homogeneity of variances"). Like regression, ANOVA is a parametric procedure which assumes multivariate normality (the dependent has a normal distribution for each value category of the independent(s)). Analysis of covariance (ANCOVA) is used to test the main and interaction effects of categorical variables on a continuous dependent variable, controlling for the effects of selected other continuous variables which covary with the dependent.The control variable is called the "covariate." There may be more than one covariate. One may also perform planned comparison or post hoc comparisons to see which values of a factor contribute most to the explanation of the dependent. ANCOVA uses built-in regression using the covariates to predict the dependent, then does an ANOVA on the residuals (the predicted minus the actual dependent variables) to see if the factors are still significantly related to the dependent variable after the variation due to the covariates has been removed. In SPSS, select Analyze, General Linear Model, Univariate; enter the dependent variable, the factor(s), and the covariate(s); click the Model button and accept the default, which is Full Factorial (if you select Custom, your model should not include interactions of factors with covariates: that is used beforehand in testing the equality of regressions assumption discussed below in the "Assumptions" section, but not in the ANCOVA model itself). The Full Factorial model contains the intercept, all factor and covariate main effects, and all factor-by-factor interactions. For instance, for three variables A, B, and C, it includes the effects A, B, C, A*B, A*C, B*C, and A*B*C. It does not contain factor-by-covariate interactions. Covariates will be listed in the /DESIGN statement after the WITH keyword. The maximum number of covariates SPSS will process is 10.
Linear mixed models (LMM) and its subset cousin, analysis of variance components (VC), perform many of the same functions as analysis of variance under GLM. A comparison of GLM with both LMM and VC, illustrated with data, is found in the section on linear mixed models. While both GLM and LMM accept the use of random effects in models, LMM is preferred for reasons given in the comparison.
|
|
The formulas for the t-test (a special case of one-way ANOVA), and for the F-test used in ANOVA, thus reflect three things: the difference in means, group sample sizes, and the group variances. That is, the ANOVA F-test is a function of the variance of the set of group means, the overall mean of all observations, and the variances of the observations in each group weighted for group sample size. Thus, the larger the difference in means, the larger the sample sizes, and/or the lower the variances, the more likely a finding of significance.
In SPSS, select Analyze, Compare Means, One-Way ANOVA; enter the dependent variable in the Dependent list; enter the independent variable as the Factor.
The "Mean" column shows how the mean years of education varies by region. However, the 95% confidence limits for West and for Southeast overlap those for Northeast. We can only be 95% confident that the mean of Southeast differs from the mean of West; we cannot be 95% confident that the mean years of education for West or Southest differ from Northeast.
In SPSS, select Analyze, General Linear Model, Univariate; enter the dependent variable, factor(s), and covariate variables; click Model and select the model type you want (ex., Full Factorial) or click Custom and specify a model (such as Model: gender race gender*race). You may click the Options button and specify means to estimate (gives marginal means for each level of the selected factor when the covariate is at its mean value), and under the Options button you may check that you want Parameter Estimates (which are the GLM-estimated regression coefficients).
The "Parameter Estimates" table is partially shown in the figure above for a model in which years of education is predicted from the factors sex, race, and region, using number of siblings as a covariate. The b coefficients reflect the effect of the factor or covariate on the dependent, years of education. For the covariate sibs, with b = -.240, this indicates that having one additional sibling subtracts .24 years of education from the subject, controlling for other variables in the model. For the dichotomy sex, given that these data are coded 1=male, 2=female, and given a b coefficient of 2.119 for sex=1, this indicates that for two individuals equal on other variables in the model, the male will have 2.119 more years of education than the female. For the factor region, using simple dummy coding, given b = 1.219 for region 1 (Northeast) and given the left-out category is 3=West, we can say that a subject in the Northeast will have 1.219 more years of education than a subject in the West.
Also in the Parameter Estimates table, the "Sig." column shows the significance level of the effect (as usual, <=.05 is considered acceptable to reject the null hypothesis that the effect is not different from 0). The "Partial Eta Squared" column provides an effect size measure for purposes of comparison among effects. The "Observed Power" column show the power level for the effect (as usual, >=.80 is considered acceptable to have confidence that one has not made a Type II error when accepting a null significance finding for the effect).
This is the usual ANOVA design. There is one set of subjects: the "groups" refer to the subset of subjects associated with each category of the independent variable (in one-way ANOVA) or with each cell formed by multiple categorical independents (in multivariate ANOVA). After measurements are taken for each group, analysis of variance is computed to see if the variance on the dependent variable between groups is different from the variance within groups. Just by chance, one would expect the variance between groups to be as large as the variance within groups. If the variance between groups is enough larger than the variance within groups, as measured by the F ratio (discussed below), then it is concluded that the grouping factor (the independent variable(s) does/do have a significant effect.
In this example, using the figure above, the rows would be the four classes. The columns would be the four textbooks, administered in each of four periods. In period 1, teacher A would use the first textbook for class 1; in period 2, teacher A would use textbook 2 for class 4; in period 3, teacher A would use testbook 3 for class 3; and in period 4, techer A would use textbook 4 for class 2. The other three teachers would rotate similarly, according the the design schedule above. In the schedule, each class starts with a different teacher and text, ruling out the chance that results would be attributable to different classes starting with the same treatment. Since no two classes ever have the same textbook in the same period, results cannot be attributed to a period effect either.
The figure above represents a 2x3x2 factorial design where there are treatment and control groups, each with two groups by sex (male, female) who are administered three levels of treatment (noise = low, medium, high) and some interval measurement is taken for each group on some variable (ex., test scores). The figure only shows the design factors. There may be one or more covariates as well, such as age. A full factorial design will model the main effects of the factors noise and sex; the main effect of the covariate age; and the interaction of noise*sex. It will not model noise*age or sex*age.
Thus, in the example above, in RCB Design, there are three blocks, one for each age group, where age group is the blocking factor. Within each block there are all six possible brand-dosage treatments (ex., Brand A, Dosage 2), assigned in random order to subjects within each of the three blocks.
In a typical split-plot repeated measures design, Subjects will be measured on some Score over a number of Trials. Subjects will also be split by some Group variable. In SPSS, Analyze, General Linear Model, Univariate; enter Score as the dependent; enter Trial and Group as fixed factors; enter Subject as a random factor; Press the Model button and choose Custom, asking for the Main effects for Group and Trial, and the interaction effect of Trial*Group; then click the Paste button and modify the /DESIGN statement to also include Subject(Group) to get the Subject-within-Group effect; then select Run All in the syntax window to execute.
In SPSS, Analyze, General Linear Model, Univariate; specify the main factor as fixed or random, then specify the nested factor as random; click the Model button and enter the main effects of the main (not nested) factor(s); click the Paste button and modify the /DESIGN statement to a format such as /DESIGN = mainfactor nestedfactor(mainfactor), signifying the model is the main effect of the fixed factor plus the effects of the random nested factor at each value of the main fixed factor. In the syntax window, Run All. In the resulting ANOVA table, a significant nestedfactor(mainfactor) effect means that the dependent variable varies by the nested factor even within the same level of (controlling for) the main factor.
Put another way, if a random factor is treated as a fixed factor, the researcher opens his or her research up to the charge that the findings pertain only to the particular arbitrary cases studied and findings and inferences might well be quite different if alternative cases had been selected. The purpose of using a random effect model is to avoid these potential criticisms by taking into account the variability of the replications or random effects when computing the error term which forms the denominator of the F test for random effect models.
In the figure above, a between-subjects data design is contrasted with a within-subjects (repeated measures) data design on the same topic: what is the effect of different sign colors on stopping distance in feet? In a between-subjects design, each subject experiences a different treatment (color). In a within-subjects design, each subject experiences all three treatments (colors), and fewer subjects are needed.

Below is a second example which may be implemented on one's spreadsheet so that students can play with the numbers to see different main and interaction effects. This example shows the interaction of learning type (control vs. classroom vs. online) and hours of instruction (low, medium, high). The upper set of lines in the graph is the means, the lower is the standard deviations. Normally the researcher is primarily interested in the set of means. For the means set, that the black control group means line is below and does not cross the others shows that online and classroom education is associated with higher scores for all hours of instruction categories. That the aqua classroom means line crosses the green online means line, shows there is an interaction of learning type with hours category. For low hours, online subjects score higher, but for medium and high hours, classroom subjects score higher.

| value of d | % of comparison group below | % of non-overlap |
|---|---|---|
| 0 | 50 | 0 |
| .2 | 58 | 14.7 |
| .4 | 66 | 27.4 |
| .6 | 73 | 38.2 |
| .8 | 79 | 47.4 |
| 1.0 | 84 | 55.4 |
| 1.5 | 93.3 | 70.7 |
| 2.0 | 97.7 | 81.1 |
If the computed F value is around 1.0, differences in group means are only random variations. If the computed F score is significantly greater than 1, then there is more variation between groups than within groups, from which we infer that the grouping variable does make a difference. Note that the significant difference may be very small for large samples. The researcher should report not only significance, but also strength of association, discussed below.
Example 1, in the figure above, shows a design in which years of education is predicted by the fixed factors sex, race, and region, using number of siblings as a covariate. The F test of significance of effects and the eta-squared measure of effect size both appear in SPSS GLM Univariate in the "Tests of Between-Subjects Effects" table.The first ("Corrected Model") row shows that the overall model is significant at the .000 level and the effect size is partial eta2 = R2 = .103, meaning that the model explains 10.3% of the variance in years of education. The Adjusted R2 is a conservative, downward adjustment which penalizes for the number of predictors in the model and is the effect size that should be used when comparing models, though some researchers also report adjusted R2 even when not making comparisons.The "Tests of Between-Subjects Effects" table also shows that the factors race and region as well as the covariate "sibs" are significant, but gender is not; and it shows that only the race*sex interaction is significant.
Example 2. For instance, an F-ratio of 1.21 with 1 and 8 degrees of freedom corresponds to a significance level of .30, which means that there is a 30% chance that one would find a sample difference of means this large or larger when the unknown real difference is zero. At the customary .05 significance level cutoff customarily used by social scientists, this is too much chance. That is, the researcher would not reject the null hypothesis that the group means do not differ on the dependent variable being measured. Example.
Significance in two-way ANOVA. Toothaker (1993: 69) notes that in two-way ANOVA most researchers set the alpha significance level (ex., .05) at the same level for the two main effects and the interaction effect, but that "when you make this choice, you should realize that the error rate for the whole experiment is approximately three times " alpha. Toothaker therefore recommends setting the error rate at alpha/3 to obtain an overall experimentwise error rate of alpha in two-way ANOVA.
F-test assumptions. The F test is less reliable as sample sizes are smaller, group sample sizes are more divergent, and the number of factors increase (see Jaccard, 1998: 81). In the case of unequal variances and unequal group sample sizes, F is conservative if smaller variances are found in groups with smaller samples. If larger variances are found in groups with smaller samples, F is too liberal, with actual Type I error more than indicated by the F test.
The lack of fit F test is a test of the difference of a full vs. reduced model. The reduced model is the researcher's fitted model, which must be a non-full factorial model. The full model to which it is compared is a full factorial model. The sum of squares for the reduced model is partitioned into sum of squares for pure error (SSPE) and sum of squares for lack of fit (SS(LOF)). Thus SS(LOF) = SSE(Reduced)-SSPE, where Reduced refers toSSE for researcher's fitted model (in SPSS, this is found in the Error row of the Sum of Squares column of the "Test of Between Subjects Effects" table). The lack of fit test is described further in Khuri (1985) and in Levy & Neill (1990).
In SPSS, for One-Way ANOVA, select Analyze, Compare Means, One-Way ANOVA; click Post Hoc; select the multiple comparison test you want (see below).
In contrast to the t-test, coefficients based on the q-statistic, discussed below, are commonly used for post-hoc comparisons (exploring the data to uncover large differences, without limiting investigation by à priori theory). T-tests may be seen as a special case of one-way ANOVA. The SPSS TTEST procedure implements t-tests.
This test imposes an extremely small alpha significance level as the number of comparisons becomes large. That is, this method is not recommended when the number of comparisons is large because the power of the test becomes low. Klockars and Sax (1986: 38-39) recommend using a simple .05 alpha rate when there are few comparisons, but using the more stringent Bonferroni-adjusted multiple t-test when the number of planned comparisons is greater than the number of degrees of freedom for between-groups mean square (which is k-1, where k is the number of groups). Nonetheless, researchers still try to limit the number of comparisons, trying to reduce the probability of Type II errors (accepting a false null hypothesis). This test is not recommended when the researcher wishes to perform all possible pairwise comparisons.
If the Bonferroni test is requested, SPSS will print out a table of "Multiple Comparisons" giving the mean difference in the dependent variable between any two groups (ex., differences in test scores for any two educational groups). The significance of this difference is also printed, and an asterisk is printed next to differences significant at the .05 level or better. SPSS supports the Bonferroni test in its GLM and UNIANOVA procedure. Example.
In SPSS, Analyze, Colmpare Means, One-Way ANOVA; click Post Hoc; select Tukey. The "sig." column is the Tukey corrected significance level.
While the Scheffé test maintains an experimentwise .05 significance level in the face of multiple comparisons, it does so at the cost of a loss in statistical power (more Type II errors may be made -- thinking you do not have a relationship when you do). That is, the Scheffé test is a conservative one (more conservative than Dunn or Tukey, for ex.), not appropriate for planned comparisons but rather restricted to post hoc comparisons. Even for post hoc comparisons, the test is used for complex comparisons and is not recommended for pairwise comparisons due to "an unacceptably high level of Type II errors" (Brown and Melamed, 1990: 35). Toothaker (1993: 28) recommends the Scheffé test only for complex comparisons, or when the number of comparisons is large. The Scheffé test is low in power and thus not preferred for particular comparisons, but it can be used when one wishes to do all or a large number of comparisons. Tukey's HSD is preferred for making all pairwise comparisons among group means, and Scheffé for making all or a large number of other linear combinations of group means.
The Post Hoc procedure in SPSS also outputs a homogenous subsets table, illustrated below. This table summarizes the post hoc tests in what some find to be a more easily interpretable format. In each subset column, the factor levels of race are shown which are not significantly different from each other. Thus for Subset 1, Black and Other are shown not to be significantly different from each other. For Subset 2, Other and White are shown not to be significantly different from each other. The Homogenous Post Hoc tests table is not available for all post hoc tests.
In SPSS, select Analyze, Compare Means, One-Way ANOVA; click Contrasts; enter the contrasts you want. Any number of contrast tests are possible. If the researcher wishes to omit a group from the comparison, it is simply coded 0.
In SPSS, select Analyze, Compare Means, One-Way ANOVA; select the dependent and the factor (categorical independent); click Contrasts; select Polynomial; set the Degree drop-down list to Linear, Quadratic, or Cubic.
However, ANOVA is robust for small and even moderate departures from homogeneity of variance (Box, 1954). Still, a rule of thumb is that the ratio of largest to smallest group variances should be 3:1 or less. Moore (1995) suggests the more lenient standard of 4:1. When choosing rules of thumb, remember that the more unequal the sample sizes, the smaller the differences in variances which are acceptable. Marked violations of the homogeneity of variances assumption can lead to either over- or under-estimation of the significance level. disrupt the F-test.
In the figure above, a full factorial model is tested, in which years of education is predicted from the fixed factors region, race, and gender, using the covariate number of siblings. Since Levene's test is significant, the researcher concludes that that groups do not have equal variances. Group variances may be examined in the Descriptive Statistics table, illustrated below, by squaring the standard deviation. The largest variance in years of education is 5.8912 = 34.70 for males in the Southeast who are "other" in race (nonwhite, nonblack); the smallest variance is 1.7732 = 3.14 for Northeast black females, as shown in the partial output below. Since the ratio of the largest to smallest group variance exceeds 10, there is a substantial violation of the assumption of homogeneity of variances. This can be expected to increase Type I errors on F tests in ANOVA (recall Type I errors are false positives, concluding a relationship is significant when it is not). Put another way, if the computed significance for an F test comes out to be .04, it is likely to be worse than that and while the .04 would say the relationship is significant, that conclusion could well be a Type I error for these data.
Spread vs.Level plots. One may also inspect for homogeneity of variances visually by asking for a spread vs. level plot under the Univariate GLM Options button:
In the plot above, the factors with levels in parentheses are region (3), race (3), and sex (2), jointly giving 18 factor groups corresponding to the 18 dots on the plot. The more the dots are within a narrow band of variances on the Y axis, the greater the homogeneity of variances. Additionally, since the X axis is the means of the factor groups, one can visually inspect to see if there is a trend for variances to increase as means increase or some other pattern. Here there is no such clear pattern, but there is considerable diversity in the variances of the factor groups.
In SPSS, Analyze, Compare Means, One-Way ANOVA; click Options; select Brown-Forsyth.
In SPSS, select Analyze, General Linear Model, Univariate; click Save; select the residual res_1 for the dependentl click Plots; select Normality Plots.
Balanced ANOVA designs have equal group sizes, unbalanced ANOVA does not. Unbalanced designs require adjustments in how ANOVA is computed. This is done automatically in ANOVA and MANOVA in SPSS. In SAS, unless a recent version has changed it, no correction is made in PROC ANOVA but correction for unequal groups is done in PROC GLM.
Equal group sizes are not assumed by the t or F tests for the overall model. The range tests based on the q statistic do require a common n, but this is derived by computing the harmonic mean of the unequal group n's when differences are small, and by computing the harmonic mean of the two groups being compared when differences are larger.
Epsilon. If the researcher wishes to correct the univariate F test, this is done by using Huynh-Feldt or Greenhouse-Geisser Epsilon. The closer epsilon is to 1.0, the greater the sphericity. Recall that F is the ratio of between-groups to within-groups mean square variance. The degrees of freedom for between-groups is (k-1), where k = the number of groups. The degrees of freedom for within-groups is k(n-1), where n is the number of cases in each group. To correct F given a finding of lack of sphericity, the researcher multiplies the between-groups degrees of freedom by the value of epsilon. SPSS supplies Huynh-Feldt epsilon, and the more conservative Greenhouse-Geisser epsilon [which in turn is an extension of Box's epsilon, no longer widely used]). For more severe departures from sphericity (epsilon < .75), the more conservative Greenhouse-Geisser epsilon is used, while Huynh-Feldt epsilong is used for less severe violations of the sphericity assumption. The researcher rounds degrees of freedom down to the nearest whole number and looks up the corrected F value in a table using the corrected degrees of freedom.
| SS | df | MS | F | |
|---|---|---|---|---|
| between or explained | 64 | 2 | 32 | |
| within or | 68 | 21 | 3.24 | 9.88 |
| total | 132 | 23 |
SS is the sum of squares (the variation), df the degrees of freedom, MS the mean square (the variance, which is SS/df), and F is the F ratio (which is between MS divided by within MS). As the MS for between-groups is much greater than the MS for within-groups, this table shows the grouping variable does have an effect, as indicated hy the F ratio being greater than 1. The grouping variable had three groups (high, medium, low), which is why the between-groups df was (3-1)=2. There are 8 people per group, so the within-groups d.f. is number of groups times one less than the number of people per group: 3*(8-1)=21. These are the df for the numerator and denominator respectively. We look in the F table for the .05 significance level with 2 and 21 d.f., and find the critical F value is 3.47. As the computed F value is considerably more (9.88), we can be 95% confident that the grouping (independent) variable makes a difference in the dependent variable. (In fact, the F is high enough to be significant at the .001 level, and some computer programs will print this out in the ANOVA table along with or instead of the F value).
The table for two-way ANOVA is similar, but there are additional rows for the main (dependent on independent) effects and for interaction effects, as well as total explained and residual portions:
| SS | df | MS | F | |
|---|---|---|---|---|
| Main Effects | 88 | 3 | 29.333 | 18.857 |
| X1 | 24 | 1 | 24 | 15.429 |
| X2 | 64 | 2 | 32 | 20.571 |
| 2-Way Interaction Effects | 16 | 2 | 8 | 5.143 |
| X1 X2 | 16 | 2 | 8 | 5.143 |
| explained | 104 | 5 | 20.8 | 13.371 |
| residual | 28 | 18 | 1.556 | |
| total | 132 | 23 | 5.739 |
The two-way table is interpreted in the same way, except now there are rows for assessing the between-groups (main effects) variation overall and for each independent, and there are rows for assessing the interaction effects overall and for each interaction (here there is just one interaction, which is thus the same as the overall interaction row). The Explained row now reflects the combined main and interaction effects of the grouping variables, and the Residual is the remaining within-groups variation (the total variation minus the explained variation).
The two-way ANOVA table can be interpreted in terms of the difference of mean differences. The F test for either of the main effects in the table above is reflected in the difference between row means or between column means ( depending on whether X1 or X2 is the row or column variable) in a table (not shown) where X1 and X2 are independent factors and the cell entries are means on the dependent variable. The F test for the interaction effect is reflected in the difference of these two mean differences.