[Home]  [Syllabus]  [Statnotes]  [Links]  [Lab]  [Instructor]  [Home]

Two-Way ANOVA Example


Notes This example is from the SPSS 7.5 "Applications Guide" example for file "cars.sav". The dependent is "horse" (horsepower) The factors (independents) are "origin" and "cylinder." The analysis seeks to understand horsepower as a function of country of origin of the automobile, and number of cylinders.

To obtain this output:

  1. File, Open, point to cars.sav.
  2. Statistics, General Linear Model, Simple Factorial
  3. In the ANOVA dialog box, select "horse" as the "dependent", and select "origin" and "cylinder" as the "factors."
  4. Click on "origin" and then on the "Define Range" button. Set the range as 1 to 3 (three categories: American, European, Japanese). Repeat for "cylinder," for a range of 1 to 2 (two categories: 4 and 6 cylinder cars).
  5. Click on "OK" to run the analysis of variance.
Comments in blue are by the instructor and are not part of SPSS output.

Horsepower by Origin and Number of Cylinders
Output Created 20 Feb 98 11:07:46
Comments
Input Data Y:\PC\spss95\Cars.sav
Filter <none>
Weight <none>
Split File <none>
N of Rows in Working Data File 407
Missing Value Handling Definition of Missing User defined missing values are treated as missing.
Cases Used Statistics for each list of variables are based on the cases with no missing or out-of-range data for any variable in the list.
Syntax ANOVA
VARIABLES=horse
BY origin(1 3) cylinder(1 2)
/MAXORDERS ALL
/METHOD UNIQUE .
Resources Memory Required 708 bytes
Elapsed Time 0:00:00.16

Case Processing Summary(a)

Only 70% of the cases had information on all three variables.

Cases
Included Excluded Total
N Percent N Percent N Percent
285 70.0% 122 30.0% 407 100.0%
a horsepower by country of origin, number of cylinders


The "Sig." column in the ANOVA table below shows the main effect for "cylinders" is significant, but not that for country of "origin." The two-way interaction of cylinders*origin, however, was significant at the .002 level. The researcher concludes that number of cylinders is related to horsepower, but this relationship is not a simple one but must be interpreted in terms of the interaction of cylinders joint with country of origin.

The "Model" row is also significant, showing that the model in which horsepower is caused by cylinders, origin, and cylinders*origin, is significant when the model is taken as a whole. As just seen, however, this does not mean each model component is significant (origin acting as a main effect is not significant). The "Model" significance is useful when comparing the fit of multiple models for the same dependent.

ANOVA(a,b)

Unique Method
Sum of Squares df Mean Square F Sig.
horsepower Main Effects (Combined) 30712.122 3 10237.374 48.616 .000
country of origin 923.295 2 461.648 2.192 .114
number of cylinders 18430.125 1 18430.125 87.522 .000
2-Way Interactions country of origin * number of cylinders 2796.816 2 1398.408 6.641 .002
Model 34284.812 5 6856.962 32.563 .000
Residual 58751.062 279 210.577

Total 93035.874 284 327.591

a horsepower by country of origin, number of cylinders
b All effects entered simultaneously


Testing the Normal Distribution Assumption

The SPSS boxplot option can be used to assess normality. For this example, first limit the dataset to 4- and 6-cylinder cars, as in the ANOVA above. In the Data Editor, select Data, Select Cases, If, then specify cylinder<3 as the criterion (because for the "cylinder" variable, 1= 4 cylinder and 2 = 6 cylinder). The get boxplot output by selecting Graphs, Boxplots, Clustered, Define, and set variable = horse, category axis = origin, and define clusters = cylinder.

Explore

Output Created 20 Feb 98 13:20:58
Comments
Input Data Y:\PC\spss95\Cars.sav
Filter cylinder < 3 (FILTER)
Weight <none>
Split File <none>
N of Rows in Working Data File 291
Missing Value Handling Definition of Missing User-defined missing values for dependent variables are treated as missing. User-defined and system missing values for factors are treated as valid data.
Cases Used Statistics are based on cases with no missing values for any dependent variable or factor used.
Syntax EXAMINE
VARIABLES=horse BY origin BY cylinder /PLOT=BOXPLOT/STATISTICS=NONE/NOTOTAL
/MISSING=REPORT.
Resources Elapsed Time 0:00:01.65

country of origin*number of cylinders

Case Processing Summary

Cases
Valid Missing Total

country of origin number of cylinders N Percent N Percent N Percent
horsepower American 4 cylinders 69 95.8% 3 4.2% 72 100.0%
6 cylinders 73 98.6% 1 1.4% 74 100.0%
European 4 cylinders 64 97.0% 2 3.0% 66 100.0%
6 cylinders 4 100.0% 0 .0% 4 100.0%
Japanese 4 cylinders 69 100.0% 0 .0% 69 100.0%
6 cylinders 6 100.0% 0 .0% 6 100.0%

The boxplot shows how the dependent variable (here, horsepower) varies for each factor cell (country and number of cylinders in this case). The range from the highest to lowest value in a category (factor cell) is indicated by the horizontal thin bars at each end of the vertical box column. The dark horizontal line inside the rectangle indicates the mean (mean horsepower here). The rectangle itself indicates where most of the cases lie. Ideally, for normal distributions, the rectangle is in the middle of the range, and the mean line is in the middle of the rectangle.

If most of the rectangle is on one side or the other of the mean line, this indicates the dependent is skewed (not normal) for that group (category). For these data, several of the categories are skewed. Note there are very few cases in the European and Japanese 6-cylinder columns, so skewness is hard to estimate for these categories. Also keep in mind that ANOVA is robust when normality is violated, particularly if one has a large sample and even more if the factor cells have similar n's.


horsepower