[Home]  [Syllabus]  [Statnotes]  [Links]  [Lab]  [Instructor]  [Home]

Discriminant Function Analysis (Three Groups): SPSS Output

Notes This example is from the SPSS 7.5 "Applications Guide" example for file "gss 93 subset.sav". The dependent is "race." The independents are agewed, educ, rincom91, sibs, rap, polviews (which is a 7-point Likert scale from "Extremely liberal" to "extremely conservative"), and marital.

To obtain this output:

  1. File, Open, point to gss 93 subset.sav.
  2. Statistics, Classify, Discriminant
  3. Select race as the "grouping variable" (the dependent). As independents, select agewed, educ, rincom91, sibs, rap, polviews, and marital.Check "Enter independents together" (i.e., not stepwise).
  4. Click on Statistics and check Univariate ANOVA amd Box's M.
  5. Click on Classify and check Computer from group sizes, Summary table, and all plots.
  6. To run, click OK.
Comments in blue are by the instructor and are not part of SPSS output.


Discriminant

First come several blocks of general processing and descriptive statistics information.
Notes
Output Created 03 Mar 98 13:12:51
Comments
Input Data Y:\PC\spss95\GSS93 subset.sav
Filter <none>
Weight <none>
Split File <none>
N of Rows in Working Data File 1500
Missing Value Handling Definition of Missing User-defined missing values are treated as missing in the analysis phase.
Cases Used In the analysis phase, cases with no user- or system-missing values for any predictor variable are used. Cases with user-, system-missing, or out-of-range values for the grouping variable are always excluded.
Syntax DISCRIMINANT
/GROUPS=race(1 3)
/VARIABLES=agewed educ rincom91 sibs rap polviews marital
/ANALYSIS ALL
/PRIORS SIZE
/STATISTICS=UNIVF BOXM TABLE
/PLOT=COMBINED SEPARATE MAP
/CLASSIFY=NONMISSING POOLED .
Resources Elapsed Time 0:00:02.20

Analysis Case Processing Summary
Unweighted Cases N Percent
Valid 732 48.8
Excluded Missing or out-of-range group codes 0 .0
At least one missing discriminating variable 768 51.2
Both missing or out-of-range group codes and at least one missing discriminating variable 0 .0
Total 768 51.2
Total 1500 100.0

Group Statistics

Valid N (listwise)
Racew of Respondent Unweighted Weighted
white Age When First Married 623 623.000
Highest Year of School Completed 623 623.000
Respondent's Income 623 623.000
Number of Brothers and Sisters 623 623.000
Rap Music 623 623.000
Think of Self as Liberal or Conservative 623 623.000
Marital Status 623 623.000
black Age When First Married 73 73.000
Highest Year of School Completed 73 73.000
Respondent's Income 73 73.000
Number of Brothers and Sisters 73 73.000
Rap Music 73 73.000
Think of Self as Liberal or Conservative 73 73.000
Marital Status 73 73.000
other Age When First Married 36 36.000
Highest Year of School Completed 36 36.000
Respondent's Income 36 36.000
Number of Brothers and Sisters 36 36.000
Rap Music 36 36.000
Think of Self as Liberal or Conservative 36 36.000
Marital Status 36 36.000
Total Age When First Married 732 732.000
Highest Year of School Completed 732 732.000
Respondent's Income 732 732.000
Number of Brothers and Sisters 732 732.000
Rap Music 732 732.000
Think of Self as Liberal or Conservative 732 732.000
Marital Status 732 732.000

In the ANOVA table below, the smaller the Wilks's lambda, the more important the independent variable to the discriminant function. Wilks's lambda is significant by the F test for all variables except rincom91 and polviews, which we might consider dropping from the model.
Tests of Equality of Group Means

Wilks' Lambda F df1 df2 Sig.
Age When First Married .992 3.118 2 729 .045
Highest Year of School Completed .990 3.648 2 729 .027
Respondent's Income .996 1.459 2 729 .233
Number of Brothers and Sisters .937 24.567 2 729 .000
Rap Music .946 20.793 2 729 .000
Think of Self as Liberal or Conservative .994 2.221 2 729 .109
Marital Status .981 6.992 2 729 .001

Analysis 1

Box's Test of Equality of Covariance Matrices

Log Determinants
Racew of Respondent Rank Log Determinant
white 7 10.002
black 7 12.168
other 7 11.980
Pooled within-groups 7 10.537
The ranks and natural logarithms of determinants printed are those of the group covariance matrices.

Box's M test tests the assumption of homogeneity of covariance matrices. This test is very sensitive to meeting also the assumption of multivariate normality. For the data below, the test is significant so we conclude the groups do differ in their covariance matrices, violating an assumption of DA. However, discriminant function analysis is robust even when the homogeneity of variances assumption is not met, provided the data do not contain important outliers. Also, when n is large, as it is here, small deviations from homogeneity will be found significant.
Test Results
Box's M 165.317
F Approx. 2.792
df1 56
df2 32394.124
Sig. .000
Tests null hypothesis of equal population covariance matrices.

Summary of Canonical Discriminant Functions

One discriminant function will be computed the lesser of g - 1 (number of dependent groups minus 1) or k (the number of independent variables). Since the dependent, race, has three groups, the number of discriminant functions computed is two. The eigenvalues show how much of the variance in the dependent, race, is accounted for by each of the functions. To attach meaning to the functions (like to factors in factor analysis) we will use the structure matrix later in the output. Wilks's lambda shows each function is significant.
Eigenvalues
Function Eigenvalue % of Variance Cumulative % Canonical Correlation
1 .145(a) 87.6 87.6 .356
2 .021(a) 12.4 100.0 .142
a First 2 canonical discriminant functions were used in the analysis.

Wilks' Lambda
Test of Function(s) Wilks' Lambda Chi-square df Sig.
1 through 2 .856 112.950 14 .000
2 .980 14.783 6 .022

The standardized discriminant function coefficients in the table below serve the same purpose as beta weights in multiple regression: they indicate the relative importance of the independent variables in predicting the dependent.
Standardized Canonical Discriminant Function Coefficients

Function
1 2
Age When First Married .147 .579
Highest Year of School Completed -.211 -.003
Respondent's Income .071 -.255
Number of Brothers and Sisters .674 .246
Rap Music -.644 .135
Think of Self as Liberal or Conservative -.193 .086
Marital Status .190 -.716

The structure matrix table below shows the correlations of each variable with each discriminant function. The correlations serve like factor loadings in factor analysis -- that is, by identifying the largest absolute correlations associated with each discriminant function the researcher gains insight into how to name each function.

Structure coefficients vs. standardized discriminant function coefficients. The standardized discriminant function coefficients (above) indicate the partial contribution of each variable to the discriminant function(s), controlling for other independents entered in the equation. The structure coefficients (below) indicate the simple correlations between the variables and the discriminant function or functions. The structure coefficients should be used to assign meaningful labels to the discriminant functions. The standardized discriminant function coefficients should be used to assess each independent variable's unique contribution to the discriminant function.

You can see from the example below, it is not easy to assign a meaningful label to each function. The first and most important function has to do with siblings, rap music, education, and political views. The second dimension (function) has to do with age married, marital status, and income. Could these functions be labeled "culture" and "marriage"?
Structure Matrix

Function
1 2
Number of Brothers and Sisters .675(*) .270
Rap Music -.626(*) .112
Highest Year of School Completed -.260(*) .099
Think of Self as Liberal or Conservative -.200(*) .117
Marital Status .232 -.744(*)
Age When First Married .105 .581(*)
Respondent's Income -.156 -.156(*)
Pooled within-groups correlations between discriminating variables and standardized canonical discriminant functions
Variables ordered by absolute size of correlation within function.
* Largest absolute correlation between each variable and any discriminant function


The table below is used to establish the cutting points for classifying cases. The optimal cutting point is the weighted average of the paired values. The cutting points set ranges of the discriminant score to classify cases as white, black, or other. Of course, the computer does the classification automatically, so these values are for informational purposes.
Functions at Group Centroids

Function
Racew of Respondent 1 2
white -.158 -6.990E-03
black .982 -.219
other .738 .565
Unstandardized canonical discriminant functions evaluated at group means

Classification Statistics

The tables below just tells the researcher about the status of cases in terms of processing.
Classification Processing Summary
Processed 1500
Excluded Missing or out-of-range group codes 0
At least one missing discriminating variable 768
Used in Output 732

Prior Probabilities for Groups

Prior Cases Used in Analysis
Racew of Respondent
Unweighted Weighted
white .851 623 623.000
black .100 73 73.000
other .049 36 36.000
Total 1.000 732 732.000


The territorial map below is a plot of the boundaries used for classifying cases into groups based on discriminant function scores. It is obtained by checking "Territorial map" in the "Classify" options of discriminant analysis. For the meaning of the symbols, note the legend below the map. For instance, where one sees "13" near the top of the map, this is a point in discriminant space where group 1 (whites) are differentiated from group 3 (other) on the two functions.
                                  Territorial Map
Canonical Discriminant
Function 2
       -3.0      -2.0      -1.0        .0       1.0       2.0       3.0
          +---------+---------+---------+---------+---------+---------+
     3.0 +                                               13            +
         I                                                13           I
         I                                                 13          I
         I                                                  13         I
         I                                                   13        I
         I                                                    13       I
     2.0 +          +         +         +         +         +  13      +
         I                                                      13     I
         I                                                       133333I
         I                                                        12222I
         I                                                        12   I
         I                                                       12    I
     1.0 +          +         +         +         +         +    12    +
         I                                                       12    I
         I                                                       12    I
         I                                     *                12     I
         I                                                      12     I
         I                                                      12     I
      .0 +          +         +       * +         +         +  12      +
         I                                        *            12      I
         I                                                     12      I
         I                                                    12       I
         I                                                    12       I
         I                                                    12       I
    -1.0 +          +         +         +         +         +12        +
         I                                                   12        I
         I                                                   12        I
         I                                                   12        I
         I                                                  12         I
         I                                                  12         I
    -2.0 +          +         +         +         +         12         +
         I                                                 12          I
         I                                                 12          I
         I                                                 12          I
         I                                                12           I
         I                                                12           I
    -3.0 +                                                12           +
          +---------+---------+---------+---------+---------+---------+
       -3.0      -2.0      -1.0        .0       1.0       2.0       3.0
                         Canonical Discriminant Function 1



Symbols used in territorial map

Symbol  Group  Label
------  -----  --------------------

   1        1  white
   2        2  black
   3        3  other
   *           Indicates a group centroid

Separate-Groups Graphs

The tables below result from checking "Combined-groups" and "Separate-groups" under "Plots" in the "Classify" options of discriminant analysis. Since there are two or more discriminant functions, the charts are scatterplots showing the discriminant scores of the cases on the two discriminant functions. The first three tables show this separately for each of the three race groups, and the fourth table shows the same information for the combined groups. Race of respondent = white

Race of respondent = black

Race of respondent = other

All-groups scatter plot

The table below is used to assess how well the discriminant function works, and if it works equally well for each group of the dependent variable. Here it correctly classifies about 85% of the cases, but this is not as good as it seems. DA gets almost all whites correctly classified. However, it misclassifies most of the "blacks" and "other" cases. The seemingly high 85% rating is obtained by classifying nealy everyone white in a sample which is preponderantly white. This is not a satisfactory discriminant analysis. It would be better to train DA on an analysis set which was balanced in terms of numbers of people in each race group.
Classification Results(a)

Predicted Group Membership Total

Race of Respondent - white black other
Original Count white 616 6 1 623
black 64 8 1 73
other 33 2 1 36
% white 98.9 1.0 .2 100.0
black 87.7 11.0 1.4 100.0
other 91.7 5.6 2.8 100.0
a 85.4% of original grouped cases correctly classified.