|
|
Overview
Advantages of SEM compared to multiple regression include more flexible assumptions (particularly allowing interpretation even in the face of multicollinearity), use of confirmatory factor analysis to reduce measurement error by having multiple indicators per latent variable, the attraction of SEM's graphical modeling interface, the desirability of testing models overall rather than coefficients individually, the ability to test models with multiple dependents, the ability to model mediating variables rather than be restricted to an additive model (in OLS regression the dependent is a function of the Var1 effect plus the Var2 effect plus the Var3 effect, etc.), the ability to model error terms, the ability to test coefficients across multiple between-subjects groups, and ability to handle difficult data (time series with autocorrelated error, non-normal data, incomplete data). Moreover, where regression is highly susceptible to error of interpretation by misspecification, the SEM strategy of comparing alternative models to assess relative model fit makes it more robust. SEM is usually viewed as a confirmatory rather than exploratory procedure, using one of three approaches: SEM is a family of statistical techniques which incorporates and integrates path analysis and factor analysis. In fact, use of SEM software for a model in which each variable has only one indicator is a type of path analysis. Use of SEM software for a model in which each variable has multiple indicators but there are no direct effects (arrows) connecting the variables is a type of factor analysis. Usually, however, SEM refers to a hybrid model with both multiple indicators for each variable (called latent variables or factors), and paths specified connecting the latent variables. Synonyms for SEM are covariance structure analysis, covariance structure modeling, and analysis of covariance structures. Although these synonyms rightly indicate that analysis of covariance is the focus of SEM, be aware that SEM can also analyze the mean structure of a model. See also partial least squares regression, which is an alternative method of modeling the relationship among latent variables, also generating path coefficients for a SEM-type model, but without SEM's data distribution assumptions. PLS path modeling is sometimes called "soft modeling" because it makes soft or relaxed assumptions about data...
|
|
The AMOS interface looks like this (large, initially blank area to draw the path diagram on the right is not shown):
In AMOS, the general process of structural modeling is to use the icons above to draw a circle-and-arrow path diagram, associated the diagram with data (a correlation matrix or raw data), then select Analyze, Calculate Estimates from the menu.
Example. In the AMOS example above, the latent variable PriorAbility is measured by the indicator variables pretest1 and pretest2. The latent variable PostAbility is measured by the indicator variables posttst1 and posttst2. Indicator and other measured variables are depicted as rectangles by convention. Latent variables are depicted as ovals by convention.The e1 to e4 terms are the error terms associated with each indicator variable. The arrows hypothesize that PostAbility is caused by PreAbility and by the practical performance experience of the exogenous measured variable Perform. The two-headed arrow indicates Perform is thought to be correlated with PriorAbility, which is an exogenous latent variable. As is usual, there is a disturbance or error term, Dist, associated with the endogenous latent variable, PostAbility. The 1's next to certain arrows are the regression weights necessary to set metrics in the model, as discussed below. Also shown is the Data Files window (select File, Data Files from the AMOS menu), showing the associted data file, structur1.sav, which is for this example is a correlation matrix with information on n, standard deviations, and means.
For this example, correlation matrix input looks like this (though note conventional raw data input is possible also and, indeed, is necessary if certain operations such as Data Recode, discussed below, are requested).
Warning: Indicator variables cannot be combined arbitrarily to form latent variables. For instance, combining gender, race, or other demographic variables to form a latent variable called "background factors" would be improper because it would not represent any single underlying continuum of meaning. The confirmatory factor analysis step in SEM is a test of the meaningfulness of latent variables and their indicators, but the researcher may wish to apply traditional tests (ex., Cronbach's alpha) or conduct traditional factor analysis (ex., principal axis factoring).
In the illustration below, the AMOS Object Properties window has been opened to the Parameters tab to show 1 entered as the metric for the regression line from the Dist disturbance term to the PostAbility latent variable, whose path is also labeled 1 on the diagram. Object Properties may be opened on any object by right-clicking the object in the diagram, then selecting Object Properties from the context menu. Though the metric of 1 is set automatically, this is where the researcher may constrain any parameter to any value, or, alternatively, erase a setting to free the parameter to be freely estimated.
Alternatively, one may set the factor variances to 1, thereby effectively obtaining a standardized solution. This alternative is inconsistent with multiple group analysis. Note also that if the researcher does not explicitly set metrics to 1.0 but instead relies on an automatic standardization feature built into some SEM software, one may encounter underidentification error messages -- hence explicitly setting the metric of a reference variable to 1.0 is recommended. See step 2 in the computer output example. Warning: LISREL Version 8 defaulted to setting factor variances to 1 if the user did not set the loading of a reference variable to 1.
Example. In the illustration above, the highest modification indexes have to do with correlated error terms between pre- and post-tests, particularly between pretest 1 and posttest1 (error terms 1 and 3). Modification indexes are also presented to suggest adding paths (regression lines), such as from posttest1 to pretest1. However, that would violate chronological logic for these data and the path therefore should not be added. In general, one should have sound theoretical reason for adding paths suggested by MIs.
Likewise, one can have good fit in a misspecified model. One indicator of this occuring is if there are high modification indexes in spite of good fit. High MI's indicate multicollinearity in the model and/or correlated error.
A good fit doesn't mean each particular part of the model fits well. Many equivalent and alternative models may yield as good a fit -- that is, fit indexes rule out bad models but do not prove good models.Also, a good fit doesn't mean the exogenous variables are causing the endogenous variables (for instance, one may get a good fit precisely because one's model accurately reflects that most of the exogenous variables have little to do with the endogenous variables). Also keep in mind that one may get a bad fit not because the structural model is in error, but because of a poor measurement model.
All other things equal, a model with fewer indicators per factor will have a higher apparent fit than a model with more indicators per factor. Fit coefficients which reward parsimony, discussed below, are one way to adjust for this tendency.
There are three ways, listed below, in which the chi-square test may be misleading. Because of these reasons, many researchers who use SEM believe that with a reasonable sample size (ex., > 200) and good approximate fit as indicated by other fit tests (ex., NNFI, CFI, RMSEA, and others discussed below), the significance of the chi-square test may be discounted and that a significant chi-square is not a reason by itself to modify the model.
Also, when degrees of freedom are large relative to sample size, GFI is biased downward except when the number of parameters (p) is very large. Under these circumstances, Steiger recommends an adjusted GFI (GFI-hat). GFI-hat = p / (p + 2 * F-hat), where F-hat is the population estimate of the minimum value of the discrepancy function, F, computed as F-hat = (chisquare - df) / (n - 1), where df is degrees of freedom and n is sample size. GFI-hat adjusts GFI upwards. Also, GFI tends to be larger as sample size increases; correspondingly, AGFI may underestimate fit for small sample sizes, according to Bollen (1990).
BIC is an approximation to the log of a Bayes factor for the model of interest compared to the saturated model. BIC became popular in sociology after it was popularized by Raftery in the 1980s. See Raftery (1995) on BIC's derivation. Recently, however, the limitations of BIC have been highlighted. See Winship, ed. (1999), on controversies surrounding BIC. BIC uses sample size n to estimate the amount of information associated with a given dataset. A model based on a large n but which has little variance in its variables and/or highly collinear independents may yield misleading model fit using BIC.
NNFI close to 1 indicates a good fit. Rarely, some authors have used the a cutoff as low as .80 since TLI tends to run lower than GFI. However, more recently, Hu and Bentler (1999) have suggested NNFI >= .95 as the cutoff for a good model fit and this is widely accepted (ex., by Schumacker & Lomax, 2004: 82) as the cutoff. . NNFI values below .90 indicate a need to respecify the model.
It may be said that RMSEA corrects for model complexity, as shown by the fact that df is in its denominator. However, degrees of freedom is an imperfect measure of model complexity. Since RMSEA computes average lack of fit per degree of freedom, one could have near-zero lack of fit in both a complex and in a simple model and RMSEA would compute to be near zero in both, yet most methodologists would judge the simpler model to be better on parsimony grounds. Therefore model comparisons using RMSEA should be interpreted in the light of the parsimony ratio, which reflects model complexity according to its formula, PR = df(model)/df(maximum possible df). Also, RMSEA is normally reported with its confidence intervals. In a well-fitting model, the lower 90% confidence limit includes or is very close to 0, while the upper limit is less than .08.
Example. In the example above, the path from Perform to PriorAbility has been made optional (here shown in green, but yellow in AMOS). Therefore the specification search generates two default models, one with and one without the optional arrow. In the output, the original full model with the optional arrow is the one with the larger number of parameters (12). Various fit measures are shown, explained below. By most (but not all) measures, such as AIC, the original model is best-fitting.
The next step is to associate the group names with actual data files under Data, Data Files. as illustrated below, using the Data Files button as usual to add the files, here males.sav and females.sav,
Before testing for measurement invariance across groups, the researcher first checks to see if the model as drawn has acceptable fit for each of the multiple groups (in this example, for males and females). Often the researcher tests one-sample models separately first. For instance one might test the model separately for a male sample and for a female sample. Separate testing provides an overview of how consistent the model results are, but it does not constitute testing for significant differences in the model's parameters between groups. If consistency is found, then the researcher will proceed to multigroup testing. First a baseline chi-square value is derived by computing model fit for the pooled sample of all groups. To accomplish this, one simply selects Analyze, Calculate Estimates. View, Text Output, will reveal (1) the usual overall goodness of fit measures, which should show the model has acceptable fit; and (2) separate regression parameter estimates for each of the groups, and these parameters should be significant for all groups. These findings establish that the given path model is plausible for the multiple groups and set the stage for testing measurement invariance across groups.
First, the researcher can click the "View output path" icon as illustrated below, then alternately select the groups to display the path parameter estimates to verify that they do not differ strongly by group. If the path parameters seem similar, the researcher has reason to suspect that measurement invariance is upheld (or structural invariance in the case of path parameters pertaining to the structural model).
The parameters noted in the illustration above are:
The selection of parameters to constrain corresponds to the researcher's purpose. For instance, in testing measurement invariance for a factor analysis model, the weights to constrain to be equal across groups would be the regression weights from the latent variables (the factors) to the indicator variables.
After selecting the constraint model(s) using Analyze, Multiple-Group Analysis from the AMOS menu, the researcher selects Analyze, Calculate Estimates. In the main AMOS display as illustrated below, each of the models is evaluated. The "OK" next to each model indicates it could be fitted (was not unidentified).
Then the researcher selects View, Text Output, to see the output, including the (excerpted) goodness of fit measures illustrated below. Each measure is shown for the unconstrained model and each constrained model selected under Analyze, Multiple-Group Analysis. In this example, only three such models were requested. The P value for CMIN is the model chi-square test, which for these models is non-significant, meaning that none of the models are significantly different from the saturated (perfect explanation) model and all are acceptable. By GFI, fit was acceptable for all three, meaning that constraining the groups to be equal still yielded acceptable model fit.
Cheung & Rensvold (2002) examined 20 goodness of fit measures for use when testing for measurement invariance across multiple groups, recommending the use of CFI, NCP, and GFI because these measures were independent of model complexity and sample size and were uncorrelated with model chi-square.