Principles and Practice
of Structural Equation Modeling
Rex B. Kline
NY: Guilford Press, 1998.
ISBN 1-57230-337-9 (pbk.). List $35.00.
Chapter 2: Basic Statistical Concepts
[pp. 15-30 assigned for Week 3, Correlation and Partial
Correlation]
1. What are exogenous and endogenous variables?
Loosely, exogenous variables are
independents and endogenous ones are the dependents in a model. The exogenous
variables in a model have no incoming arrows except error. Endogenous variables
do have arrows incoming from other variables in the model.
2. What is standardization and when, in general, does one
prefer standardization?
Data are standardized by
subtracting the mean and dividing by the standard deviation, making all
variables comparable because all wind up with means of 0 and standard
deviations of 1. Standardized statistical coefficients are ones computed on
standardized data or use algorithms which have the same effect.
You want standardization when you
want variables to be comparable. Let the relation of income and conservatism be
equally strong as the relation of education and conservatism. A standardized
measure such as correlation will yield the same magnitude coefficient for both
relationships. An unstandardized measure, such as covariance, will not.
Covariance will be larger for the relation involving a larger metric (income)
than for one measured in small units (education). Covariance will also be larger for the variables with
the larger standard deviations.
You do not want
standardization, however, when you need to compare means and variances since,
by definition, standardization gets rid of differences on these bases. ANOVA
and SEM are examples of procedures which want to look at patterns of
covariances and thus need unstandardized data.
3. What is the difference between Pearsonian and
point-biserial correlation?
Pearson’s r is for the correlation
of two interval variables. Point-biserial r is for the correlation of an
interval with a dichotomy.
SPSS computes point-biserial
automatically, using an exact method. Decades ago, in the age of manual
computation, there were separate formulas for approximating point-biserial
correlation and for Pearsonian correlation.
4. Is a criterion variable an independent or dependent?
Dependent. The independent in this
vocabulary is the predictor variable.
5. If y is the dependent and x is the independent, do we
speak of “the regression of x on y” or “the regression of y on x”?
The latter. The “on” variable is
the independent or predictor. It is short for “y predicted on the basis of
knowing x”.
6. What coefficient is the slope of the straight line which
best summarizes a pattern of dots on a scattergraph?
The unstandardized regression
coefficient, which is the change in y for each unit change in x, the predictor.
7. Why is ordinary least-squares regression called that?
Because it is the most common type
of regression, using as a criterion for drawing the regression line the line
that minimizes the sum of squared distances of the points in a scatterplot to
the regression line.
8. What is the assumption of independent error in
correlation and regression, and what difference does its violation make?
Residuals (deviation of observed
and predicted values) of x should be random. Particularly when one has time
series data, there is a possibility that error at time 1 will be a large
determinant of error at time 2. This is called autocorrelation.
To the extent this assumption is
violated, the reliability of associated measures of significance of the r or b
coefficients will be compromised. This is a non-trivial problem. If one has
reason to suspect possible autocorrelation, one should test for it.
(Not discussed in Kline): The Durbin-Watson
coefficient, d, is a test for autocorrelation. The value of d ranges from 0
to 4. A value of 2 indicates no autocorrelation; 0 indicates positive
autocorrelation; and 4 indicates negative autocorrelation. For a given level of
significance such as .05, there is an upper and a lower d value limit. If the
computed Durbin-Watson d value for a given series is more than the upper limit
for the case of positive serial correlation, the null hypothesis is not
rejected. If the computed d value is less than the lower limit, the null
hypothesis is rejected. If the computed value is in between the two limits, the
result is inconclusive. For the case of negative first-order serial correlation
d must be more than (4 - d-lower-limit) to reject the null hypothesis.
9. How might the truncation of the range of a variable (ex.,
by reducing from 7 to 3 the number of scale points by which it is measured)
affect correlation?
It typically leads to attenuation
or lowering of the correlation coefficient.
10. How might correlation be affected by the fact two
variables have differently-shaped underlying distributions (ex., right-skewed
v. left-skewed, or normal v. bipolar)?
This also typically leads to
attenuation. The maximum correlation will be less than 1.
11. How might correlation be affected when data reliability
is low?
This also typically leads to
attenuation.
12. How might nonlinearity in the relationship affect linear
correlation?
Yet another factor where the
measured correlation will be lower than the actual correlation.
13. What is an interaction effect? How are they assessed in
correlation and regression?
It is a third, moderator variable
whose value affects the relationship of two original variables. As such it is a
form of control variable. In correlation they are explored through partial
correlation. In regression they are explored through adding cross-product
interaction terms to the model.
14. (Omitted because covered in Tacq). Explain spurious correlation.
A correlation between two variables
is fully spurious if the entire correlation can be explained by a third
variable, control of which will cause the correlation of the first two to be
zero. This happens under two circumstances: (1) the third variable is the
common cause of the first two, which do not actually affect each other but
which covary due to their common parent; or (2) the third variable intervenes
between the first two and all causation must flow through it, with neither
original variable affecting the other directly.
Spurious correlation is checked by
partial correlation, using the third variable as the control. The partial
correlation goes to 0 if the third variable is a full control (a common cause
or an intervening variable). Obviously,
there are many situations in the middle, where a relationship is partly
spurious and partly real. In these circumstances, partial correlation will drop
only part way to 0 compared to the correlation of the two original variables.
[pp. 30-46 are
assigned for week 4, Regression]
1. What is “the problem of omitted variables” in regression?
The omission of causally important
variables will affect all the regression coefficients with which the omitted
variable(s) is/are related. Such omission is called specification error.
: If relevant variables are omitted
from the model, the common variance they share with included variables may be
wrongly attributed to those variables, and the error term is inflated. If
causally irrelevant variables are included in the model, the common variance
they share with included variables may be wrongly attributed to the irrelevant
variables. The more the correlation of the irrelevant variable(s) with other
independents, the greater the standard errors of the regression coefficients
for these independents. Omission and irrelevancy can both affect substantially
the size of the b and beta coefficients. This is one reason why it is better to
use regression to compare the relative fit of two models rather than to seek to
establish the validity of a single model specification.
The specification problem in
regression is analogous to the problem of spuriousness in correlation, where a
given bivariate correlation may be inflated because one has not yet introduced
control variables into the model by way of partial correlation.
Note that when the omitted variable
has a suppressing effect, coefficients in the model may underestimate rather
than overestimate the effect of those variables on the dependent.
2. What is suppression in the context of
regression?
Suppression
is when the beta weight in regression or r in correlation is lower than what it
would be when a control variable is introduced. A control is introduced in regression simply by adding the
omitted variable to the equation. It is introduced in correlation by partial
correlation.
This
is the opposite of the usual situation, where the omission of a variable in
regression leads to a beta for an included variable which overestimates
the effect of that variable. This happens because the causal importance of the
omitted variable is taken on by the included variable.
Suppression
occurs when the omitted variable is positively related to the included
independent and negatively to the included dependent (or vice versa). In this
situation, the covariance of the two included variables is less than it would
be if somehow the third variable had no effect. When suppression is present,
the betas for the included variable underestimate the power of its
effect on the dependent. In fact, a beta may be 0, hiding an important
independent variable’s effect!
When
suppression is suspected, backward elimination is used as an option in stepwise
regression.
3.
What models can structural equation modeling handle that regression can’t?
Regression
can’t handle models which have latent variables, nor can it handle models which
posit that error (residuals) is correlated. Rather, in regression, all
variables are indicator variables and correlated error violates one of its
assumptions.
CHAPTER
THREE: SEM FAMILY TREE
[pp.
47-55 is assigned for path analysis]
1.
In section 3.4 Kling discusses a structural model of delinquency. He notes that
“A standard statistical technique that could be used here is multiple
regression. Two separate analyses could be conducted....” (P. 51). What are
these two analysis and how do they relate to path analysis?
The
model has three exogenous variables (social class, motivation, verbal ability)
and two endogenous variables (achievement, delinquency). The two analysis are
the two regressions, one for each endogenous variable. In these regressions,
one endogenous variable is the dependent and the independents are all the other
variables with arrows going to that endogenous variable. The beta weights from
these regressions are the path coefficients.
2.
What is a disturbance term?
This
is a phrase for residual error, which is (1 - R2). It represents the
sum effect of all unmeasured variables. Some authors, Kline included, reserve
“disturbance term” for SEM and use “residual error” for regression models.
[Chapter
3, pp. 55-66 is assigned for topic 10, factor analysis]
1.
What is a “measurement model”?
A
measurement model is the set of causal specifications posited by the
researcher, often in the form of a circles-and-arrows diagram. Arrows indicate
causal effects thought by the researcher to be possible and lack of arrows, of
course, indicates the posited the absence of a causal connection.
2.
How does the measurement model differ in exploratory vs. confirmatory factor
analysis? Explain in terms of Kline, Figures 3.2 and 3.3.
In
EFA the researcher assumes all indicators may be related to all factors and is
looking to see which indicators sort themselves out onto which factors. That is
why Fig. 3.2, for EFA, has an arrow from each factor to every indicator.
In CFA, the researcher posits in advance
which variables are associated with which factors and is looking to see if the
indicators sort as predicted. Thus in Fig. 3.3, each factor has arrows to a
unique subset of indicators.
3.
What does the double-headed arrow between the two factors mean in Kline’s
measurement model figures? What type of factor rotation is implied?
It
indicates correlation of the factors. The normal forms of rotation yield
orthogonal factors. Because factors are here correlated, oblique factor
rotation should be used to conform to the measurement model.
4.
What factor analysis finding indicates convergent validity?
In
CFA, a finding that indicators have high loadings on the predicted factors
indicates convergent validity.
5.
What factor analysis finding indicates discriminant validity?
In
an oblique rotation, the correlation between factors is not so high (ex., >
,85) as to lead one to think the two factors overlap conceptually.
6.
Structural equation modeling combines aspects of path analysis on the one hand
and (a) EFA or (b) CFA?
CFA.
The researcher specifies beforehand the relation between the indicators and the
latent variables (factors). Two or more such models may be evaluated.
CHAPTER
4: DATA PREPARATION AND SCREENING
1.
Why is listwise deletion recommended over pairwise deletion for handling
missing values in SEM?
Listwise
deletion means a case with missing values is ignored in all calculations.
Pairwise means it is ignored only for calculations involving that variable.
However, the pairwise method can result in correlations or covariances which
are outside the range of the possible (see Kline, p. 76). This in turn can lead
to covariance matrices which are singular (aka, non-positive definite),
preventing such math operations as inverting the matrix, because division by
zero will occur. This problem does not occur with listwise deletion. Given that
SEM uses covariance matrices as input, listwise deletion is recommended (or
some form of estimation of missing values, such as substituting mean values).
2.
Why is complete or very high multicollinearity a problem in SEM?
For
the same reason. Complete multicollinearity is assumed to be absent in SEM
because it will result in <I>singular</I> covariance matrices,
which are ones on which one cannot perform certain calculations (ex., matrix
inversion) because division by zero will occur. Very high multicollinearity can
result in matrix entries which approach 0 and while division can occur, reults
will be unstable. Hence complete or very high multicollinearity prevents a SEM
solution.
3.
How is high multicollinearity tested?
Inspection
of the correlation matrix reveals only bivariate multicollinearity, for
bivariate correlations > .90. To assess multivariate multicollinearity, one
uses tolerance or VIF.
Tolerance
is 1 _ R_2 for the regression of that independent variable on all the other
independents, ignoring the dependent. There will be as many tolerance
coefficients as there are independents. The higher the intercorrelation of the
independents, the more the tolerance will approach zero. Tolerance is part of
the denominator in the formula for calculating the confidence limits on the b
(partial regression) coefficient. When tolerance is close to 0 there is high
multicollinearity of that variable with other independents and the b and beta
coefficients will be unstable.The more the multicollinearity, the lower the
tolerance, the more the standard error of the regression coefficients.
Variance_inflation
factor, VIF VIF is the variance inflation factor, which is simply the
reciprocal of tolerance. Therefore, when VIF is high there is high
multicollinearity and instability of the b and beta coefficients. VIF and
tolerance are found in the SPSS output section on collinearity statistics. The
table below shows the inflationary impact on the standard error of the
regression coefficient (b) of the jth independent variable for various levels
of multiple correlation (Rj), tolerance, and VIF (adapted from Fox, 1991: 12).
Note that 1.0 corresponds to no impact, 2.0 to doubling the standard error,
etc. Standard error is doubled when VIF is 4.0 and tolerance is .25,
corresponding to Rj = .87. This is an arbitrary but common cut_off criterion
for deciding when a given independent variable displays "too much"
multicollinearity.
4.
What are outliers, and how are they detected?
Outliers
are extreme, untypical cases. Often the researcher wants to explain them on
some separate basis from the main model, and therefore wishes to eliminate them
from the analysis.
Kling
(p. 83) notes that alternatively, the researcher may use transforms which tend
to “pull in” outliers. These include square root, logarithmic, and inverse (x =
1/x) transforms.
Univariate
outliers can be spotted by some rule of thumb, such as cases more than 3 standard
deviations from the mean. Multivariate outliers are detected by coefficients
like the Mahalanobis distance or Cook’s distance.
The
leverage statistic, h, also called the hat_value, is available to identify
cases which influence the regression model more than others. The leverage
statistic varies from 0 (no influence on the model) to 1 (completely determines
the model). A rule of thumb is that cases with leverage under .2 are not a
problem, but if a case has leverage over .5, the case has undue leverage and
should be examined for the possibility of measurement error or the need to
model such cases separately.
Cook's
distance, D, is another measure of the influence of a case (see the output
example). Cases with larger D values than the rest of the data are those which
have unusual leverage.Fox (1991: 34) suggests as a cut_off for detecting
influential cases, values of D greater than 4/(n _ k _ 1), where n is the
number of cases and k is the number of independents.
Studentized
residuals are also used to detect outliers with high leverage. The studentized
residual is also called the deleted studentized residual because its
calculation involves leaving out one case in turn for each of the cases. Other
terms include externally studentized residual or, misleadingly, standardized
residual. In a plot of studentized residuals, one may draw lines at plus and
minus two standard units to highlight cases outside the range where 95% of the
cases normally lie.
Partial
regression plots, also called partial regression leverage plots or added
variable plots, are yet another way of detecting influential sets of cases.
Partial regression plots are a series of bivariate regression plots of the
dependent variable with each of the independent variables in turn. The plots
show cases by number or label instead of dots. One looks for cases which are
outliers on all or many of the plots.
5.
What type of data transforms tend to
“pull in” outliers?
These include square root, logarithmic, and
inverse (x = 1/x) transforms.
6.
What type of data transforms tend to normalize positively skewed data?
The
same ones. For negative skew, use powers.
CHAPTER
5: STRUCTURAL MODELS WITH OBSERVED VARIABLES AND PATH ANALYSIS: I.
FUNDAMENTALS, RECURSIVE MODELS
[pp.
95-125, 150-154 are assigned for path
analysis]
[pp,
95-125]
1.
What is the specification issue in path analysis?
If
relevant causal variables are omitted, then the direct and indirect effects
will not be measured accurately.
2.
Explain parameters and observations in path models, and why a solution is
impossible when there are more parameters than observations. Relate this to the
concept of model identification.
Observations
are coefficients the researcher has for the model. The total of observations is
the total number of coefficients that can be plugged into equations used to
estimate the unknown parameters, such as the path coefficients. Let v be the
number of observed variables in the model. Then the number of observations is
the number of variances and covariances, equal to [v(v+1)/2]. For instance, 4
variables have 4 variances and 6 unique covariances = [4(4+1)/2] = 10
observations.
Parameters
are what can vary in the model. What can vary are the path coefficients for any
arrows, and the variances and covariances of the exogenous variables and the
disturbance terms. Total parameters are equal to [p + e + d], where p is the
number of straight arrows in the model, denoting the paths; and e is the number
of observations for the exogenous variables; and d is the number of
observations for the disturbance terms.
If
there are 4 variables in the model, p could be 6 (arrows from A to B, C, and D;
from B to C and D; and from C to D). If there is one exogenous variable, then e
= [1(1+1)/2] = 1. If there are three endogenous variables, then d =[3(3+1)/2] =
6. Thus the number of parameters could
be 6 + 1 + 6 = 13. Recall the number of observations for this example is 10.
When there are more things that can vary in the model (parameters) than there
are fixed facts (observations), the model is too complex to be solved. That is,
in this case there are [13 - 10] = 3 more parameters than observations.
Underidentified
models are ones which are not solvable because they have more parameters than
observations. Recursive models are never underidentified. Recursive
models are ones where the research assumes covariances of disturbance terms are
all 0, and where all arrows are unidirectional (no feedback loops). In the example above, p might still be 6 (6
arrows), e might still be 1 (variance of the one exogenous variable, but since
it is assumed there are no covariances among the three disturbance terms, d
would be 3 (the variances of the three disturbance terms). This is 6 + 1 + 3 =
10, which is the same as the number of observations, so the recursive model
would be identified.
If
a model is underidentified, then one must do one or more of the following: (1)
simplify the model by reducing the number of arrows, and/or (2) add exogenous
variables (which, of course, is usually possible only if this need is
considered prior to gathering data).
3.
In relation to parameters, what sample size does Kline recommend?
He
recommends 10 times as many cases as parameters (or ideally 20 times). He
states that 5 times or less is insufficient for significance testing of model
effects.
4.
What is the effect size of a disturbance term? What is its variance?
When
you are computing the betas (path coefficients) for a given endogenous
variable, in a regression in which it is the dependent and those with arrows to
it are independents, you will also get an R2 value. The effect size
of the disturbance term, which reflects unmeasured variables, is (1 - R2), and its variance is (1 -
R2) times the variance of that endogenous variable.
5.
What is the estimate of the correlation and covariance between two disturbance
terms?
The
correlation between two disturbance terms is the partial correlation of the two
endogenous variables, using as controls all their common causes (all variables
with arrows to both). The covariance estimate is the partial covariance: the
partial correlation times the product of the standard deviations of the two
endogenous variables.
6.
How are indirect effect sizes calculated based on betas (path coefficients)?
Simply
multiply out along the path, from the starting variable through the mediating
variable(s) to the dependent variable.
7.
How does one compute the total effect size, reflecting both direct and indirect
effects of one variable on another.
Simply
add the direct and indirect effect sizes.
8.
What is “effects decomposition”?
Listing
direct, indirect, and total effects for each causal variable with respect to
each endogenous variable.
9.
Explain the tracing rule and model-estimated correlation.
The
direct, indirect, and total effects are examples of model estimates. The
tracing rule is a rule for identifying all the paths, the sum of effects of
which is the estimated correlation between two variables in the model. This
model-estimated correlation can be compared to the observed correlation to
assess the fit of the model to the data.
The
tracing rule is simply that the model-implied correlation between two variables
in a model is the sum of all valid paths (tracings) between the two variables.
These include the total effect (which is the sum of direct and indirect
effects) plus any associational effects due to correlated exogenous variables.
These associational effects are calculated by multiplying the correlation
between the exogenous variable under consideration with a second exogenous
variable, by this second exogenous variable”s total effect on the target
variable under consideration. In practice, mistakes are easy and one is wise
to eschew hand computation and instead rely on a model-fitting program like
LISREL or AMOS to compute the model-estimated correlations and model-estimated
covariances.
[Chapter
5, pp. 125-150, assigned for SEM I)
1.
How does MLE relate to SEM?
Structural
coefficients in SEM may be computed any of several ways. Ordinarily, one will
get similar estimates by any of the methods.
· MLE.
Maximum likelihood estimation (MLE) is by far the most common method. Unless
the researcher has good reason, this default should be taken even if other
methods are offered by the modeling software. MLE makes estimates based on
maximizing the probability (likelihood) that the observed covariances are drawn
from a population assumed to be the same as that reflected in the coefficient
estimates. Unlike OLS regression estimates, MLE does not assume uncorrelated
error terms and thus may be used for non-recursive as well as recursive models.
· Starting
values. Note MLE is an iterative procedure in which either the researcher
or the computer must assign initial starting values for the estimates. Poor
starting values (ex., opposite in sign to the proper estimates) may cause MLE
to fail to converge on a solution. Sometimes the researcher is wise to override
manually computer-generated starting values.
· MLE
estimates of variances, covariances, and paths to disturbance terms.
Whereas MLE differs from OLS in estimating structural (path) coefficients
relating variables, it uses the same method (i.e., the observed values) as
estimates for the variances and covariances of the exogenous variables. Each
path from a latent endogenous variable to its disturbance term is set to 1.0,
thereby allowing SEM to estimate the variance of the disturbance term.
· OLS.
Ordinary least squares (OLS). This is the common form of multiple regression,
used in early, stand-alone path analysis programs. It makes estimates based on
minimizing the sum of squared deviations of the linear estimates from the
observed scores. However, even for path modeling of one-indicator variables,
MLE is still preferred in SEM because MLE estimates are computed simultaneously
for the model as a whole, whereas OLS estimates are computed separately in
relation to each endogenous variable.
· 2SLS Two-stage least squares (2SLS) is an
estimation method which adapts OLS to handle correlated error and thus to
handle non-recursive path models. LISREL, one of the leading SEM packages, uses
2SLS to derive the starting coefficient estimates for MLE. MLE is preferred
over 2SLS for the same reasons given for OLS.
· GLS.
Generalized least squares (GLS) is an adaptation of OLS to minimize the sum of
the differences between observed and predicted covariances rather than between
estimates and scores. GLS and ULS (see below) require much less computation
than MLE and thus were common
· ULS.
Unweighted least squares (ULS) also focuses on the difference between observed
and predicted covariances, but does not adjust for differences in the metric
(scale) used to measure different variables, whereas GLS is scale-invariant,
and is usually preferred for this reason.
· ADF.
Asymptotically distribution-free (ADF) estimation does not assume multivariate
normality (whereas MLE, GLS, and ULS) do. For this reason it may be preferred
where the researcher has reason to believe that MLE's multivariate normality
assumption has been violated. Note ADF estimation starts with raw data, not
just the correlation and covariance matrices. ADF is even more
computer-intensive than MLE and is accurate only with very large samples
(200-500 even for simple models, more for complex ones).
2.
In SEM, what is chi-square?
Chi_square.
This is the most common fit test, printed by all computer programs. Chi_square
tests the hypothesis that an unconstrained model (no direct arrows; variables
related randomly) fits the covariance/correlation matrix as well as the given
model. The chi_square value should not be significant if there is a good model
fit. LISREL refers to this simply as chi_square, but synonyms include the
chi_square fit index, chi_square goodness of fit, and chi_square
badness_of_fit. Chi_square approximates for large samples what in small samples
is called G2, the generalized likelihood ratio, which is a function of FML and
sample size: chi_square = FML*(N_1), where N = sample size.
4.
How can chi-square be misleading?
Note
three ways in which the chi_square test may be misleading:
The
more complex the model, the more likely a good fit. In a just_identified model,
with as many parameters as possible and still achieve a solution, there will be
a perfect fit. Put another way, chi_square tests the difference between the
researcher's model and a just_identified version of it, so the closer the researcher's
model is to being just_identified, the more likely good fit will be found.
The
larger the sample size, the more likely the rejection of the model and the more
likely a Type II error (rejecting something true). In very large samples, even
tiny differences between the observed model and the perfect_fit model may be
found significant.
The
chi_square fit index is also very sensitive to violations of the assumption of
multivariate normality.
4.
What is GFI?
GFI
is one of a couple dozen goodness-of-fit measures used to assess the merits of
a SEM model.
Goodness_of_fit
index, GFI (Jöreskog_Sörbom GFI): GFI = FML/FO, where FO is the fit function
when all model parameters are zero. GFI varies from 0 to 1, but theoretically
can yield meaningless negative values. A large sample size pushes GFI up.
Though analogies are made to R_square, GFI cannot be interpreted as percent of
error explained by the model. Rather it is the percent of observed covariances
explained by the covariances implied by the model. That is, R2 in multiple
regression deals with error variance whereas GFI deals with error in
reproducing the variance_covariance matrix. By convention, GFI should by equal
to or greater than .90 to accept the model. LISREL and AMOS both compute GFI.
Adjusted
goodness_of_fit index, AGFI. AGFI is a variant of GFI which uses mean squares
instead of total sums of squares in the numerator and denominator of 1 _ GFI.
It, too, varies from 0 to 1, but theoretically can yield meaningless negative
values. AGFI > 1.0 is associated with just_identified models and models with
almost perfect fit. AGFI < 0 is associated with models with extremely poor
fit, or based on small sample size. AGFI should also be at least .90. LISREL
and AMOS both compute AGFI. AGFI's use has been declining.
5.
Which goodness of fit test(s) should be used out of the many available?
Goodness
of fit tests determine if the model being tested should be accepted or
rejected. These overall fit tests do not establish that particular paths within
the model are significant. If the model is accepted, the researcher will then
go on to interpret the path coefficients in the model ("significant"
path coefficients in poor fit models are not meaningful).
LISREL
prints 15 and AMOS prints 25 different goodness_of_fit measures, the choice of
which is a matter of dispute among methodologists. Jaccard and Wan (1996 87)
recommend use of at least three fit tests, one from each of the first three
categories in StatNotes, so as to reflect diverse criteria. Kline (1998: 130)
recommends at least four tests, such as chi_square; GFI, NFI, or CFI; NNFI; and
SRMR.
6. Why might one not have a good model in spite
of having a good fit on a fit index?
There
are several reasons:
Each
index has its own problems. That is why reporting several, not one, is the
standard procedure.
A
model can fit well overall, but particular parts may be very wrong.
Large
samples may yield significance even for very small differences.
Equivalent
models may fit as well or better.
7.
How is model chi-square used to modify the researcher’s model?
Chi_square
difference statistic. This measures the significance of the difference between
two SEM models of the same data, in which one model is a subset of the other.
It is simply the chi_square fit statistic for one model minus the corresponding
value for the second model. The degrees of freedom (df) for this difference is
simply the df for the first minus the df for the second. If chi_square difference
is not significant, then the two models have comparable fit to the data.
Model_trimming.
Most modification is by way of model_trimming, which is deleting one path at a
time until a significant chi_square difference indicates trimming has gone too
far. As paths are trimmed, chi_square tends to increase, indicating a worse
model fit and also increasing chi_square difference. That is, a significant
chi_square difference indicates dropping a path means the fit of the simpler
model is significantly worse than for the more complex model. Naturally,
dropping paths should be done only if consistent with theory and face validity.
Model_building
is the opposite strategy of starting with the null model or a simple model and
adding paths one at a time, retaining those which yield a significant
chi_square difference. As paths are added to the model, chi_square tends to
decrease, indicating a better fit and also increasing the chi_square
difference. That is, a significant chi_square difference indicates adding a path
means the fit of the more complex model is significantly better than for the
simpler one. Adding paths should be done only if consistent with theory and
face validity.
Non_hierarchical
model comparisons. Model_building and model_trimming involve comparing a model
which is a subset of another. Chi_square difference cannot be used directly for
non_hierarchical models. This is because model fit by chi_square is partly a
function of model complexity, with more complex models fitting better. For
non_hierarchical model comparisons, the researcher should use a fit index which
penalizes for complexity (rewards parsimony), such as AIC.
8.
How is the modification index used to revise models?
Modification
indexes (MI), also called the Lagrange Multiplier. The improvement in fit is
measured by a reduction in chi_square, which makes the chi_square fit index
less likely to be found significant (recall a finding of significance
corresponds to rejecting the model as one which fits the data). For each fixed
and constrained parameter (coefficient), the modification index is a measure of
predicted decrease in chi_square if a single fixed parameter or equality
constraint is removed from the model by eliminating its path, and the model is
reestimated.
In
the case of modification indexes for covariances, the MI has to do with the
decrease in chi_square if the two error term variables are allowed to
correlate. In the case of MI for estimated regression weights, the MI has to do
with the decrease in chi_square if the path between the two variables is
eliminated, no longer requiring estimation of that weight in the model. One
arbitrary rule of thumb is to consider eliminating paths associated with
parameters whose modification index exceeds 100. However, another common path
is simply to eliminate the parameter with the largest MI, then see the effect
as measured by the chi_square fit index. Naturally, eliminating paths or
allowing correlated error terms should only be done when it makes substantive
as well as statistical sense to do so. LISREL and AMOS both compute
modification indexes.
Multivariate
MI, also called the multivariate Lagrange Multiplier, is a variant in EQS
software output, providing a modification index for allowing an entire set of
structure coefficients constrained to 0 (no direct paths) in the researcher's
model to be allowed to vary instead.
9.
How are correlation residuals used to revise models?
Correlation
residuals are the difference between model_estimated correlations and observed
correlations. The variables most likely to be in need of being respecified in
the model are apt to be those with the larger correlation residuals (the usual
cutoff is > .10). Having all correlation residuals < .10 is sometimes
used, along with fit indexes, to define "acceptable fit" for a model.
10.
What are “equivalent models” and what does Kline want researchers to do about
them?
Most
models have alternative specifications which would result in the same estimated
correlations and covariances among the variables. The Lee-Hershberger
replacement rules detail how the researcher can respecify to construct
mathematically equivalent models. Kline notes that very few researchers
actually bother to compare their refined model with equivalents, a practice he
condemns.
[Chapter
5, pp. 150-154. assigned for path analysis]
1.
How is the significance of a path coefficient assessed (compare Kline with
StatNotes)?
Kline
gives some formulas, but it is simpler: the path coefficient is the beta
weight, and the beta has the same significance as that given by SPSS for the
unstandardized b coefficient.
2.
How is the significance of a multiple-leg path assessed?
Each
and all of the path coefficients must be significant. For three-leg paths or
simpler, Kline gives alternative formulas (p. 150).
3.
How do you assess the significance of the total (direct and indirect) effect of
exogenous variable x on endogenous variable y?
Run
a regression with y as dependent and all others as independents, leaving out
any variable which mediates between x and y. The significance of the b or beta
for x in this equation is a test of the significance of the total effect.
Chapter
6, Structural Models with Observed Variables and Path Analysis, II:
Nonrecursive Models, Multiple Group Analysis
1.
What is a disturbance term?
It
is an error term for an endogenous variable. It represents the effects on that
variable of all unmeasured causes.
2.
What is recursivity?
A
model is recursive if all direct effects (straight arrows in the diagram) are
one-way, without feedback loops, and the disturbance terms for the
endogenous variables are uncorrelated with each other (if they are correlated,
feedback loops form).
3.
Isn’t recursivity an assumption of path analysis and SEM? Explain.
Recursivity
is sometimes said to be an assumption, but it is not. All recursive models are
identified and thus can be solved, whereas non-recursive models may not
be identified and thus may not yield a unique solution. The point of Chapter 6
is to understand which non-recursive models may be identified.
4.
What does it mean to say a model is “identified”? Do researchers want an
underidentified or overidentified model?
A
model is identified if it has mathematical properties which allow a unique
solution. Researchers want more knowns than unknowns, which corresponds to an
overidentified model. An underidentified model is not solvable. Note, a synonym
is determined, overdetermined, underdetermined.
5.
What is the easiest way to determine if a model is underidentified, and why
should the researcher determine this before collecting data?
If
the model is recursive, one may assume underidentification. Otherwises, the
easiest way is to run SEM on pretest or fictional data prior to data
collection, since this will usually reveal underidentification. One good reason
to do this is because one solution to underidentification is adding more
exogenous variables, which must be done prior to collecting data. If
underidentified, the program may issue an error message (ex., failure to
converge), generate non-sensical estimates (ex., negative error variances),
display very large standard errors for one or more path coefficients, yield
unusually high correlation estimates (ex., over .9) among the estimated path
coefficients, and/or even stall or crash. The Amos package notifies the
researcher of identification problems and suggests solutions, such as adding
more constraints to the model.
6. What options does the researcher have if it
is found his/her model is underidentified?
If
a model is underidentified, then one must do one or more of the following (not
all model fitting computer packages support all strategies): (Not all ar
discussed in Chapter 6):
Eliminate
feedback loops and reciprocal effects.
Specify
at fixed levels any coefficient estimates whose magnitude is reliably known.
Simplify
the model by reducing the number of arrows, which is the same as constraining a
path coefficient estimate to 0.
Simplify
the model by constraining a path estimate (arrow) in other ways: equality
(it must be the same as another estimate), proportionality (it must be
proportional to another estimate), or inequality (it must be more than
or less than another estimate).
Consider
simplifying the model by eliminating variables.
Add
exogenous variables (which, of course, is usually possible only if this need is
considered prior to gathering data).
If
MLE (maximum likelihood estimation) is being used to estimate path
coefficients, two other remedies may help, if the particular computer program
allows these adjustments:
· Substitute
researcher "guesstimates" as starting values in place of
computer-generated starting values for the estimates.
· Increase
the maximum number of iterations the computer will attempt in seeking
convergence.
7.
What is “empirical underidentification”?
A
model can be theoretically identified but still not solvable due to such
empirical problems as high multicollinearity in any model, or path
estimates close to 0 in non-recursive models.
8.
What is multiple group analysis and how does it work?
Multiple
group analysis is a method of determining if a grouping variable affects a
model. It may be implemented only if the same measurement model is applicable
to both groups.
Multiple
group analysis is implemented by running two separate two-group analyses, first
with no constraints and then again with the constraint that the loadings for
the indicator variables on their respective latent variables be the same for
both groups, and/or that the path estimates be the same for the two groups,
and/or that the error term variances in the two groups be equal. There is
disagreement among methodologists on just which and how many constraints define
"same measurement model." Regardless, this approach is called multiple
group path analysis and can be extended to more than two groups.
If
the goodness of fit is similar for both the constrained and unconstrained
analyses, then the path coefficients for the model as applied to the two groups
separately may be compared. If the fit of the constrained model is worse than
that for the corresponding unconstrained model, then the researcher concludes
that model direct effects differ by group.
9.
What is two-stage least squares and how does it relate to non-recursivity?
Two-stage
least squares regression (2SLS) is a method of extending regression to cover
models which violate ordinary least squares (OLS) regression's assumption of
recursivity, specifically models where the researcher must assume that the
disturbance term of the dependent variable is correlated with the cause(s) of
the independent variable(s). Second, 2SLS is used for the same purpose to
extend path analysis, except that in path models there may be multiple
endogenous variables rather than a single dependent variable. Third, 2SLS is an
older, less-used alternative to maximum likelihood estimation (MLE) in
estimating path parameters of non-recursive models in structural equation
modeling (SEM).
Maximum
likelihood estimation (MLE) is generally preferred over 2SLS for estimating
path parameters in non-recursive models because the MLE estimates take the
entire model into account, whereas 2SLS estimates are computed based on one
portion of the model at a time. That is, MLE is a "full
informational" whereas 2SLS is a "partial informational"
technique. The bottom line is that for overidentified models, MLE estimates are
generally better than 2SLS estimates.
It
is true that path estimation in structural equation modeling (SEM) typically
uses maximum likelihood estimation (MLE) for non-recursive models. However,
two-stage least squares (2SLS), not being an iterative strategy like MLE, is
faster computationally and requires less computer memory. It also does not
require the computer or researcher to posit starting points for the estimates,
mistakes in which may (rarely) lead to lack of convergence in MLE.
However,
use of 2SLS probably indicates one is reading an older article, or an article
by a researcher who has access to 2SLS software but not to SEM software.
10.
What are the observations/parameters test, order condition test, and rank
condition test used for? (It is not necessary to explain the mechanics of these
tests).
These
tests are used to determine in advance if a nonrecursive model is identified.
The first two are necessary conditions, while the third is sufficient. It may
be easier, however, simply to run a SEM package on pretest or fictional data.
What
follows is the mechanics of the tests (probably skip in class):
Non-recursive
models involving all possible correlations among the disturbance terms of the
endogenous variables. The correlation of disturbance terms, of course,
means the researcher is assuming that the unmeasured variables which are also
determinants of the endogenous variables are all correlated among themselves.
This introduces non-recursivity in the form of feedback loops. Still, such a
model may be identified if it meets the rank condition test test, which
implies it also meets the parameters-to-observations test and the order
condition test. These last two are necessary but not sufficient to assure
identification, whereas the rank condition test is a sufficient condition.
These tests are discussed below.
Non-recursive
models with variables grouped in blocks. The relation of the blocks is
recursive. Variables within any block may not be recursively related, but
within each block the researcher assumes the existence of all possible correlations
among the disturbance terms of the endogenous variables for that block.
Such a model may be identified if each block passes the tests for non-recursive
models involving all possible correlations among the disturbance terms of its
endogenous variables, as discussed above.
Non-recursive
models assuming only some disturbance terms of the endogenous variables are
correlated. Such models may be identified if it passes the parameters/observations
test, but even then this needs to be confirmed by running a model-fitting
program on test data to see if a solution is possible.
Tests
related to non-recursive models:
Observations/parameters
test:
Models
are cannot be identified, and hence solvable, unless they have as many or more
parameters than observations. This is an important necessary but not sufficient
condition.
Observations
are coefficients the researcher has for the model. The total of observations is
the total number of coefficients that can be plugged into equations used to
estimate the unknown parameters, such as the path coefficients. Let v be the
number of observed variables in the model. Then the number of observations is
the number of variances and covariances, equal to [v(v+1)/2]. For instance, 4
variables have 4 variances and 6 unique covariances = [4(4+1)/2] = 10
observations.
Parameters
are what can vary in the model. What can vary are the path coefficients for any
arrows, and the variances and covariances of the exogenous variables and the
disturbance terms. Total parameters are equal to [p + c + e + d], where p is
the number of straight arrows in the model, denoting the paths; c is the number
of curved arrows in the model, denoting the correlations of exogenous variables
or of disturbance terms; e is the number of exogenous variables, with a
variance; and d is the number of disturbance terms, each with a variance.
If
there are 4 variables in the model, p could be 6 (arrows from A to B, C, and D;
from B to C and D; and from C to D); c could be 3 if the disturbance terms for
B and C, B and D, and C and D were posited to be correlated; e would be 1 (A is
the 1 exogenous variable); and d could be 3 (if each endogenous variable has a
disturbance term). Thus the total parameters in this model could be 6 + 3 + 1 +
3 = 13. Recall the number of observations for this example is 10. When there
are more things that can vary in the model (parameters) than there are fixed
facts (observations), the model is too complex to be solved. That is, in this
case there are [13 - 10] = 3 more parameters than observations, hence the model
is underidentified.
Order
condition test:
Excluded
variables are endogenous or exogenous variables which have no direct effect
on (have no arrow going to) any other endogenous variable. The order condition
test is met if the number of excluded variables equals or is greater than one
less than the number of endogenous variables.
Rank
condition test:
Rank
refers to the rank of a matrix and is best dealt with in matrix algebra. In
effect, the rank condition test is met if every endogenous variable which is
located in a feedback loop can be distinguished because each has a unique
pattern of direct effects on endogenous variables not in the loop. To test
manually without matrix algebra, first construct a system matrix, in
which the column headers are all variables and the row headers are the
endogenous variables, and the cell entries are either 0's (indicating excluded
variables with no direct effect on any other endogenous variable) or 1's
(indicating variables which do have a direct effect on some endogenous variable
in the model). Then follow these steps:
Repeat
these steps for each endogenous variable, each time starting with the original
system matrix:
Cross
out the row for the given endogenous variable.
Cross
out any column which had a 1 in the row, now crossed-out, for the given
endogenous variable..
Simplify
the matrix by removing the crossed-out row and columns.
Cross
out any row which is all 0's in the simplified matrix. Simplify the matrix
further by removing the crossed-out row.
Cross
out any row which is a duplicate of another row. Simplify the matrix further by
removing the crossed-out row.
Cross
out any row which is the sum of two or more other rows. Simplify the matrix
further by removing the crossed-out row.
Note
the rank of the remaining simplified matrix. The rank is the number of
remaining rows. The rank condition for the given endogenous variable is met if
this rank is equal to or greater than one less than the number of endogenous
variables in the model.
The
rank test is met for the model if the rank condition is met for all endogenous
variables.
Chapter
7: Measurement Models and Confirmatory Factor Analysis
1.
What distinguishes SEM from path analysis?
In
path analysis, each latent variable (construct) is measured by a single
indicator. SEM can be thought of as a combination of path analysis and factor
analysis, with the latter used to create the latent variables which are the
exogenous and endogenous variables in the model. The model-fitting programs
used for SEM also allow the researcher to set any number of model constraints
(ex., correlated disturbances; equal disturbance terms; etc.).
2.
In section 7.2, Kline discusses various aspects of validity. Why, in relation
to CFA or SEM?
All
techniques, including CFA and SEM, are subject to errors of validity. Poor convergent
validity among the indicators for a factor, for instance, may mean the model
needs to have more factors. Take the time to explore validity.htm on the class
website.
3.
What is attenuation and how is it related to reliability? to CFA and SEM?
Attenuation
is the probability that the estimate of r (correlation) is artificially low due
to measurement error or restriction of the data range. Both lower the
reliability coefficient, which may be seen as the correlation of a variable
with itself. . The correction for attenuation of a correlation, rxy
is a function of the reliabilities of the two variables, rxx and ryy:
rxy
= rxy / [SQRT{rxxryy}]
Both
CFA and SEM analyze correlation (and covariance) matrices. If the entries are
attenuated, corresponding relationships may be underestimated.
4.
How is CFA related to SEM?
Confirmatory
factor analysis (CFA) seeks to determine if the number of factors and the
loadings of measured (indicator) variables on them conform to what is expected
on the basis of pre_established theory. The researcher's à priori assumption is
that each factor (the number and labels of which may be specified à priori) is
associated with a specified subset of indicator variables. A minimum
requirement of confirmatory factor analysis is that one hypothesize beforehand
the number of factors in the model, but usually also expectations about which
variables will load on which factors (Kim and Mueller, 1978b: 55). The
researcher seeks to determine, for instance, if measures created to represent a
latent variable really belong together.
Confirmatory
factor analysis can also mean the analysis of alternative factor models using a
structural equation modeling package. While SEM is typically used to model
causal relationships among latent variables, it is equally possible to use SEM
to explore CFA measurement models. SEM packages allow the researcher to specify
any of a wide variety of model constraints, estimate path coefficients, then
assess the goodness_of_fit between estimated and observed coefficients as a
gauge of the merit of alternative models. If only the paths from the latent
variables (factors) to their respective indicators are examined, with paths
between latent variables specified as 0 (orthogonal) or allowed to correlate
(oblique) but are not represented as one_way causal paths, then SEM is being
used to evaluate CFA models.
Using
SEM, the researcher can explore CFA models with or without the assumption of
certain correlations among the error terms of the indicator variables. Such
measurement error terms represent causes of variance due to unmeasured
variables as well as random measurement error. Depending on theory, it may well
be that the researcher should assume unmeasured causal variables will be shared
by indicators or will correlate, and thus SEM testing may well be merited. That
is, including correlated measurement error in the model tests the possibility
that indicator variables correlate not just because of being caused by a common
factor, but also due to common or correlated unmeasured variables. This
possibility would be ruled out if the fit of the model specifying uncorrelated
error terms was as good as the model with correlated error specified. In this
way, testing of the confirmatory factor model may well be a desirable validation
stage preliminary to the main use of SEM to model the causal relations among
latent variables.
Using
SEM, the redundancy test is to use chi_square difference (discussed in the
section on structural equation modeling) to compare an original multifactor
model with one which is constrained by forcing all correlations among the
factors to be 1.0. If the constrained model is not significantly worse than the
unconstrained one, the researcher concludes that a one_factor model would fit
the data as well as a multi_factor one and, on the principle of parsimony, the
one_factor model is to be preferred.
Using
SEM, the measurement invariance test is to use chi_square difference to assess
whether a set of indicators reflects a latent variable equally well across
groups in the sample. The constrained model is one in which factor loadings are
specified to be equal for each class of the grouping variable. If the
constrained model is not significantly worse, then the researcher concludes the
indicators are valid across groups. This procedure is also called multiple
group CFA. If the model fails this test, then it is necessary to examine each
indicator for group invariance, since some indicators may still be invariant.
This procedure, called the partial measurement invariance test is discussed by
Kline (1998: 225 ff.).
Using
SEM, the orthogonality test is similar to the redundancy test, but factor
correlations are set to 0. If the constrained model is not significantly worse
than the unconstrained one, the factors in the model can be considered
orthogonal (uncorrelated, independent). This test requires at least three
indicators per factor.
5.
When is a confirmatory factor analysis (CFA) model identified in SEM?
CFA
models in SEM have no causal paths (straight arrows in the diagram) connecting
the latent variables. The latent variables may be allowed to correlate (oblique
factors) or be constrained to 0 covariance (orthogonal factors). CFA analysis
in SEM usually focuses on analysis of the error terms of the indicator
variables (see previous question and answer). Like other models, CFA models in
SEM must be identified for there to be a unique solution.
In
a standard CFA model each indicator is specified to load only on one factor,
measurement error terms are specified to be uncorrelated with each other, and
all factors are allowed to correlate with each other. One_factor standard
models are identified if the factor has three or more indicators. Multi_factor
standard models are identified if each factor has two or more indicators.
Non_standard
CFA models, where indicators load on multiple factors and/or measurement errors
are correlated, may nonetheless be identified. It is probably easiest to test
identification for such models by running SEM for prestest of fictional data
for the model, since SEM programs normally generate error messages signaling
any underidentification problems. Non_standard models will not be identified if
there are more parameters than observations. (Observations equal v(v+1)/2,
where v is the number of observed indicator variables in the model. Parameters
equal the number of unconstrained arrows from the latent variables to the
indicator variables [unconstrained arrows are the one per latent variable
constrained to 1.0, used to set the metric for that latent variable], plus the
number of two_headed arrows in the model [indicating correlation of factors
and/or of measurement errors], plus the number of variances [which equals the number
of indicator variables plus the number of latent variables].) Note that meeting
the parameters >= observations test does not guarantee identification,
however.
6.
Do severe departures from normal distributions matter in SEM?
Multivariate
normal distribution of the indicators: That is, each indicator should be
normally distributed for each value of each other indicator. Even small
departures from multivariate normality can lead to large differences in this
chi_square test, undermining its utility. In general, simulation studies
(Kline, 1998: 209) suggest that under conditions of severe non_normality of
data, SEM parameter estimates (ex., path estimates) are still fairly accurate
but corresponding significance coefficients are too high. This means that for
significance tests of parameters, there is a bias toward Type II errors
(considering the parameter significant when it is not). Recall that for the
chi_square test of goodness of fit of the model as a whole, the chi_square
value should not be significant if there is a good model fit. Lack of
multivariate normality inflates the chi_square statistic such that the overall
chi_square fit statistic for the model as a whole is biased toward Type I error
(rejecting a model which should not be rejected).