|
|
Overview
A non-technical analogy: A mother sees various bumps and shapes under a blanket at the bottom of a bed. When one shape moves toward the top of the bed, all the other bumps and shapes move toward the top also, so the mother concludes that what is under the blanket is a single thing, most likely her child. Similarly, factor analysis takes as input a number of measures and tests, analogous to the bumps and shapes. Those that move together are considered a single thing, which it labels a factor. That is, in factor analysis the researcher is assuming that there is a "child" out there in the form of an underlying factor, and he or she takes simultaneous movement (correlation) as evidence of its existence. If correlation is spurious for some reason, this inference will be mistaken, of course, so it is important when conducting factor analysis that possible variables which might introduce spuriousness, such as anteceding causes, be included in the analysis and taken into account. Factor analysis is part of the general linear model (GLM) family of procedures and makes many of the same assumptions as multiple regression: linear relationships, interval or near-interval data, untruncated variables, proper specification (relevant variables included, extraneous ones excluded), lack of high multicollinearity, and multivariate normality for purposes of significance testing. Factor analysis generates a table in which the rows are the observed raw indicator variables and the columns are the factors or latent variables which explain as much of the variance in these variables as possible. The cells in this table are factor loadings, and the meaning of the factors must be induced from seeing which variables are most heavily loaded on which factors. This inferential labeling process can be fraught with subjectivity as diverse researchers impute different labels. There are several different types of factor analysis, with the most common being principal components analysis (PCA), which is preferred for purposes of data reduction. However, common factor analysis is preferred for purposes of causal analysis anf for confirmatory factor analysis in structural equation modeling, among other settings..
|
|
There are two approaches to confirmatory factor analysis:
The SEM Approach. Confirmatory factor analysis can mean the analysis of alternative measurement (factor) models using a structural equation modeling package such as AMOS or LISREL. While SEM is typically used to model causal relationships among latent variables (factors), it is equally possible to use SEM to explore CFA measurement models. This is done by removing from the model all straight arrows connecting latent variables, adding curved arrows representing covariance between every pair of latent variables, and leaving in the straight arrows from each latent variable to its indicator variables as well as leaving in the straight arrows from error and disturbance terms to their respective variables. Such a measurement model is run like any other model and is evaluated like other models, using goodness of fit measures generated by the SEM package.
PCA is generally used when the research purpose is data reduction (to reduce the information in many measured variables into a smaller set of components). PFA is generally used when the research purpose is to identify latent variables which contribute to the common variance of the set of measured variables, excluding variable-specific (unique) variance.
Warning: Simulations comparing factor analysis with structural equation modeling (SEM) using simulated data indicate that at least in some circumstances, factor analysis may not correctly identify the correct number of latent variables, or sometimes even come close. While factor analysis may demonstrate that a particular model with a given predicted number of latent variables is not inconsistent with the data by this technique, researchers should understand that other models with different numbers of latent variables may also have good fit by SEM techniques.
There are different methods of extracting the factors from a set of data. The method chosen will matter more to the extent that the sample is small, the variables are few, and/or the communality estimates of the variables differ.
A Q-mode issue has to do with negative factor loadings. In conventional factor analysis of variables, loadings are loadings of variables on factors and a negative loading indicates a negative relation of the variable to the factor. In Q-mode factor analysis, loadings are loadings of cases (often individuals) on factors and a negative loading indicates that the case/individual displays responses opposite to those who load positively on the factor. In conventional factor analysis, loading approaching zero indicates the given variable is unrelated to the factor. In Q-mode factor analysis, a loading approaching zero indicates the given case is near the mean for the factor. Cluster analysis is now more common than Q-mode factor analysis. Note, however, that correlations in factor analysis are treated in a general linear model which takes control variables into account, whereas cluster analysis uses correlations simply as similarity measures. For this reason, some researchers still prefer Q-mode factor analysis for clustering analysis.
The following modes are rare.
Before dropping a factor below one's cut-off, however, the researcher should check its correlation with the dependent variable. A very small factor can have a large correlation with the dependent variable, in which case it should not be dropped. Also, as a rule of thumb, factors should have at least three high, interpretable loadings -- fewer may suggest that the reasearcher has asked for too many factors.
The reproduced correlation residuals matrix may help the researcher to identify particular correlations which are ill reproduced by the factor model with the current number of factors. By experimenting with different models with different numbers of factors, the researcher may assess which model best reproduces the correlations which are most critical to his or her research purpose.
Oblique rotations, discussed below, allow the factors to be correlated, and so a factor correlation matrix is generated when oblique is requested. Normally, however, an orthogonal method such as varimax is selected and no factor correlation matrix is produced as the correlation of any factor with another is zero.
MATRIX DATA VARIABLES=varlist. BEGIN DATA MEAN meanslist STDDEV stddevlist CORR 1 CORR .22 1 CORR -.58 .88 1 CORR .33 .02 -.17 1 END DATA. EXECUTE.
where
varlist is a list of variable names separated by commas
meanslist is a list of the means of variables, in the same order as varlist
stddevlist is a list of standard deviations of variables, in the same order
CORR statements define a correlation matrix, with variables in the same order (data above are for illustration; one may have more or fewer CORR statements as needed according to the number of variables).
Note the period at the end of the MATRIX DATA and END DATA commands.
Then if the MATRIX DATA command is part of the same control syntax working file, add the FACTOR command as usual but add the subcommand "/MATRIX=(IN(*)" (but without the quote marks). If the MATRIX DATA is not part of the same syntax set but has been run earlier, the matrix data file name is substituted for the asterisk.
Using confirmatory factor analysis in structural equation modeling, having several or even a score of indicator variables for each factor will tend to yield a model with more reliability, greater validity, higher generalizability, and stronger tests of competing models, than will CFA with two or three indicators per factor, all other things equal. However, the researcher must take account of the statistical artifact that models with fewer variables will yield apparent better fit as measured by SEM goodness of fit coefficients, all other things equal.
However, "the more, the better" may not be true when there is a possibility of suboptimal factor solutions ("bloated factors"). Too many too similar items will mask true underlying factors, leading to suboptimal solutions. For instance, items like "I like my office," "My office is nice," "I like working in my office," etc., may create an "office" factor when the researcher is trying to investigate the broader factor of "job satisfaction." To avoid suboptimization, the researcher should start with a small set of the most defensible (highest face validity) items which represent the range of the factor (ex., ones dealing with work environment, coworkers, and remuneration in a study of job satisfaction). Assuming these load on the same job satisfaction factor, the researcher then should add one additional variable at a time, adding only items which continue to load on the job satisfaction factor, and noting when the factor begins to break down. This stepwise strategy results in the most defensible final factors.
There is a KMO statistic for each individual variable, and their sum is the KMO overall statistic. KMO varies from 0 to 1.0 and KMO overall should be .60 or higher to proceed with factor analysis. If it is not, drop the indicator variables with the lowest individual KMO statistic values, until KMO overall rises above .60. (Some researchers use a more lenient .50 cut-off).
the To compute KMO overall, the numerator is the sum of squared correlations of all variables in the analysis (except the 1.0 self-correlations of variables with themselves, of course). The denominator is this same sum plus the sum of squared partial correlations of each variable i with each variable j, controlling for others in the analysis. The concept is that the partial correlations should not be very large if one is to expect distinct factors to emerge from factor analysis. See Hutcheson and Sofroniou, 1999: 224.
In SPSS, KMO is found under Analyze - Statistics - Data Reduction - Factor - Variables (input variables) - Descriptives - Correlation Matrix - check KMO and Bartlett's test of sphericity and also check Anti-image - Continue - OK. The KMO output is KMO overall. The diagonal elements on the Anti-image correlation matrix are the KMO individual statistics for each variable.
The factor invariance test, discussed above, is a structural equation modeling technique (available in AMOS, for ex.) which tests for deterioration in model fit when factor loadings are constrained to be equal across sample groups.
The comparison measures method requires computation of various measures which compare factor attributes of the two samples. Factor comparison is discussed by Levine (1977: 37-54), who describes these factor comparison measures:
However, occasionally an oblique rotation will still result in a set of factors whose intercorrelations approach zero. This, indeed, is the test of whether the underlying factor structure of a set of variables is orthogonal. Orthogonal rotation mathematically assures resulting factors w
Also, oblique rotation is necessary as part of hierarchical factor analysis, which seeks to identify higher-order factors on the basis of correlated lower-level ones..
When modeling, oblique rotation may be used as a filter. Data are first analyzed by oblique rotation and the factor correlation matrix is examined. If the factor correlations are small (ex., < .32, corresponding to 10% explained), then the researcher may feel warranted in assuming orthogonality in the model. If the correlations are larger, then covariance between factors should be assumed (ex., in structural equation modeling, one adds double-headed arrows between latents).
For purposes other than modeling, such as seeing if test items sort themselves out on factors as predicted, orthogonal rotation is almost universal.
HFA is a two-stage process. First an oblique (oblimin) factor analysis is conducted on the raw dataset. As it is critical in HFA to obtain the simplest factor structure possible, it is recommended to run oblimin for several different values of delta, not just the default delta=0. A delta of 0 gives the most oblique solutions, but the more the researcher specifies (in the SPSS "Factor Analysis" Rotation" dialog, invoked by clicking the Rotation button) a more and more negative delta, the factors become less and less oblique. To override the default delta of 0, the researcher enters a value less than or equal to 0.8.
When the researcher feels the simplest factor structure has been obtained, one has a correlated set of lower-order factors. Factor scores or a correlation matrix of factors from the first stage can be input to a second-stage orthogonal factor analysis (ex., varimax) to generate one or more higher-order factors.
Note, however, that this orthogonalization comes at a price. Now, instead of explicit variables, one is modeling in terms of factors, the labels for which are difficult to impute. Statistically, multicollinearity is eliminated by this procedure, but in reality it is hidden in the fact that all variables have some loading on all factors, muddying the purity of meaning of the factors.
A second research use for component scores is simply to be able to use fewer variables in, say, a correlation matrix, in order to simplify presentation of the associations.
Note also that factor scores are quite different from factor loadings. Factor scores are coefficients of cases on the factors, whereas factor loadings are coefficients of variables on the factors.
Common factor analysis (PFA) determines the least number of factors which can account for the common variance in a set of variables. This is appropriate for determining the dimensionality of a set of variables such as a set of items in a scale, specifically to test whether one factor can account for the bulk of the common variance in the set, though PCA can also be used to test dimensionality. Common factor analysis has the disadvantage that it can generate negative eigenvalues, which are meaningless.
Copyright 1998, 2008 by G. David Garson.
Last update: 3/25/08.