|
|
Overview
Note that multi-level mixed models are based on a multi-level theory which specifies expected direct effects of variables on each other within any one level, and which specifies cross-level interaction effects between variables located at different levels. That is, the researcher must postulate mediating mechanisms which cause variables at one level to influence variables at another level (ex., school-level funding may positively affect individual-level student performance by way of recruiting superior teachers, made possible by superior financial incentives). Multi-level modeling tests multi-level theories statistically, simultaneously modeling variables at different levels without necessary recourse to aggregation or disaggregation. It should be noted, though, that in practice some variables may represent aggregated scores. In SPSS, select Analyze, Mixed Models, Linear; if there are repeated measures, enter the repeated variables and the subject variable; click the Fixed Effects button and fill out that dialog; click the Random Effects button and do likewise; click the Statistics button to select output; click OK. (There are also other options). See also variance components analysis (VARCOMP). Note that the linear mixed models procedure includes all VARCOMP models, but output options differ somewhat.
|
|
In summary, a subject variable is one which is used to define groups such that each is independent of the others. Each subject has a different set of random parameters for the random effect variable(s). Note that if an observation has a missing value on any of the subject variables, it will be dropped from analysis.


SPSS will generate a table labeled "Tests of Fixed Effects" with one row for the intercept and one row for Appraise, the only fixed effect in the example. The last column of this row will be the significance level. If the intercept for Appraise is significant then we can be 95% confident that Appraise has an effect on Price different from 0. If under the Statistics button we checked "Parameter estimates," then a second table of "Estimates of Fixed Effects" will appear, where the "Estimate" column will contain the estimated values for the intercept and for the fixed effect, Appraise in this example. If there had been a fixed factor then there would be estimates for the coefficients for each level of that factor except the last category, which would be redundant and set to 0). Confidence intervals are also displayed.
For significant fixed effects, the value 0 will not be within the confidence interval. The intercept is interpreted as the overall mean of the dependent variable. Since the coefficient for Appraise is positive and significant in this example, we can conclude that appraisal value is positively related to sales price. If the model had a fixed factor, Gender, then if gender=0 corresponded to female and the coefficient was significant and positive, then females have higher performance scores than males. Since City is conceptualized as a random variable, these conclusions may be generalized to all cities, not just those in the sample.
If the City effect is significant, then City affects Price (the dependent). The "City Variance" estimate is the estimate of variance in Price attributable to the random sampling of cities. Since it is not significant in this example, we conclude that City does not affect the relationship of Appraise to Price. When a random effects parameter is found to be non-significant, that effect would normally be dropped from the model and the analysis re-run. The Residual parameter is the estimate of the unexplained variance in Price after controlling for Appraisal and sampling of the random factor City.
GLM produces Type III sums of squares for fixed effects only. Evem though City is entered as a random factor, the table above treats City as if it were a fixed effect for purposes of computing the sums of squares used to compute the F statistic. GLM estimates variance parameters for City (or any random effect) indirectly as described below, using expected mean squares. Linear mixed models and variance components analysis, in contrast, estimate variance parameters directly, using maximum likelihood (ML) or restricted maximum likelihood (REML) methods. For unbalanced designs (unequal n's in the groups formed by the random effect), as in this example, the GLM method will return estimates different from the methods used by linear mixed models or variance components analysis. Thus where in the linear mixed model run, the F value for the main effect Appraisal was 3.010E3, for the GLM run it is 2.994E3 in the table above.
Above it is seen that the GLM method generates coefficient estimates for the fixed effect Appraise similar to that for linear mixed model. It also generates coefficients for the random effect City, which is not part of linear mixed model output due to the LMM not being based on sums of squares methods of estimation. However, the variance estimate for City, which was 38,180,000 in linear mixed modeling and in variance components analysis, is only 21,376,633 in GLM. The GLM variance estimate is computed as Var(City)=[MS(City)-MS(Error)]/EMS(City), where MS(City) = 4.215E9 and MS(Error)=6.519E8 (both from the "Between Subjects Effects" table in GLM) and EMS(City)=166.682 (from the "Expected Mean Squares" table in GLM). Even when the LMM and GLM variance estimates are the same as they will be in balanced designs, GLM has the drawback that the standard error of estimate for the variance of random factor(s) (ex., City) that appear in the "Estimates of Covariance Parameters" table in LMM, cannot be computed in GLM.
Example notes: The level 2 grouping variable, agency, is entered as the Subjects variable (signifying individuals are independent observations within each agency) and as the Subject Grouping variable (signifying it is level 2). Note agency is not entered as a random effect factor as that is already assumed by making it a Subject Grouping variable. Score, by virtue of being Dependent, is known to be level 1. Femalepercent is identified as a covariate in the "Linear Mixed Models" dialog and is specified as a fixed factor to be modeled.
Example notes: The level 2 grouping variable, agency, is entered as the Subjects variable (signifying individuals are independent observations within each agency) and as the Subject Grouping variable (signifying it is level 2). Note agency is not entered as a random effect factor as that is already assumed by making it a Subject Grouping variable. Score, by virtue of being Dependent, is known to be level 1. Seniority is identified as a covariate in the "Linear Mixed Models" dialog and is specified as a fixed factor to be modeled and also specified as a random factor to be modeled. Modeling the slope and intercept of seniority as a random effect is what makes this an RC model. RC models require that the covariance structure be specified, but if there is no information to do this, setting it to Unstructured is necessary. Random effects are not assumed to be independent as when Variance Components is specified as the covariance type.
Note that employee id is thus both the Subjects variable and the grouping variable (Combinations). Time is modeled as a fixed effect. In this example employee is level 1 as Subjects variable and is level 2 as Combinations variable.
The random effect above models SES as a random effect within students. That is, Estimates of Fixed Effects table in SPSS will later show if SES seems to be related to Verbal scores. In contrast, the random effect of SES, shown in the Estimates of Covariance Parameters table, assesses if a student effect due to sampling of students conceived to be at random from a larger population of students, significantly adjusts the variation in SES. A second random effect, not illustrated, does the same thing for Class: it tests if the variation in SES is significantly due to sampling of classes from a random sample of classes. Because the two random effects are modeled separately, the researcher is assuming the sampling of students is uncorrelated with sampling of classes.
The efficiency and power of multi-level tests rests on pooled data across the units comprising two or more levels, which implies large datasets. The REML and ML estimation methods used by LMM give asymptotically efficient estimates, meaning efficiency depends on large samples.
For instance, simulation studies by Kreft (1996) found there was adequate statistical power with 30 groups of 30 observations each; 60 groups with 25 observations each; 150 groups with 5 observations each. The number of groups has more effect on statistical power than the number of observations, though both are important. There is a rapid fall-off in statistical power as the number of groups/observations falls below the threshhold needed. With less than adequate power there is an unacceptable risk of not detecting cross-level interactions (ex., between schools and students). However, both adequate number of individual observations and adequate number of groups are needed. Power for individual-level estimates depends on number of individuals observed, and power for second level estimates depends on number of groups.
Specifically with regard to MSEM, Hox & Maas (2001) used simulation studies to show for small group-level sample sizes, coefficient estimates were not stable. They recommended group-level samples of at least 100. However, Cheung & Au (2005: 612) used resampling to test sample size effects and found sample size "can be as small as 50, yet the results are still comparable with other larger sample size conditions." Unbalanced individual-level samples within groups may require larger group samples. Cheung & Au's experiments also disconfirmed the assertion of some that larger individual-level samples could compensate for small group-level samples.
MIXED dependent varname [BY factor list] [WITH covariate list]
[/CRITERIA = [CIN({95** })] [HCONVERGE({0** } {ABSOLUTE**})
{value} {value} {RELATIVE }
[LCONVERGE({0** } {ABSOLUTE**})] [MXITER({100**})]
{value} {RELATIVE } {n }
[MXSTEP({5**})] [PCONVERGE({1E-6**},{ABSOLUTE**})] [SCORING({1**})]
{n } {value } {RELATIVE } {n }
[SINGULAR({1E-12**})] ]
{value }
[/EMMEANS = TABLES ({OVERALL })]
{factor }
{factor*factor ...}
[WITH (covariate=value [covariate = value ...])
[COMPARE [({factor})] [REFCAT({value})] [ADJ({LSD** })] ]
{FIRST} {BONFERRONI}
{LAST } {SIDAK }
[/FIXED = [effect [effect ...]] [| [NOINT] [SSTYPE({1 })] ] ]
{3**}
[/METHOD = {ML }]
{REML**}
[/MISSING = {EXCLUDE**}]
{INCLUDE }
[/PRINT = [CORB] [COVB] [CPS] [DESCRIPTIVES] [G] [HISTORY(1**)] [LMATRIX] [R]
(n )
[SOLUTION] [TESTCOV]]
[/RANDOM = effect [effect ...]
[| [SUBJECT(varname[*varname[*...]])] [COVTYPE({VC** })]]]
{covstruct+}
[/REGWGT = varname]
[/REPEATED = varname[*varname[*...]] | SUBJECT(varname[*varname[*...]])
[COVTYPE({DIAG** })]]
{covstruct†}
[/SAVE = [tempvar [(name)] [tempvar [(name)]] ...]
[/TEST[(valuelist)] =
['label'] effect valuelist ... [| effect valuelist ...] [divisor=value]]
[; effect valuelist ... [| effect valuelist ...] [divisor=value]]
[/TEST[(valuelist)] = ['label'] ALL list [| list] [divisor=value]]
[; ALL list [| list] [divisor=value]]
** Default if the subcommand is omitted.
† covstruct can take the following values: AD1, AR1, ARH1, ARMA11, CS, CSH, CSR, DIAG, FA1, FAH1, HF, ID, TP, TPH, UN, UNR, VC.
Multi-level modeling in LMM is particularly helpful in the analysis of covariance when data are sparse. For instance, in a study of a Social Security agency office, there may be too few minority employees to enable valid statistical inferences on performance evaluations, using traditional regression models. However, if multi-level data are available on employees and multiple SSA offices, then multi-level models can use not only the individual data in the SSA office but also information in the pooled data for all offices. The resulting prediction equation applied to the given SSA office will use coefficients reflecting both their own and also pooled data. For agencies with a large number of minorities, the multi-level and ordinary regression models will be similar. For agencies with sparse data -- few minorities -- it is true their estimate will rely considerably on the pooled data, but the advantage is that the pooling involved in multi-level models affords a "borrowing of strength" that supports statistical inference in a situation where no inference would be possible using traditional methods.
Traditional regression models vs. LMM analysis. There were three traditional approaches to regression modeling of multilevel data:
Based on a review of the literature and on simulation studies, Ita G. G. Kreft (1996) concluded, "for researchers specifically interested in variance components, and posterior means, RC modeling provides them with separate estimates for separate contexts, and the iteration procedure improves the estimates of the variance components." That is, although effect size as revealed through regression is apt to be similar to effect size in multi-level modeling (see discussion below), multi-level modeling is more helpful in revealing differences in variance among units of analysis in different groups which comprise the levels. An empirical comparison of OLS regression with multilevel modeling by Moerbeek, van Breukelen, & Berger (2003) found that "The treatment effect and especially its standard error, are generally incorrectly estimated by traditional methods, which should, therefore, not in general be used as an alternative to multilevel regression" (p. 341). Also, multi-level modeling may be a preferred method when data are sparse, including studies (ex., twin studies) where groups are sparse.














