|
|
Overview
|
|
Counting scale errors in Guttman scaling.. Taking original dichotomous data shown in the upper half of the figure below, items are sorted to attempt to form as perfect a Guttman scale as possible. See the FAQ section for SPSS syntax to get data into Guttman scale format. However, even when resorted as shown in the bottom portion of the figure below, various counting issues arise.
By arbitrary convention, Cs should be .60 or higher to consider a set of items to be adequately scalable as an ordered Guttman-type scale. However, Cs should be understood in the context of MMR. MMR should not excessively high. Also, Cs should be appreciably higher than MMR (not the case for this example). See McIver & Carmines (1980: 50).
Proximity scaling uses many of the same statistics (ex., coefficient of reproducibility) as Guttman scaling, but the definition of scale errors is different. In Guttman scaling, a perfect set of scores forms a triangle of responses, in which interior blanks are errors. In proximity scaling, a perfect set of scores forms a parallelogram of responses, in which interior blanks are errors. Note that proximity scales are not ordinal because a score on one item does not predict scores on all less extreme items.
Herbert Weisberg (1972) compared a Guttman scale with a proximity scale of House voting on the compromise of 1850, which included five votes: fugitive slaves, Utah territory, Texas/New Mexico, California statehood, and D. C. slavery. The best Guttman scale displayed 15% of votes not conforming to Guttman criteria. The best proximity scale displayed only 2% of votes not conforming to proximity scale criteria, with error patterns as in the table above.
All items in a Mokken scale have different difficulties, as reflected in different proportions of positive responses. The graphic representation (called a trace line) of the probability of a positive response to an item should increase monotonically as the latent trait increases along the x axis (and where the y axis, of course, is the probability). Double monotony must not exist (that is, trace lines of items in a scale should not intersect). Also, trace lines must be steep enough to produce only a limited number of Guttman errors (exceptions to the rule that a positive answer to an item implies a positive answer to all easier items). Loevinger's H measures the conformity of a set of items to Mokken's criteria and validates their use together as a scale of a unidimensional latent variable.
| Item | i | j |
| 0 | 0 | |
| 0 | 1 | |
| 1 | 1 | |
| Error pattern below | ||
| 1 | 0 | |
Let item j be easier than item i, which in formulaic expression means that P(Xj = 1) > P(Xi = 1) -- the probability j is 1 is greater than the probability i is 1. Then Hij = l-E/Eo, where E = P(Xi = 1,Xj = 0) and Eo =P(Xi= 1)*P(Xj= 0) for a random subject. When there are no Guttman errors, Hij = 1. When the response is random (the null model), Hij = 0. (Of course, when computing these values one must recode where necessary so that the 1's and 0's have a consistent meaning across items).
Then one can sum across all j to get Hi = 1 - E/Eo, which gives a measure of fit of item i to the Mokken scale. That is, Hi is the mean Hij for all pairs involving item i.
Then one can sum across all item pairs to get H = 1 - E/Eo to get a measure of the quality of the total Mokken scale. H (Loevinger's H) is the mean Hij for all pairs. The arbitrary but customary criterion for validating a set of items as a Mokken scale is that H and all Hi must be .30. A rule of thumb is to speak of a "strong scale" for values exceeding 0.50, a "moderate scale" for values from .40 to .50, and a "weak scale" for values from .30 to .40. See further discussion below.
In addition to estimating internal consistency (a.k.a. "reliability") from the average correlation, the formula for alpha also takes into account the number of items on the theory that the more items, the more reliable a scale will be. That is, when the number of items in a scale is higher, alpha will be higher even when the estimated average correlations are equal. As the number of items rises, alpha rises.
Also, the more consistent within-subject responses are, and the greater the variability between subjects in the sample, the higher Cronbach's alpha will be. Finally, alpha will be higher when there is homogeneity of variances among items than when there is not.
The widely-accepted social science cut-off is that alpha should be .70 or higher for a set of items to be considered a scale, but some use .75 or .80 while others are as lenient as .60. That .70 is as low as one may wish to go is reflected in the fact that when alpha is .70, the standard error of measurement will be over half (0.55) a standard deviation.
Under some circumstances, alpha may be negative. This reflects a serious coding error in the data: data should be recoded if necessary to assure that all items are coded in the same conceptual direction. If there are n negatively coded items and m positively coded items, there will be n*m negative correlations. The number of positive correlations is based on the number of combinations within each set, namely n!/(n-2)!*2 + m!/(m-2)!*2. Two variables have 1 combination, three have 3, four have 6, five have 10, six have 15, etc. For some circumstances there will be more negative correlations than positive. For instance, if the negatively and positively coded sets have 5 and 4 items respectively, there will be 5*4=20 negative correlations and only 6+10 positive correlations and alpha will be negative. The researcher should check the covariance matrix of scale items to make sure there are no negative covariances reflecting coding error.
In SPSS, Cronbach's alpha is found under Analyze, Scale, Reliability Analysis. Then in the Statistics button, select Scale to get alpha. You can also check Scale if deleted, in which case alpha will be computed both for all variables entered, and also for all remaining variables if any one is dropped (the alpha if deleted is listed in a table, one for each variable). See discussion in reliability section.
Alpha makes no assumptions about what one would obtain at a different time (the latter is "reliability," discussed below). Alpha = (# of items/(# of items - 1)) * (1 - (sum of the variances of the items/variance of the total score)). See Miller, M.B. (1995). Coefficient alpha: A basic introduction from the perspectives of classical test theory and structural equation modeling. Structural Equation Modeling, Vol. 2, No. 3: 255-273.
In structural equation modeling (SEM), tau-equivalence is tested by comparing an unconstrained model with one in which the factor loadings of the indicator variables on the factor are all set to 1.0, then seeing is the chi-square difference is insignificant. If the indicator model for the factor is found to be tau-equivalent, then the original model is compared to one in which measurement error variances are all constrained to be equal, and another chi-square difference test is conducted.
Inter-item correlation is used to spot reverse-coded items. Scale items should correlate positively with one another. A negative correlation may well indicate, for instance, that an item was coded such that "1" meant the opposite direction (ex., low) from the meaning of "1" (ex., high) for the other items in the scale.
Kappa = (observed concordance - concordance by chance)/(1-concordance by chance), where "by chance" is calculated as in chi-square (multiple row marginal times column marginal and divide by n).
Example: Two raters rate a target as "A," "B," or "C," as in the table below:
| Rating | A | B | C | Total |
|---|---|---|---|---|
| A | 11 | 2 | 4 | 17 |
| B | 1 | 14 | 2 | 17 |
| C | 4 | 2 | 10 | 16 |
| Total | 16 | 18 | 16 | 50 |
Observed concordance (agreement) = diagonal sum divided by n = (11+14+10)/50=0.70
Expected concordance by chance = diagonal row*column totals divided by n, summed, divided by n = (16*17/50 + 18*17/50 + 18*16/50)/50 = (5.44 + 6.12 + 5.76)/50 = .35
Kappa = (Observed Concordance - Expected Concordance)/(1-Expected Concordance) = (.70 - .35)/(1 - .35) = .54 (.55 without rounding).
As an independent: Methodologists use a rule-of-thumb that there must be a certain minimum number of classes in the ordinal independent (Achen, 1991, argues for at least 5; Berry (1993: 47) states five or fewer is "clearly inappropriate"; others have insisted on 7 or more). Use of 7-point scales or higher would seem best, but it must be noted that use of 5-point Likert scales with interval procedures is extremely common in the literature.
As a dependent: One method is to test to see if there are significant differences in the regression equation when computed separately for each value class of the ordinal dependent. If the independents seem to operate equally across each of the ordinal levels of the dependent, then use of an ordinal dependent is considered acceptable.
In SPSS, select Analyze, Scale/Reliabiliy Analysis; select your variables; click Statistics; select Tukey's test of additivity. Continue. OK. Look for the "Additivity" row in the resulting ANOVA table.
*GUTTMAN SCALING SPSS SYNTAX * by david_garson@ncsu.edu, 2008 * Creates a new Guttman-sorted worksheet, with cases and variables sorted * The original worksheet is left intact. * This assumes cell entries are 0, 1 * Also assumes variables are named as in the first SORT CASES command below; change as needed. * Also assumes 9 cases; change count in COUNT statement below by changing var009; also change N OF CASES to actual n. SORT CASES BY item1 item2 item3 item4 item5 item6 item7 item8. EXECUTE. FLIP VARIABLES=all. COUNT isum = var001 to var009 (1). SORT CASES BY isum. FLIP VARIABLES=all. DELETE VARIABLES case_lbl. N OF CASES 9. EXECUTE.
The cluster bloc technique results in an ordering of Senators, displayed in a matrix, such that blocs of agreement are easily identified as clusters on the matrix. The blocs that are identified for one set of votes can be compared to blocs for another set of votes. Note that agreement is not equivalent to similar thinking: agreement may also be born of party discipline, constituent pressures, or mutual bargaining ("log-rolling").
The Q-dispersion statistic was created by Thurstone as a measure of item ranking agreement. Q values were an ordinal analog to standard deviations. For each item a table was constructed with three columns: (1) the pile numbers into which judges sorted the item (ex., 1 to 11); (2) the number of judges who sorted the item into the corresponding pile number; and (3) the cumulative percentage of the second column. To compute the Q value, one interpolated the pile number of the 25% and 75% cumulative percentage points (called "ogive values") and subtracted. Thus if 25% of judges ranked an item 3.7 or less and 75% ranked it 5.5 or less, then Q would be 1.8. Items with the lowest Q values in each pile were selected for the scale, making sure to select items from all piles.
Studies found Thurstone scales were not significantly better than Likert scales in establishing intervalness, and it was also found many statistical procedures were robust in the face of moderate departures from the assumption of intervalness. With these findings, the complicated and costly Thurstone procedure fell into disuse.
General
Business
Crime
Economic/Financial/Work
Education
Politics
Psychology
Public Administration/Organizations
Social
Copyright 1998, 2008 by G. David Garson.
Last updated 3/30/2008.