|
|
Overview
Probit models are similar to logistic models but use a log-normal transformation (the probit transformation) of the dependent variable. Where logit and logistic regression are appropriate when the categories of the dependent are equal or well dispersed, probit may be recommended when the middle categories have greater frequencies than the high and low tail categories, or with binomial dependents when an underlying normal distribution is assumed. As a practical matter, probit and logistic models yield the same substantive conclusions for the same data the great majority of the time. The probit module in SPSS (in the menus, Analyze, Regression, Probit) is for a particular kind of probit model - the "probit response model.". In SPSS, in addition to the probit module itself (Analyze, Regression, Probit), other types of probit models may be implemented in the following modules, discussed separately: Also, the SPSS manual notes, "the Binary Logistic Regression procedure uses a logistic link model for predicting the event probability for a categorical response variable with two outcomes, and is generally more useful than Probit Analysis when your study is not a dose-response experiment." See the logistic regression section of Statnotes. The SPSS Probit module discussed in this section is designed for a binary dependent using grouped data in a response model. The classic application is dose-response studies in medicine, where experimental groups are given varying doses of a medication under test, where the dependent is response/no response to the medication. Probit relates strength of dose to proportion responding, optionally controlling for other covariates which may also cause a response, such as age. To generalize, this procedure relates levels of an independent covariate to proportion in one category of a binary dependent variable, optionally controlling for one or more continuous covariates and optionally grouping by levels of a categorical factor. It is particularly appropriate for experimental designs where the purpose is to gauge what level of the independent variable (ex., dose) is needed to obtain a specified proportion of responses. An example would be experiments with varying levels of payments to determine the median effective payment rate needed to obtain a 50% volunteer rate for army reenlistment, as applied to randomly selected groups of soldiers nearing the end of their service obligation. Note the SPSS Probit module can also implement logit models for grouped data, assuming a binary dependent. Logit response models have nearly identical inputs and outputs but give slightly different estimates.
|
|
If the default "None" is selected, the probit procedure will not incorporate a natural response rate in the model. If "Calculate from data" or "Value" (the researcher enters a specific value) are selected, a natural response rate will be incorporated in the model and probit estimates and model chi-square tests will differ. If "Calculate from Data" is selected, the input data should contain a control group for which the value of the covariate(s) is zero. If there is no control group, the probit procedure estimates the natural response rate from the entire dataset and prints a notification that no control group was available. If "Value" is selected, the value entered must be non-negative and less than 1.0.

In the fictional example above, the size of reenlistment bonus is tested for Army and Navy groups in a conditional response model of reenlistment response rates. Reenlistment count was transformed by the natural log. Since chi-square is non-significant, model fit is accepted as adequate.


Although there are separate intercepts for Army and Navy, there is a single probit slope estimate for the covariate, which is size of reenlistment bonus. All are significant.

A probit estimate of .093 means that 9.3% of the population would respond even when the magnitude of the stimulus (the covariate, which is reenlistment bonus) was zero. Army and Navy control groups were provided in the data.

Note the stimulus covariate (bonus size for reenlistment) is not actual dollars but is the requested natural log of the bonus. The log was requested to achieve linearity in the probit, discussed in the assumptions section. The model works least well for the Army (Branch = 1) for the larger bonuses.

If the default ("None") is accepted for the natural response rate, note that estimates will then be unconditional potencies, gauging the effect of the stimulus on increasing the response, assuming the natural response rate is zero. In contrast, if a natural response rate is incorporated in a conditional model, potency estimates and relative median potencies below are interpreted in relative terms (that is, in terms which focus on the relative response of different groups to the stimulus, but which do not focus on absolute response differences since these are due to both the natural response rate and to differences in response to the stimulus among groups formed by the factor variable).
If a logit rather than probit model is requested, logit rather than probit estimates are reported in the Estimates column of the Confidence Limits table. Both logit and probit estimates are estimates of the effective levels of the stimulus needed to achieve the row response rate.
The abridged table above shows some of the probability rows for the armed forces reenlistment bonus example, which is a conditional model (conditional on a computed natural response rate). For any row, the estimate is the magnitude of the stimulus covariate (ex., reenlistment bonus) needed to achieve that percentage of response (ex., proportion reenlisting). For instance, for the Army, a reenlistment bonus of $75,400 was estimated to be needed to obtain a 50% reenlistment rate. Note that a lower estimate corresponds to greater potency, since less stimulus is needed to achieve the response rate.
The 95% confidence interval for each relative median potency is also presented. If the value 1.0 is not within the confidence interval, then the researcher concludes that there is a significant difference in potencies. The factor level with the strongest potency is the factor level which requires the smallest magnitude of the stimulus (the covariate) to obtain median potency (to obtain a 50% response rate).
If a natural response rate is incorporated in the model, note that estimates or relative potencies will then be conditional and the strongest factor level by relative potency is the one obtaining the greatest increase in response based on a given magnitude of the stimulus (covariate) variable. It will not necessarily be the factor level obtaining the largest absolute response since some other factor level may have a very high natural response rate to begin with.

In the example a natural response rate was incorporated in the model by asking that the rate be computed from the data (under the Options button). Recall the groups were Army=1 and Navy=2. To achieve linearity in the probit, response was transformed by taking a natural log, but estimates and confidence limits are given with and without log transform. Since 1.0 is not within the confidence limits, we can assume that Army and Navy groups differ in potency (responsiveness to any given level of the stimulus, which is reenlistment bonus size). We would normally interpret RMP in untransformed terms. Thus in the Confidence Limits table, the Army potency at the median .50 response level was 75,400; for the Navy it was 31,810. That is, the reenlistment bonus stimulus had greater potency for the Navy than the Army since less was required to achieve a 50% reenlistment rate. The ratio of these two potencies is the RMP = 75400/31810 = 2.37 for Army compared to Navy. Similarly, for Navy compared to Army, the RMP computes to 0.42. The former (2.37) is greater than 1.0, meaning that compared to the Navy, in the Army achieving a 50% reenlistment rate requires a greater bonus. The latter (.42) is less than 1.0, meaning that compared to the Army, in the Navy achieving a 50% reenlistment rate can be achieved with a smaller bonus. Note interpretation is relative in a conditional model, not absolute: the RMP of .42 does not mean that Navy bonuses can be 42% of those for the Army since the RMP estimates are estimates controlling for differences between the Army and Navy in natural response (reenlistment) rates in this example.
Rows are the groups. Each group receives a given level of the independent(s). The independents are continuous covariates (thus, in a medical study, dose could be a covariate). There can be one grouping factor, which is a categorical variable. Each group must have the same value on the factor and/or covariates (ex., if age is a covariate, then each group must have all individuals of the same age). There is a categorical factor variable (ex., brand of medication) with categories coded as integers. The dependent (response) variable is count of the number of observations in the desired binary category (ex., typically response to dose) for that group (row). There also must be a group size variable (ex., "Total") with the count of the total number of observations in that group (row).
The SPSS manual gives this example of how to use the Aggregate command to obtain grouped data from individual data:
Aggregating case-by-case data
DATA LIST FREE/PREPARTN DOSE RESPONSE.
BEGIN DATA
1.00 1.50 .00
...
4.00 20.00 1.00
END DATA.
AGGREGATE OUTFILE=*
/BREAK=PREPARTN DOSE
/SUBJECTS=N(RESPONSE)
/NRESP=SUM(RESPONSE).
PROBIT NRESP OF SUBJECTS BY PREPARTN(1,4) WITH DOSE.
PROBIT response-count varname OF observation-count varname
WITH varlist [BY varname(min,max)]
[/MODEL={PROBIT**}]
{LOGIT }
{BOTH }
[/LOG=[{10** }]
{2.718}
{value}
{NONE }
[/CRITERIA=[{OPTOL }({epsilon**0.8})][P({0.15**})][STEPLIMIT({0.1**})]
{CONVERGE} {n } {p } {n }
[ITERATE({max(50,3(p+1)**})]]
{n }
[/NATRES[=value]]
[/PRINT={[CI**] [FREQ**] [RMP**]} [PARALL] [NONE] [ALL]]
{DEFAULT** }
[/MISSING=[{EXCLUDE**}] ]
{INCLUDE }
**Default if the subcommand or keyword is omitted.
Example
PROBIT R OF N BY ROOT(1,2) WITH X
/MODEL = BOTH.