David
A. Kenny
December 18, 2002
Instrumental Variable Estimation
One way of identifying models
that cannot be estimated by using multiple regression
is through the use of instrumental variables. For path analysis,
the disturbance must not be correlated with each causal variable.
There are three reasons why such a correlation might exist:
-
Spuriousness (Third Variable Causation): A variable causes both the
endogenous
variable and one its causal variables and that variable is not included
in the model.
-
Reverse Causation (Feedback Model): The endogenous
variable causes, either directly or indirectly, one of its causes.
-
Measurement Error: There is measurement
error in a causal variable.
Given the above, one or more
causal variable is correlated with the disturbance
of the endogenous variable. Thus, multiple
regression cannot be used to estimate the causal coefficients. Denote
Y as the endogenous variable, U as
its disturbance, I as an instrumental variable, and Z as the set
of variables that cause Y but not needing an instrumental variable.
The defining feature of an instrumental variable is that I is assumed
not to directly cause Y: The path from I to Y is zero.
The zero path is given by theory, not by statistical analysis.
That is, one should not regress Y on X, I, and Z, and select I by seeing
which variables have coefficients that are not significantly different
from zero. Conditions for instrumental variable estimation:
1) The variable I must not directly cause Y or
be correlated with U.
2) For a given structural equation, there must be as many
or more I variables as there are variables needing an instrument.
3) The variable I must cause the variable that needs an
instrument.
(For the details of identification of
models with instrumental variable.)
Mechanics of Two-Stage Least
Squares (2SLS)
Although this method is not currently used very often for the estimation
of models with instrumental variables, it is instructive to understand
how it works. 2SLS estimation is available as an option within
SPSS.
-
Stage 1: Regress each variable needing an
instrument on the set of I and Z variables. Using the coefficients
from this stage, compute predicted X variables.
-
Stage 2: Regress Y on the stage 1 predicted
variables and the set of Z variables.
In actuality, 2SLS computer programs execute the two steps in a single
stage or step.
2SLS Example
Structural Equations:
Z = aX + bY + U
Y = cQ + dZ + V
Note that the notation has changed. For this example, variable
Q serves as an instrumental variable for Y in the Z equation, and
X serves as an instrumental variable for Z in the Y equation.
For the Z equation:
Stage
1: Regress Y on X and Q.
Stage
2: Regress Z on the stage 1 predicted score for Y and X.
For the Y equation:
Stage
1: Regress Z on X and Q.
Stage
2: Regress Y on the stage 1 predicted score for Z and Q.
Go
to the next page.
Go
to the SEM page.