David A. Kenny
January 22, 2010
Measuring Model Fit

Fit refers to the ability of a model to reproduce the data (i.e., usually the variance-covariance matrix). It should be noted that a good-fitting model is not necessarily a valid model. There are now literally hundreds of measures of fit. Moreover, a model all of whose parameters are zero is of a "good-fitting" model. This page includes some of the major ones, but does not pretend to include all the measures.  Though a bit dated, the book edited by Bollen and Long (Testing structural equation models. Newbury Park, CA: Sage, 1993) explains these indexes and others.

Chi Square: &chi2

For models with about 75 to 200 cases, this is a reasonable measure of fit.  But for models with more cases, the chi square is almost always statistically significant.  Chi square is also affected by the size of the correlations in the model: the larger the correlations, the poorer the fit.  For these reasons alternative measures of fit have been developed.  (A website for computing p values for chi square.)

Chi Square to df Ratio: &chi2/df

There are no consistent standards for what is considered an acceptable model.

Transforming Chi Square to Z

Sometimes chi square is more interpretable if it is transformed into a Z value.  The following appoximation can be used:

Z =  √(2&chi2) - √(2df - 1)

Bentler-Bonett Index or Normed Fit Index (NFI)

Define the null model as a model in which all of the correlations or covariances are zero.  The null model is referred to as the "Independence Model" in AMOS.  Its formula is:

[&chi2(Null Model) - &chi2(Proposed Model)]/ [&chi2(Null Model)]

A value between .90 and .95 is acceptable, and above .95 is good. A disadvantage of this measure is that it cannot be smaller if more parameters are added to the model.  Thus, the more parameters added to the model, the larger the index.  It is for this reason that this measure is not recommended, but rather one of the next two is used.

Tucker Lewis Index or Non-normed Fit Index (NNFI)

A problem with the Bentler-Bonett index is that there is no penalty for adding parameters.  The Tucker-Lewis index does have such a penalty.  Let &chi2/df be the ratio of chi square to its degrees of freedom

[&chi2/df(Null Model) - &chi2/df(Proposed Model)]/[&chi2/df(Null Model) - 1]

If the index is greater than one, it is set at one.  It is interpreted as the Bentler-Bonett index.  Note than for a given model, a lower chi square to df ratio (as long as it is not less than one) implies a better fitting model.   Note that the TLI (and the CFI which follows) depends on the average size of the correlations in the data.  If the average correlation between variables is not high, then the TLI will not be very high. Consider a simple example.  You have a 5-item scale that you think measures one latent variable. You also have 3 dichotomous experimental variables that you manipulate that cause those two latent factors.  These three experimental variables create 7 variables when you allow for all possible interactions. You have equal N so all their correlations are zero.  If you run the CFA on the 5 indicators, you might have a nice TLI of .95.  However, if you add in the 7 experimental variables, your TLI might sink below .90 because now the null model will not be so "bad" because you now have added to the model 7 variables who have zero correlations with each other.

Comparative Fit Index (CFI)

This measure is directly based on the non-centrality measure.  Let d = &chi2 - df where df are the degrees of freedom of the model.  The Comparative Fit Index equals

[d(Null Model) - d(Proposed Model)]/d(Null Model)

If the index is greater than one, it is set at one and if less than zero, it is set to zero. It is interpreted as the previous indexes.  If the CFI is less than one, then the CFI is always greater than the TLI.  CFI pays a penalty of one for every parameter estimated.   Note that the CFI depends on the average size of the correlations in the data.  If the average correlation between variables is not high, then the CFI will not be very high.

Root Mean Square Error of Approximation (RMSEA)

This measure is based on the non-centrality parameter.  Its formula can be shown to equal:

√[([&chi2/df] - 1)/(N - 1)]

where N the sample size and df the degrees of freedom of the model.  (If &chi2 is less than df, then RMSEA is set to zero.)  Good models have an RMSEA of .05 or less. Models whose RMSEA is .10 or more have poor fit.

A confidence interval can be computed for this index.  First, the value of the non-centrality parameter is determined by &chi2 - df.  The confidence interval for non-centrality parameter can be determined for &chi2, df, and the width of the confidence interval.  (One can use the function "CNONCT" within SAS to compute these values.  Also a website for computing p values for the non-centrality parameter.)  Then these values are substituted for &chi2 - df into the formula for the RMSEA.  Ideally the lower value of the 90% confidence interval includes or is very near zero and the upper value is not very large, i.e., less than .08.   Note that the RMSEA can be misleading when the df are small and sample size is not large.  For instance, a chi square of 2.098 (a value not statistically significant), with a df of 1 and N of 70 yields an RMSEA of .126. 

p of Close Fit (PCLOSE)

The null hypothesis is that the RMSEA is .05, a close-fitting model. The p value examines the alternative hypothesis that the RMSEA is greater that .05. So if the p is greater than .05, then it is concluded that the fit of the model is "close."

Standardized Root Mean Square Residual (SRMR)

This measure is the standardized difference between the observed covariance and predicted covariance.  A value of zero indicates perfect fit.  This measure tends to be smaller as sample size increases and as the number of parameters in the model increases. A value less than .08 is considered a good fit.

Akaike Information Criterion (AIC)

The AIC measure indicates a better fit when it is smaller.  The measure is not standardized and is not interpreted for a given model.  For two models estimated from the same data set, the model with the smaller AIC is to be preferred.

&chi2 + k(k - 1) - 2df

where k is the number of variables in the model and df is the degrees of freedom of the model.  Note that k(k - 1) - 2df equals the number of free parameters in the model.   The AIC makes the researcher pay a penalty of two for every parameter that is estimated.  The absolute value of AIC has relatively little meaning; rather the focus is on the relative size, the model with the smaller AIC being preferred.

Bayesian Information Criterion (BIC) and Adjusted BIC

the AIC pays a penalty of 2 for each parameter estimated.  The BIC and adjusted BIC increases the penalty as sample size increases

&chi2 + [k(k - 1)/2 - df]ln(N)

where ln(N) is the natural logarithm of the number of cases in the sample.  The adjusted BIC replaces ln(N) with ln[(N + 2)/24].  The BIC places a high value on parsimony (perhaps too high).  The adjusted BIC, while placing a penalty for adding parameters based on sample, does not place as high a penalty as the BIC.  Like the AIC, these measures are not absolute measues and are used to compare the fit of two or more models estimated from the same data set.  The adjusted BIC is not given in Amos, but is given in Mplus.

GFI and AGFI (LISREL measures)

These measures are affected by sample size and can be large for models that are poorly specified. The current consensus is not to use these measures.

Hoelter Index

The index states the sample size at which chi square would not be significant (alpha = .05), i.e., that is how small one's sample size would have to be for the result to be no longer significant.  The index should only be computed if the chi square is statistically significant.  Its formula is:

[(N - 1)&chi2(crit)/&chi2] + 1

where N is the sample size,  &chi2 is the chi square for the model and &chi2(crit) is the critical value for the chi square.  If the critical value is unknown, the following approximation can be used:

[[1.645 + √(2df - 1)]2/[2&chi2/(N - 1)]] + 1

where df are the degrees of freedom of the model.  For both of these formulas, one rounds down to the nearest integer value.   Hoelter recommends values of at least 200.  Values of less than 75 indicate very poor model fit.


Go to the next page.
Causal Modeling logo
Go to the SEM page.