Ron Dumont and
John Willis on WISC-IV Subtest Substitutions 11/8/03
ust say no to
This is one of the "problems" -- (probably not the only one) -- with the WISC-IV.
Despite the explicit cautions and rules in the WISC-IV manual that strongly
recommends that substitution be done rarely and only with true, clinical
reasoning, examiners will be very tempted to play IQ roulette with the test.
In our feeble attempt to understand the WISC-IV, we have tried to see what the
possible combinations might be. We are mathematicians but here is our
reasoning for over 100 possible FSIQ combinations.
If one can substitute 1 subtest per
factor you have the following:
7 Verbal Comprehension Index
4 Perceptual Reasoning Index
3 Working Memory combinations
3 Processing Speed Index
If we don't care about a FSIQ, we
get 7 x 4 x 3 x 3 = 252 combinations with one subtest substitution per index.
But if we are concerned with the substitution rule: 1 substitution per index
and no more than 2 per FSIQ, we get the following.
two-subtest substitutions we see:
one-subtest substitutions we see:
7 x 4
+ 0 + 0 = 28
7 + 0
+ 0 + 0 = 7
7 + 0
x 3 + 0 = 21
0 + 4
+ 0 + 0 = 4
7 + 0
+ 0 x 3 = 21
0 + 0
+ 3 + 0 = 3
0 + 4
x 3 + 0 = 12
0 + 0
+ 0 + 3 = 3
0 + 4
+ 0 x 3 = 12
0 + 0
+ 3 x 3 = 9
possible total combinations.
We may be missing some important
combinations, but this is our thinking (late at night after a long day of
Now, let’s just consider what happens
when one makes some of the “acceptable” possible substitutions? Examination of
the correlation matrix for the 15 tests gives some interesting results.
erbal Comprehension (VCI) has 2
subtests - Information and Word Reasoning (WR) used for substitution.
Information correlates best to Vocabulary (.75) and then Similarities (.70) and
Comprehension (.62). WR, on the other hand, has lower correlations to the 3
core tests (.58 to .66). As measures of VCI, Information and WR correlate .77
and .70 respectively, while the 3 core subtests correlate .86 to .91. The
substitutions do not correlate to the factor as well as the core subtests do, so
substitute with extreme caution.
erceptual Reasoning (PRI), whatever
that that means, has Picture Completion as a substitution subtest. It correlates
bests with Block Design (BD) (.54) and then Matrix Reasoning (MR) (.46) and
Picture Concepts (PCn) (.39). While the 3 core subtests correlate to the PRI .77
to .84, Picture Completion (PCm) correlates only at .57 – and it correlates
almost equally to the VCI (.55)! Picture Completion does not seem to measure
the same process as Picture Concepts or Matrix Reasoning. MR is the one "pure"
measure of Gf (fluid reasoning), so substituting it out for some reason seems to
drastically change the results. If there had been another Gf subtest in the
scale, there might have been separate Gv (visual/spatial thinking) and Gf
indices, as on the Differential Ability Scales. Now that could make for clearer,
less contaminated factor interpretation. Alas, we are left to fend (and
interpret) for ourselves.
orking Memory Index (WMI) has some
potential additional problems. Arithmetic is the substitution subtest. It
correlates fairly well to the two other subtests (.47 and .51 respectively).
But, despite the moderate correlations, Arithmetic measures something different
from WM – it appears to be a mixed measure of Gq (mathematical ability) and Gsm
(short-term working memory). Interestingly, Arithmetic correlates to the WM
index .57 while correlating to the VCI and PRI at .63 and .62. (PRI has its own
problems so re-read the comments above). The other "problem" with the WM index
is the fact that Digit Span (the Certs™ Test of Intelligence - "It's two, two,
two tests in one") is both Memory Span (Gsm MS - Digits Forward) and Working
Memory (Gsm MW - Digits Backward). If only this test had been allowed to divide
itself into the two tests it really has always been and should be.
Instead, with mindless subtest interpretation, one often get a totally worthless
Digit Span scaled score that might be created by combining a child’s wonderful
memory span with the same child’s pitiful working memory (“I can remember tons
of information for a very short time – but pleeeease – don’t ask me to do
anything with it”).
Next, in Working Memory, we add in
Letter-Number Sequencing (LNS) – probably more appropriately called
Number-Letter Sequencing, since that is the actual task that the child is
supposed to perform - on which a child need do nothing more than simply repeat
back the items the examiner reads (just like Digits Span Forward) to obtain 9
raw score points. Not a bad raw-score performance, but on a working memory
test, this can result in average, or above average scaled scores for never doing
any working memory task at all. If we try to substitute Arithmetic, we get
really lost in what is being measured. This particular index reminds us of a
Anyone who wishes to interpret the WMI blindly, based solely on the individual
subtest scaled scores, without truly understanding the complex nature of the
separate tasks involved, may resemble those early barnstorming pilots in their
poorly constructed aircraft – they sometimes
found themselves spiraling downward in a spectacular smoking display of
technology gone awry. Exciting to watch from the ground - not so exciting if
you're the pilot.
rocessing Speed Index (PSI): Finally
the Processing Speed Index (PSI). Cancellation is the substitution test. It
correlates to Coding (Cd) and Symbol Search and destroy (SS)- .40 and .42, but
correlates to the PSI only .41 – in contrast to the two regular PSI tests that
correlate .88 and .87. Note too, that Cancellation is the poorest overall
measure of G (“Gee, I am smart”, or “Gee, what’s up with that kid”), sharing
only about a paltry 7% of the variance. It may be the overall best measure of
something – we just don’t know what at this point.
ust say no to WISC-IV
this long diatribe, maybe the rule is "Just say no to substitutions." We think
that all the subtests do offer potentially valuable information, but not as some
ingredient in a perceived index. We need to look closely at what abilities
these measures attempt to tap and keep that in mind rather than falling into the
seductive trap of trying to recalculate Index and FS scores.
ottom line: Subtest Substitutions
should be done rarely and only with very good, clinically relevant,
test-specific, “I’ll go to court and defend” reasons. The FSIQ should be based
only on the 10 core subtests unless one is spoiled or compelling clinical
reasons play into a substitution. Substitutions should be made a priori
(before the fact) – based upon an examiner's understanding of the child and the
child's needs (e.g., replace Block Design with Picture Completion for a child
with Cerebral Palsy due to the high demand of motor skills on the Block Design
subtest and the low demand of motor skills on Picture Completion). This
substitution decision must be made before one knows the actual subtest scores;
otherwise an a priori decision simply transforms itself into an a
posteri judgment. If this is the case, disregard all rules and do as you
wish. Like speeding, just hope you don’t get caught!
Substitutions are also acceptable, if
for some plausible, and clinically acceptable reason, a subtest was spoiled
(e.g., substituting Arithmetic for Digit Span because a fire drill takes place
during, and thus spoils, the administration of the Digit Span subtest). This
becomes an acceptable, a posteri, substitution.
Substitutions should never be
made simply to raise or lower a composite score. Certainly one should never play
IQ Roulette simply to raise or lower a score to meet some bureaucratic guideline
for Special Education.
Doesn’t the following sound a bit
"Gee, if I
play IQ roulette (Parker Brothers ™, 2003) and substitute the higher Information
subtest for the lower Comprehension subtests, I raise the VCI enough to get the
FSIQ just high enough so that the desired severe discrepancy
between hope and experience, that didn't exist, now exists, and the MDT
will now believe that little Laurie is deserving of the SPED and the concomitant
that she needs."
"Gee, if I
play IQ roulette (Parker Brothers ™, 2003) and substitute the lower Word
Reasoning subtest for the higher Similarities subtests, I lower the VCI enough
to get the FSIQ just low enough so that the desired severe discrepancy between
hope and experience that did exist now doesn’t exists, and the MDT will now
believe that little Laurie is not deserving of the SPED and concomitant IEP that
we never believed she needed in the first place."
This is not
the way to do it!
It occurs to us that these comparisons
are tricky. If we use the correlations of subtests with index scores at the
bottom of Table 5.1 in the WISC-IV Technical and Interpretive Manual, the
correlations of the core subtests are boosted by being included in the index
scores. That seems to give them an unfair edge over the substitution
subtests. However, if we instead were to use the corrected correlations in
the blue, upper right corner, the core subtests would be at a disadvantage
because they are being correlated with only the remainder of the index
without themselves, while the substitution subtests were being compared to
the whole factor.