Since 2002, 4 million visitors plus:
hit counters
search engine optimization service

  Appletcollection Vertical Menu java applet, Copyright 2003 GD

Ron Dumont and John Willis on WISC-IV Subtest Substitutions 11/8/03


ust say no to WISC-IV substitutions!  This is one of the "problems" -- (probably not the only one) -- with the WISC-IV. Despite the explicit cautions and rules in the WISC-IV manual that strongly recommends that substitution be done rarely and only with true, clinical reasoning, examiners will be very tempted to play IQ roulette with the test. In our feeble attempt to understand the WISC-IV, we have tried to see what the possible combinations might be. We are mathematicians but here is our reasoning for over 100 possible FSIQ combinations.

If one can substitute 1 subtest per factor you have the following:

7 Verbal Comprehension Index subtest combinations

4 Perceptual Reasoning Index combinations

3 Working Memory combinations

3 Processing Speed Index combinations

If we don't care about a FSIQ, we get 7 x 4 x 3 x 3 = 252 combinations with one subtest substitution per index. But if we are concerned with the substitution rule: 1 substitution per index and no more than 2 per FSIQ, we get the following.


For two-subtest substitutions we see:   For one-subtest substitutions we see:
7 x 4 + 0 + 0 = 28   7 + 0 + 0 + 0 = 7
7 + 0 x 3 + 0 = 21   0 + 4 + 0 + 0 = 4
7 + 0 + 0 x 3 = 21   0 + 0 + 3 + 0 = 3
0 + 4 x 3 + 0 = 12   0 + 0 + 0 + 3 = 3
0 + 4 + 0 x 3 = 12   total = 17    
0 + 0 + 3 x 3 =   9    
total = 103   120 possible total combinations.

We may be missing some important combinations, but this is our thinking (late at night after a long day of travel). 

Now, let’s just consider what happens when one makes some of the “acceptable” possible substitutions?  Examination of the correlation matrix for the 15 tests gives some interesting results.[1]   


erbal Comprehension (VCI) has 2 subtests - Information and Word Reasoning (WR) used for substitution. Information correlates best to Vocabulary (.75) and then Similarities (.70) and Comprehension (.62).  WR, on the other hand, has lower correlations to the 3 core tests (.58 to .66).  As measures of VCI, Information and WR correlate .77 and .70 respectively, while the 3 core subtests correlate .86 to .91. The substitutions do not correlate to the factor as well as the core subtests do, so substitute with extreme caution.  


erceptual Reasoning (PRI), whatever that that means, has Picture Completion as a substitution subtest. It correlates bests with Block Design (BD) (.54) and then Matrix Reasoning (MR) (.46) and Picture Concepts (PCn) (.39). While the 3 core subtests correlate to the PRI .77 to .84, Picture Completion (PCm) correlates only at .57 – and it correlates almost equally to the VCI (.55)!  Picture Completion does not seem to measure the same process as Picture Concepts or Matrix Reasoning.  MR is the one "pure" measure of Gf (fluid reasoning), so substituting it out for some reason seems to drastically change the results.   If there had been another Gf subtest in the scale, there might have been separate Gv (visual/spatial thinking) and Gf indices, as on the Differential Ability Scales. Now that could make for clearer, less contaminated factor interpretation.  Alas, we are left to fend (and interpret) for ourselves.


orking Memory Index (WMI) has some potential additional problems.  Arithmetic is the substitution subtest.  It correlates fairly well to the two other subtests (.47 and .51 respectively). But, despite the moderate correlations, Arithmetic measures something different from WM – it appears to be a mixed measure of Gq (mathematical ability) and Gsm (short-term working memory). Interestingly, Arithmetic correlates to the WM index .57 while correlating to the VCI and PRI at .63 and .62.  (PRI has its own problems so re-read the comments above). The other "problem" with the WM index is the fact that Digit Span (the Certs™ Test of Intelligence - "It's two, two, two tests in one") is both Memory Span (Gsm MS - Digits Forward) and Working Memory (Gsm MW - Digits Backward).  If only this test had been allowed to divide itself into the two tests it really has always been and should be.[2]  Instead, with mindless subtest interpretation, one often get a totally worthless Digit Span scaled score that might be created by combining a child’s wonderful memory span with the same child’s pitiful working memory (“I can remember tons of information for a very short time – but pleeeease – don’t ask me to do anything with it”). 

Next, in Working Memory, we add in Letter-Number Sequencing (LNS) – probably more appropriately called Number-Letter Sequencing, since that is the actual task that the child is supposed to perform - on which a child need do nothing more than simply repeat back the items the examiner reads (just like Digits Span Forward) to obtain 9 raw score points.  Not a bad raw-score performance, but on a working memory test, this can result in average, or above average scaled scores for never doing any working memory task at all.  If we try to substitute Arithmetic, we get really lost in what is being measured.  This particular index reminds us of a barnstorming analogy[3]. Anyone who wishes to interpret the WMI blindly, based solely on the individual subtest scaled scores, without truly understanding the complex nature of the separate tasks involved, may resemble those early barnstorming pilots in their poorly constructed aircraft – they sometimes found themselves spiraling downward in a spectacular smoking display of technology gone awry.   Exciting to watch from the ground - not so exciting if you're the pilot.


rocessing Speed Index (PSI): Finally the Processing Speed Index (PSI).  Cancellation is the substitution test.  It correlates to Coding (Cd) and Symbol Search and destroy (SS)-  .40 and .42, but correlates to the PSI only .41 –  in contrast to the two regular PSI tests that correlate .88 and .87.  Note too, that Cancellation is the poorest overall measure of G (“Gee, I am smart”, or “Gee, what’s up with that kid”), sharing only about a paltry 7% of the variance.  It may be the overall best measure of something – we just don’t know what at this point. 


ust say no to WISC-IV substitutions!   After this long diatribe, maybe the rule is "Just say no to substitutions."  We think that all the subtests do offer potentially valuable information, but not as some ingredient in a perceived index.  We need to look closely at what abilities these measures attempt to tap and keep that in mind rather than falling into the seductive trap of trying to recalculate Index and FS scores. 


ottom line:   Subtest Substitutions should be done rarely and only with very good, clinically relevant, test-specific, “I’ll go to court and defend” reasons. The FSIQ should be based only on the 10 core subtests unless one is spoiled or compelling clinical reasons play into a substitution.  Substitutions should be made a priori (before the fact) –  based upon an examiner's understanding of the child and the child's needs (e.g., replace Block Design with Picture Completion for a child with Cerebral Palsy due to the high demand of motor skills on the Block Design subtest and the low demand of motor skills on Picture Completion).  This substitution decision must be made before one knows the actual subtest scores; otherwise an a priori decision simply transforms itself into an a posteri judgment.  If this is the case, disregard all rules and do as you wish.  Like speeding, just hope you don’t get caught!

Substitutions are also acceptable, if for some plausible, and clinically acceptable reason, a subtest was spoiled (e.g., substituting Arithmetic for Digit Span because a fire drill takes place during, and thus spoils, the administration of the Digit Span subtest). This becomes an acceptable, a posteri, substitution.

Substitutions should never be made simply to raise or lower a composite score. Certainly one should never play IQ Roulette simply to raise or lower a score to meet some bureaucratic guideline for Special Education. 

Doesn’t the following sound a bit lame?

"Gee, if I play IQ roulette (Parker Brothers ™, 2003) and substitute the higher Information subtest for the lower Comprehension subtests, I raise the VCI enough to get the FSIQ just high enough so that the desired severe discrepancy[4] between hope and experience, that didn't exist, now exists, and the MDT[5] will now believe that little Laurie is deserving of the SPED and the concomitant IEP[6] that she needs."  




"Gee, if I play IQ roulette (Parker Brothers ™, 2003) and substitute the lower Word Reasoning subtest for the higher Similarities subtests, I lower the VCI enough to get the FSIQ just low enough so that the desired severe discrepancy between hope and experience that did exist now doesn’t exists, and the MDT will now believe that little Laurie is not deserving of the SPED and concomitant IEP that we never believed she needed in the first place."


This is not the way to do it!

[1] It occurs to us that these comparisons are tricky.  If we use the correlations of subtests with index scores at the bottom of Table 5.1 in the WISC-IV Technical and Interpretive Manual, the correlations of the core subtests are boosted by being included in the index scores.  That seems to give them an unfair edge over the substitution subtests.  However, if we instead were to use the corrected correlations in the blue, upper right corner, the core subtests would be at a disadvantage because they are being correlated with only the remainder of the index without themselves, while the substitution subtests were being compared to the whole factor.

[2] Actually, we must admit that the WISC-IV does allow for examination of these scores as scaled scores, but does so by calling them “Process Scores.”  These scores (DSF, DSB, LDSF, LDSB), and the base rate frequencies provided for them, are probably better indicators of a child’s ability and performance in the area of Memory than is the global aggregate Digit Span scaled score.

[3] Thanks to Guy McBride for this wonderful and apt description.

[4] Severe Discrepancy   One of the regulatory requirements for identification of a specific learning disability:  “a severe discrepancy between [academic] achievement and intellectual ability.”  Frequently misinterpreted as a severe discrepancy between other pairs of variables never contemplated in the regulations, such as Verbal IQ and Performance IQ, height and weight, or hope and experience.  Despite Congress’s wise and explicit rejection of proposed mathematical formulae to quantify the severe discrepancy, some school districts impose various and wondrous arithmetic criteria that blindly identify students with no particular learning problem and deny services to students with severe learning disabilities.  These school districts tend over time to come to mistake their policies for state law.

[5] Multi-Disciplinary Team (MDT ): Hopefully, a group of individuals with both left- and right-hemispheres who are willing, and able, to participate in a process in which calling each other “Smart-ass” and “Dumb-bunny” is not only acceptable but expected. (The table at which this confrontation occurs does not contain a single rubber stamp).

[6] Individualized Education Program (IEP)   A mass-produced, computer-generated document describing in considerable detail the pre-existing program into which the student will be placed.  The individualization occurs when the student’s name, age, grade, and birth date are entered in the appropriate computer program fields.