Since 2002, 4 million visitors plus:
hit counters
search engine optimization service

  Appletcollection Vertical Menu java applet, Copyright 2003 GD


Tables to Aid in the Interpretation of the Woodcock Johnson - Revised Cognitive Battery

Ron Dumont Ed.D, NCSPJohn O. Willis Ed.D., Joe Janetti, NCSP

Base rate data were computed from a sub-sample of the standardization sample of the Woodcock Johnson - Revised Cognitive Battery. This sub-sample consisted of all 3,130 children (1568 female, 1562 male) , ages 6 to 18 (m = 11.6 sd = 3.5) in grades 1 through 12 (m = 6.11 sd = 3.5) administered the subtests of the WJ-R COG. (John and Ron thank Drs. Richard Woodcock and Kevin McGrew for the very generous granting of access to portions of the WJ-R standardization data and for the kind permission to post these results.)

Examiners often are interested in knowing if the difference between two test scores is significant. Applying one of several discrepancy analysis procedures usually does this. Examiners can analyze score or cluster differences obtained on particular tests (e.g., they can compare the WJ-R COG Memory for Words subtest score to the Memory for Sentences subtest score or compare the Fluid Ability Cluster score to the Crystallized Ability Cluster score). Some procedures for comparing scores within tests are described here.

Confidence Bands

Test scores are never perfectly accurate. Lucky or unlucky guesses, lapses of attention, and other factors mean that the same person would almost never get exactly the same score on a test twice in a row. A confidence band around a score tells how scores on that test are likely to vary by pure chance.

If the confidence bands on two scores overlap, there probably is not a significant difference between the two scores. On another day the higher and lower scores might have been reversed.

If the confidence bands on two scores do not overlap, and if both scores are probably valid, there probably is a significant difference between the two scores. On another day, the higher and lower scores would probably have still been the higher and lower scores, respectively.

In the example above, there is a triumph of Hope over Experience, but neither is significantly different from Dumb Luck.


Base-rate refers to the prevalence or frequency of a particular occurrence or event within a population. Awareness of relevant base-rate data allows an evaluator to determine the diagnostic utility of a particular sign. Although a particular relevant comparison may reach some level of statistical significance, it is always necessary to determine if the statistical difference is a usual or an unusual one. Base-rate information provides just such data.

Testing the Difference of Scores within the Same Test

One can test the differences between any 2 of the 21 WJ-R COG subtests and/or between any of the 7 WJ-R COG composites associated with the McGrew, Flanagan, and Ortiz integrated Carrol/Cattell-Horn Gf-Gc model. The first step in conducting a discrepancy analysis between two WJ-R COG subtest scores is to calculate the actual difference between the scores in question. This is computed by subtracting the lower test standard score from the higher test standard score. The next step is to determine if the amount of point difference is large enough to be of any interest. We describe two methods that can be used to examine within-test difference scores for importance. The first examines the statistically significant difference between two test scores; the second examines whether or not the difference is large enough to be considered clinically useful.

Statistical Significance

The first step in examining difference between scores is to see if the difference is beyond that which would be expected by chance alone. Anastasi and Urbina (1997) provide a formula to help determine how large a Difference Score must be in order to be statistically significant. This formula has been adapted to read:

Significant Difference Score = SD*Z*Sqrt[2-(r1+r2)]

where, SD = standard deviation of the two scores, Z = statistical significance level, r1 = reliability of the first score, and r2 = reliability of the second score.

All subtests and composites of the WJ-R have a standard deviation of 15. For our purposes, the significance level at .05 was employed, which is represented on the z-distribution table as 1.96. Table 7.1 of the Woodcock manual ( Woodcock, R. W., & Mather, N. (1989). WJ-R Tests of Cognitive Ability -- Standard and Supplemental Batteries: Examiner's Manual. In R. W. Woodcock & M. B. Johnson, Woodcock-Johnson Psycho-Educational Battery--Revised. Chicago: Riverside Publishing Co.) (p 117) provides the median internal consistency reliability coefficients for the WJ-R COG subtests and composites across the standardization sample ages. Thus, we can use the formula to determine the minimal Difference Score required for significance for all subtest and composite combinations.

When considering score differences, one should consider the true meaning concerning differences between two test scores that are not significant at a desired level. If the difference is due to chance, then for all practical purposes, the difference should be thought of as being zero. There is no real meaning to saying something is "almost significant." Therefore, when making comparisons among WJ-R COG subtests and composites, differences that are not significant at the .05 level should be interpreted to mean that the examinee demonstrated equal abilities in the abilities measured by the subtests or composites.

If two test scores are significantly different from one another, one still cannot assume that the differences are unusual enough to be clinically useful (i.e., that the differences are rare enough to be of value). To help determine how severe the discrepancy must be to be considered clinically useful, frequency tables were created from the standardization sampling.


To create the tables for significant differences, the following formulas were utilized:

Significance level for multiple comparisons: The Davis (1959) formula used to compute the deviations from the average that are significant at the desired level of significance. That formula is:

SQRT(SUM(∑SEmT2)/n2+((n-2/n)*SEmI2))*Bonferroni Correction

∑SEmT2= Sum of the Standard Errors of Measurement Squared for all subtests included in the comparison.
n= the total number of subtests in the comparison.
SEmI2= The Standard Error of Measurement Squared for the individual subtest in question.
Bonferroni Correction= The adjustment made for alpha slippage due to multiple comparisons and set at 95% confidence.

Links to the tables and descriptions of how to use them are below:

Between tests: For individual test strength or weakness, compared to all other tests: determine the mean of the 7, 14, or 21 tests administered. Subtract the obtained score of the desired test from the total mean. If the absolute value of the resulting number is greater than the "Significance level" value in appropriate column, the test may be considered a strength or a weakness. 

7 Subtest comparison 14 Subtest comparison 21 Subtest comparison

Clusters (1): To determine between-cluster strengths and weaknesses, compared each cluster score to the mean of all the clusters combined.

Between Cluster comparisons

Clusters (2): To determine BCA-cluster strengths and weaknesses, compared each cluster score to the BCA.

BBCA Cluster comparisons

Within-clusters: Cluster scores are determined from the individual scores of either 2 (14 test administration) or 3-4 (21 test administration) tests. To compare within-cluster test differences, determine the differences between each subtest comparison within the cluster and use the appropriate "Difference score" from the table. 

Within-Cluster comparisons




Click Here!