| ||||||
|
|
Tables to Aid in the Interpretation of the Woodcock Johnson - Revised Cognitive Battery
(John and Ron thank Drs. Richard Woodcock and Kevin McGrew for the very generous granting of access to portions of the WJ-R standardization data and for the kind permission to post these results.) Examiners often are interested in knowing if the difference between two test scores is significant. Applying one of several discrepancy analysis procedures usually does this. Examiners can analyze score or cluster differences obtained on particular tests (e.g., they can compare the WJ-R COG Memory for Words subtest score to the Memory for Sentences subtest score or compare the Fluid Ability Cluster score to the Crystallized Ability Cluster score). Some procedures for comparing scores within tests are described here. Confidence Bands Test scores are never perfectly accurate. Lucky or unlucky guesses, lapses of attention, and other factors mean that the same person would almost never get exactly the same score on a test twice in a row. A confidence band around a score tells how scores on that test are likely to vary by pure chance. If the confidence bands on two scores overlap, there probably is not a significant difference between the two scores. On another day the higher and lower scores might have been reversed. If the confidence bands on two scores do not overlap, and if both scores are probably valid, there probably is a significant difference between the two scores. On another day, the higher and lower scores would probably have still been the higher and lower scores, respectively.
In the example above, there is a triumph of Hope over Experience, but neither is significantly different from Dumb Luck. Base-rate Base-rate refers to the prevalence or frequency of a particular occurrence or event within a population. Awareness of relevant base-rate data allows an evaluator to determine the diagnostic utility of a particular sign. Although a particular relevant comparison may reach some level of statistical significance, it is always necessary to determine if the statistical difference is a usual or an unusual one. Base-rate information provides just such data. Testing the Difference of Scores within the Same Test One can test the differences between any 2 of the 21 WJ-R COG subtests and/or between any of the 7 WJ-R COG composites associated with the McGrew, Flanagan, and Ortiz integrated Carrol/Cattell-Horn Gf-Gc model. The first step in conducting a discrepancy analysis between two WJ-R COG subtest scores is to calculate the actual difference between the scores in question. This is computed by subtracting the lower test standard score from the higher test standard score. The next step is to determine if the amount of point difference is large enough to be of any interest. We describe two methods that can be used to examine within-test difference scores for importance. The first examines the statistically significant difference between two test scores; the second examines whether or not the difference is large enough to be considered clinically useful. Statistical Significance The first step in examining difference between scores is to see if the difference is beyond that which would be expected by chance alone. Anastasi and Urbina (1997) provide a formula to help determine how large a Difference Score must be in order to be statistically significant. This formula has been adapted to read:
where, SD = standard deviation of the two scores, Z = statistical significance level, r1 = reliability of the first score, and r2 = reliability of the second score. All subtests and composites of the WJ-R have a standard deviation of 15. For our purposes, the significance level at .05 was employed, which is represented on the z-distribution table as 1.96. Table 7.1 of the Woodcock manual ( Woodcock, R. W., & Mather, N. (1989). WJ-R Tests of Cognitive Ability -- Standard and Supplemental Batteries: Examiner's Manual. In R. W. Woodcock & M. B. Johnson, Woodcock-Johnson Psycho-Educational Battery--Revised. Chicago: Riverside Publishing Co.) (p 117) provides the median internal consistency reliability coefficients for the WJ-R COG subtests and composites across the standardization sample ages. Thus, we can use the formula to determine the minimal Difference Score required for significance for all subtest and composite combinations. When considering score differences, one should consider the true meaning concerning differences between two test scores that are not significant at a desired level. If the difference is due to chance, then for all practical purposes, the difference should be thought of as being zero. There is no real meaning to saying something is "almost significant." Therefore, when making comparisons among WJ-R COG subtests and composites, differences that are not significant at the .05 level should be interpreted to mean that the examinee demonstrated equal abilities in the abilities measured by the subtests or composites. If two test scores are significantly different from one another, one still cannot assume that the differences are unusual enough to be clinically useful (i.e., that the differences are rare enough to be of value). To help determine how severe the discrepancy must be to be considered clinically useful, frequency tables were created from the standardization sampling. Method To create the tables for significant differences, the following formulas were utilized: Significance level for multiple comparisons: The Davis (1959) formula used to compute the deviations from the average that are significant at the desired level of significance. That formula is: SQRT(SUM(∑SEmT2)/n2+((n-2/n)*SEmI2))*Bonferroni Correction
Links to the tables and descriptions of how to use them are below: Between tests: For individual test strength or weakness, compared to all other tests: determine the mean of the 7, 14, or 21 tests administered. Subtract the obtained score of the desired test from the total mean. If the absolute value of the resulting number is greater than the "Significance level" value in appropriate column, the test may be considered a strength or a weakness.
Clusters (1): To determine between-cluster strengths and weaknesses, compared each cluster score to the mean of all the clusters combined. Clusters (2): To determine BCA-cluster strengths and weaknesses, compared each cluster score to the BCA. Within-clusters: Cluster scores are determined from the individual scores of either 2 (14 test administration) or 3-4 (21 test administration) tests. To compare within-cluster test differences, determine the differences between each subtest comparison within the cluster and use the appropriate "Difference score" from the table.
|
|
Content on these pages is copyrighted by Dumont/Willis © (2001) unless otherwise noted. |