 
Not All Test Scores are Alike
The Gift of Time
 Grade Norms vs. Age Norms (see WoodcockJohnson chart)
GradeEquivalent Scores
 Not equal units: cannot be added, subtracted, multiplied, divided, nor
averaged.
 Do not reflect the student's actual functioning level (see OATCEREAL)
 May not be real scores at all (interpolated and extrapolated)
 May not even be grade levels included in the test.
Percentile Ranks
 The percent of students whose scores were tied or beaten by this student.
The 37^{th} percentile rank means you scored as high as or higher
than 37 percent of the students in the test's norming sample or in your
local group. The 99^{th} percentile means you were in the highest
one percent of the group.
 Nothing to do with percent correct. (Never use % sign in an abbreviation!)
 Not equal units: cannot be added, subtracted, multiplied, divided, nor
averaged.
Standard Scores and Scaled Scores
 Measure how far the student scored from the average in terms of the
average spread of scores for the whole group. A standard score of 115 or
scaled score of 13 means the student scored one standard deviation above the
average (which would be the 84^{th} percentile rank). A standard
score of 85 or scaled score of 7 means the student scored one standard
deviation below the average (which would be a percentile rank of 16).
 Equal units: can be added, subtracted, multiplied, divided, or averaged if
you're in the mood.
 Too narrow: Encourage obsessive comparisons between essentially identical
scores.
 Often misunderstood.
Stanines
 Almost equal units: can be added, subtracted, multiplied, divided, or
averaged if you wish.
 Too broad: Encourage obsessive comparisons between essentially identical
scores.
 Fairly easy to explain and understand (on a good day).
Percentile Ranks and Standard Scores
 These two statistics will not always tell the same story. A student may
not be many points away from the average and still have an extreme
percentile rank or may be many points away from the average and nonetheless
have a fairly average percentile rank.
Confidence Bands
 Test scores are never perfectly accurate. Lucky or unlucky guesses,
lapses of attention, and other factors mean that the same person would almost
never get exactly the same score on a test twice in a row. A confidence band
around a score tells how scores on that test are likely to vary by pure
chance.
 If the confidence bands on two scores overlap, there probably is not a
significant difference between the two scores. On another day the higher and
lower scores might have been reversed.
 If the confidence bands on two scores do not overlap, and if both scores are
probably valid, there probably is a significant difference between the two
scores. On another day, the higher and lower scores would probably have still
been the higher and lower scores, respectively.
In the example above, there is a triumph of Hope over Experience, but
neither is significantly different from Dumb Luck.
Stanines
 As a rough approximation, you can usually assume that two valid scores
that are in adjacent stanines may not be significantly different, but that
two valid scores that differ by more than one stanine probably are
significantly different.
Significant Difference
 A "significant difference" is one that is too large to have
been likely to have occurred by chance when there was no real difference
between the abilities being tested. This likelihood is expressed as a
probability. e.g., p<.05 means that there were fewer than 5 chances in 100
of a difference that large or larger happening by accident.
Baserate

Baserate refers to the prevalence or frequency of a
particular occurrence or event within a population. Awareness of relevant
baserate data allows an evaluator to determine the diagnostic utility of a
particular sign. Although a particular relevant comparison may reach some level
of statistical significance, it is always necessary to determine if the
statistical difference is a usual or an unusual one. Baserate information
provides just such data.
 Although an 11.2 point difference between scores on the WISCIII Verbal
and Performance scales represents a statistical difference, base rate tells
us that such an occurrence is likely happen in about 40.2% of the
population.
Item Analysis
 A score can tell you only so much, and some of what it tells you may be
wrong. To really understand a student's test performance, you need to look
at the individual item responses. For example, the GatesMacGinitie manuals
have extremely valuable sections on sources of error, such as responding to
a single word in the paragraph instead of the whole text, making false
assumptions on the basis of prior knowledge, or overemphasizing one part of
the story.
Scoring
 All scores should be done three times. Count the number right. Then
count the number wrong and subtract that from the total. Finally, start with
the number right and add one point for each wrong item; you should come out
with the total number of items at the end. Read numbers and titles of tables,
columns, and rows aloud as you look up scores and listen to yourself.
 Examine your resulting standard scores. Do any look like they don't
belong. The child with a score of 143 on Memory for Sentences and scores
hovering around 100 all on other test should set off a mad rush to answer the
question "Why?" Often you will find that you made a mistake
when you scored the test as opposed to having just discovered some weird
ability of the child. "Tester: Blame thyself before passing the
blame on to others."
TEST SCORES ARE NOT NECESSARILY TRUSTWORTHY
PERVASIVE INVALIDITY
 Princess Summerfallwinterspring and seasonal norms.
 The student may have just blown off the test.
 The student may have had a bad day.
 The student may have followed the instruction to skip toodifficult items
but forgotten to skip the corresponding items on the answer sheet.
 The student may not have followed the instruction to skip toodifficult
items and spent most of the time struggling bravely but fruitlessly on one
impossible item.
 The answer sheet may have baffled the student (hint: if the name is spelled
wrong on the printout, the scores may well be invalid).
 The student may fail to switch tasks (e.g., initial sounds to final sounds,
synonyms to antonyms).
 The student may be carrying out an entirely different task from the one
intended.
 The ordinarily very generous time limits may be too short for a few students
who work very slowly.
 The ordinarily reasonable time allotments for subtests may exceed some
students' attention spans.
 The students score may be low but it results from doing what has been
instructed in the classroom. The student "worked slowly"
but accurately thus completing very few Coding (Symbol Search, Cross Out,
etc.) items but getting every one correct.
 The students score may be low but it results from doing what has been
instructed on the test. The student "worked quickly" in but
accurately thus completing very many Coding (Symbol Search, Cross Out, etc.)
items but getting many incorrect.
CONFUSION BETWEEN INCAPACITY AND SPECIFIC PROBLEMS
 Freeresponse and multiplechoice tests are not comparable for some
students.
 The student may have misread operation signs.
 The student may know the process (e.g., long division) well but make
computational errors (e.g., subtracting wrong in an otherwise correct long
division problem).
 The student may fail otherwise easy math applications problems because of
reading difficulty.
 The student may understand fairly highlevel skills but make simple errors
on much simpler skills (see the OATCEREAL).
 The score may slightly overestimate the student's working level if the
student is unusually accurate on the problems the student can solve.
 Study and use the interpretive suggestions in the test manuals.
