Since 2002, 4 million visitors plus:
hit counters
search engine optimization service

  Appletcollection Vertical Menu java applet, Copyright 2003 GD

 

Not All Test Scores are Alike

The Gift of Time

  • Grade Norms vs. Age Norms (see Woodcock-Johnson chart)

Grade-Equivalent Scores

  • Not equal units: cannot be added, subtracted, multiplied, divided, nor averaged.
  • Do not reflect the student's actual functioning level (see OAT-CEREAL)
  • May not be real scores at all (interpolated and extrapolated)
  • May not even be grade levels included in the test.

Percentile Ranks

  • The percent of students whose scores were tied or beaten by this student. The 37th percentile rank means you scored as high as or higher than 37 percent of the students in the test's norming sample or in your local group. The 99th percentile means you were in the highest one percent of the group.
  • Nothing to do with percent correct. (Never use % sign in an abbreviation!)
  • Not equal units: cannot be added, subtracted, multiplied, divided, nor averaged.

Standard Scores and Scaled Scores

  • Measure how far the student scored from the average in terms of the average spread of scores for the whole group. A standard score of 115 or scaled score of 13 means the student scored one standard deviation above the average (which would be the 84th percentile rank). A standard score of 85 or scaled score of 7 means the student scored one standard deviation below the average (which would be a percentile rank of 16).
  • Equal units: can be added, subtracted, multiplied, divided, or averaged if you're in the mood.
  • Too narrow: Encourage obsessive comparisons between essentially identical scores.
  • Often misunderstood.

Stanines

  • Almost equal units: can be added, subtracted, multiplied, divided, or averaged if you wish.
  • Too broad: Encourage obsessive comparisons between essentially identical scores.
  • Fairly easy to explain and understand (on a good day).

Percentile Ranks and Standard Scores

  • These two statistics will not always tell the same story. A student may not be many points away from the average and still have an extreme percentile rank or may be many points away from the average and nonetheless have a fairly average percentile rank.

Confidence Bands

  • Test scores are never perfectly accurate. Lucky or unlucky guesses, lapses of attention, and other factors mean that the same person would almost never get exactly the same score on a test twice in a row. A confidence band around a score tells how scores on that test are likely to vary by pure chance.
  • If the confidence bands on two scores overlap, there probably is not a significant difference between the two scores. On another day the higher and lower scores might have been reversed.
  • If the confidence bands on two scores do not overlap, and if both scores are probably valid, there probably is a significant difference between the two scores. On another day, the higher and lower scores would probably have still been the higher and lower scores, respectively.

In the example above, there is a triumph of Hope over Experience, but neither is significantly different from Dumb Luck.

Stanines

  • As a rough approximation, you can usually assume that two valid scores that are in adjacent stanines may not be significantly different, but that two valid scores that differ by more than one stanine probably are significantly different.

Significant Difference

  • A "significant difference" is one that is too large to have been likely to have occurred by chance when there was no real difference between the abilities being tested. This likelihood is expressed as a probability. e.g., p<.05 means that there were fewer than 5 chances in 100 of a difference that large or larger happening by accident.

Base-rate

  • Base-rate refers to the prevalence or frequency of a particular occurrence or event within a population. Awareness of relevant base-rate data allows an evaluator to determine the diagnostic utility of a particular sign. Although a particular relevant comparison may reach some level of statistical significance, it is always necessary to determine if the statistical difference is a usual or an unusual one. Base-rate information provides just such data.

  • Although an 11.2 point difference between scores on the WISC-III Verbal and Performance scales represents a statistical difference, base rate tells us that such an occurrence is likely happen in about 40.2% of the population.

Item Analysis

  • A score can tell you only so much, and some of what it tells you may be wrong. To really understand a student's test performance, you need to look at the individual item responses. For example, the Gates-MacGinitie manuals have extremely valuable sections on sources of error, such as responding to a single word in the paragraph instead of the whole text, making false assumptions on the basis of prior knowledge, or overemphasizing one part of the story.

Scoring

  • All scores should be done three times. Count the number right. Then count the number wrong and subtract that from the total. Finally, start with the number right and add one point for each wrong item; you should come out with the total number of items at the end. Read numbers and titles of tables, columns, and rows aloud as you look up scores and listen to yourself.
  • Examine your resulting standard scores.  Do any look like they don't belong.  The child with a score of 143 on Memory for Sentences and scores hovering around 100 all on other test should set off a mad rush to answer the question "Why?"  Often you will find that you made a mistake when you scored the test as opposed to having just discovered some weird ability of the child.  "Tester: Blame thyself before passing the blame on to others."

 

TEST SCORES ARE NOT NECESSARILY TRUSTWORTHY

PERVASIVE INVALIDITY

  • Princess Summerfallwinterspring and seasonal norms.
  • The student may have just blown off the test.
  • The student may have had a bad day.
  • The student may have followed the instruction to skip too-difficult items but forgotten to skip the corresponding items on the answer sheet.
  • The student may not have followed the instruction to skip too-difficult items and spent most of the time struggling bravely but fruitlessly on one impossible item.
  • The answer sheet may have baffled the student (hint: if the name is spelled wrong on the print-out, the scores may well be invalid).
  • The student may fail to switch tasks (e.g., initial sounds to final sounds, synonyms to antonyms).
  • The student may be carrying out an entirely different task from the one intended.
  • The ordinarily very generous time limits may be too short for a few students who work very slowly.
  • The ordinarily reasonable time allotments for subtests may exceed some students' attention spans.
  • The students score may be low but it results from doing what has been instructed in the classroom.  The student "worked slowly"  but accurately thus completing very few Coding (Symbol Search, Cross Out, etc.) items but getting every one correct.
  • The students score may be low but it results from doing what has been instructed on the test.  The student "worked quickly" in but accurately thus completing very many Coding (Symbol Search, Cross Out, etc.) items but getting many incorrect.

CONFUSION BETWEEN INCAPACITY AND SPECIFIC PROBLEMS

  • Free-response and multiple-choice tests are not comparable for some students.
  • The student may have misread operation signs.
  • The student may know the process (e.g., long division) well but make computational errors (e.g., subtracting wrong in an otherwise correct long division problem).
  • The student may fail otherwise easy math applications problems because of reading difficulty.
  • The student may understand fairly high-level skills but make simple errors on much simpler skills (see the OAT-CEREAL).
  • The score may slightly overestimate the student's working level if the student is unusually accurate on the problems the student can solve.
  • Study and use the interpretive suggestions in the test manuals.