By Hubert Lovett
Subsequent to a discussion among several members of this
list (Matthew Warren, John Willis, Ron Dumont, etc.), I decided there may be
some confusion about regression methods used to identify sever discrepancies.
I decided to write a more or less complete statement of those methods.
I do hope it will be of value to someone.
At the outset, I would like to thank John Willis, Ron Dumont, and Matthew Warren for their input. These are classy folks, as helpful as they are
I will here try to describe the rational and method of
using regression to determine severe discrepancies in diagnosing learning
disabilities. A severe discrepancy
in achievement occurs when a child 's achievement deviates severely from what
one would expect. It is essential,
therefore, to establish expectation for a particular child.
Few test scores ever coincide exactly with what is expected.
In making a decision to label one discrepancy as severe and one as
normal, some criterion must be established to which to compare an actual
discrepancy. This application of
decision theory will also be discussed here.
I will then compare the method presented here with a method described by
Cecil Reynolds in Chapter 24 of Handbook of Psychological and Educational
Assessment of Children : Personality,
Behavior, and Context by Cecil R. Reynolds (Editor), Randy W.
Kamphaus (Editor). Hardcover (1990).
- Y = Achievement score,
- X = IQ score,
- Y' = Predicted achievement score,
- MY = Mean achievement score,
- MX = Mean IQ score,
- SDY = Standard Deviation for achievement scores,
- SDX = Standard Deviation for IQ scores,
- rYY = Reliability for achievement scores,
- rXX = Reliability for achievement scores,
- rXY = Correlation between achievement scores and IQ scores,
- TY = True score for achievement scores for a particular child,
- EY = Expected value of Y,
- e = Y - EY,
- SE = Standard error of estimate when Y' is determined using X, and
- zpn = normal deviate for
probability = p and the number of type of test = n, one tailed or two.
There is no particular need to translate X and Y to the
same metric. However, if this is
done, it should be accomplished before calculations begin.
Y is normally distributed,
The regression of Y on X is best described by a
Variance of Y on X is independent of X, and
The best method of determining EY is the method that
Because of assumption 2 above, the formula for predicting
achievement given IQ is a special case of the general linear formula and is
Y' = SDY(rXY((X - MX)/SDX)) +
It can be shown that, given assumption 2, using Y' as EY
will minimize SUM(e^2). Therefore,
using Y' as expectation for Y will satisfy assumption 4 above, whereas using MY
or X as the expected value of Y will not satisfy this basic assumption.
One object in measurement is to minimize error.
Since e is error, we would like to minimize it.
However, since e is an unknown for a particular person on a particular
administration of a test, we can only hope to minimize it within a group.
Summing e across a group is fruitless.
The mean is zero, and, therefore, so is the sum.
If we square e before summing, then the result must be a nonnegative
number contingent upon e. That is
why we stipulate assumption 4 above. There
are those who would like to use MY as an estimate of EY.
Others would use X. Neither
of these will minimize error. The
attractiveness of either is based mostly on concern for a child not learning as
well as his age mates and on the convenience of calculation.
While using Y' as an estimate of EY minimizes SUM(e^2) for
a group, it may not minimize SUM(e^2)
for a particular person. The task
in the next section is to determine whether it is reasonable to believe that
Y’ minimizes SUM(e^2) for a particular person.
This is tantamount to asking whether Y' = TY.
To establish a criterion against which to compare actual
performance, it is necessary to select a unit of measurement for deviations from
expected. Given assumption 1 above,
the natural unit of measurement is some type of standard deviation.
In this case, SE is the appropriate unit.
Given assumptions 1 and 3 above, SE is given by the following formula:
SE = SDY(Sqrt(1 - rXY^2)).
We next pose a question. The exact nature of the question
reflects our philosophy of severe discrepancies. The two most common methods of stating this
Is it reasonable to believe that, for child C, TY = Y'.
Is it reasonable to believe that, for child C, TY >
As Matthew pointed out, if we think we are concerned with
question 1, then we would select a probability and normal deviate such that a
deviation from expectation in either direction must be explained.
Most school psychologists have tested children whose achievement scores
significantly exceed expectation. This
is sometimes more difficult to explain than the child who underachieves.
If we take the approach that severe discrepancies only fall
below expectation, then question two is the appropriate question.
Deciding which question is appropriate in a particular situation is of
major importance. It is one of the
two chief concerns in selecting a normal deviate for use in the next step.
Those trained in research will immediately recognize that
the above questions correspond to the null hypotheses used in research.
Question one evokes the use of a two-tailed test, while question two, a
one-tailed test. In a very real
sense, determining whether a particular child has a severe discrepancy is
testing an hypothesis about that child. The
logic is the same as in hypothesis testing.
The null hypothesis is assumed to be true.
This gives us a way of determining the probability of various events.
We can tell which events are common and which are rare.
When we test the hypothesis, we allow an event to occur and observe
whether it is a rare event. The
presence of rare events creates doubt about the truth value of the null
For example, suppose we are playing the old game, Twenty
Questions, and are trying to identify an object. We have developed the null hypothesis that the object is a
dog. Before venturing a
"guess" as to what the object is, we propose to test the hypothesis
with a question, the possible answers to which have known, relatively speaking,
probabilities. We ask the question,
"How many legs does this object have?"
In my experience, the most likely answer, given that the hypothesis is
true, is 4. However, in my life I
have seen several three-legged dogs, one two-legged dog, and one five-legged dog
(As an aside, I must admit that I paid 50¢ to see the five-legged dog).
I have seen pictures of a six-legged dog and an eight-legged dog.
Suppose we get this answer to our question, "It has six legs."
This is not an impossible answer, but it is rare.
It is so rare that most of us would decide to reject the hypothesis as
The question is, "How rare must an event be before we
decide to reject the hypothesis as untenable in the face of the data?"
As Reynolds (1990) argues, the traditional values are a likelihood of
less than, or equal to, five in a hundred (.05 level), or less than, or equal
to, 1 in a hundred (.01 level). Ultimately, the probability level must be set by the person
making the decision.
Suppose we decide that any deviation from expectation,
above or below, are of interest and that rare events have probabilities less
than, or equal to, 0.05.
On the normal curve, the normal deviate that corresponds to this decision
is z = 1.96.
We would, therefore,
calculate two critical values, one 1.96 SE above Y' and one 1.96 SE below Y'.
Actual values between these two critical values would be common events.
Actual values outside these two critical values would be rare events.
The formula for the critical values would be:
Critical values = Y' +/- 1.96SE.
If, on the other hand, we decide that only values below
expectation are of interest, then on the normal curve, the normal deviate that
corresponds to this decision is z = 1.65. We
would, therefore, calculate only one critical values, 1.65 SE below Y'.
Actual values equal to, or below, this critical value would be rare
events. The formula for the
critical value would be:
Critical value = Y' - 1.65SE
The two formulae may be generalized as follows:
Critical values = Y' +/- zp2SE, and
Critical value = Y' - zp1SE.
Matthew Warren posted some data to the list for which he
had accomplished the calculations necessary to decide whether a particular child
has a severe discrepancy.
I have appropriated his data and will use it to illustrate the above
formulae. The data are given below:
Matthew Warren wrote:
- FSIQ(wisc3) = 80
- WJ(Writing Fluency) = 62
- Correlation (FSIQ, Writing Fluency) = .60
- Reliability(FSIQ) = .95
- Reliability(Writing Fluency) = .95
- Calculated values:
- Predicted WJ (Writing Fluency) = 88
- Standard Error of Estimate = 12
Critical values (95% confidence), given that any deviation
from expectation must be explained, would be an achievement score less than or
equal to 64 or a score greater or equal to 112.
Critical value (95% confidence), given that only negative
deviations from expectation are of interest, would be an achievement score 68
Y' = SDY(rXY((X - MX)/SDX)) +
15(.60((80 - 100)/15)) + 100 = 88.
SE = SDY(Sqrt(1 - rXY^2))
= 15(Sqrt(1 - .60^2)) = 12.
Critical values = Y' +/- z(.05)2SE
= 88 + 1.96(12) = 111.52, and
= 88 - 1.96(12) = 64.45.
Note that in the application of Formula 5, the first value,
if not an integer, always rounds up to the next possible score, and the second
value, down. In this case, rounding
goes to the nearest integer, but that is not always the case.
Therefore, as Matt said, test scores of 112 and above and 64 and below
indicate a severe discrepancy at the .05 level of significance.
If you have a machine that yields anything else, the machine is wrong.
Critical value = Y' - z(.05)1SE
= 88 - 1.65(12) = 68.2
Note that in the application of Formula 6, the value, if
not an integer, always rounds down to the next
possible score. In this
case, rounding goes to the nearest integer, but that is not always the case.
Therefore, as Matt said, test scores of 68 and below indicate a severe
discrepancy at the .05 level of significance.
If you have a program that yields anything else, it is wrong.
Up to this point, the formulae developed by Cecil Reynolds
parallel the ones presented here. Cecil,
however, at this point shifts his focus. When
we test to see if a particular child has a severe discrepancy, there are four
possible outcomes: 1) We correctly identify a child who really has a severe
discrepancy [True positive]; 2) we correctly identify a child as not having a
severe discrepancy [True negative]; 3) we erroneously identify a child as having
a severe discrepancy when in fact he does not
[False positive]; and 4) we erroneously fail to identify a child as
having a severe discrepancy, when in fact he does [False negative].
The significance level, often called alpha, that we use, .05 above, is
the probability of a false positive. The
probability of a false negative is often called beta. The relationship between alpha and beta is inverse and
nonlinear. If we decrease the
likelihood of one type of error, then we increase the likelihood of the other.
After developing all the formulas given above, Cecil decided that he
should add something to reduce the likelihood of false negatives, the value of
beta. He decided to do this without
any notion what the value of beta was. He
reduced the difference between Y' and the critical value of a two-tailed test by
1.65SEresid, where SEresid was defined in Critical Measurement Issues in
Learning Disabilities in the Journal of Special Education, 18,
For the current example, the critical value becomes 70.90.
Clearly this does reduce the probability of a false negative, but it also
increases the probability of a false positive.
We are no longer working at the .05 level, but at the .1556 level. Cecil (1990) cites an example where the relevant z-score
moved from 2.00 to 1.393. This
changed the probability from about .05 to .1646.
He seemed to have been somewhat confused about the question he was asking
at the time. He changed the
probability to .082, as it would have been in a one-tailed test. He clearly started his discussion using a two-tailed test,
then stated that sever discrepancies went in only one direction.
When he reduced the distance to the critical value, he subtracted a value
based on a one-tailed test. The
situation then is this: he selects only one of the two critical values from a
two-tailed test and used a value from a one-tailed test to move that toward the
Worry not; it gets worse. Cecil
then drew a picture (his Figure 24.2) to clarify matters.
In this figure, he shows only one of the critical values of the original
one-tailed test (happens not to be the one discussed in the text).
Then he both subtracts 1.65 and adds 1.65 to this critical value to get
two more critical values. Thus, he takes what should be a one-tailed test at the .05
level, stacks it on top of a two-tailed test at the .05 level, but runs it in
both directions so that the probability would be .10. At this point, I think I will give up trying to explain what
he proposed. The logic is clearly
muddled and the issue of significance level becomes totally garbled.
In the two examples that I ran, Cecil's and Matthew Warren's, the
significance level multiplied by about 3. I
think there is no way to predetermine what will happen to the significance
level, but clearly it alters drastically with the addition of Cecil's invention.
Interestingly, Cecil added the "correction" to
control beta. He describes no way
of determining what beta is, either before or after his fix.
There are methods of controlling beta, but not with Cecil's formula.
Here's the damage done by this method. Cecil
gives a lengthy discussion of why the .05 level of significance is appropriate.
However, when he acknowledged that the significance level changed, he
changes terminology. Instead of
significance, it becomes the "percent of the total population."
Would an astute reader miss this shift?
Ron Dumont and John Willis, the best in the business, missed it.
On their template for calculating severe discrepancies using Cecil's
method, they specified that the results are at the .05 level.
Has anyone else missed it? In
the WIAT manual, page 188, the significance level is clearly identified as
either .05 or .01, when it clearly is not.
Big Cecil himself acknowledged that the chances changed (1990, p. 552).
Developing procedures to assume control of beta, correctly
is beyond the scope of this post. I
may do that another time.
Also, the WIAT manual (p. 189) implies that Cecil's
procedures were used to establish the significance bands in its tables.
I have neither the time nor inclination to check that assertion.
Certainly I would view those tables with suspicion.
- May the road rise to meet you.
- May the wind be always at your back.
- May the sun shine warm upon your face.
- And rains fall soft upon your fields.
- And until we meet again,
- May God hold you in the hollow of His hand.
For a download of a
severe discrepancy analyzer that follow Hubert's description press here.