Since 2002, 4 million visitors plus:
hit counters
search engine optimization service

  Appletcollection Vertical Menu java applet, Copyright 2003 GD

Does Subtest Order Make a Difference?

The following are exchanges from the Institute for Applied Psychometrics website and discussion group focusing on the Cattell-Horn-Carroll Theory of Cognitive Abilities located at

Participants in this particular discussion include: Ron Dumont; Laurie Ford; Richard Woodcock; Mark Daniel; Samuel O. Ortiz; Kevin McGrew; John Willis, Catherine A. Fiorello

Kevin and Dick

I am hoping you can respond to a question that comes up every time we do a workshop on the Cross battery approach. Here it is paraphrased:

""Given that IDEA requires that any standardized tests used "are administered in accordance with any instructions provided by the producer of such tests;" can the WJ-R subtests be administered out of sequence?""

Have I missed anything in the WJ-R's manual that tells the examiner that it is okay to pick and choose subtests as they are needed? If the WJ-R is used to supplement other cognitive measures (a la Cross battery) does that go against the IDEA conception of a safeguard.

Are there any studies available to tell us that giving subtests out of order, or out of the framework of the standardization procedure will, or in this case, will not have an effect on the results? We have always heard about the need to follow standardized procedures so as not to artificially introduce confounding variables to the results. I think in particular about some of the memory tasks. Visual Auditory Learning is subtest #8 in 'regular sequence'. Does administering it as subtest #2 have any potential impact on the results?

I hope this question makes sense. If not, blame it on John!



Ok, I am not Kevin or Dick but having worked with the two of them for years and having done more than my share of WJ trainings over the years I think I can help you on this one....

The key here lies in part in the issue of calling them "Tests vs subtests".....the WJ system of "Tests" (not subtests) is developed so that each individual test (e.g. Memory for Names, Sound Blending, Calculation) is a stand alone test by itself.....Dick often uses the analogy of a tool box which I like. You reach into the box and pull out the tool you need to help with your referral question....with this in mind you really would not necessarily have to use all the tests in the battery so they have to stand by themself (different from the subtest format of the Wechsler scales). This is where the selective testing table in the manual is helpful. With that in mind (you could use many different combinations of tests) ordering of the tests from a psychometric point of view is really not an issue, from a convience or preference of the child, age of the child, etc....that may be where you address ordering.......the test was developed so each test stands alone as an individual" test " in a battery of" tests"....some of this is discussed in the first few pages of the technical manual (pp 4-7). Of course ya gotta give Memory for Names and VAL before MforN and VAL Delayed Recall (ha!....WJ humor)....

There is some logic to the ordering of the tests in the easel booklets but as you can see they are in a binder so you can reorder them and repackage them to meet your needs. In my trainings I have found people who work with younger children find Memory for Names a great place to start while those working with secondary age students and adults do not like to start with Memory for you can change it up.....this is an especially nice feature when you are using a cross-battery approach like Kevin and Dawn have proposed......

Long answer, good question, common question.....hope this information helps with your work,

trainings, etc....


Ron, you will find an example of such a statement in the middle of p. 40 of the COG Examiner's Manual. In some places where we make this statement we also point out that the tests were not presented in any particular order during standardization.

I am not aware of any studies that say there is, or isn't, such an effect. I doubt that there could be anything more than a trivial effect. Test developers expect (or, at least, hope) that tests are independent and robust enough that the order of administration is not important in a situation where the rapport is good and the subject is not tired. Of course, if there is something learned in one test that will help the subject in another test, then order would become quite important.

Richard Woodcock

I don't know of published research on this question, but have some anecdotal information. I once worked for an organization that provides batteries of individually-administered tests to teenagers and adults for vocational and educational counseling. The battery sequence was changed from time to time. There was a very noticeable depression of performance on the first one or two subtests in the battery. As I recall, the magnitude of the effect was about a 10 percentile point shift in the median (i.e., when the subtest was given first the average performance was at about the 40th percentile on the norms based on later administrations).

It's good to see that the WJ-R standardization utilized varied administration sequences, so that this effect would have minimal influence on the norms. A similar approach was taken with the diagnostic subtests on the DAS.

Mark Daniel

Although others have already offered their perspective on the issue raised by Ron with respect to the WJR in particular, I would like to add some comments regarding the legalities Ron raises as they pertain to any cognitive test used under the auspices of IDEA.

The concern with the legal issues raised by the questions posed to Ron is in reference to Sec. 300.532 Evaluation procedures of IDEA, which states in part:

(c)(1) Any standardized tests that are given to a child
(i) Have been validated for the specific purpose for which they are used; and
(ii) Are administered by trained and knowledgeable personnel in accordance with any instructions provided by the producer of the tests.

First, I think it is important to note that the law will always be wholly unable to keep up with many of the advancements being made in the science of assessment; cross-battery assessment being only one such advancement. The notion of ability-achievement discrepancy, no matter how discredited it contiunes to be in the literature, nevertheless remains a central criterion for determining the presence of a specific learning disability and represents yet another problem inherent in the "practice-legal gap." Apologies to Kevin for jumping on his "theory-practice gap" idea, but the notion is the same. As such, we must often decide whether to operate strictly within the legal mandates as written or apply superior methodology when and where available. For my money, in a court of law, I would rather sit on strong science and reason in keeping with the spirit of the law than attempt to justify my actions under the letter of the law. Legal mandates are not, in fact, straightforward, rigid, or inflexible. The actual meaning of any statute tends to be fluid and susceptible to myriad interpretations as evidenced by the many, often conflicting, decisions rendered in courtrooms across this nation.

Now if we extend the question posed by Ron into actual practice and we adhere to it as written, are we not then bound to administer every single subtest (in the correct order, of course) from a particular battery? In other words, in the absence of explicit instructions from the publisher that allow for the administration of only specific subtests, are we not legally required to administer the entire test, even if we may be only interested in a particular set of tests or composite score? It seems to me that in a wide variety of cases, individuals have been given only a portion of certain batteries because other portions were deemed invalid or innappropriate for that individual. In the case of children who are culturally or linguistically diverse, giving only the Performance subtests of the Wechslers has been a common practice for decades and remains so even today. Similar selective administrations are often done with children who are blind, deaf, motor impaired, etc. Is it illegal to do this then, because the verbal tests were not intermixed with the nonverbal tests and vice versa? Similarly, what if I only want the non-verbal composite from the K-ABC because I deem that composite to be the only appropriate measure of ability for the individual in question? Must I give all of the K-ABC subtests and then only score the ones loading on the non-verbal scale? The correct "legal" answer to all of these questions might well be "yes," but how many psychologists adhere that strictly to the letter of the law? Likewise, I have yet to read in any publishers test manual that the only valid use of the test occurs only when every single subtest is given. Rather, test publishers are savvy enough to recognize the potentially lucrative alternative uses of the test and rarely do anything to discourage exploration or adoption of such use. In fact, when it is backed by empirical research, it appears that test developers actually encourage such use, albeit it is not always specifically endorsed in the test manual. Thus, test developers seem to prescribe mainly the manner in which the entire test is to be given in order to derive certain composite scores. They do not often, however, offer prescriptions prohibiting the use of the test in alternative manners. Doing so would likely amount to economic suicide.

The discussion above addresses the issue of giving subtests "out of the framework of the standardization procedure." I'm sure I've made it clear that not only do test publishers routinely omit prescriptions against using selected subtests or portions of the test, they also do not state that the use of the battery and interpretation of resulting scores is valid only if all subtests are administered. It seems to me that such alternative uses (whether with particular populations or through different theoretical foundations) is left up to the professional judgment of the examiner and that the examiner, in such cases, would be responsible and liable for providing a suitable rationale for whatever decisions were made and action taken.

As for giving subtests "out of order," I am certain that it does indeed violate standardization for some tests (but not the WJR or DAS according to Dick and Mark). I am not certain, however, that violation of a particular administration sequence produces noticeable effects one way or the other (Mark seems to say yes, Dick seems to say no) so I'll leave that up to them to debate. I do know that subtests are generally given in a particular order to attend primarily to issues of examinee interest, rapport building, minimization of fatigue, counterbalancing, etc., as well as maintaining consistency in the experimental (test) setting. If sequence is deemed important, for whatever reason and for whatever test, use of cross-battery assessment does not require that tests be given "out of order" or out of sequence per se. Based on the preceeding discussion the order of subtests seems to matter most when the examiner is interested in using the full battery and deriving a full scale score. If the examiner is not interested in a full scale score, then I see no reason why the examiner can't give only the selected subtests of interest, and no reason why the examiner can't give those subtests in exactly the order they occur within the battery as a whole and exactly in accordance with the instructions provided by the publisher for administration. The WJR is a good example that allows such an approach, albeit many batteries also offer a variety of composite scores that can be calculated on the administration of certain specific subtests. For example, if I am interested in assessing an individual's Reading Aptitude or Oral Language using the WJR, the manual specifies exactly which substests must be administered. Although presentation order is not specified, there is no reason to give subtests 2, 3, 11, and 13 (for Reading Aptitude) or subtests 2, 6, 13, 20, and 21 (for Oral Lnaguage), "out of order." Subtest 2 can always be given before subtest 3 and, subtest 3 can always be given before subtest 11, etc. I see no reason why derivation of Gf, Gc, Gs, etc., clusters is any different (especially on the WJR which is constructed specifically on these concepts) and thus, interest in a particular cluster or composite does not have to violate sequence. Imagine having to give ALL 21 subtests of the WJR in order to derive a valid score for the Oral Language cluster! Surely, no one would bother if that were the case and thankfully, Richard was careful to make it clear that this was not necessary. Thus, cross-battery assessment procedures do not necessarily have to violate order or sequence of subtests and neither do they violate administration procedures.

Although cross-battery techniques are based on an alternative theoretical framework, administration and scoring issues remain well in line with the "intended" use of any battery which is to measure or sample a particular aspect of abililty or behavior. Because the administration and scoring procedures specified in cross-battery assessment are entirely consistent with the procedures specified in the test manuals, cross-battery assessment does not explicitly violate any instructions set forth by the publisher. In cross-battery assessment, it is only the combinations of subtests that are used which may not be described by the tests' manuals, but which are supported by the empirical research undergirding the entire cross-battery approach. In effect, as Kevin is fond of saying, taken together the research behind the cross-battery approach and the methods specified by it, comprise a full fledged techinical manual which guides how tests are to be combined and interpreted as much as is found in any other test manual. Therefore, as long as cross-battery assessment is carried out as specified, an examiner could further argue that he or she is indeed following the instructions set forth by the developers of the "test." And finally, because cross-battery assessment is much more than simply "picking and choosing" subtests (precisely because it is based on empirical research and reasoning, not haphazard guessing), I see nothing that constitutes any inherent violation of the IDEA regulations cited at the outset. In fact, given that cross-battery assessment can be used to generate a test battery that is unique to the individual's needs and specific to the referral concerns, I see a much more solid foundation for assessment and an approach that is highly defensible in a court of law.

Sorry once again for the mini "dissertation" but it seemed relevant. And I don't want to hear anything about my long winded tendencies from either of you, Kevin and Dawn :)


Sam the "orator" Ortiz has struck again.

I'll be brief, non-technical and non-legal.

When I take my seriously ill child to the doctor to help with a tricky diagnosis I'm more concerned about the doctor blending his/her clinical skills with his/her instrumentation, even if it means that he/she may modify a procedure to meet my child's unique needs. There is both an "art" and a "science" in psychological assessment and interpretation. This is one of the reasons I left the school psych. trenches after 10 years. I felt I was no longer a school "psychologist" but had become a school "proceduralist" because rules and regulations where suppressing my art variance. I concur with Sam that it is better to do what is in the professional best interests of a child during an assessment and wage war with administrators, regulations, etc. instead of letting these forces wage war against good practice.

There --- I've been waiting 9 years to get that off my chest.

Kevin McGrew

Thanks to everyone who has responded. Let me add another kink. I do believe that administration order can effect the end results, similar to what Mark talked about. Let me explain:

I was asked to help norm a task (the Mesulam) of visual scanning being used by some evaluators. The Mesulam Continuous Performance Test consists of two pages with the letters of the alphabet printed in uppercase. On one page (Ordered), the letters are placed in neat, orderly rows and columns, while on the second page (Random), the letters are placed in a haphazard fashion, with no order imposed. On both pages, 60 As are placed among the other letters. Regardless of the page, the As are in the same location, with approximately 15 in each of the 4 quadrants. Because there were no administration directions and because some psychologists reported giving it differently (one used the ordered page first, the second gave the random page first) we decided to give the task both ways (approximately half got the ordered page first, half random page first). Because the total score (score on Ordered plus Random) was the score of importance, few thought that the administration order mattered. The Mesulam was administered in a balanced manner, with 849 children being administered the Ordered page first (Ordered Administration) while 691 were administered the Random page first (Random Administration).

Here is a snip of our results:

Means were computed for each of the nine age groups (6 to 14), and for both administration procedures (Ordered and Random). Analysis was done to determine the effects of sex, age, and order of administration. No significant differences were found between the sex of the subjects. In contrast, both order of administration and age were highly significant. Effect of administration order and age was assessed by a two factor analysis of variance. OE was found to be significantly affected by both the administration order (F=6.709, P=.0097) and the age level (F=14.654, P=.0001). RE and TE were found to be non-significantly affected by the administration order (RE: F=.098, P=.7652, TE: F=2.934, P=.0869), but significant for age level (RE: F=10.379, P=.0001, TE: F=18.045, P=.0001).

For ages 6, 7, and 8 there was a pronounced difference in the total number of errors (TE) made depending on the order in which the pages were administered. For example, six-year-olds given Ordered Administration had 5.4 total errors while those six-year-olds given Random Administration had an average of only 2.7 total errors. Interestingly, most errors occur on the first page presented (Ordered or Random). Whichever page was administered to the child first yielded comparatively more errors than the succeeding page. Six-year-olds given Ordered Administration averaged 3.6 errors on the Ordered page, compared to the 1.8 errors on the Random page. Similarly, six-year-olds given Random Administration averaged 1.9 errors on the Random page, while averaging .74 errors on the Ordered page."

Maybe it had something to do with this specific task, but it sure raises the question of unanticipated administration influence.


Just to add to the confusion (it's what I do; it's a gift), I had a colleague who became tired of low scores for Coding on the WISC-R, so -- anticipating the WISC-III -- my colleague began giving Coding early, rather than late in the sequence of subtests. Lo and behold, the Coding scores of the post-change evaluations were generally higher than the Coding scores on the pre-change evaluations.

As Sam and Kevin recommend, I do use subtests out of context. In addition to the non-random McGrew, Flanagan, and Ortiz Integrated Carroll/Horn-Cattell Gf-Gc Cross-Battery Approach, I sometimes have a specific skill I want to study, for example using the KAIT listening comprehension subtest for a student who is having difficulty with high school lectures. I will even intersperse other tests of approximately the right length and then administer the listening comprehension delayed recall! However, I worry -- not about the numerical sequence with or without gaps -- but about the effect on test "n" not being preceded by tests "n-1" and "n-2," as with Ron's Mesulam or my colleague's WISC-R Coding. It is not preserving the order 4, 11 (vs. 11, 4) to assess Ga on the WJ-R that concerns me. It is the possible effect of 4 not following 3, or 11 not following 10, or 11 not coming fairly late in the assessment session. Obviously, Sam is correct that I would be insane to give the entire WISC-III and then give the entire WJ-R just to get at tests 7 and 14 for Gf. In practice I do what Sam and Kevin recommend, but I do worry.

Knowing that the WJ-R tests and DAS Diagnostic Subtests were actually normed in varied sequences assuages some of my guilt when using those instruments. Ron's Mesulam effect and Mark's experience with his test would, however, still be a concern. I don't know for sure, for instance, whether recognizing Jawf and Meegoy might be equally difficult at all times during a 90-minute test session but remembering that all kinds of life live in the sea might become much more difficult toward the end of a long test session than at the beginning. Or vice versa -- who knows? If the differences are small, who cares? But what if there is some test or subtest for which the difference is significant?

I seem to recall that when the WISC and WISC-R were being compared, it was found that, as expected, students had higher WISC than WISC-R scores. However, if the WISC were given first, the difference was much smaller than if the WISC-R were given first. Apparently the more liberal querying rules on the WISC-R gave students a set for providing more elaborate, higher-scoring answers on the WISC if they took the WISC second.

I absolutely agree that I must practice my art to the best of my knowledge and ability and not be a proceduralist [McGrew, K. S. (1994). School psychologists vs. school proceduralists: A reply to Willis and Dumont. Communique, 22 (8)] and I share Sam's belief that I can explain and defend my science on the stand. My concern is that my science might be flawed in the case of certain tests or subtests given out of their norming-sample context and I don't know which ones (if any) they might be.

John Willis

It's not so much that I or Kevin recommend that this be done, rather it is more that using subtests outside the context of the whole or entire battery does not appear to constitute any legal transgression as defined under IDEA. Moreover, cross-battery assessment is not the first or only method that takes subtests out of context of their corresponding battery. As I mentioned in my previous post, test context is violated in any number of ways when working with individuals from various populations (culturally/linguistically diverse, deaf, blind, motor impaired, etc.) or whenever the entire set of subtests are not given. Standard methodology in neuropsychological evaluation has long used subtests out of context without any seeming legal or scientific ramifications. If such methods have passed both legal and scientific scrutiny currently and in the past, why would cross-battery methods be judged any differently? Now whether there truly is or isn't any scientific reason against the use of subtests out of

context is something that appears to remain a point of debate. In my opinion, and I'm sure Kevin and Dawn as well as others will share this view, the context of a test battery that is either unfounded or poorly founded upon theory is less defensible from an empirical standpoint next to a context that is steeped in theory and backed by research. Thus, one of the core arguments underlying cross-battery assessment is that it is deliberately designed to provide a better, more defensible context than that provided by the subtests' battery itself.

I believe that observations about performance on any given test, at any point in the assessment process, is a major responsibility of the assessor whether they are using cross-battery techniques or not. If it is noted that an individual is tiring heavily under the burden of testing, the examiner must be aware of it and attend to it. Is the validity of an individual's score on WJR Concept Formation any higher simply because it was given after the preceding 13 subtests even though the individual was noticeably fatigued vs. giving it along with subtest 7 (Analysis-Synthesis) without the preceding/intervening tests on a day when the individual was fully rested? Clearly, some issues that may be related to sequence or order are within the control of examiner and some are not. Given the lengthy history of psychometrics, it would seem to me that if sequence and order were of such critical importance, test publishers would long ago have constrained administration to very rigid standards and procedures. It would make sense that prohibitions against altering the sequence and context (administering every and all subtests each time a battery is used) would be explicitly stated and repeated in the manual of every major test battery by now. Again, from a practical point of view, forcing professionals to use batteries in such a manner would also surely doom a test's acceptance and usage. I would like to think, based on Dick's and Mark's comments, that developers have looked at the issue and that either it makes little difference or they've made it a non-issue by simply standardizing the test with various sequences.

I don't know if it will help your angst much, John, but I would venture to guess that the tradeoff may well be an acceptable one. That is, although we may not know which subtests might be significantly affected by being given out of the context of the test as a whole or even out of sequence, the advanced and defensible theoretical context created by cross-battery assessment methods balances that and any other concerns that accompany the application of the method and appears to be of equal if not greater value than the alternative. The growing recognition in psychometrics of the necessity for "theory first" will likely hold those in good stead whose practices are consonant with the credo for some time to come.


It appears that in some specific cases order of administration matters. For example, Ron showed that order of administration mattered on the Mesulam Continuous Performance Test. This is not surprising given that the demands of the task were essentially identical for both pages. It was also not surprising that fewer errors were found on the second page (whether is was the Order or Random page) because practice effects were likely operating. Although many practitioners would recognize that order of administration would likely have an effect (i.e., better performance on page 2) on tests that are similar to the Mesulam test, other instances in which practice effects or other variables may be operating with regard to adminstration order may not be obvious at all. Notwithstanding, it is important to keep in mind that the cross-battery approach espoused by Flanagan, McGrew, & Ortiz does not recommend the interpretation of individual subtests. Rather, their cross-battery approach recommends the interpretation of broad ability clusters (i.e., the aggregate of qualitatively different indicators of the broad ability). In instances in which significant differences among subtests are found within broad ability areas, the cross-battery approach recommends that additional measures of the narrow abilities underling these subtests be administered to determine if there are differences between narrow ability clusters within broad ability domains. Thus, interpretations within the context of the cross-battery approach are always made based on clusters consisting of individual subtest scores that vary consistently.

This type of approach to interpretation is sensitive to tests that deviate from tests with which they should be consistent theoretically (and empirically). Therefore, if order of administration had an impact on a particular subtest (either positively or negatively), it would most likely be inconsistent with other tests administered that measure aspects of the same broad ability. The cross-battery approach does not recommend interpreting this finding in isolation. Other measures would be administered to determine if this finding is spurious or if it represents, for example, a meaningful and substantiated strength or weakness for the individual. In short, the guiding principles and procedures of the cross-battery approach are designed to ensure that anomalous findings are identified as such and interpreted appropriately. Furthermore, all interpretations within in the context of the cross-battery approach (or any other approach for that matter) should be supported by additional sources of data. In my opinion, within the context of the cross-battery approach, order of subtests is a trivial matter. If subtest order made a difference either positively or negatively for a given individual, then this would most likely be detected and interpreted as such in an appropriately designed cross-battery assessment.

Dawn Flanagan

My best guess as a practitioner coincides with your better-informed judgment (for which I am grateful). I began switching from teaching to testing in a rehabilitation center where I seldom encountered a student or adult client who could take a standard test battery in toto and before that had the privilege of assisting the late Martin Berko in his assessments of children with cerebral palsy. However, it is the angst that drives my assessments, waking me up in the middle of the night wondering, "Did I do that right?" "Would that have made a difference?"

My favorite book (aside from Marty and Frances Berko's) on the importance of observations, as you observe, is O'Neill, A. M. (1995). Clinical inference: How to draw meaningful conclusions from tests. New York: Wiley. Paying attention to the student does cause problems. I have had assessments that stretched into a fortnight of quarter-hour sessions.

John Willis

I agree with John's point that if there are large and systematic effects of administration order, we at least ought to know about them. To take Kevin's medical analogy, we'd expect our physician to be aware of any tendencies for environmental factors to skew the results of a diagnostic test. So, for example, if it were true that examinees perform substantially worse on the first one or two tests they take, this would be important information and would affect interpretation.

Judging from the discussion thus far, it appears that administration sequence is an under-researched topic. Publishers have long presumed that sequence may have an effect, which is why the manuals recommend that subtests in a battery be administered in a standard sequence. What publishers don't know is the actual effect of departing from the standard sequence, or what kinds of departures are most significant.

John also distinguishes two types of possible effect: (1) general location (early, where there may be "start-up" effects, or late, where there may be fatigue effects), and (2) sequence (the effect on Test A of having given Test B immediately before it). Regarding #2, in my experience test designers have generally tried to avoid giving tests back-to-back that are highly similar either in the ability they measure or in their task features. Giving highly similar tests back-to-back would seem to invite practice and/or fatigue, and might affect motivation (i.e., when a child feels they have done poorly on one test, how do they approach the next test if it appears the same?) These questions are relevant to the practice of cross-battery assessment. Should practitioners be urged to avoid giving several tests of the same ability consecutively?

Mark Daniel

To continue the discussion, Dawn noted the following about the results we got on the Mesualm task - the fact that there were differences on the pages based on the administration order: " This is not surprising given that the demands of the task were essentially identical for both pages. It was also not surprising that fewer errors were found on the second page (whether is was the Order or Random page) because practice effects were likely operating."

This was my point, albeit poorly said. Mark's follow up is important as well.

If we choose to test Processing Speed as an example, by giving just the 2 WJR subtests (visual matching and cross out) might they be affected by both Dawn's point (practice effect for very similar tasks) and by Mark's point (sequence effect). It is interesting, if true, that there is little empirical evidence to support or deny the effects. Sounds like a dissertation topic.


Thanks for your comments Mark. In response to your question, "Should practitioners be urged to avoid giving several tests of the same ability consecutively?" I believe the answer is yes. Upon designing a cross-battery assessment, practitioners should be cognizant of the demands of the tasks and the nature of the stimuli with regard to administration order for the reasons that many have offered in this dialogue. However, the cross-battery approach suggests that practitioners select a minimum of two subtests that measure qualitatively different aspects of each of the broad abilities they wish to assess. This guiding principle of the cross-battery approach may serve to minimize the effects of practice.

It would be interesting to hear from those who use the cross-battery approach or to examine the tasks that someone has selected recently for cross-battery assessment. I am in the process of having a student review a random sample of cross-battery evaluations from a referred sample of 200.

Dawn Flanagan

I try to use clinical judgment when arranging the order of subtests in a cross-battery assessment. As Mark suggests, I try to vary the abilities measured and the modalities. I keep time spans in mind for memory and such. I do often use Naglieri's Planning subtests, and make sure I do those first so that the Coding/or whatever response set of left to right doesn't contaminate the Planned Codes freedom. But I agree that this is an under-researched question.... 

Catherine A. Fiorello, Ph.D.