All That I Have to Say Has Already Crossed Your Mind*




Roger Koppl

Department of Economics and Finance

Fairleigh Dickinson University

Madison, NJ 07940 USA




J. Barkley Rosser, Jr.

Department of Economics

James Madison University

Harrisonburg, VA USA


                                                                    February 2002












We present three arguments regarding the limits to rationality, prediction, and control in economics, based on Morgenstern's analysis of the Holmes-Moriarty problem.  The first uses a standard metamathematical theorem on computability to indicate logical limits to forecasting the future.  The second provides possible nonconvergence for Bayesian forecasting in infinite dimensional space.  The third shows the impossibility of a computer perfectly forecasting an economy with agents knowing its forecasting program.  Thus, economic order is partly the product of something other than calculative rationality.  The joint presentation of these existing results should introduce the reader to implications of these concepts for certain shared concerns of Keynes and Hayek. 

“All that I have to say has already crossed your mind,” said he.

“Then possibly my answer has crossed yours,” I replied. 

“You stand fast?”


The Final Problem


I. Introduction

            Self-reference arises naturally in economics.  Augustin Cournot (1838) presented a model of duopoly in which each party must form conjectures about the actions of the other party. Implicitly, this means that each player is forming conjectures about the conjectures of the other player.  I think that you think that I think.  Herbert Simon called the problem Cournot raised “the permanent and ineradicable scandal of economic theory” (1976, p. 140).  The “reflexivity” of such situations defies logical closure.  Self-reference leads to paradox, not always or inexorably, but easily and often.  Cournot resolved the problem by having each simply assume that the other's behavior would not change.  Such devices are common in economic theory and amount to arbitrary limits to the rationality of the agents, but they amount to bounds on the rationality of agents, as was noted by Herbert Simon (1976) in discussing the Cournot model.

Most economists are aware of this problem in a general sort of a way.  But we believe there is scope to improve our understanding of self-reference and the paradoxes it gives rise to.  In particular, we believe there is scope to improve our appreciation of the constraints to knowledge and ultimately of rationality that reflexivity creates.  When you think that I think that you think, our knowledge may be subject to logically necessary limits.

 We present below three arguments that illustrate the limits to our knowledge created by self-reference.  Our arguments give more formal expression to ones made by Morgenstern, Hayek, and Keynes.  We believe that such ideas have not influenced economic thought as much as they should have and hope that their formal expression can clarify them somewhat and expand their influence.  The common thread in these arguments is self-reference.  This theme has emerged as important in the literature on “complex adaptive systems.”  (See Arthur 1994 and Arthur et al. 1997.)  Our goal is to bring an existing body of mathematical theory into contact with arguments well established in the history of economic thought.  Consistent with that goal, our arguments will be “formal” but not highly “rigorous.”  In this respect we can be seen as trying to introduce readers to this vein of argumentation and its implications for certain shared concerns of Keynes and Hayek.

Argument I formalizes parts of Morgenstern's (1935) analysis of the Holmes-Moriarty problem.  In this (as in any) two-person zero-sum game, a mixed-strategy Nash equilibrium exists.  This existence result, however, does not negate Morgenstern's argument that the assumption of perfect foresight on both sides is paradoxical.  Argument I gives formal expression to Morgenstern's ideas by showing that Holmes's best-reply function is not computable.  This argument draws on diagonal arguments from logic and is an example of the halting problem from the perspective of recursion theory in computer science.

Argument II states that in the Holmes-Moriarty game Bayesian updating may not lead to convergence, the problem arising from the existence of an infinite dimensional parameter space in such a case.  Problems arise from the possibility of multi-modal priors that can lead to oscillatory behavior as the Bayesian process simply jumps back and forth from one modal zone to another.

Argument III states that no computer can be built that could always correctly predict the future of the economy even in those cases where the future is completely determined by the past.  These three aguments imply that prediction and control in economics are subject to limits similar to those preventing a complete axiomatization of mathematics.  These results are consistent with Hayek's theory of complex phenomena suggesting logical limits to rational calculation and with Keynes' argument that at least some important economic decisions are based on non-rational factors such as “animal spirits.”

Before turning to our three arguments, a few remarks of clarification are in order.  Argument I relies on elementary recursion theory, whereas arguments II and III do not.  Argument II relies on a counterexample from Bayesian theory.  It has a similarity to results from recursion theory because it deals with self-reference.  But it is not itself a result from recursion theory.  Argument III also looks very similar to arguments from recursion theory.  In particular, Argument III is supported with a diagonal logic similar to that of Argument I.  But, it too is not a result from recursion theory, per se.  In elementary recursion theory an important determinant of effective decidability is whether a “Turing machine” (defined below) will “halt.”  Will it stop running and generate a definite answer?  Argument III imagines computers that are given problems they can solve.  They all stop at some point and produce correct answers.  But they do not always spit out the correct answer quickly enough to predict the results ahead of time.  The “prediction” comes after the fact.  The question is not computability, but relative speed.  Indeed, the computers considered in Argument III need not be “Turing machines.”  Thus, this result is outside of standard recursion theory.


II. Morgenstern and the Holmes-Moriarty Problem

In a classic article, which led to his collaboration with John von Neumann, Oskar Morgenstern (1935) argued that perfect foresight is inconsistent with economic equilibrium.  Perfect foresight requires that I know perfectly well what you will do and vice versa.  I have a complete and correct model of you and all of your thoughts.  Your have an equally perfect model of me and my thoughts.  My model of you contains your model of me, which, in turn, contains again my model of you, and so on.  All that I might say has already crossed your mind.  All your responses have already crossed mine.

I think that you think that I think that you think that I think...  Morgenstern argued that this infinite chain of perfect reflections creates inescapable paradoxes.  In making his argument, Morgenstern repeated an analysis he had made in a earlier paper (1928).  His analysis describes the Holmes-Moriarty problem.

Sherlock Holmes, pursued by his opponent, Moriarty, leaves London for Dover.  The train stops at a station on the way, and he alights there rather than travelling on to Dover.  He has seen Moriarty at the railway station, recognizes that he is very clever and expects that Moriarity will take a faster special train in order to catch him in Dover.  Holmes' anticipations turns out to be correct.  But what if Moriarity had been still more clever, had estimated Holmes' mental abilities better and had foreseen his actions accordingly?  Then, obviously, he would have travelled to the intermediate station [Canterbury].  Holmes again would have had to calculate that, and he himself would have decided to go on to Dover.  Whereupon, Moriarity would again have “reacted” differently.  Because of so much thinking they might not have been able act at all or the intellectually weaker of the two would have surrendered to the other in the Victoria Station, since the whole flight would have become unnecessary.  Examples of this kind can be drawn from everywhere.  However, chess, strategy, etc. presuppose expert knowledge, which encumbers the example unnecessarily (1935, p. 174).

In this passage, Morgenstern describes the two-person zero-sum game represented in Figure I.

Many economists take the paradoxes Morgenstern identified to have been resolved by the game-theoretic idea of a Nash equilibrium in mixed strategies.  If each player knows the other will choose by flipping a coin, he can do no better himself than to flip a coin.  This solution is consistent with perfect foresight and perfect rationality only if both contestants play along with it.  But there are no rational grounds for supposing they will.  If one player chooses by flipping a coin, any response by the opposing player will do just as well as any other.  Why should we suppose that he will play along by himself flipping a coin? Indeed, if it is costly to flip a coin, we can be sure he won't.

If we represent the supposed perfect rationality of Holmes and Moriarty by modeling them as Turing machines (recognizing that there other views of what constitutes “perfect rationality”), we can adapt a standard result on computability to show that Holmes and Moriarty must remain more or less inscrutable to one another.  No degree of “rationality” will remove the veil of mystery shrouding the other player.  This analysis is inspired by Binmore's (1987) discussion of undecidability in the context of the universal Turing machine.  He argues that undecidability reflects the problem of infinite regress in game theory.  It is thus unsurprising to find computability appearing as a crucial sufficiency condition in studies of learning (Spear, 1989) and game theory (Anderlini and Sabourian, 1995).  In the former, agents with imperfect information may not be able to learn rational expectations because of noncomputability problems. Albin with Foley (1998) contains important work showing how undecidability arises in the context of macroeconomic forecasting, of policy prescription, and rational choice.  They distinguish two sources of bounds to rationality.  One arises from the computing costs associated with trying to “optimize optimizing” by one thinking about whether one should be thinking about thinking about…doing something.  The other more fundamental one arises from the logic of the halting problem already mentioned with regard to Turing machines. 

     Our exposition draws heavily on Kleene 1967, pp. 232-247.  Argument I below is a modified version of Kleene's Theorem II, p. 245.

Turing machines are mathematical objects expressing the intuitive idea of an algorithm.  If Holmes and Moriarty are “perfectly rational,” then their calculations should be well represented by a calculating algorithm, a Turing machine.  Indeed, Velupillai (2000, p. 29) identifies "the rational economic agent as a Turing machine."  A Turing machine calculates a value based on a set of instructions, a program.  A Turing machine never makes mistakes.  It has a “potentially infinite” memory.  While it will store in memory only finitely many data at one time, there is no upper limit to this finite number.  This potentially infinite memory is the “tape” we imagine to be fed into the machine from, say, the left.  The tape is a string of squares, each containing either a blank or a symbol.  The tape goes in one side and contains a string of symbols that are the data on which the machine operates.  The machines passes the tape back and forth across a reader changing, perhaps, the symbols marked on the squares of the tape.  When the machine is done responding to the tape, it will spool the tape out to the right and we may read off the calculated results.  Such a “potentially infinite” tape can be given an economic interpretation along the lines of a budget for tape being renewed in each period with new tape being supplied in each period even in the case of an arbitrarily long period.

What action a Turing machine takes at a given moment depends on the symbol written on the part of the tape being read and the internal state of the machine.  The number of such internal states is “potentially infinite” in the same sense as the memory.  Any machine has only a finite number, k, of internal states, but k may take on any positive integer value.  There are, therefore, only a countably infinite number of Turing machines.  We may make of list of them and assign a positive integer value to each member of the list.  We may thus speak of the “number” of a given Turing machine.

The data on the tape may be denoted with the letter “a.”  Thus, a is the value on which the machine operates.  Any tape on which the machine operates will have only a finite number of filled-up spaces.  Moreover, the alphabet used to write on the tape is also finite.  There are, therefore, only a countable number of distinct tapes upon which a Turing machine might operate.  Since there are only a countable number of values for the tape to hold, we may imagine them to have been numbered with the natural numbers.  In that case a Î {0,1,2, . . . }

When fed a given tape, a, not all Turing machines will come to a halt.  If the machine does finally come to a stop, one can read the tape the machine has produced.  We may say in that case that the machine “computes a value c for a as argument.”  Like the number of tapes containing original data, the number of tapes a machine might produce is countable. 

Thus, c Î {0,1,2, . . . }. 

            Furthermore, the number of such Turing machines itself is countable.  Kleene (1967, p. 246) makes clear that this is the crux of the theorem, as the countable set of Turing machines can only compute a countable set of functions.  But the full set of number-theoretic functions is uncountable. 

Imagine the Holmes-Moriarty game to be played by two Turing machines.  Each machine chooses a strategy knowing only the number of the other machine.  Let φi(j) be the value machine i generates when given input j. We may interpret φi(j) as machine i's strategy choice when pitted against machine j.  We will show that there is no Turing machine, i, whose strategy choice, φi(j), is always a best reply to the strategy choice of its Turing machine opponent, j.  In effect, each machine while trying to simulate the action of the other it must also simulate itself, thereby generating an inevitable self-referencing problem that implies noncomputability.  In this sense, we will show:

Result I. In the Holmes-Moriarty Game, best-reply strategies are not computable.

            Each player must select a probability of going to Dover and, by implication, a probability of going to Canterbury.  Thus, it would seem, each player must pick a number from the set of reals on the closed interval [0,1].  But as we have seen, a Turing machine can generate only countably many output tapes.  There are an uncountable number of reals on any interval.  In others words, it is impossible to make a list the real numbers on [0,1] starting with the first, then the second, and so on.  Any such list will leave out some reals.  This impossibility was established in Georg Cantor’s famous diagonal proof.  There is no one-to-one correspondence between the reals on [0,1] and the whole numbers 0, 1, 2, . . .  Thus, we must give our Turing machines a “smaller” strategy set to choose from, namely, the rational numbers in the closed interval [0,1], although we could have chosen a set that includes irrational numbers as well as rational ones, such as the set of algebraic numbers, as long as it is a countable subset of the reals, there always being non-computable reals (Velupillai, 2000, pp. 96-97).

            The rational numbers in [0,1] are all the values p/q where p and q are whole numbers (0, 1, 2, . . .), q is not 0, and p£q.  We can make a list of them.  Table 1 is an example.  Equation (1) describes the pattern displayed in Table 1.




Table 1

















































The list described in equation (1) and Table 1 is redundant.  Many values of a correspond to 0, namely, 0, 2, 5, 9, and so on.  In fact, every ratio, p/q, is represented by infinitely many values of a.  (For any p and q, we have p/q, 2p/2q, 3p/3q, and so on.)  Mathematicians have found lists that avoid this double counting, but they are more complicated to describe.  Our demonstration that best-reply strategies are not computable does not seem to depend on which type of enumeration scheme we use.  With the aid of equation (1) or any similar enumeration, we can describe each player’s strategy set as the set of whole numbers.  For Holmes and Moriarty alike, we will interpret each whole number as corresponding to the probability of going to Dover.

            If the part of Holmes is played by Turing machine i and the part of Moriarty is played by Turing machine j, then Holmes’ strategy choice is φi(j), assuming it can be calculated.  Moriarty’s strategy choice is φj(i), assuming it can be calculated.   If Holmes picks best reply to Moriarty, then there is some best-reply function, f(x), such that f(φj(i))= φi(j).

Let f(x) be one of Holmes’ best reply functions.  The argument, x, is some whole number. That number corresponds to the probability of Moriarty going to Dover.  If that probability exceeds one half, then Holmes wants to go to Canterbury.  For instance, in terms of Table 1, x might be 12.  If x=12, then Moriarty has a 75% chance of going to Dover.  Holmes can do no better than to pick, say, strategy 9, which corresponds to a probability of 0 for going to Dover.  In this case, f(12)=9.

If Moriarty’s choice corresponds to a probability of going to Dover that is less than one half, then Holmes goes to Dover for sure.  For instance, Holmes might choose strategy 4 in response to Moriarty’s choice of strategy 10.  In this case, f(10)=4.  Holmes would never choose to have the same chance of going to Dover as Moriarty if that chance were different from one half. 

We can see that there are best-reply functions, f(x), such that f(x)¹x for all x.  That is, there are best-reply functions without a fixed point.  (A fixed point is defined by the condition that f(x)=x.)  As long as Moriarty’s strategy choice, x, does not correspond to an equal chance of getting off at Dover or Canterbury, f(x)¹x.  If x=3 or any other value corresponding to an even chance of selecting Dover, then Holmes chance of meeting Moriarty is 0.5 regardless of Holmes’ choice.  Any element in the strategy set is a best reply.  In this case, Holmes could easily find a best reply such that f(x)¹x.  Thus, there are best-reply functions with no fixed points.  Of course when Moriarty has an even chance of going to Dover, Holmes may also choose to have an even chance of going to Dover.  There are many best-reply functions; some will have fixed points and some will not.

            Let f(x) be a best-reply function and assume for the moment that f(x)¹x for all x in {0,1,2, . . . }.  Let i=j=a.  Then each player chooses φa(a).  If Holmes’ choice is a best reply satisfying f(x), then we have a contradiction.  We have f(φa(a))¹ φa(a) because we have assumed f(x)¹x.  We have f(φa(a))= φa(a), because we have assumed that φi(j) satisfies f(x), which implies f(φj(i))= φi(j).  Thus, we have f(φa(a))¹ φa(a) and f(φa(a))= φa(a), a contradiction.

The difficulty lies in the function φi(j).  Either φi(j) does not always select the best-reply required by f(x) or the values φi(j) are not always well-defined.  Now the fact that φi(j) does not pick out the reply required by f(x) does not mean φi(j) is suboptimal.  Perhaps φi(j) is a best reply, just not the best reply that f(x) selects.  But if, for some i, φi(j) is always well defined and always a best reply, then there is some best-reply function that φi(j) satisfies.  Thus, if we can show that for any best-reply function, f(x), and for any i, either φi(j) does not always select the best-reply required by f(x) or the values φi(j) are not always well-defined, then we can say the best-reply function is not computable.  So far we have our result for all best-reply functions without a fixed point.  We now consider best-reply functions with one or more fixed points.

            Let g(x) be a best-reply function with at least one fixed point.  For some x, g(x)=x.  Clearly, for any x such that g(x)=x, the corresponding probability of going to Dover is 0.5.  Thus, at every fixed point, Holmes chooses to go to Dover with probability 0.5.


Define w(x) as follows:


If g(x) ¹ x, then w(x)= g(x).

If g(x)=x, then w(x) = 0.


Assume some Turing machine, h, can always calculate a strategy that satisfies g(x).  We can easily program a Turing machine, k, to do the following:

1)      Compute φh(j).

2)      If φh(j) corresponds to a probability of going to Dover different from 0.5, select the strategy φh(j).  In this case, φk(j)= φh(j).

3)      If φh(j) implies a fifty-fifty chance of going to Dover, select the strategy 0, which sends one to Canterbury for sure.  In this case, φk(j)=0.


The machine k selects φh(j) in response to machine j only if φh(j) does not create a fifty-fifty chance of going to Dover.  The machine k selects strategy 0 in response to machine j if φh(j) does create a fifty-fifty chance of going to Dover. 

From our assumption that the machine h can always calculate a strategy that satisfies g(x), it follows that the machine, k, can always calculate a strategy that satisfies w(x).  But w(x) is a best-reply function with no fixed point.  We saw earlier that no Turing machine can satisfy such a function.  Thus, the assumption that some Turing machine can always find a strategy satisfying g(x) has led us into a contradiction.

            For any best-reply function f(x), with or without fixed points, there is no Turing machine, i,  that can always calculate a value φi(j) that satisfies f(x).  For any i, there is at least one j, such that either φi(j) is not a best reply, or the machine i never stops and spits out a value.  This establishes our Result 1.  In the Holmes-Moriarty game, best-reply strategies are not computable.  Fundamentally, this result arises from the inevitable self-referencing involved in the problem.

Experts in metamethematics will recognize our result as an example of the halting problem:  We cannot always know when a Turing machine will come to a halt.  (See Blum et al., 1998 for a recent treatment of the halting problem and computability, especially in regard to the computability of the Mandelbrot set.)  The value of our result, however, lies not in any mathematical innovations.  It lies in showing that the problem of mutually interlocking expectations does not admit of a purely logical solution, at least in some cases. 

Canning (1992) analyzes a situation somewhat similar to ours.  He seems to contradict us when he says “computability does not undermine rationality.”  But to reach this result he must assume common knowledge and, significantly, a “natural” restriction on the “domain of applicability of game theory.”  But, as Canning shows, “many different domain restrictions are possible” (p. 887).  It is widely recognized that rationality gives us no particular reason to prefer one Nash equilibrium to another.  Canning has shown that something similar is true at the higher level that we might call the “meta-strategy level.”  Far from contradicting us, we think his result complements ours.

Result I says that all best-response functions are noncomputable.  Every machine can be stymied by at least one other machine, including itself.  One cannot program a Turing machine to give Holmes's best response to all possible Turing machines playing the role of Moriarty.  Given a Turing machine made to play Holmes's role, one can always find another (possibly identical) Turing machine to stymie Holmes when made to play Moriarty.  The reverse, of course, also holds.  It is always possible to find a machine to play Holmes that will stymie the machine playing Moriarty.  At least one of the two will be more or less inscrutable to the other.

In the Holmes-Moriarity game and in similar cases, as Morgenstern argued, “there is exhibited an endless chain of reciprocally conjectural reactions and counter-reactions.”  Morgenstern notes, in an anticipation of Keynes' “animal spirits,” that “[t]his chain can never be broken by an act of knowledge but always only through an arbitrary act -- a resolution.”  Result I supports his conclusion, that “[t]he paradox still remains no matter how one attempts to twist or turn things around” (1935 p. 174).4

Binmore (1987) infers from the problem of undecidability that we should give more attention to procedural rationality, less to substantive rationality.   He has in mind a kind of sophisticated Bayesian updating.  Binmore's criticism of “naive Bayesianism” (1987, pp. 209-212) is reinforced when one considers the conditions under which one can expect convergence to a Nash equilibrium within a game theoretic context.   Without some very restrictive assumptions, Bayesian updating does not generally lead to convergence under the kinds of substantive rationality conditions found in the Holmes-Moriarty game.


III. Rationality and Convergence

In Chapter 12 of the General Theory, Keynes argued that “prospective profit” cannot be calculated rationally, implying no rational basis on which to found investment decisions.  And yet investment occurs, Keynes argued, because investors are driven by “animal spirits, a spontaneous impulse to action rather than inaction” (1936 p.161). 

In making his argument, Keynes relied on his theory of induction (Butos and Koppl, 1997).  In his Treatise on Probability, Keynes identified five conditions that must be satisfied if a reliable induction is to be performed.  The probabilities involved must be measurable; the phenomena studied must be decomposable into a finite set of “atoms” and relations; small causes must have small effects, big causes, big effects; the quantity of evidence used to perform the induction must be sufficient to render it reliable; and, finally, it must be possible to assign equal probabilities to different events when no other specific assignment of weights is known to apply.  These conditions, Keynes thought, are all violated in the field of human action, rendering calculation impossible.  Thus, animal spirits are the springs of action (Koppl, 1991; Rosser, 2001). 

Paradoxes of self-reference were part of Keynes' argument, as in his famous beauty contest in which contestants are asked to select the six most beautiful faces from a group of one hundred photos with the winner doing the best job of identifying the six most popular “beauties.”  A contestant in this game will try to pick “not those faces which he himself finds prettiest, but those which he thinks likeliest to catch the fancy of the other competitors, all of whom are looking at the problem from the same point of view” (1936 p.156).  Some contestants may try to guess the average opinion.  Others will try to guess what such contestants will guess.  “We have reached the third degree where we devote our intelligences to anticipating what average opinion expects the average opinion to be.  And there are some who practice the fourth, fifth, and higher degree” (1936 p. 156).  In the context of such an infinite regress, rational action based on “Benthamite calculations” is impossible (Butos and Koppl, 1997).5   A simple example of the problem arises in the financial theory of security valuation, which assumes a firm has a capital-budgeting system but in which the capital-budgeting system assumes a security-valuation system for determining internal rate of return (Albin, 1974).  Much as in the Holmes-Moriarty case, in both of these there is implicitly the problem of (at least) two intelligent agents thinking about the other's thinking about their thinking about the other's thinking in an infinite regress, although in the financial theory example the agents are replaced by the presumed capital-budgeting and security-valuation systems.

Argument II is similar in structure and spirit to Keynes' negative result.  Bayesian updating in the Holmes-Moriarity game may not lead to convergence.  Inductive techniques do not eliminate the paradoxes created when you think that I think that you think ad infinitum.

Let each player observe the other's thinking about itself.  Our perfectly rational, Bayesian players are computing both strategies and expectations about each other's strategies following a Bayesian pattern.  Holmes is trying to predict Moriarty's prediction of Holmes's prediction of Moriarty's prediction, and so on.  In this context, the two may be unable to hit on a mutually consistent set of prior beliefs that would allow convergence.

The essence of the result we derive below has been expressed by Clayton 1986 (p. 38).  Imagine

A and B are witness to some coin tossing.  Bayesian A is firmly committed to the belief that all coins are fair, and so uses d1/2 for q, the probability of heads.  B is firmly committed to the belief that coins are never fair, and uses a uniform prior on [0,1/3]U[2/3,1].  Both A and B will use Bayes theorem to coherently update their priors as they see data, but they will never agree, nor should they.6

Let Holmes and Moriarty calculate as computers T1 and T2 that do not need to be Turing machines.  Each one will compute according to T1([g],[y]) and T2([g],[x]).  [g] encodes game rules, [y] gives a description of T2, and [x] gives a description of T1.  They each compute ([pi],[qi]) where qi is a set of mixed strategies and pi is a set of predictions about qj.  Each player models the other's expected choice as q + ei with ei distributed on i with density li.  Bayesian updating occurs according to




where  is the posterior mean of the distribution , m is a prior on q, and li is a density of I.   T1 and T2 need not be Turing machines.  They need only update according to Equation 2.  The prior, m is what the agents assume initially about the mean that then gets updated with subsequent observations.

Diaconis and Freedman 1986 show that this Bayesian updating can be inconsistent.  They present an example where the prior is Dirichlet on a Cauchy base a.  Dirichlet priors are generalized beta distributions that have been widely used as priors in Bayesian analysis due to their robust characteristics (Hartigan, 1983).  Let be symmetric with the Dirichlet prior such that i is the distribution of -pi  and   is  symmetric.  This implies that l is symmetric about 0 where there is a strict maximum; l is infinitely differentiable and vanishing off a compact interval, although the distribution is multimodal.  Although there is no standard economic interpretation of this, one possible case that slightly resembles it might be that of an extreme form of the phenomenon of leptokurtotic (“fat-tailed”) distributions in financial markets, although with considerable differences.

In particular, the Dirichlet prior is trimodal with the central mode being the greatest, which holds, assuming X is the actual data series, if

              Pg(xI - q)m = exp{-S log[1 + (XI - q)2]}.                                                   (3)


See Figure 2.



Argument II.  Let n be the number of periods of play.  For some li the Bayes estimate is inconsistent in that for large n there is a probability near 2 that is close to g and a probability near 2 that is close to -g, where g0 and depends on li.  As n rises, will tend to oscillate between g and -g.  See Diaconis and Freedman (1986, pp. 6-9) for the formal asymptotic argument.  However, they argue that this tendency to oscillate between two solutions can easily appear in the finite case as well.  More intuitively, as n rises, increasingly only g and -g come to matter as posterior mean solutions, neither of which will equal the true mean that lies between them.

           This result is echoed by Nyarko (1991) who shows that if a player has a misspecified model, then cyclical behavior can result.  This is rather like the Clayton (1986) case referred to above.  But in our case the misspecification, as it were, arises from the inability to properly compute as the dimensions created by mutual conjecture go to infinity.  The infinite regress of total mutual awareness undermines the convergence as Holmes and Moriarty endlessly cycle after each other.  This is not a problem of incompatible priors as every event has positive measure in the prior.

Eichberger (1995) has presented sufficient conditions for convergence of naive Bayesian learning to a Nash equilibrium.  Converging pure strategy choices will lead to a Nash equilibrium if either prior beliefs have a full support or if for every neighborhood of a Nash equilibrium strategy, the absolutely continuous part of the prior distribution has positive measure.  The first of these does not hold for the case depicted in Figure 2 and discussed by Diaconis and Freedman because the strategies may not be converging at all.  With regard to the second, Diaconis and Freedman (1986, p.9) note that any particular probability will be assigned zero mass by the Dirichlet measure in the asymptotic case.  In the finite but high dimensional case, data will not necessarily swamp the priors and the problem of oscillation at infinity can still show up.  This is relevant to the Holmes-Moriarty example in that there is an effectively multi-modal situation inhering in this problem in the formulation proposed above.

     Aumann and Brandenburger (1995) show that for a two-person, non-cooperative game, if payoff functions, the rationality of the players, and the conjectures of the players are mutually known, then this mutual knowledge of conjectures constitutes a Nash equilibrium of the conjectures, even in the case of a system of infinite beliefs.  However, it does not necessarily mean an equilibrium of actions, and the problem of convergence in the infinite dimensional case remains if actions do not correspond with conjectures.

Argument II shows that even in the absence of fundamental undecidability, there may be no convergence to a Nash equilibrium.  Oscillating q's imply oscillating pi's.  When they are Bayesians, Holmes and Moriarty may chase each other in their predictions in never-ending oscillation.  This result is very close in spirit to the argument of Keynes's Chapter 12.


IV. Complexity and Computability

In the latter part of his career, beginning with The Sensory Order and The Counterrevolution of Science, F.A. Hayek developed a set of arguments on the limits of reason (Caldwell 2000).  Hayek's theory of complex phenomena (1967) implies that economists must often content themselves with “explanations of the principle” behind observed phenomena.  It is often impossible to predict particular events.  Economists have often taken prediction and control to be the hallmarks of science.  Hayek's arguments on the limits of reason imply that prediction and control are subject to profound logical limits.

Hayek based his argument on his theory of mind (1952), according to which the mind is essentially a rule-governed classificatory apparatus.  The mind's rules of operation are the product of the organism's personal and phylogenic history and, for humans at least, the cultural history of the organism's society.  Because the mind is rule-governed, Hayek argued there is an analogy between the human mind and computers.  Despite apparently supporting a “mechanistic” view of human agency and even some form of behaviorist psychology, Hayek explicitly defends “verstehende psychology” against behaviorism, and Weberian social science against all forms of positivism.  And he did so, in part at least, on the basis of this same “mechanistic” theory of mind.

Because it is rule governed, the mind may indeed be described as “mechanistic” in some large sense.  But we cannot understand our own or others' minds in the same complete and explicit way we could a simple machine.  To do so, Hayek argued, the mind would have to state all the rules of its own operation.  Hayek appealed to a “diagonal” logic of the sort pioneered by Georg Cantor (1874), to argue that such a complete self-description is not possible.

In The Sensory Order, the argument is given with the aid of some algebraic symbols (1952, paragraphs 8.66-8.68).  Later Hayek was to say, “It now seems to me as if this would follow from what I understand to be Georg Cantor’s theorem . . . according to which in any system of classification there are always more classes than things to be classified, which presumably implies that no system of classes can contain itself (Hayek, 1967, p. 61, note 49).  In the same essay, Hayek gives an informal version of the argument.  He concludes by noting the similarity to Gödel’s theorem.

To those familiar with the celebrated theorem due to Kurt Gödel it will probably be obvious that these conclusions are closely related to those Gödel has shown to prevail in formalized arithmetical systems.  It would thus appear that Gödel's theorem is but a special case of a more general principle, namely the principle that among their determinants there must always be some rules which cannot be stated or even be conscious.  At least all we can talk about and probably all we can consciously think about presupposes the existence of a framework which determines its meaning, i.e. a system of rules which operate us but which we can neither state nor form an image of and which we can merely evoke in others in so far as they already possess them.  (1967 p.62)

In human affairs, Hayek infers, complete prediction and control is impossible.

Argument III, adapted from Wolpert 1996, is proved using a kind of “diagonal” logic similar to that Hayek used.  The theorem leads to the same negative result Hayek reached.  As a matter of logic, prediction is not always possible even when the future is fully determined by the past.  Paraphrasing Wolpert (1996, p. 1), we may say econometricians cannot always process information faster than the economy.

Imagine a computer, C, programmed to predict something about the economy, A.  Let Jt be the yet-to-be-realized information we desire from C.  JT describes something about what state A will be in at time t=T.  The computer takes as input at time t=0 a description, J0, of the economy.  J0 encodes information about the dynamics of A and its state at t=0.  We require of J0 only that it be “sufficient” (in conjunction with the laws of economics) to determine JT.  Given enough time and the true laws of economics, one could compute JT knowing only J0.  The coding that asks C to generate JT may be denoted .  At time t=0, our computer takes as inputs T, , and J0.  Result III says that no computer will always be able to produce a correct prediction ahead of time.

Our result looks quite a bit like the “halting” result we studied earlier.  But here, we are imagining, the problem to be solved can be solved.  Our computer – or at least some computer – can solve the problem in a finite amount of time.  This assumption would seem to obviate the sort of paradoxes of self-reference we have been examining.  The brilliance of Wolpert’s analysis was to pluck an impossibility result out of such an unpromising context.  As we shall see, if the phenomena to be predicted resemble the device used to predict, paradoxes of self-reference may emerge even in a fully computable context.  The paradox, however, takes an unexpected form: The prediction can be made, but not always ahead of time.


Result III. For a given T>0, there is no computer, C, such that for all A, , and for all J0 sufficient for JT, C can take as input J0, T, and  at t=0 and produce JT before t=T.

     Take T=1.  Let A be an economy in which, at each moment t, a central bank sets a discount rate, dt.  We wish to know if the discount rate at t=1, d1, will at least equal a certain value, y. We are asking if d1³y.  J1 is  0 if d1<y or 1 if d1³y.   Now, J0 must be sufficient for J1.  This is possible only if the central bank is following a rule (however simple or complex) about where to fix the discount rate.  Let the rule be that if C produces the value J1=0, set d1=y; if, instead, the computer predicts J1=1, set d1=0.9y.  The rule is to negate the prediction of the computer.  Since J0 must be sufficient for J1, J0 includes a description of both C and the central bank's rule.  No matter how powerful it may be, the computer cannot correctly predict whether or not the discount rate will exceed y in this case; it cannot always process information faster than the economy. This is true even though J0 is sufficient to determine J1.

The theorem holds even if we require that C be separated from A in the sense that C's output is not read by anyone in A until after T.  To see this, imagine an economy A containing an exact copy, B, of C.  Since B is an exact copy, it may be fed the same input as C so as to produce the same output (and at the same time).  If the central bank follows the same rule earlier described, B's prediction, and thus C's, will be wrong. 

We are asking our computer to predict that which can be known based only on information available at moment t=0.  Thus, there is no contradiction involved in imagining that there is some computer that can successfully make the calculations required to tell us the desired information, JT.  The contradiction comes from our assumption that some computer can always do so ahead of time.

The problem is not merely that policy makers may purposefully falsify their own predictions.  Even pure observers cannot always predict reliably.  This is because the economies they observe may contain computers as powerful as the ones they themselves are using to predict.  This is, in essence at least, the same point Hayek makes in his theory of complex phenomena.  And, as Hayek argues, it is for this reason that in social science we must generally content ourselves with “explanation of the principle.”

Our result holds even though we have not restricted C to be only as powerful as a Turing machine.  If C were somehow more powerful than a Turing machine (because it could have a continuum of internal states for example), it still could not always process information faster than the economy.

The relation of all this to Result I may not be completely transparent.  The arguments are similar.  Our informal proof of Result III is a kind of diagonal argument in that it makes use of the idea of self-reference.  But it is not about computability in the strict mathematical sense.  Result III cannot be a computability result (in the narrow sense) simply because the argument works even if the computers in question are not Turing machines, even if they are more powerful than any Turing machine.  The number of internal states may be imagined as high as one likes.  The cardinality of internal states may equal that of the reals, of the power set of the reals, or higher still.  Of course the computers may be simply different from Turning machines, rather than more powerful. The human mind may be an example.  In any case, the argument works without assuming Turing machines. 

There is another reason Result III is not a computability result.  The computers’ problems were explicitly assumed to be solvable.  In the universe of discourse created by Result III there are no non-computable functions.  Such a world might be called a “sub-Turing world.”  For every problem considered, there is a computer and a program that can spit out the answer in a finite amount of time.  But they do not always do so fast enough to predict.  Result III might be more fundamental than Result I.  It says that limits to rationality exist no matter what model of rationality we adopt.  They exist even if we assume the relevant problems are all solvable in a finite amount of time.  Limits to rationality exist even if we computers more powerful than Turing machines in a sub-Turing world.

It is an immediate implication of Result III that econometricians, who may be compared to high-powered computers, cannot always process information faster than the economy.  Econometricians trying to predict the future of the economy are trying to predict the results of a process driven by agents who are at least as sophisticated as the observing econometricians.  They cannot consistently outperform agents of the observed economy by predicting ahead of time the results of economic processes.

Result III is adapted from Wolpert (1996).  Wolpert says of his result that it “can be viewed as a physical analogue of Gödel's incompleteness theorem.”  If so, then Argument III may be viewed as a social-science analogue of Gödel's theorem.


V. Conclusions.

We have established three results that together cast doubt on the Laplacian dream in economics.  Argument I points to the desirability of some version of the bounded rationality assumption in economics, given the inability of infinitely knowledgeable agents to rationally compute best-strategy behaviors against each other.  Argument II presents limits to Bayesian updating as an alternative to rational maximizing in such situations.  Argument III suggests, further, that economists must consider their own cognitive limits as well as those of economic actors in forecasting economic events.

             We note that the infinite regress problem does not rule out the possible existence of Nash equilibria.  Furthermore, agents can “model” such problems by using hierarchies of beliefs approaches.  But, for these approaches to generate convergent behavior, much less uniqueness, conditions must be put on them that amount to substantially limiting the rationality of the agents involved or the domain within which rationality operates.

Despite their disagreements over theory and policy, Keynes and Hayek agreed on the importance of an idea found in the work of Oskar Morgenstern, a co-founder of game theory.  The attempt to imagine a market economy  guided by perfect foresight and rationality creates insuperable paradoxes of infinite regress.



Albin, P.S. (1974): “Information Exchange in Security Markets and Assumption of Homogeneous Beliefs,” Journal of Finance, 29, p. 1217.


Albin, P.S. (1998): Barriers and Bounds to Rationality: Essays on Economic Complexity and Dynamics in Interactive Systems, edited by Duncan K. Foley, Princeton University Press, Princeton.


Anderlini, L. and Sabourian, H. (1995): “Cooperation and Effective Computability”, Econometrica, 63, pp. 1337-1369.


Arthur, W.B. (1994): “Inductive Reasoning and Bounded Rationality”, American Economic Review Papers and Proceedings, 84, pp. 406-411.


Arthur, W.B., Holland, J.H., LeBaron, B., Palmer, R., and Tayler, P. (1997): “Asset Pricing Under Endogenous Expectations in an Artificial Market”, in W.B. Arthur, S.N. Durlauf, and D.A. Lane (eds.): The Economy as an Evolving Complex System II, Addison-Wesley, Reading, MA.


Aumann, R. and Brandenburger, A. (1995): “Epistemic Conditions for Nash Equilibrium”, Econometrica, 63, pp. 1161-1180.


Binmore, K. (1987): “Modeling Rational Players I”, Economics and Philosophy, 3, pp. 9-55.


Blum, L., Cucker, F., Shub, M., and Smale, S. (1998): Complexity and Real Computation, Springer-Verlag, New York.


Butos, W. and Koppl, R. (1997): “The Varieties of Subjectivism: Keynes and Hayek on Expectations”, History of Political Economy, 29, pp. 327-359.


Caldwell, B. (2000): “The Emergence of Hayek’s Ideas on Cultural Evolution,” Review of Austrian Economics, 13, pp. 5-22.


Canning, D. (1992): “Rationality, Computability and Nash Equilibrium,” Econometrica, 60, pp. 877-888.


Cantor, G. (1874): “Uber eine Eigenschaft des Inbegriffes aller reellen algebraischen Zahlen”, Jour. reine angew, Math., 77, pp. 258-262. [reprinted in Georg Cantor Gesammelte Abhandlungen, 1932; English translation by P.E. Jourdain in Cantor, G. (1955): Contributions to the Founding of the Theory of Transfinite Numbers, Dover, New York].


Clayton, M.K. (1986): “Discussion of Diaconis and Freedman”, (1986): The Annals of Statistics, 14, pp. 37-40.


Cournot, A. A. Recherches sur les Principes Mathématiques de la Théorie des Richesses, Paris, France: L. Hachette, 1838; reprinted with introduction and notes by Georges Lutfalla and text by L. Walras; J. Bertrand; V. Pareto, Paris, France: M. Rivere, 1938; translated by N. T. Bacon with introduction and notes by Irving Fisher, New York, NY: Macmillan, 1897, 1927; reprinted New York, NY: Augustus M. Kelley, 1960.


Diaconis, P. and Freedman, D. (1986): “On the Consistency of Bayes Estimates”, The Annals of Statistics, 14, pp. 1-26.


Eichberger,  J. (1995): “Bayesian Learning in Repeated Normal Form Games”, Games and Economic Behavior, 11, pp. 254-278.


Epstein, L.G. and Wang, T. (1996): “'Beliefs About Beliefs' Without Probabilities”, Econometrica, 64, pp. 1343-1373.


Harsanyi, J.C. (1973): “Games with Randomly Distributed Payoffs: A New Rationale for Mixed-Strategy Equilibrium Points”, International Journal of Game Theory, 2, pp. 1-23.


Hartigan, J.A. (1983): Bayes Theory, Springer-Verlag, New York.


Hayek, F.A. (1952): The Sensory Order, The University of Chicago Press, Chicago.


Hayek, F.A. (1967): “The Theory of Complex Phenomena”, in his Studies in Philosophy, Politics, and Economics, The University of Chicago Press, Chicago.


Kalai, E. and Lehrer, E. (1993.):”Rational Learning Leads to Nash Equilibrium”, Econometrica, 61, pp. 1019-1045.


Keynes, J.M. (1921): Treatise on Probability, Macmillan, London Reprinted (1973) as Volume VIII of Moggridge, D.E. ed. The Collected Writings of John Maynard Keynes, St. Martin’s Press, New York.


Keynes, J.M. (1936): General Theory of Employment, Interest and Money, Macmillan, London. Reprinted (1973) as Volume VII of Moggridge, D.E. ed. The Collected Writings of John Maynard Keynes, St. Martin's Press, New York.


Kleene, S.C. (1967): Mathematical Logic, John Wiley & Sons, New York.


Koppl, R. (1991): “Animal Spirits”, Journal of Economic Perspectives, 5(3), pp. 203-210.


Lipman, B.L. (1991): “How to Decide How to Decide How to...: Modeling Limited Rationality”, Econometrica, 59, pp. 1105-1125.


Mertens, J.-F. and Zamir, S. (1985): “Formulation of Bayesian Analysis for Games with Incomplete Information”, International Journal of Game Theory, 14, pp. 1-29.


Morgenstern, O. [1935] (1976): “Perfect Foresight and Economic Equilibrium”, in A. Schotter (ed.): Selected Economic Writings of Oskar Morgenstern, New York University Press, New York.


Nyarko, Y. (1991): “Learning in Mis-specified Models and the Possibility of Cycles”, Journal of Economic Theory, 55, pp. 416-427.


Nyarko, Y. (1997):  “Convergence in Economic Models with Bayesian Hierarchies of Beliefs”, Journal of Economic Theory, 74, pp. 266-29


Rosser, J.B., Jr. (2001): “Alternative Keynesian and Post Keynesian Perspectives on Uncertainty            and Expectations,” Journal of Post Keynesian Economics, 23, pp. 545-566.


Rubinstein, A. (1991): “Comments on the Interpretation of Game Theory”, Econometrica, 59, pp. 909-924.


Simon, H.A. (1976): “From Substantive to Procedural Rationality,” in S. Latsis, ed., Method and Appraisal in Economics, Cambridge University Press, Cambridge.


Spear, S.E. (1989): “Learning Rational Expectations Under Computability Constraints”, Econometrica, 57, pp. 889-910.


Velupillai, K. (2000): Computable Economics, Oxford University Press, Oxford.


Wolpert, D.H. (1996): “An Incompleteness Theorem for Calculating the Future”, Santa Fe Institute Working Paper 96-03-008.





    Dover     Canterbury


Dover                 | 0,1       | 1,0      |

                                     |            |            |

Holmes               |---------|----------|

                         | 1,0      | 0,1       |

Canterbury        |            |             |




Each player must choose whether to get off in Dover or at the intermediate station, Canterbury.  If both choose the same location, Moriarity wins.  If they choose different stations, Holmes wins.  The first number in each box is the payoff to Holmes, who chooses the row.  The second number is the payoff to Moriarty, who chooses the column.  A Nash equilibrium in mixed strategies exists if each player assigns a probability weight of 0.5 to each of his pure strategies.










































          Figure 2

     *We thank Duncan K. Foley, Young Back Choi, Richard Langlois, Larry Samuelson, Kumaraswamy Velupillai, and an anonymous referee for useful comments, and Yaw Nyarko for providing research materials.  Any limits or errors of analysis are our own.

     4If one allows for something less than the complete common knowledge of Holmes and Moriarty, one can model an infinite regress of “beliefs about beliefs” as a hierarchy of beliefs (Mertens and Zamir, 1985; Epstein and Wang, 1996) under certain conditions regarding the state space.  Lipman (1991) shows how to construct nondenumerable hierarchies of beliefs.  But, deciding to believe is itself an arbitrary act.     

     5Nyarko (1997) shows convergence in systems with Bayesian hierarchies of beliefs of the infinite Mertens-Zamir (1985) type under certain conditions.  One of these is a “contraction property” that he notes does not hold for the Keynesian beauty contest case.  If this last property fails, then “anything can happen.”

     6Kalai and Lehrer (1993) present a similar example of nonconvergence to Nash equilibrium with Bayesian updating.  This is a violation of the absolute continuity assumption, which Kalai and Lerner call the “grain of truth” condition.