"The AIXI model could behave optimally in any computable but unknown environment with reinforcement feedback." "AIXI is Pareto optimal in the sense that there is no other policy yielding higher or equal value in all environments and a strictly higher value in at least one." AIXI is suitable as an agent for at least four problem classes: sequence prediction (SP), strategic games (SG), function minimization (FM) and supervised learning from examples (EX).
Sequential decision theory formally solves the problem of rational agents in uncertain worlds if the true environmental prior probability distribution is known. Solomonoff’s theory of universal induction formally solves the problem of sequence prediction for unknown prior distribution. We combine both ideas and get a parameter-free theory of universal Artificial Intelligence.Standard sequential decision theory requires the environment's "true ... prior probability distribution" to be known. This is, of course, completely unrealistic for AI-style problems. It's a bit like asking a blind robot to walk across an obstacle course. The robot would have to have an internal map of not only of this obstacle course, but every obstacle course that exists or ever will exist - a "true ... prior probability distribution" describing its environment.
Instead, AIXI leverages Solomonoff's theory of induction (see Part 9) to allow AIXI to construct a prior on the spot from any computable environment. This is like allowing a robot to scan an obstacle course before walking across it, but it's more profound than just this. Even if the obstacle course includes movements that are extremely irregular (but still computable), the robot would be able to formulate a plan of action. The key to Solomonoff's theory is that it is "parameter-free", meaning, we do not need to teach it how to understand its environment; it will automatically find the best model for any computable environment.
We give strong arguments that the resulting AIXI model is the most intelligent unbiased agent possible... The major drawback of the AIXI model is that it is uncomputable. To overcome this problem, we construct a modified algorithm AIXItl that is still effectively more intelligent than any other time t and length l bounded agent. The computation time of AIXItl is of the order t·2l .
AIXI [is] a parameter-free optimal reinforcement learning agent embedded in an arbitrary unknown environment.AIXI itself is uncomputable but there exists a computable approximation to AIXI (AIXItl). It is optimal (its optimality has been demonstrated in several different problem classes relevant to decision-making agents) but it is not as strongly optimal as Solomonoff's theory of induction. It utilizes reinforcement learning. Specifically, this means that AIXI has some built-in reward function and it is "trained" against an environment using this reward function. AIXI is an agent, which means it makes decisions. AIXI operates in arbitrary unknown environments because it does not require a "true ... prior probability distribution" of the environment. The environment must be computable, however.
The science of Artificial Intelligence (AI) may be defined as the construction of intelligent systems and their analysis. A natural definition of a system is anything that has an input and an output stream. Intelligence is more complicated. It can have many faces like creativity, solving problems, pattern recognition, classification, learning, induction, deduction, building analogies, optimization, surviving in an environment, language processing, knowledge and many more. A formal definition incorporating every aspect of intelligence, however, seems difficult. Most, if not all known facets of intelligence can be formulated as goal-driven or, more precisely, as maximizing some utility function. It is, therefore, sufficient to study goal-driven AI; e.g. the (biological) goal of animals and humans is to survive and spread. The goal of AI systems should be to be useful to humans. The problem is that, except for special cases, we know neither the utility function nor the environment in which the agent will operate in advance. The mathematical theory, coined AIXI, is supposed to solve these problems.AIXI chooses a paradigm of intelligence that is fortuitous for an agent - goal-based decision-making. While this does not eliminate all conceptual disputes surrounding AIXI's universality, it does unify the vast majority of problems that are typically considered to be part of the field of artificial intelligence. This simplified framework allows the power of Solomonoff's theory of sequence prediction to be applied to maximum advantage to a wide array of problem classes.
Assume the availability of unlimited computational resources. The first important observation is that this does not make the AI problem trivial. Playing chess optimally or solving NP-complete problems become trivial, but driving a car or surviving in nature don’t. This is because it is a challenge itself to well-define the latter problems, not to mention presenting an algorithm. In other words: The AI problem has not yet been well defined. One may view AIXI as a suggestion for such a mathematical definition of AI.
AIXI is a universal theory of sequential decision making akin to Solomonoff’s celebrated universal theory of induction. Solomonoff derived an optimal way of predicting future data, given previous perceptions, provided the data is sampled from a computable probability distribution. AIXI extends this approach to an optimal decision making agent embedded in an unknown environment. The main idea is to replace the unknown environmental distribution µ in the Bellman equations by a suitably generalized universal Solomonoff distribution ξ. The state space is the space of complete histories.That the state space consists of complete histories is an important point - AIXI does not merely optimize the next decision in sequence, it optimizes over all decisions in the decision-tree.
AIXI is a universal theory without adjustable parameters, making no assumptions about the environment except that it is sampled from a computable distribution. From an algorithmic complexity perspective, the AIXI model generalizes optimal passive universal induction to the case of active agents. From a decision-theoretic perspective, AIXI is a suggestion of a new (implicit) “learning” algorithm, which may overcome all (except computational) problems of previous reinforcement learning algorithms.The AIXI algorithm can be written in one line. The following explanation is quoted from [2]: Let U(q, a1 a2 . . . an) denote the output of a universal Turing machine U supplied with program q [a model of the environment] and input a1 a2 . . . an, m ∈ N a finite lookahead horizon, and ℓ(q) the length in bits of program q. The action picked by AIXI at time t, having executed actions a1a2 . . . at−1 and having received the sequence of observation-reward pairs o1r1o2r2 . . . ot−1rt−1 from the environment, is given by:
Intuitively, the agent considers the sum of the total reward over all possible futures up to m steps ahead, weighs each of them by the complexity of programs consistent with the agent’s past that can generate that future, and then picks the action that maximizes expected future rewards. [This equation] embodies in one line the major ideas of Bayes, Ockham, Epicurus, Turing, von Neumann, Bellman, Kolmogorov, and Solomonoff. The AIXI agent is rigorously shown ... to be optimal in many different senses of the word. In particular, the AIXI agent will rapidly learn an accurate model of the environment and proceed to act optimally to achieve its goal. [end quote]
To recapitulate, the AIXI algorithm operates in an environment - which is modeled by q. Its real or hypothetical actions are a's, its observations of environment state are o's and the rewards are r's. The algorithm chooses the action that maximizes reward, which is summed over a future decision history m iterations deep. The term 2-l(q) automatically chooses the most parsimonious model of the observed environment because of the maxat+m function. The variables o and r are output in sequence by q. AIXI is a decision-tree search through the environment q.
From the preface of Machine Super-Intelligence[3] (MSI):
Hutter was able to prove that the behaviour of universal agents converges to optimal in any setting where this is at all possible for a general agent, and that these agents are Pareto optimal in the sense that no agent can perform as well in all environments and strictly better in at least one. These are the strongest known results for a completely general purpose agent. Given that AIXI has such generality and extreme performance characteristics, it can be considered to be a theoretical model of a super intelligent agent.
From MSI, p. 126ff:
Are super intelligent machines possible?
Many people outside of the field are deeply sceptical about the idea that machines, mere physical objects of our construction, could ever have anything resembling real intelligence: machines can only ever be strictly logical; they cannot do anything they were not programed to do; and they certainly could not be superior to their own creator — that would be a paradox! However, as anybody working in the field knows, these common beliefs are baseless myths. Artificial intelligence algorithms regularly find solutions to problems using heuristics and forms of reasoning that are not strictly logical. They discover powerful new designs for problems that the system’s programmers had never thought of (Koza et al., 2003). They also learn to play games such as chess (Hsu et al., 1995) and backgammon (Tesauro, 1995) at levels superior to that of any human, let alone the researchers who designed and created the system. Indeed, in the case of checkers, computers are now literally unbeatable as they can play a provably perfect game (Schaeffer et al., 2007).
The persistence of these beliefs seems to be due to a number of things. One is that algorithms from artificial intelligence are not consumer products: they are hidden in the magic of sophisticated technology. For example, when hand writing the address on a card most people do not know that it will likely be read by a computer rather than a human at the sorting office. People do not think about the learning algorithms that are monitoring their credit card transactions looking for fraud, filtering spam to their email address, automatically trading their retirement savings on international markets, monitoring their behaviour on the internet in order to decide which ads should appear on web pages they view, or even just the vision processing algorithms that graded the apples at the supermarket. The steady progress that artificial intelligence algorithms are making is out of sight, and thus generally out of mind. Another common objection is that we humans have something mysterious and special that makes us tick, something that machines, by definition, do not have. Perhaps some type of non-physical consciousness or feelings, qualia, or ‘quantum field’ etc. Of course it is impossible to rule out mysterious possibilities until an intelligent machine has been constructed without needing anything particularly mysterious. Nonetheless, we should view such objections for what they are: a form of vitalism. Throughout history, whenever science could not explain some unusual phenomenon, many people readily assumed that God or magic was at work. Even distinguished scientists have fallen into this, only to be embarrassed once more sceptical and curious scientists worked out what was actually going on. Things ranging from the motion of whole galaxies to the behaviour of sub-atomic particles are now known to follow extremely precise physical laws. To conjecture that our brains are somehow special and different in some strange way is to speculate based on nothing but our own feelings of specialness.
If the human brain is merely a ‘meat machine’, as some have put it, it is certainly not the most powerful intelligence possible. To start with, there is the issue of scale: a typical adult human brain weights about 1.4 kg and consumes just 25 watts of power (Kandel et al., 2000). This is ideal for a mobile intelligence, however an artificial intelligence need not be mobile and thus could be orders of magnitude larger and more energy intensive. At present a large supercomputer can fill a room twice the size of a basketball court and consume 10 megawatts of power. With a few billion dollars much larger machines could be built. Google, for example, is currently constructing a data centre next to a power station in Oregon that will cover two football fields and have cooling towers four stories high (Markoff and Hansell, 2005). [Note: Today, this data-center exists.] Biology never had the option of building brains on such an enormous scale.
Another point is that brains use fairly large and slow components. Consider one of the simpler of these, axons: essentially the wiring of the nervous system. These are typically around 1 micrometre wide, carry spike signals at up to 75 metres per second at a frequency of at most a few hundred hertz (Kandel et al., 2000). Compare these characteristics with those of a wire that carries signals on a microchip. Currently these are 45 nanometres wide, propagate signals at 300 million metres per second and can easily operate at 4 billion hertz. Some might debate whether an electrochemical spike travelling down an axon is so directly comparable to an electrical pulse travelling down a wire, however it is well established that at least the primary role of an axon is simply to carry this information. Given that present day technology produces wires which are 20 times thinner, propagate signals 4 million times faster and operate at 20 million times the frequency, it is hard to believe that the performance of axons could not be improved by at least a few orders of magnitude.
Of course, the above assumes that the brain’s design is what we should replicate. Perhaps the brain’s algorithm is close to optimal for some things, but it certainly is not optimal for all problems. Even the most outstanding savants cannot store information anywhere near as quickly, accurately and in the quantities that are possible for a computer. Also savants’ impressive ability to perform fast mental calculations is insignificant next to even a basic calculator. Brains are poorly designed for such feats. A machine, however, would have no such limitations: it could employ a range of specialised algorithms for different types of problems. Concepts like education become obsolete when knowledge and understanding can simply be copied from one intelligent machine to another. It is easy to think up many more advantages.
Most likely improvements over brains are possible in algorithms, hardware and scale. This is not to take away from the amazing system that the brain is, something that we are still unable to match in many ways. All we wish to point out is that if the brain is essentially just a machine, which appears to be the case, then it certainly is not the most intelligent machine that could exist. This idea is reasonable once you think about it: machines can easily carry more, fly higher, move faster and see further than even the most able animals in each of these categories. Why would human intelligence be any different? Of course, just because systems with greater than human intelligence are possible in principle, this does not mean that we will be able to build one. Designing and constructing such an advanced machine could be beyond our capabilities.
One serious criticism of AIXI is that it relies on an intrinsic reward function. This objection is addressed in MSI, p. 88:
[Objection:] Assuming that environments return bounded sum rewards is unrealistic.
If an environment µ is an artificial game, like chess, then it seems fairly natural for µ to meet any requirements in its definition, such as having a bounded reward sum. However, if we think of the environment µ as being the universe in which the agent lives, then it seems unreasonable to expect that it should be required to respect such a bound.
Strictly speaking, reward is an interpretation of the state of the environment. In this case the environment is the universe, and clearly the universe does not have any notion of reward for particular agents. In humans this interpretation is internal, for example, the pain that is experienced when you touch something hot. In this case, should it be a part of the agent rather than the environment? If we gave the agent complete control over rewards then our framework would become meaningless: the perfect agent could simply give itself constant maximum reward. Perhaps the analogous situation for humans would be taking the “perfect” drug.
A more accurate framework would consist of an agent, an environment and a separate goal system that interpreted the state of the environment and rewarded the agent appropriately. In such a set up the bounded rewards restriction would be a part of the goal system and thus the above problem would not occur. However, for our current purposes, it is sufficient just to fold this goal mechanism into the environment and add an easily implemented constraint to how the environment may generate rewards. One simple way to bound an environment’s total rewards would be to use geometric discounting...
Conclusion
By giving a rigorous, fully formalized framework for general-purpose intelligent agents, AIXI has opened a new frontier in artificial intelligence. It is not an exaggeration to say that the theory of artificial general intelligence (AGI) is complete; not in the sense that it cannot be extended, amplified and elucidated, but in the sense that there exists a sufficiently worked-out theoretical foundation for work on practical implementations to begin. In short, the conventional wisdom that narrow AI (special-purpose AI) is making rapid strides, while AGI is still an unsolved or open problem, is simply incorrect.Universal search (which we covered in Part 11) is like a universal optimizing compiler - it will take any program (even if it is not optimized) and run that program to within a factor of 5 of its optimal speed. Solomonoff sequence prediction (which is based on algorithmic inductive inference, see Part 9) is like a Delphic oracle, it does nothing in the world until you query it - it is pure knowing. AIXI can be thought of as Solomonoff sequence prediction, plus choice. AIXI incorporates a capacity for acting, that is, making decisions. All three of these mechanisms are, in a sense, theoretical summa. It is not possible to build an algorithm that is better than optimal for all well-defined problems. It is not possible to build an inductive inference system that is better than optimal for all inferential environments. It is not possible to build a decision-making agent that is better than optimal for all computable environments across a comprehensive swath of problem classes. These are truly powerful tools; the toolbox for thinking that we are assembling is starting to look more like an arsenal.
Next: Part 13, The Halting Probability
---
1. Link to report [PDF]
2. A Monte-Carlo AIXI Approximation [PDF], Veness, Kee Siong, Hutter, Uther & Silver
3. Machine Super-Intelligence [PDF], Shane Legg
4. The optimality of AIXI is not as strong as that of Solomonoff sequence prediction but this appears to be primarily due to the inherent impossibility of optimality in certain categories of decision-theory.
No comments:
Post a Comment