Calypso's Farewell: Notes on a Cosmology - Part 11, Universal Search

This post centers on a paper by Marcus Hutter, titled, The Fastest and Shortest Algorithm for All Well-Defined Problems[1]. When I first read the title of this paper circa 2005, I thought to myself, "That's impossible, and it doesn't even make sense. I bet this is one of those papers focused on some arcane problem no one cares about but with an attention-getting title so people will read it." I was mistaken - the paper sets forth exactly what the title claims, an algorithm that is the fastest and shortest for all well-defined problems. It is one in a class of algorithms called universal search or universal sequential search.

Levin search

Arguably, the most important open problem in computer science is the question P=NP? This question has deep technical implications but it can be simply stated. The P=NP question asks: If you have an algorithm or function that verifies solutions of problems in problem class C in polynomial time[2], does there also exist an algorithm or function that finds solutions to problems in problem class C in polynomial time? Let's say that we have a function f(x)=y that verifies that g(y)=x, in polynomial time but we do not have a polynomial time version of g(). Does the existence of f() in polynomial time prove the existence of a polynomial time version of g()? Stated another way: if we have a function that can check the solution to a problem in class C quickly, does this imply the existence of a function that can find the solution to a problem in class C quickly?

The canonical example is prime factorization. It is fast to check a factorization - simply give me the factors and the product they are supposed to multiply out to. I multiply the factors (a polynomial time operation) and then compare the result. But suppose I give you a huge number and ask you for its factors (or its prime-factorization). This is much harder, even though factorization is just the inverse of the multiplication. If P=NP, this implies that there must exist a polynomial-time factorization algorithm because multiplication is polynomial-time. Prime-factorization is not the only example of this kind of problem - many of the most interesting problems are easy to solve in one direction but difficult to solve in the other direction.

Leonid Levin sought to tackle the problem by building a general-purpose function-inversion algorithm. If such an algorithm can be found, and if it can run in polynomial-time, it can be used to prove P=NP. Conversely, if it can be proved that no such algorithm can exist, this might be able to be used in a proof that P≠NP. Levin search is an algorithm that searches for function inversions, but it has not settled the P=NP? question. Given a polynomial-time function f that maps xày, the inverse function of f, f, is defined as the function that maps yàx, that is, f(f(x)) = x. Levin search finds f given f, for any polynomial-time function f. The generalization of Levin search to the inversion of functions that run in any complexity class is called Universal Search.

Universal search

Hutter's universal search is an optimal algorithm for solving all well-defined problems - not just polynomial-time problems and not just function-inversion problems. His algorithm - which he calls M_p*- "accelerates the computation of a program p^∗. M_p* combines (A) sequential search through proof space, (B) Levin search through time-bound space, (C) and sequential program execution, using a somewhat tricky scheduling. Under certain provability constraints, M_p* is the asymptotically fastest algorithm for computing p^∗ apart from a factor 5 in computation time." Hutter's universal search does not rely on program complexity, although it can be reformulated this way by adding the time-complexity of a program to that program's Kolmogorov complexity.

It can be argued that much of the recent advancements in Deep Learning fall precisely into this category - computable approximation of highly non-linear functions[4]. Neural nets have lately been used to great effect for function inversion and optimization problems. However, they are just one in a large class of architectures that can solve these problems. Until they are incorporated into a systematic search approach, their applicability is purely ad hoc. This is not to downplay the importance of these systems, only to say that they are not universal and, thus, strictly less powerful than universal systems.

There are two drawbacks to Hutter's algorithm:

Large constants
Relies on provable equivalence of functions, which is an uncomputable problem

The problem of large constants is quite workable and has been worked around in other areas. The problem of proving equivalence of functions might be manageable through the use of computable approximations to this uncomputable problem.

There are several variations of universal search algorithms, see this scholarpedia page for more information. The key importance of universal search is that it shows that our intuitions about the long-range behavior of computational systems are easily misguided. It also shows how structured approaches to solving problems can enable us to offload the dirty work of optimization onto the machine. The importance of the clever programmer for optimizing algorithms is not as great as intuition would suggest. Hutter's search shows that, in principle, mechanical methods are guaranteed to equal the performance of anything the clever programmer produces, up to a factor of five.

Next: Part 12, AIXI

---

1. Link to Arxiv entry

2. A "polynomial-time" function is a function whose running time is bounded above by some polynomial function of the size of its input. For example, if some function f(x) has quadratic time-complexity, we say that it is O(|x|²) - that is, its running-time is bounded above by |x|²; since x² is a polynomial function, f(x) is in the polynomial time-complexity class, P. Note that |x| in this context means "the encoded length of the value x", not absolute-value.

3. Any problem that has a numerical solution, for example, is a well-defined problem. The term originates in the philosophy of mathematics and is much broader than just this but there are still some disputes about its precise meaning.

4. Many applications of Deep Learning implement non-linear regression of unknown empirical functions. But there is nothing stopping us from sampling an analytic function as though it were an empirical function in order to generate a confidence rating in the equivalence to another analytic function. Such a confidence rating is not tantamount to a proof but computable approximations to uncomputable problems always involves tolerating uncertainty.

Calypso's Farewell

Thursday, August 3, 2017

Notes on a Cosmology - Part 11, Universal Search

No comments:

Post a Comment

Wave-Particle Duality Because Why?

Blog Archive