Wednesday, October 18, 2017

AlphaGo Zero

DeepMind has just released a paper describing AlphaGo Zero, a version of AlphaGo that has taught itself to play Go better than any human player and any prior version of AlphaGo, based solely on self-play. It did not utilize human historical games or any other games than its own training games. The training rate is startlingly rapid. The primary change was increasing the efficiency of the neural networks so they could learn faster through self-training.

While these results are astounding, it is still hard to see how we "get from here to there" with the DeepMind approach. While human intelligence consists of a patchwork of special-purpose modules interacting in a nebulous ether of neuronal connections, the essence of general-purpose goal-achievement (or problem-solving) is not contained within an AlphaGo style decision-tree.

In technical terms, AlphaGo Zero's (AGZ) self-training is called unsupervised learning (UL) and it is the same kind of learning that humans undergo in childhood while learning to walk, speak, stack blocks, throw a ball, and so on. Yet AGZ's self-training looks nothing like natural UL, while natural UL looks very similar across many species that are very different from one another. In short, AGZ has taught itself to play Go with UL, but it does not seem that the same framework can automatically translate to anything other than Go without a significant investment of hand-coded, domain-specific "glue code." AGZ exhibits no playfulness, curiosity or "anti-fragility" (resilience), qualities that are arguably present in all natural organisms capable of UL.

Hats off to the DeepMind team's amazing achievement. I'm still waiting for someone to pick up the challenge of building an AGI based on a computable approximation of AIXI.

No comments:

Post a Comment

Wave-Particle Duality Because Why?

We know from experimental observation that particles and waves are fundamentally interchangeable and that the most basic building-blocks of ...