|DeepMind's IMPALA tells us that transfer learning is starting to work:|
...Single reinforcement learning agent with same parameters solves a multitude of tasks, with the aid of a bunch of computers…
DeepMind has published details on IMPALA, a single reinforcement learning agent that can master a suite of 30 3D-world tasks in 'DeepMind Lab' as well as all 57 Atari games. The agent displays some competency at transfer learning, which means it's able to use knowledge gleaned from solving one task to solve another, increasing the sample efficiency of the algorithm.
The technique: The Importance Weighted Actor-Learner Architecture (IMPALA) scales to multitudes of sub-agents (actors) deployed on thousands of machines which beam their experiences (sequences of states, actions, and rewards) back to a centralized learner, which uses GPUs to derive insights which are fed back to the agents. In the background it does some clever things with normalizing the learning of individual agents and the meta-agent to avoid temporal decoherence via a new off-policy actor-critic algorithm called V-trace. The outcome is an algorithm that can be far more sample efficient and performant than traditional RL algorithms like A2C.
Datacenter-scale AI training: If you didn't think compute was the strategic determiner of AI research, then read this paper and consider your assumptions: IMPALA can achieve throughput rates of 250,000 frames per second via its large-scale, distributed implementation which involves 500 CPUS and 1 GPU assigned to each IMPALA agent. Such systems can achieve a throughput of 21 billion frames a day, DeepMind notes.
Transfer learning: IMPALA agents can be trained on multiple tasks in parallel, attaining median scores on the full Atari-57 dataset of as high as 59.7% of human performance, roughly comparable to the performance of single-game trained simple A3C agents. There's obviously a ways to go before IMPALA transfer learning approaches are able to rival fine-tuned single environment implementations (which regularly far exceed human performance), but the indications are encouraging. Similarly competitive transfer-learning traits show up when they test it on a suite of 30 environments implemented in DeepMind Lab, the company's Quake-based 3D testing platform.
Why it matters: Big computers are analogous to large telescopes with very fast turn rates, letting researchers probe the outer limits of certain testing regiments while being able to pivot across the entire scientific field of enquiry very rapidly. IMPALA is the sort of algorithm that organizations can design when they're able to tap into large fields of computation during research. "The ability to train agents at this scale directly translates to very quick turnaround for investigating new ideas and opens up unexplored opportunities," DeepMind writes.
MIT Technology Review