Theme

It is conventional to divide reinforcement learning methods into those that are model free and those that are model based. Model-free methods compute their value function and policy directly from experience, whereas model-based methods form a model of the system to be controlled and use it as an intermediate step toward computing their value functions and policies. Model-free methods are conceptually simpler and generally require less computation and memory, whereas model-based methods can be more efficient in terms of the amount of experience needed to achieve a given performance level. By converting their experience into a model, model-based methods are able to retain it and make use of it later, whereas model-free methods use each experience once, when it occurs,  and then discard it. Thus, model-free methods can be more computationally efficient whereas model-based methods are more data efficient. There are also numerous architectural issues regarding the interaction of model-free and model-based learning.

The most common notion of a model in reinforcement learning is an estimate of the system's transition probabilities and expected rewards---the conventional structures defining a specific Markov decision process. This knowledge can be used by simple planning processes, such as the policy iteration method of dynamic programming, to compute the value function and policy. However, in general, learning and using a model could be much more than this. In general the states are not given, but rather their discovery and updating is part of the model-learning process. In general, the dynamics is not step-to-step, but may be extended over multiple time steps, such as in methods for temporal abstraction. In general, the model represents the agent's knowledge of the world, and all the challenges of knowledge representation arise.

Model-based methods are where reinforcement learning starts to touch the larger ambitions of artificial intelligence. The model corresponds roughly to knowledge about the world, and computing the value function  corresponds to heuristic search, planning and reasoning. Reinforcement-learning planning methods are appealing because they are more general and systematic; they handle stochastic systems and are domain independent. The reinforcement-learning approach is also more low-level, closer to data; this suggests that it may be more amenable to using modern machine learning methods to learn about the model. In classical artificial intelligence one often takes a higher level approach, closer to human-level reasoning. One goal of the workshop would be to begin to bridge the gap between lower-level reinforcement-learning and higher-level artificial-intelligence approaches to gain the advantages of both. Another would be to see what we can learn from natural learning systems that bears on these questions.

Some of the topics we expect to discuss at this year’s workshop are:

-Planning with Learned and Uncertain Models
-Representing World State and Dynamics
-Model learning
-Representation Learning
-Discovery of the Structure of Models
-Function Approximation in Model-based RL
-Architectural Issues in Model-based RL
-Roles of Models in Exploration
-Inherent Limitations of Model-based Methods
-Computational Efficiency of Planning
-Hierarchical Models
-Integrating Model-based and Model-free Methods
-Novel Techniques for Model Construction
-Off-Policy Learning
-Model-Based Learning in Natural Systems
-Sub-Goals in RL Systems
-Goal-Directed vs. Habitual Learning
Comments