Publications

Research papers, conference proceedings, and academic publications in machine learning, reinforcement learning, and artificial intelligence.

Publication Summary

7

Peer-reviewed publications in top conferences and journals

Research Areas

Reinforcement Learning Machine Learning Temporal-Difference Learning Predictive Representations Artificial Intelligence Computational Models Knowledge Representation Temporal Abstraction

2008

Learning to Generalize through Predictive Representations: A Computational Model of Mediated Conditioning

Authors: Ludvig, E. A., & Koop, A.
Venue: From Animals to Animats 10: Proceedings of Simulation of Adaptive Behavior 2008
Pages: 342-351
Abstract:

Learning when and how to generalize knowledge from past experience to novel circumstances is a challenging problem many agents face. In animals, this generalization can be caused by mediated conditioning—when two stimuli gain a relationship through the mediation of a third stimulus. For example, in sensory preconditioning, if a light is always followed by a tone, and that tone is later paired with a shock, the light will come to elicit a fear reaction, even though the light was never directly paired with shock.

In this paper, we present a computational model of mediated conditioning based on reinforcement learning with predictive representations. In the model, animals learn to predict future observations through the temporal-difference algorithm. These predictions are generated using both current observations and other predictions. The model was successfully applied to a range of animal learning phenomena, including sensory preconditioning, acquired equivalence, and mediated aversion.

We suggest that animals and humans are fruitfully understood as representing their world as a set of chained predictions and propose that generalization in artificial agents may benefit from a similar approach.

2007

Investigating Experience: Temporal Coherence and Empirical Knowledge Representation

Author: Koop, A.
Type: Master's Thesis
Institution: University of Alberta
Abstract:

This thesis investigates the idea of artificial intelligence as an agent making sense of its experience, illustrating some of the benefits of representing knowledge as predictions of future experience. Experience is here defined as the temporal sequence of sensations and actions that are the inputs and outputs of the agent. One characteristic of this sequence is that it can have temporal coherence: what is experienced in a short period of time is likely to be consistent.

The first part of this thesis examines how an agent with dynamic memory can take advantage of the temporal coherence of its experience. Results in a simple prediction task and the more complex problem of Computer Go show how such an agent can dramatically improve on the performance of the best stationary solutions. The prediction task is then used to illustrate how temporal coherence can provide a natural testbed for meta-learning.

In the second part of the thesis, the frameworks of predictive representations and options are adapted for use in knowledge representation. The traditional approach to knowledge representation for artificial intelligence uses the framework of formal logic, in which knowledge is dissociated from experience. The knowledge representation presented here is defined in terms of experience, predictions and time. This kind of representation is defined in this thesis as an empirical knowledge representation. Using objects as a case study, the final chapter shows how an empirical knowledge representation makes it possible to represent even abstract concepts in terms of experience.

On the Role of Tracking in Stationary Environments

Authors: Sutton, R. S., Koop, A., Silver, D.
Venue: Proceedings of the 2007 International Conference on Machine Learning (ICML)
Abstract:

It is often thought that learning algorithms that track the best solution, as opposed to converging to it, are important only on nonstationary problems. We present three results suggesting that this is not so.

First we illustrate in a simple concrete example, the Black and White problem, that tracking can perform better than any converging algorithm on a stationary problem. Second, we show the same point on a larger, more realistic problem, an application of temporal-difference learning to computer Go.

Our third result suggests that tracking in stationary problems could be important for meta-learning research (e.g., learning to learn, feature selection, transfer). We apply a meta-learning algorithm for step-size adaptation, IDBD, to the Black and White problem, showing that meta-learning has a dramatic long-term effect on performance whereas, on an analogous converging problem, meta-learning has only a small second-order effect. This small result suggests a way of eventually overcoming a major obstacle to meta-learning research: the lack of an independent methodology for task selection.

Grounding Abstractions in Predictive State Representations

Authors: Tanner, B., Bulitko, V., Koop, A., Paduraru, C.
Venue: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI)
Pages: 1077-1082
Abstract:

This paper proposes a systematic approach of representing abstract features in terms of low-level, subjective state representations. We demonstrate that a mapping between the agent's predictive state representation and abstract features can be derived automatically from high-level training data supplied by the designer.

Our empirical evaluation demonstrates that an experience-oriented state representation built around a single-bit sensor can represent useful abstract features such as "back against a wall", "in a corner", or "in a room". As a result, the agent gains virtual sensors that could be used by its control policy.

2006

Off-policy Learning with Recognizers

Authors: Precup, D., Sutton, R. S., Paduraru, C., Koop, A., Singh, S.
Venue: Advances in Neural Information Processing Systems 18 (NIPS)
Abstract:

We introduce a new algorithm for off-policy temporal-difference learning with function approximation that has much lower variance and requires less knowledge of the behavior policy than prior methods.

We develop the notion of a recognizer, a filter on actions that distorts the behavior policy to produce a related target policy with low-variance importance-sampling corrections. We also consider target policies that are deviations from the state distribution of the behavior policy, such as potential temporally abstract options, which further reduces variance.

This paper introduces recognizers and their potential advantages, then develops a full algorithm for MDPs and proves that its updates are in the same direction as on-policy TD updates, which implies asymptotic convergence. Our algorithm achieves this without knowledge of the behavior policy or even requiring that there exists a behavior policy.

Temporal Abstraction in Temporal-Difference Networks

Authors: Sutton, R. S., Rafols, E. J., Koop, A.
Venue: Advances in Neural Information Processing Systems 18 (NIPS)
Abstract:

Temporal-difference (TD) networks have been proposed as a way of representing and learning a wide variety of predictions about the interaction between an agent and its environment (Sutton & Tanner, 2005). These predictions are compositional in that their targets are defined in terms of other predictions, and subjunctive in that they are about what would happen if an action or sequence of actions were taken.

In conventional TD networks, the inter-related predictions are at successive time steps and contingent on a single action; here we generalize them to accommodate extended time intervals and contingency on whole ways of behaving. Our generalization is based on the options framework for temporal abstraction (Sutton, Precup & Singh, 1999).

The primary contribution of this paper is to introduce a new algorithm for intra-option learning in TD networks with function approximation and eligibility traces. We present empirical examples of our algorithm's effectiveness and of the greater representational expressiveness of temporally-abstract TD networks.

Research Collaborators & Advisors

My research has been conducted in collaboration with leading researchers in reinforcement learning and machine learning, including:

Richard S. Sutton Doina Precup Satinder Singh Elliott A. Ludvig David Silver Vadim Bulitko Cristian Paduraru