Workshop: Majd Hawasly and Aris Valtazanos
| What |
|
|---|---|
| When |
Mar 29, 2012 from 11:00 am to 12:00 pm |
| Where | IF 4.31/4.33 |
| Add event to calendar |
|
Majd Hawasly
Reinforcement learning for changing and unmodelled tasks
A robot deployed in the real world has to respond promptly and online to the unknown richness in the many tasks it can be tasked with in the world. On the other hand, the robot has extended offline time in which many samples of every task can be experienced, and better representations can be built. I'll describe two aspects of how offline experience can be used to make cheaper, adaptive online decisions. First, I'll review how learning variation in a task can be posed as a Bayesian reinforcement learning problem, and show an approximate method which adapts quickly to a novel task, drawn from the same unknown class, by reusing an earlier learnt policy. Next, given a set of interacting tasks with varied objectives, I'll demonstrate a control hierarchy, in the spirit of hierarchical reinforcement learning, that better enables an agent to respond to a changing world when seeking a collection of objectives, and contrast that to known hierarchical reinforcement learning methods like MAXQ.
Aris Valtazanos
Learning to shape partially controllable strategic interactions with non-cooperative agents
Autonomous agents are increasingly being deployed in environments where they must coexist and interact with other strategic agents, possibly adversarial. A class of related decision problems is concerned with an agent's ability to shape an interaction by indirectly influencing another non-cooperative agent to behave in accordance with its own strategic goal. In this talk, I will present Hierarchical Interaction Control Processes (HICPs), a framework for controlling interactions in adversarial environments. HICPs extend Interactive POMDPs through composable local controllers (policies) called tactics, which can be adapted based on acquired experience to adversaries whose behavioural model is not known exactly. This allows us to achieve interesting strategic behaviours while mitigating the intractability of recursive reasoning in the general IPOMDP setting. These ideas are evaluated experimentally in the robotic soccer domain, where we show that, even when building on simple tactics, HICPs can shape interactions more consistently than alternate utility-based procedures. Furthermore, we show that our algorithm is able to learn increasingly more sophisticated policies in response to correspondingly more capable adversaries.


