Department of
Knowledge Engineering
SIKS-DKE Colloquia
2009
December 4, 14:00
Shimon Whiteson,
UvA
Protecting Against Overfitting in Empirical Reinforcement Learning
Abstract:
In this talk, I ask the question: what is the best way to conduct empirical
evaluations in reinforcement learning? In general, designing empirical
methodologies is difficult because learners can overfit test evaluations and
thereby obtain misleadingly high scores. In data overfitting, the learned
function is overfit to the data on which it was trained, while in task
overfitting the learning algorithm itself is overfit to the task. I propose a
model of empirical methodologies in which an adversarial designer submits
learners to an evaluator to be scored. Given a methodology, I consider how well
the evaluator can protect against data and task overfitting in a one-shot
setting, in which all learners are evaluated at the same time, or a timeless
setting, in which learners evaluated across time are compared. I analyze
several single-task methodologies and conclude that, while often adequate in
supervised learning, they fail to provide any reliable protection against task
overfitting in reinforcement learning, wherein it is infeasible to rely on fixed
data sets. To address these limitations, I advocate departing from single-task
methodologies and propose a new generalized task methodology in which
evaluations are based on multiple tasks sampled from a distribution. Generalized
tasks make it possible to specify, by selecting the distribution, what types of
task generality are desired from the learner. The resulting methodology can
then timelessly protect against overfitting to tasks within that distribution.
Furthermore, since it does not rely on fixed data sets, this protection is
reliable even in reinforcement learning. Finally, I present illustrative
results in a generalized helicopter hovering task based on a recent
Reinforcement Learning Competition.
November 11,
16h00
Konstantin Klemm,
Department of Bioinformatics, University of Leipzig, Germany.
Agents under pressure: Cooperation arises before extinction
Abstract:
Cooperation and altruistic behavior are often enhanced in spatially structured
populations as compared with mean-field scenarios [Nowak and May, 1992]. In most
studies of evolutionary game theory with spatial structure, however, the size of
the population is assumed to be constant.
Here we demonstrate that the variation of population size due to environmental
pressure induces non-trivial effects: As increasing pressure drives the
population towards extinction, cooperation prevails even if altruistic acts have
arbitrarily benefit-cost ratio.
The overall population size behaves non-monotonically with environmental
pressure: Increased attacks from predators may lead to a growth of the prey
population. These effects are captured analytically by pair approximation and
lead to a complex phase diagram for cooperation
October 21,
16h00
Wolfgang Wagner, RWTH
Aging and Long-Term Culture Have Related Effects on Human Mesenchymal Stem Cells
October 9,
13h00
Sander Bohte, CWI Amsterdam
Rational Altruistic Punishment with Foresighted Policy Gradient Reinforcement
Learning
Abstract:
It is well known that in social dilemmas, like Hardin's Tragedy of the Commons
or the classic iterated Prisoner's Dilemma, it can be rational for
self-interested agents to promote and sustain cooperation by altruistically
dispensing costly punishment to other agents, thus maximizing their own
long-term reward. However, agents using most current multi-agent reinforcement
learning algorithms will not sustain cooperation in social dilemmas: the
algorithms do not sufficiently capture the consequences on the agent's reward of
the interactions that it has with other agents. Recent more foresighted
algorithms specifically account for such expected consequences, and have been
shown to work well for the small scale Prisoner's Dilemma. However, they quickly
become intractable for larger social dilemmas. Here, we advance on this work and
develop a stateless foresighted policy gradient reinforcement learning algorithm
that scales well to larger settings like the Tragedy of the Commons. We show for
a variety of settings that large groups of self-interested agents using this
algorithm will robustly find and sustain cooperation in social dilemmas where
agents can punish the behavior of other agents.
Website of the speaker:
http://www.bohte.com/sander/
September 15,
16h00
Martine Olivi, INRIA, Sophia-Antipolis, France
On system identification, rational approximation and some applications
Abstract:
I'll introduce a rational approximation problem first motivated by system
identification applications.
I'll briefly explain the approach developed in our team at INRIA which strongly
relies on joint works with B. Hanzon and R. Peeters. I'll finally present two
applications in which rational approximation gets involved, namely, microwave
filters identification and source localization from EEG measurements.
September 15,
17h00
Bernard Hanzon,
University College Cork, Ireland
On non-negativity of impulse response functions of linear systems
Abstract:
Consider
the class of exponential-polynomial-trigonometric (EPT) functions. (These are
all functions that are solutions of homogeneous linear differential equations
with constant coefficients). These functions appear in many, if not virtually
every area of mathematics. In financial mathematics for instance as forward rate
curves, such as Nelson-Siegel curves; in probability theory as probability
density functions ( see Hanzon-Ober(2002) for the discrete counterpart of
this--in the present talk we will emphasise the continuous probability density
case); in systems theory as impulse response functions. In many applications,
especially in the case of forward rate curves in finance, and probability
density functions in probability and statistics, one requires these functions to
be non-negative. The topic of this lecture is how to analyze, and as a result
guarantee, non-negativity of such a function, especially on a finite interval of
the non-negative real half-line. These results are very recent and are presented
here for the first time. They are based on construction of a so-called
generalized Budan-Fourier sequence for any given EPT function.
Reference: B. Hanzon, R.J. Ober, "State-space calculations for discrete
probability densities, Linear Algebra and Its Applications, 350, no. 1-3, pp.
67-87 (2002).
June 3,
16h00
Frits Spieksma, ORSTAT, Faculty of Business and Economics, K.U.
Leuven
An overview of multi-index assignment problems
Abstract:
In this presentation we give an overview of applications of, and algorithms for,
multi-index assignment problems (MIAPs). MIAPs, and relatives of it, have a long
history both in applications as well as in theoretical results, starting at
least in the 1950's. In particular, we focus here on the complexity and
approximability of special cases of MIAPs. A prominent example of a MIAP is the
so-called axial three index assignment problem (3AP) which has many applications
in a variety of domains including clustering. A description of 3AP is as
follows. Given are three n-sets R, G, and B. For each triple in R \times G
\times B a cost c_{ijk} is given. The problem is to find n triples such that
each element in R \cup G \cup B is in exactly one triple, while minimizing total
cost.
We show positive and negative results for finding an optimal solution to this
problem that depend upon different ways of how the costs c_{ijk} are specified.
May 13,
16h00
Stef Zeemering, IDEE, Maastricht University
Role identification in complex biological processes applied to the cell cycle
model of fission yeast
Abstract:
The talk will focus on automated role identification in complex biological
processes based on linearized interaction matrices using notions and techniques
from systems and control theory. A first step in modeling biological processes
is to increase the understanding of the role or function of a process entity
within the entire process. A method has been developed to distinguish between
activating and inactivating roles of regulatory process entities. These roles
may vary during different process phases along with changes in the internal
feedback control mechanisms. The developed method is applied to the well-known
Tyson-Novak model of the cell division cycle of fission yeast. The roles and
role transitions of the model entities have been computed by analyzing the
structure of a linearized version of the model at several points on the limit
cycle. These roles and role transitions coincide to a large extent with the
known cell cycle phases (G1, S, G2 and M) and checkpoints (G1/S and (G2/M). This
indicates that the developed procedure for automated role identification is able
to detect cell cycle phases and checkpoints in the Tyson-Novak model of the cell
division cycle of fission yeast. Two approaches have been tested, based on the
values or the signs of the linearized model interactions. Both approaches yield
promising results, most notably the detection of the strong central role of
Cdc13T . An important characteristic of the procedure is that it does not simply
register peaks or switches, but that it looks at the role of an entity in the
overall process instead.
The procedure can serve as a useful tool in the identification of phases and
checkpoints in complex biological processes in which the underlying
dynamics are largely unknown.