Department of Knowledge Engineering


SIKS-DKE Colloquia


December 4, 14:00
Shimon Whiteson, UvA
Protecting Against Overfitting in Empirical Reinforcement Learning

In this talk, I ask the question: what is the best way to conduct empirical evaluations in reinforcement learning?  In general, designing empirical methodologies is difficult because learners can overfit test evaluations and thereby obtain misleadingly high scores.  In data overfitting, the learned function is overfit to the data on which it was trained, while in task overfitting the learning algorithm itself is overfit to the task.  I propose a model of empirical methodologies in which an adversarial designer submits learners to an evaluator to be scored.  Given a methodology, I consider how well the evaluator can protect against data and task overfitting in a one-shot setting, in which all learners are evaluated at the same time, or a timeless setting, in which learners evaluated across time are compared.  I analyze several single-task methodologies and conclude that, while often adequate in supervised learning, they fail to provide any reliable protection against task overfitting in reinforcement learning, wherein it is infeasible to rely on fixed data sets. To address these limitations, I advocate departing from single-task methodologies and propose a new generalized task methodology in which evaluations are based on multiple tasks sampled from a distribution. Generalized tasks make it possible to specify, by selecting the distribution, what types of task generality are desired from the learner.  The resulting methodology can then timelessly protect against overfitting to tasks within that distribution.  Furthermore, since it does not rely on fixed data sets, this protection is reliable even in reinforcement learning.  Finally, I present illustrative results in a generalized helicopter hovering task based on a recent Reinforcement Learning Competition.

November 11, 16h00
Konstantin Klemm,
Department of Bioinformatics, University of Leipzig, Germany.
Agents under pressure: Cooperation arises before extinction

Cooperation and altruistic behavior are often enhanced in spatially structured populations as compared with mean-field scenarios [Nowak and May, 1992]. In most studies of evolutionary game theory with spatial structure, however, the size of the population is assumed to be constant.
Here we demonstrate that the variation of population size due to environmental pressure induces non-trivial effects: As increasing pressure drives the population towards extinction, cooperation prevails even if altruistic acts have arbitrarily benefit-cost ratio.
The overall population size behaves non-monotonically with environmental pressure: Increased attacks from predators may lead to a growth of the prey population. These effects are captured analytically by pair approximation and lead to a complex phase diagram for cooperation

October 21, 16h00
Wolfgang Wagner,
Aging and Long-Term Culture Have Related Effects on Human Mesenchymal Stem Cells

October 9, 13h00
Sander Bohte,
CWI Amsterdam
Rational Altruistic Punishment with Foresighted Policy Gradient Reinforcement Learning

Abstract:  It is well known that in social dilemmas, like Hardin's Tragedy of the Commons or the classic iterated Prisoner's Dilemma, it can be rational for self-interested agents to promote and sustain cooperation by altruistically dispensing costly punishment to other agents, thus maximizing their own long-term reward. However, agents using most current multi-agent reinforcement learning algorithms will not sustain cooperation in social dilemmas: the algorithms do not sufficiently capture the consequences on the agent's reward of the interactions that it has with other agents. Recent more foresighted algorithms specifically account for such expected consequences, and have been shown to work well for the small scale Prisoner's Dilemma. However, they quickly become intractable for larger social dilemmas. Here, we advance on this work and develop a stateless foresighted policy gradient reinforcement learning algorithm that scales well to larger settings like the Tragedy of the Commons. We show for a variety of settings that large groups of self-interested agents using this algorithm will robustly find and sustain cooperation in social dilemmas where agents can punish the behavior of other agents.
Website of the speaker:


September 15, 16h00
Martine Olivi,
INRIA, Sophia-Antipolis, France
On system identification, rational approximation and some applications

I'll introduce a rational approximation problem first motivated by system identification applications.
I'll briefly explain the approach developed in our team at INRIA which strongly relies on joint works with B. Hanzon and R. Peeters. I'll finally present two applications in which rational approximation gets involved, namely, microwave filters identification and source localization from EEG measurements.


September 15, 17h00
Bernard Hanzon, University College Cork, Ireland
On non-negativity of impulse response functions of linear systems

 Consider the class of exponential-polynomial-trigonometric (EPT) functions. (These are all functions that are solutions of homogeneous linear differential equations with constant coefficients). These functions appear in many, if not virtually every area of mathematics. In financial mathematics for instance as forward rate curves, such as Nelson-Siegel curves; in probability theory as probability density functions ( see Hanzon-Ober(2002) for the discrete counterpart of this--in the present talk we will emphasise the continuous probability density case); in systems theory as impulse response functions. In many applications, especially in the case of forward rate curves in finance, and probability density functions in probability and statistics, one requires these functions to be non-negative. The topic of this lecture is how to analyze, and as a result guarantee, non-negativity of such a function, especially on a finite interval of the non-negative real half-line. These results are very recent and are presented here for the first time. They are based on construction of a so-called generalized Budan-Fourier sequence for any given EPT function.
Reference: B. Hanzon, R.J. Ober, "State-space calculations for discrete probability densities, Linear Algebra and Its Applications, 350, no. 1-3, pp. 67-87 (2002).

June 3, 16h00
Frits Spieksma,
ORSTAT, Faculty of Business and Economics, K.U. Leuven
An overview of multi-index assignment problems

In this presentation we give an overview of applications of, and algorithms for, multi-index assignment problems (MIAPs). MIAPs, and relatives of it, have a long history both in applications as well as in theoretical results, starting at least in the 1950's. In particular, we focus here on the complexity and approximability of special cases of MIAPs. A prominent example of a MIAP is the so-called axial three index assignment problem (3AP) which has many applications in a variety of domains including clustering. A description of 3AP is as follows. Given are three n-sets R, G, and B. For each triple in R \times G \times B a cost c_{ijk} is given. The problem is to find n triples such that each element in R \cup G \cup B is in exactly one triple, while minimizing total cost.
We show positive and negative results for finding an optimal solution to this problem that depend upon different ways of how the costs c_{ijk} are specified.

May 13, 16h00
Stef Zeemering,
IDEE, Maastricht University
Role identification in complex biological processes applied to the cell cycle model of fission yeast

Abstract:  The talk will focus on automated role identification in complex biological processes based on linearized interaction matrices using notions and techniques from systems and control theory. A first step in modeling biological processes is to increase the understanding of the role or function of a process entity within the entire process. A method has been developed to distinguish between activating and inactivating roles of regulatory process entities. These roles may vary during different process phases along with changes in the internal feedback control mechanisms. The developed method is applied to the well-known Tyson-Novak model of the cell division cycle of fission yeast. The roles and role transitions of the model entities have been computed by analyzing the structure of a linearized version of the model at several points on the limit cycle. These roles and role transitions coincide to a large extent with the known cell cycle phases (G1, S, G2 and M) and checkpoints (G1/S and (G2/M). This indicates that the developed procedure for automated role identification is able to detect cell cycle phases and checkpoints in the Tyson-Novak model of the cell division cycle of fission yeast. Two approaches have been tested, based on the values or the signs of the linearized model interactions. Both approaches yield promising results, most notably the detection of the strong central role of Cdc13T . An important characteristic of the procedure is that it does not simply register peaks or switches, but that it looks at the role of an entity in the overall process instead.
The procedure can serve as a useful tool in the identification of phases and checkpoints in complex biological processes in which the underlying
dynamics are largely unknown.