Anne Manegueu (Magdeburg)
Recent decades have seen rapid advances in multi-armed bandits (MAB) algorithms, due to their ability to optimize with the presence of uncertainty. Classical MAB problems are studied based on the assumption that the data generating mechanism does not change over time. This assumption is often violated in real-world applications, since the distributions of rewards themselves may change over time, and more generally the environment might be non-stationary. A large part of the bandit literature has devoted to the question of finding models that capture the non-stationarity of the environment, giving hence birth to two classes of multiarmed bandits: the rested bandits and the restless Bandits. In our work, we proposed a generalization of the switching bandits, which is a subclass of the restless bandit-where the reward distribution is assumed to be piecewise stationary. To be specific, we consider a multi-armed bandit setting that unifies the following settings: (a) the switching bandit problem, (b) the MAB problem with locally polynomial mean reward, (c) the MAB problem with locally smooth mean rewards, and (d) the one, where the gaps of the arms have a bounded number of inflextion points and the highest arm's mean cannot vary too much in a short-range. We propose two algorithms in this general setting termed "the selectivebandits" and the "the prudentbandits", that solve in an efficient and unified way the four problems (a)-(d) mentioned.
The Zoom access data are available on the programm of the Research Seminar.