Oleksandr Zadorozhnyi (Potsdam)
In the talk I present the stochastic multi-armed bandit problem and its analysis in the case when the arm samples are dependent over time and generated from the so-called weak \(\mathcal{C}\)-mixing process. Then I present the \(\mathcal{C}\)-Mix Improved UCB agorithm and provide both problem-dependent and independent regret analysis in two different scenarios. In the first, so-called fast-mixing scenario, the pseudo-regret enjoys the same upper bound (up to a factor) as for independent observations; whereas in the second, slow mixing scenario, we discover a surprising effect, that the regret upper bound is similar to the independent case, with an incremental additive term which does not depend on the number of arms. The analysis of slow mixing scenario is supported with a minmax lower bound, which (up to a \(\log(T) \) factor) matches the obtained upper bound.
The talk is based on the joint work with Gilles Blanchard and Alexandra Carpentier.