Victor Thuot
Abstract:
We investigate the problem of active clustering with bandit feedback, a pure exploration bandit problem.
Unlike the well-known and extensively studied problem of clustering, active clustering allows the learner to actively construct the sampling strategy. The learner can choose the order of observations, aiming at learning the clustering structure of the arms with as few samples as possible. In this work, the learner interact with a stochastic multi-armed bandit, whose arms are divided into groups. Arms within the same group share the same distribution of feedback, especially the same mean. The learner actively selects which arm to pull and obtain a sample from it. At the end of the process, the learner needs to propose an estimation of the hidden clustering. The feedback consists of multi-dimensional Gaussian (or sub-Gaussian) variables, and I consider the fixed-confidence framework: the goal is to achieve an exact clustering with a predetermined probability of error.
To achieve this objective, we propose a simple algorithm and analyze the number of samples (the budget) required to achieve the desired clustering accuracy. This analysis is done with the knowledge of the minimal gap and the size of the smallest cluster.
This work is part of my Master internship.
Zoom link on request