Gaussian multi-armed bandit problems with multiple objectives

P. Reverdy
Proc. of the American Control Conference, 2016

(pdf)
Motivated by the goal of formally integrating human designers into computational systems for engineering design optimization, I study decision making under uncertainty with multiple objectives in the context of the multi-armed bandit problem. A key aspect of multi-objective optimization is the need for scalarization, i.e., a way to combine the various objectives into a single well-defined scalar objective function. I study the case where the multi-objective rewards are Gaussian distributed and the scalarization is linear and develop an algorithm that achieves optimal performance, i.e., converges to selecting the best arm at the highest possible rate.