We show that one cause for early variance is numerical instability which leads to saturating nonlinearities.
In this setting, we demonstrate that variance mostly arises early in training as a result of poor 'outlier' runs, but that weight initialization and initial exploration are not to blame. To allow for an in-depth analysis, we focus on a specifically popular setup with high variance - continuous control from pixels with an actor-critic agent. In this paper, we investigate causes for this perceived instability. This is problematic for creating reproducible research and also serves as an obstacle for real-world applications, where safety and predictability are paramount. Download a PDF of the paper titled Is High Variance Unavoidable in RL? A Case Study in Continuous Control, by Johan Bjorck and 2 other authors Download PDF Abstract:Reinforcement learning (RL) experiments have notoriously high variance, and minor details can have disproportionately large effects on measured outcomes.