Accurate, Large Minibatch SGD - Training ImageNet in 1 Hour
09 Jan 2020Introduction
-
Training models with large minibatches (using distributed synchronous SGD) can lead...
Training models with large minibatches (using distributed synchronous SGD) can lead...
The paper proposes a technique (called Parameter Superposition or PSP) for...
The paper studies five different techniques for stat abstraction in MDPs...
The paper proposes parameter-reduction techniques to lower the memory consumption (and...
Procedural text comprehension tasks focus on modeling the effect of actions...
The paper presents the MuZero algorithm that performs planning with a...
The paper introduces Contrastively-trained Structured World Models (C-SWMs).
These...
The paper proposes MAML++ - a modification of MAML algorithm that...
The paper proposes the PHYRE (PHYsical REasoning) benchmark - consisting of...