Outrageously Large Neural Networks--The Sparsely-Gated Mixture-of-Experts Layer
14 Aug 2020Introduction
-
Conditional computation is a technique to increase a model’s capacity (without...
Conditional computation is a technique to increase a model’s capacity (without...
The paper hypothesizes that main optimization challenges in multi-task learning arise because of...
The paper proposes GradNorm, a gradient normalization algorithm that improves multi-task...
Meta-learning techniques are shown to benefit from the use of deep...
The paper proposes Stochastic Weight Averaging (SWA) procedure for improving the...
The paper explores the connections between the concepts of a single...
The paper compares replay-based approaches with model-based approaches in Reinforcement Learning...
The paper proposed a Technique for improving the generalization ability of...
The paper considers learning scenarios where the training data is available...
The paper builds on the prior work on self-supervised contrastive learning...