Papers I Read Notes and Summaries

Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One

Introduction

  • The paper proposed a framework for joint modeling of labels and...


Massively Multilingual Neural Machine Translation in the Wild - Findings and Challenges

Introduction

  • The paper proposes to build a universal neural machine translation system...


Observational Overfitting in Reinforcement Learning

Introduction

  • The paper studies observational overfitting: The phenomenon where an agent overfits...


Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML

Introduction

  • The paper investigated two possible reasons behind the usefulness of MAML...


Accurate, Large Minibatch SGD - Training ImageNet in 1 Hour

Introduction

  • Training models with large minibatches (using distributed synchronous SGD) can lead...


Superposition of many models into one

Introduction

  • The paper proposes a technique (called Parameter Superposition or PSP) for...


Towards a Unified Theory of State Abstraction for MDPs

Introduction

  • The paper studies five different techniques for stat abstraction in MDPs...


ALBERT - A Lite BERT for Self-supervised Learning of Language Representations

Introduction

  • The paper proposes parameter-reduction techniques to lower the memory consumption (and...


Everything Happens for a Reason - Discovering the Purpose of Actions in Procedural Text

Introduction

  • Procedural text comprehension tasks focus on modeling the effect of actions...


Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

Introduction

  • The paper presents the MuZero algorithm that performs planning with a...