Papers I Read Notes and Summaries

Accurate, Large Minibatch SGD - Training ImageNet in 1 Hour

Introduction

  • Training models with large minibatches (using distributed synchronous SGD) can lead...


Superposition of many models into one

Introduction

  • The paper proposes a technique (called Parameter Superposition or PSP) for...


Towards a Unified Theory of State Abstraction for MDPs

Introduction

  • The paper studies five different techniques for stat abstraction in MDPs...


ALBERT - A Lite BERT for Self-supervised Learning of Language Representations

Introduction

  • The paper proposes parameter-reduction techniques to lower the memory consumption (and...


Everything Happens for a Reason - Discovering the Purpose of Actions in Procedural Text

Introduction

  • Procedural text comprehension tasks focus on modeling the effect of actions...


Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

Introduction

  • The paper presents the MuZero algorithm that performs planning with a...


Contrastive Learning of Structured World Models

Introduction

  • The paper introduces Contrastively-trained Structured World Models (C-SWMs).

  • These...


Gossip based Actor-Learner Architectures for Deep RL


How to train your MAML

Introduction

  • The paper proposes MAML++ - a modification of MAML algorithm that...


PHYRE - A New Benchmark for Physical Reasoning

Introduction

  • The paper proposes the PHYRE (PHYsical REasoning) benchmark - consisting of...