I am trying a new initiative - *A Paper A Week*. This blog will hold all the notes and summaries.

© 2020. All rights reserved.

- 2002 2
- 2006 1
- 2010 1
- 2014 4
- 2015 7
- 2016 9
- 2017 29
- 2018 30
- 2019 29
- 2020 5
- AAAI 1
- AAAI 2018 1
- AAMAS 1
- AAMAS 2019 1
- ACL 4
- ACL 2015 1
- ACL 2016 1
- ACL 2017 2
- AI 110
- Abductive Reasoning 1
- Abstract Summarization 1
- Accelerated Training 1
- Activation 1
- Activation Function 1
- Adversarial 1
- Adversarial Robustness 2
- Attention 6
- BN 1
- Batch Normalisation 1
- BatchNorm 1
- Benchmark 1
- CL 5
- CV 14
- CVPR 2
- CVPR 2016 1
- CVPR 2017 1
- Calibration 1
- Catastrophic Forgetting 5
- Causal Learning 2
- Causality 2
- Chemistry 1
- Classifier 1
- Clustering 2
- Compositionality 2
- Conditional Computation 1
- Continual Learning 6
- Contrastive 2
- Contrastive Learning 2
- Conversational Agent 1
- Count Based VQA 1
- Credit Assignment 1
- Curriculum Learning 1
- DRL 10
- Data Augmentation 2
- Dataset 8
- Decentralized Reinforcement Learning 1
- Deep Reinforcement Learning 11
- Dependency Parsing 1
- Distributed Computing 2
- Distributed Reinforcement Learning 1
- Distributed SGD 1
- Dynamical System 1
- EBM 1
- ECCV 1
- ECCV 2010 1
- EMNLP 6
- EMNLP 2014 1
- EMNLP 2016 2
- EMNLP 2017 2
- EMNLP 2019 1
- ERM 1
- Economics 1
- Embedding 12
- Emergent Language 1
- Empirical 2
- Empirical Advice 4
- Energy-Based Models 1
- Entropy 1
- Environment 2
- Evaluating Generalization 3
- Evaluation 3
- Exploration 1
- Factorization 1
- Finetuning 1
- GNN 6
- Gating 1
- Generalization 5
- Generative Models 2
- Gradient Manipulation 2
- Gradient Normalization 1
- Graph 17
- Graph Neural Network 5
- Graph Representation 11
- Grounded Language Learning 1
- HRL 2
- Hierarchial RNN 1
- Hierarchical RL 1
- Hierarchical Reinforcement Learning 2
- Hybrid Models 1
- Hyperbolic Embedding 2
- Hyperboloid Model 1
- Hypothesis 1
- ICCV 1
- ICCV 2015 1
- ICLR 21
- ICLR 2014 1
- ICLR 2015 1
- ICLR 2016 1
- ICLR 2017 1
- ICLR 2018 4
- ICLR 2018' 1
- ICLR 2019 8
- ICLR 2020 5
- ICML 12
- ICML 2016 1
- ICML 2017 1
- ICML 2018 6
- ICML 2019 3
- ICML 2020 1
- ICML 2020' 1
- IRL 1
- ImageNet 3
- Incremental Learning 2
- Information Retrieval 2
- Information Theory 1
- Initialization 1
- Interactive Teaching 1
- Inverse Reinforcement Learning 1
- KD 1
- KDD 2
- KDD 2015 1
- KDD 2017 1
- KRU 1
- Kernel 1
- Key Value 1
- Knowledge Distillation 1
- Knowledge Transfer 2
- Kronecker 1
- LL 2
- LR 1
- Latent Variable 1
- Learning Optimizer' 1
- Learning Rate 1
- Lifelong Learning 6
- Linear Algebra 1
- Linear Model 1
- Long-tailed Dataset 1
- Loss 2
- Loss Function 2
- MAML 3
- MANN 1
- MDP 2
- MPNN 1
- Machine Comprehension 4
- Markov Decision Process 2
- Matrix 1
- Matrix Factorization 1
- Memory 3
- Memory Augmented Neural Network 1
- Message Passing 1
- Meta Learning 8
- Meta Reinforcement Learning 1
- Mixture of Experts 1
- Model-Based 5
- Model-Free 3
- Modular ML 1
- Modular Meta Learning 1
- Modular Network 1
- Module 1
- Motif 2
- Mujoco 1
- Multi Domain 1
- Multi Modal 2
- Multi Model 1
- Multi Task 5
- Multi-Agent 1
- NIPS 7
- NIPS 2014 2
- NIPS 2015 2
- NIPS 2017 3
- NIPS Workskop 1
- NLG 1
- NLI 1
- NLP 40
- NMT 2
- Natural Language Inference 2
- Natural Language Processing 11
- Network 3
- Network Embedding 1
- NeurIPS 2
- NeurIPS 2018 2
- NeurIPS 2019 1
- NeurIPS Workshop 2018 1
- Neural Computation 1
- Neural Computation 2002 1
- Neural Machine Translation 1
- Neural Message Passing 1
- Neural Module Network 1
- Neurips 3
- Neurips 2018 1
- Neurips 2019 2
- Normalization 1
- Object-Oriented Learning 1
- Off policy RL 2
- One shot learning 1
- Online Learning 1
- Optimizer 1
- Out of Distribution 2
- Out of Distribution Detection 1
- Out of Vocabulary Words 1
- Outlier Detection 1
- POS 1
- Physical Reasoning 1
- Physics 2
- Planning 2
- Poincare Ball Model 2
- Pointer Network 1
- Pooling 1
- Pretraining 2
- Procedural Text 1
- Pruning Network 1
- QA 7
- RL 26
- RNN 4
- RRL 1
- Reasoning 3
- Recurrent Neural Network 2
- Reinforcement Learning 20
- Relation Learning 3
- Relational Inference 1
- Relational Learning 5
- Relational Network 1
- Replay Buffer 1
- Representation Learning 4
- Robustness 2
- SAT 1
- SGD 1
- SOTA 9
- SWA 1
- Sample Efficient 2
- Scale 1
- Science 2
- Science 2002 1
- Science 2016 1
- Self Gated 1
- Self Supervised 1
- Semantic Loss 1
- Sentiment Analysis 1
- Seq2Seq 1
- Sequential models 1
- Set 1
- Softmax 2
- Speech 1
- State Abstraction 1
- Stochastic Gradient Descent 1
- Structured Exploration 1
- Summarization 1
- Symbolic Knowledge 1
- Synchronous SGD 1
- Theory 1
- Transfer Learning 5
- Transformer 2
- Tree 1
- Tucker Decomposition 1
- UAI 1
- UAI 2018 1
- Unsupervised 4
- VAE 1
- VQA 6
- Virtual Embodiment 1
- WACV 1
- WACV 2017 1
- Weight Adaptation 1
- Word Vectors 3
- Workshop 2

- » Multiple Model-Based Reinforcement Learning
- » Network Motifs - Simple Building Blocks of Complex Networks

- » An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural Networks
- » How transferable are features in deep neural networks
- » Distilling the Knowledge in a Neural Network
- » A Fast and Accurate Dependency Parser using Neural Networks

- » Exploring Models and Data for Image Question Answering
- » PTE - Predictive Text Embedding through Large-scale Heterogeneous Text Networks
- » Word Representations via Gaussian Embedding
- » Pointer Networks
- » Two/Too Simple Adaptations of Word2Vec for Syntax Problems
- » Simple Baseline for Visual Question Answering
- » VQA-Visual Question Answering

- » One-shot Learning with Memory-Augmented Neural Networks
- » Net2Net-Accelerating Learning via Knowledge Transfer
- » Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning
- » Stylistic Transfer in Natural Language Generation Systems Using Recurrent Neural Networks
- » Revisiting Semi-Supervised Learning with Graph Embeddings
- » Higher-order organization of complex networks
- » Ask Me Anything - Dynamic Memory Networks for Natural Language Processing
- » A Decomposable Attention Model for Natural Language Inference
- » Neural Module Networks

- » Outrageously Large Neural Networks--The Sparsely-Gated Mixture-of-Experts Layer
- » GradNorm--Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks
- » mixup - Beyond Empirical Risk Minimization
- » Accurate, Large Minibatch SGD - Training ImageNet in 1 Hour
- » Linguistic Knowledge as Memory for Recurrent Neural Networks
- » Hindsight Experience Replay
- » Learned Optimizers that Scale and Generalize
- » Poincaré Embeddings for Learning Hierarchical Representations
- » HoME - a Household Multimodal Environment
- » Imagination-Augmented Agents for Deep Reinforcement Learning
- » Neural Message Passing for Quantum Chemistry
- » Unsupervised Learning by Predicting Noise
- » Cyclical Learning Rates for Training Neural Networks
- » Get To The Point - Summarization with Pointer-Generator Networks
- » StarSpace - Embed All The Things!
- » Emotional Chatting Machine - Emotional Conversation Generation with Internal and External Memory
- » Two-Stage Synthesis Networks for Transfer Learning in Machine Comprehension
- » HARP - Hierarchical Representation Learning for Networks
- » Swish - a Self-Gated Activation Function
- » Reading Wikipedia to Answer Open-Domain Questions
- » Task-Oriented Query Reformulation with Reinforcement Learning
- » Refining Source Representations with Relation Networks for Neural Machine Translation
- » Learning to Compute Word Embeddings On the Fly
- » R-NET - Machine Reading Comprehension with Self-matching Networks
- » ReasoNet - Learning to Stop Reading in Machine Comprehension
- » Principled Detection of Out-of-Distribution Examples in Neural Networks
- » One Model To Learn Them All
- » Making the V in VQA Matter - Elevating the Role of Image Understanding in Visual Question Answering
- » Conditional Similarity Networks

- » Deep Reinforcement Learning and the Deadly Triad
- » Averaging Weights leads to Wider Optima and Better Generalization
- » Competitive Training of Mixtures of Independent Deep Generative Models
- » How to train your MAML
- » Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models
- » Assessing Generalization in Deep Reinforcement Learning
- » Quantifying Generalization in Reinforcement Learning
- » Set Transformer - A Framework for Attention-based Permutation-Invariant Neural Networks
- » Measuring abstract reasoning in neural networks
- » Meta-Reinforcement Learning of Structured Exploration Strategies
- » Relational Reinforcement Learning
- » Towards a natural benchmark for continual learning
- » Meta-Learning Update Rules for Unsupervised Representation Learning
- » Modular meta-learning
- » Pre-training Graph Neural Networks with Kernels
- » Smooth Loss Functions for Deep Top-k Classification
- » Representation Tradeoffs for Hyperbolic Embeddings
- » BabyAI - First Steps Towards Grounded Language Learning With a Human In the Loop
- » When Recurrent Models Don’t Need To Be Recurrent
- » Emergence of Grounded Compositional Language in Multi-Agent Populations
- » A Semantic Loss Function for Deep Learning with Symbolic Knowledge
- » Hierarchical Graph Representation Learning with Differentiable Pooling
- » Kronecker Recurrent Units
- » Learning Independent Causal Mechanisms
- » Memory-based Parameter Adaptation
- » Born Again Neural Networks
- » Learning to Count Objects in Natural Images for Visual Question Answering
- » The Lottery Ticket Hypothesis - Training Pruned Neural Networks
- » Learning an SAT Solver from Single-Bit Supervision
- » Neural Relational Inference for Interacting Systems

- » Gradient Surgery for Multi-Task Learning
- » When to use parametric models in reinforcement learning?
- » Network Randomization - A Simple Technique for Generalization in Deep Reinforcement Learning
- » On the Difficulty of Warm-Starting Neural Network Training
- » ELECTRA - Pre-training Text Encoders as Discriminators Rather Than Generators
- » Gradient based sample selection for online continual learning
- » Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One
- » Massively Multilingual Neural Machine Translation in the Wild - Findings and Challenges
- » Observational Overfitting in Reinforcement Learning
- » Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML
- » Superposition of many models into one
- » ALBERT - A Lite BERT for Self-supervised Learning of Language Representations
- » Everything Happens for a Reason - Discovering the Purpose of Actions in Procedural Text
- » Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model
- » Contrastive Learning of Structured World Models
- » Gossip based Actor-Learner Architectures for Deep RL
- » PHYRE - A New Benchmark for Physical Reasoning
- » Large Memory Layers with Product Keys
- » Abductive Commonsense Reasoning
- » Hamiltonian Neural Networks
- » Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations
- » Good-Enough Compositional Data Augmentation
- » GNN Explainer - A Tool for Post-hoc Explanation of Graph Neural Networks
- » To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks
- » Model Primitive Hierarchical Lifelong Reinforcement Learning
- » TuckER - Tensor Factorization for Knowledge Graph Completion
- » Diversity is All You Need - Learning Skills without a Reward Function
- » Hierarchical RL Using an Ensemble of Proprioceptive Periodic Policies
- » Efficient Lifelong Learning with A-GEM

- » Alpha Net--Adaptation with Composition in Classifier Space
- » TaskNorm--Rethinking Batch Normalization for Meta-Learning
- » Decentralized Reinforcement Learning -- Global Decision-Making via Local Economic Transactions
- » Supervised Contrastive Learning
- » CURL - Contrastive Unsupervised Representations for Reinforcement Learning

- » Stylistic Transfer in Natural Language Generation Systems Using Recurrent Neural Networks
- » Get To The Point - Summarization with Pointer-Generator Networks
- » Reading Wikipedia to Answer Open-Domain Questions
- » Two/Too Simple Adaptations of Word2Vec for Syntax Problems

- » Get To The Point - Summarization with Pointer-Generator Networks
- » Reading Wikipedia to Answer Open-Domain Questions

- » Deep Reinforcement Learning and the Deadly Triad
- » Alpha Net--Adaptation with Composition in Classifier Space
- » Outrageously Large Neural Networks--The Sparsely-Gated Mixture-of-Experts Layer
- » Gradient Surgery for Multi-Task Learning
- » GradNorm--Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks
- » TaskNorm--Rethinking Batch Normalization for Meta-Learning
- » Averaging Weights leads to Wider Optima and Better Generalization
- » Decentralized Reinforcement Learning -- Global Decision-Making via Local Economic Transactions
- » When to use parametric models in reinforcement learning?
- » Network Randomization - A Simple Technique for Generalization in Deep Reinforcement Learning
- » On the Difficulty of Warm-Starting Neural Network Training
- » Supervised Contrastive Learning
- » CURL - Contrastive Unsupervised Representations for Reinforcement Learning
- » Competitive Training of Mixtures of Independent Deep Generative Models
- » What Does Classifying More Than 10,000 Image Categories Tell Us?
- » mixup - Beyond Empirical Risk Minimization
- » ELECTRA - Pre-training Text Encoders as Discriminators Rather Than Generators
- » Gradient based sample selection for online continual learning
- » Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One
- » Massively Multilingual Neural Machine Translation in the Wild - Findings and Challenges
- » Observational Overfitting in Reinforcement Learning
- » Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML
- » Accurate, Large Minibatch SGD - Training ImageNet in 1 Hour
- » Superposition of many models into one
- » Towards a Unified Theory of State Abstraction for MDPs
- » ALBERT - A Lite BERT for Self-supervised Learning of Language Representations
- » Everything Happens for a Reason - Discovering the Purpose of Actions in Procedural Text
- » Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model
- » Contrastive Learning of Structured World Models
- » Gossip based Actor-Learner Architectures for Deep RL
- » How to train your MAML
- » PHYRE - A New Benchmark for Physical Reasoning
- » Large Memory Layers with Product Keys
- » Abductive Commonsense Reasoning
- » Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models
- » Assessing Generalization in Deep Reinforcement Learning
- » Quantifying Generalization in Reinforcement Learning
- » Set Transformer - A Framework for Attention-based Permutation-Invariant Neural Networks
- » Measuring abstract reasoning in neural networks
- » Hamiltonian Neural Networks
- » Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations
- » Meta-Reinforcement Learning of Structured Exploration Strategies
- » Relational Reinforcement Learning
- » Good-Enough Compositional Data Augmentation
- » Multiple Model-Based Reinforcement Learning
- » Towards a natural benchmark for continual learning
- » Meta-Learning Update Rules for Unsupervised Representation Learning
- » GNN Explainer - A Tool for Post-hoc Explanation of Graph Neural Networks
- » To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks
- » Model Primitive Hierarchical Lifelong Reinforcement Learning
- » TuckER - Tensor Factorization for Knowledge Graph Completion
- » Linguistic Knowledge as Memory for Recurrent Neural Networks
- » Hierarchical RL Using an Ensemble of Proprioceptive Periodic Policies
- » Efficient Lifelong Learning with A-GEM
- » Pre-training Graph Neural Networks with Kernels
- » Smooth Loss Functions for Deep Top-k Classification
- » Hindsight Experience Replay
- » Representation Tradeoffs for Hyperbolic Embeddings
- » Learned Optimizers that Scale and Generalize
- » One-shot Learning with Memory-Augmented Neural Networks
- » BabyAI - First Steps Towards Grounded Language Learning With a Human In the Loop
- » Poincaré Embeddings for Learning Hierarchical Representations
- » When Recurrent Models Don’t Need To Be Recurrent
- » HoME - a Household Multimodal Environment
- » Emergence of Grounded Compositional Language in Multi-Agent Populations
- » A Semantic Loss Function for Deep Learning with Symbolic Knowledge
- » Hierarchical Graph Representation Learning with Differentiable Pooling
- » Imagination-Augmented Agents for Deep Reinforcement Learning
- » Kronecker Recurrent Units
- » Learning Independent Causal Mechanisms
- » Memory-based Parameter Adaptation
- » Born Again Neural Networks
- » Net2Net-Accelerating Learning via Knowledge Transfer
- » Learning to Count Objects in Natural Images for Visual Question Answering
- » Neural Message Passing for Quantum Chemistry
- » Unsupervised Learning by Predicting Noise
- » The Lottery Ticket Hypothesis - Training Pruned Neural Networks
- » Cyclical Learning Rates for Training Neural Networks
- » Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning
- » An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural Networks
- » Learning an SAT Solver from Single-Bit Supervision
- » Neural Relational Inference for Interacting Systems
- » Stylistic Transfer in Natural Language Generation Systems Using Recurrent Neural Networks
- » Get To The Point - Summarization with Pointer-Generator Networks
- » Emotional Chatting Machine - Emotional Conversation Generation with Internal and External Memory
- » Exploring Models and Data for Image Question Answering
- » How transferable are features in deep neural networks
- » Distilling the Knowledge in a Neural Network
- » Revisiting Semi-Supervised Learning with Graph Embeddings
- » Two-Stage Synthesis Networks for Transfer Learning in Machine Comprehension
- » Word Representations via Gaussian Embedding
- » HARP - Hierarchical Representation Learning for Networks
- » Swish - a Self-Gated Activation Function
- » Reading Wikipedia to Answer Open-Domain Questions
- » Task-Oriented Query Reformulation with Reinforcement Learning
- » Refining Source Representations with Relation Networks for Neural Machine Translation
- » Pointer Networks
- » Learning to Compute Word Embeddings On the Fly
- » R-NET - Machine Reading Comprehension with Self-matching Networks
- » ReasoNet - Learning to Stop Reading in Machine Comprehension
- » Principled Detection of Out-of-Distribution Examples in Neural Networks
- » Ask Me Anything - Dynamic Memory Networks for Natural Language Processing
- » One Model To Learn Them All
- » Two/Too Simple Adaptations of Word2Vec for Syntax Problems
- » A Decomposable Attention Model for Natural Language Inference
- » Neural Module Networks
- » Making the V in VQA Matter - Elevating the Role of Image Understanding in Visual Question Answering
- » Conditional Similarity Networks
- » Simple Baseline for Visual Question Answering
- » VQA-Visual Question Answering

- » mixup - Beyond Empirical Risk Minimization
- » Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One

- » ELECTRA - Pre-training Text Encoders as Discriminators Rather Than Generators
- » ALBERT - A Lite BERT for Self-supervised Learning of Language Representations
- » Large Memory Layers with Product Keys
- » Ask Me Anything - Dynamic Memory Networks for Natural Language Processing
- » One Model To Learn Them All
- » A Decomposable Attention Model for Natural Language Inference

- » Gradient based sample selection for online continual learning
- » Superposition of many models into one
- » Towards a natural benchmark for continual learning
- » Model Primitive Hierarchical Lifelong Reinforcement Learning
- » Efficient Lifelong Learning with A-GEM

- » What Does Classifying More Than 10,000 Image Categories Tell Us?
- » Efficient Lifelong Learning with A-GEM
- » Net2Net-Accelerating Learning via Knowledge Transfer
- » Learning to Count Objects in Natural Images for Visual Question Answering
- » Unsupervised Learning by Predicting Noise
- » Exploring Models and Data for Image Question Answering
- » How transferable are features in deep neural networks
- » Principled Detection of Out-of-Distribution Examples in Neural Networks
- » One Model To Learn Them All
- » Neural Module Networks
- » Making the V in VQA Matter - Elevating the Role of Image Understanding in Visual Question Answering
- » Conditional Similarity Networks
- » Simple Baseline for Visual Question Answering
- » VQA-Visual Question Answering

- » Gradient based sample selection for online continual learning
- » Towards a natural benchmark for continual learning
- » Model Primitive Hierarchical Lifelong Reinforcement Learning
- » Efficient Lifelong Learning with A-GEM
- » An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural Networks

- » Competitive Training of Mixtures of Independent Deep Generative Models
- » Learning Independent Causal Mechanisms

- » Competitive Training of Mixtures of Independent Deep Generative Models
- » Learning Independent Causal Mechanisms

- » Competitive Training of Mixtures of Independent Deep Generative Models
- » Higher-order organization of complex networks

- » Alpha Net--Adaptation with Composition in Classifier Space
- » Good-Enough Compositional Data Augmentation

- » Gradient based sample selection for online continual learning
- » Superposition of many models into one
- » Towards a natural benchmark for continual learning
- » Model Primitive Hierarchical Lifelong Reinforcement Learning
- » Efficient Lifelong Learning with A-GEM
- » Memory-based Parameter Adaptation

- » Supervised Contrastive Learning
- » CURL - Contrastive Unsupervised Representations for Reinforcement Learning

- » Supervised Contrastive Learning
- » CURL - Contrastive Unsupervised Representations for Reinforcement Learning

- » Deep Reinforcement Learning and the Deadly Triad
- » When to use parametric models in reinforcement learning?
- » Network Randomization - A Simple Technique for Generalization in Deep Reinforcement Learning
- » CURL - Contrastive Unsupervised Representations for Reinforcement Learning
- » Observational Overfitting in Reinforcement Learning
- » Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model
- » Gossip based Actor-Learner Architectures for Deep RL
- » Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models
- » Assessing Generalization in Deep Reinforcement Learning
- » Quantifying Generalization in Reinforcement Learning

- » Everything Happens for a Reason - Discovering the Purpose of Actions in Procedural Text
- » PHYRE - A New Benchmark for Physical Reasoning
- » Abductive Commonsense Reasoning
- » Exploring Models and Data for Image Question Answering
- » Reading Wikipedia to Answer Open-Domain Questions
- » Neural Module Networks
- » VQA-Visual Question Answering

- » Deep Reinforcement Learning and the Deadly Triad
- » When to use parametric models in reinforcement learning?
- » Network Randomization - A Simple Technique for Generalization in Deep Reinforcement Learning
- » CURL - Contrastive Unsupervised Representations for Reinforcement Learning
- » Observational Overfitting in Reinforcement Learning
- » Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model
- » Gossip based Actor-Learner Architectures for Deep RL
- » Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models
- » Assessing Generalization in Deep Reinforcement Learning
- » Quantifying Generalization in Reinforcement Learning
- » Relational Reinforcement Learning

- » Outrageously Large Neural Networks--The Sparsely-Gated Mixture-of-Experts Layer
- » Accurate, Large Minibatch SGD - Training ImageNet in 1 Hour

- » Everything Happens for a Reason - Discovering the Purpose of Actions in Procedural Text
- » Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning
- » Two-Stage Synthesis Networks for Transfer Learning in Machine Comprehension
- » Task-Oriented Query Reformulation with Reinforcement Learning
- » A Decomposable Attention Model for Natural Language Inference
- » A Fast and Accurate Dependency Parser using Neural Networks

- » Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning
- » A Decomposable Attention Model for Natural Language Inference

- » Two-Stage Synthesis Networks for Transfer Learning in Machine Comprehension
- » Task-Oriented Query Reformulation with Reinforcement Learning

- » GNN Explainer - A Tool for Post-hoc Explanation of Graph Neural Networks
- » TuckER - Tensor Factorization for Knowledge Graph Completion
- » Representation Tradeoffs for Hyperbolic Embeddings
- » Poincaré Embeddings for Learning Hierarchical Representations
- » Unsupervised Learning by Predicting Noise
- » StarSpace - Embed All The Things!
- » PTE - Predictive Text Embedding through Large-scale Heterogeneous Text Networks
- » Revisiting Semi-Supervised Learning with Graph Embeddings
- » HARP - Hierarchical Representation Learning for Networks
- » Learning to Compute Word Embeddings On the Fly
- » Two/Too Simple Adaptations of Word2Vec for Syntax Problems
- » Conditional Similarity Networks

- » Deep Reinforcement Learning and the Deadly Triad
- » On the Difficulty of Warm-Starting Neural Network Training

- » Deep Reinforcement Learning and the Deadly Triad
- » Accurate, Large Minibatch SGD - Training ImageNet in 1 Hour
- » How to train your MAML
- » To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks

- » Quantifying Generalization in Reinforcement Learning
- » BabyAI - First Steps Towards Grounded Language Learning With a Human In the Loop

- » Observational Overfitting in Reinforcement Learning
- » Assessing Generalization in Deep Reinforcement Learning
- » Quantifying Generalization in Reinforcement Learning

- » Observational Overfitting in Reinforcement Learning
- » Assessing Generalization in Deep Reinforcement Learning
- » Quantifying Generalization in Reinforcement Learning

- » Contrastive Learning of Structured World Models
- » GNN Explainer - A Tool for Post-hoc Explanation of Graph Neural Networks
- » Pre-training Graph Neural Networks with Kernels
- » Hierarchical Graph Representation Learning with Differentiable Pooling
- » Learning an SAT Solver from Single-Bit Supervision
- » Neural Relational Inference for Interacting Systems

- » Averaging Weights leads to Wider Optima and Better Generalization
- » Network Randomization - A Simple Technique for Generalization in Deep Reinforcement Learning
- » Observational Overfitting in Reinforcement Learning
- » Assessing Generalization in Deep Reinforcement Learning
- » Quantifying Generalization in Reinforcement Learning

- » Competitive Training of Mixtures of Independent Deep Generative Models
- » Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One

- » Gradient Surgery for Multi-Task Learning
- » GradNorm--Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks

- » Everything Happens for a Reason - Discovering the Purpose of Actions in Procedural Text
- » Contrastive Learning of Structured World Models
- » GNN Explainer - A Tool for Post-hoc Explanation of Graph Neural Networks
- » TuckER - Tensor Factorization for Knowledge Graph Completion
- » Pre-training Graph Neural Networks with Kernels
- » Representation Tradeoffs for Hyperbolic Embeddings
- » Poincaré Embeddings for Learning Hierarchical Representations
- » Hierarchical Graph Representation Learning with Differentiable Pooling
- » Neural Message Passing for Quantum Chemistry
- » Learning an SAT Solver from Single-Bit Supervision
- » Neural Relational Inference for Interacting Systems
- » StarSpace - Embed All The Things!
- » PTE - Predictive Text Embedding through Large-scale Heterogeneous Text Networks
- » Revisiting Semi-Supervised Learning with Graph Embeddings
- » Higher-order organization of complex networks
- » Network Motifs - Simple Building Blocks of Complex Networks
- » HARP - Hierarchical Representation Learning for Networks

- » Contrastive Learning of Structured World Models
- » GNN Explainer - A Tool for Post-hoc Explanation of Graph Neural Networks
- » Pre-training Graph Neural Networks with Kernels
- » Hierarchical Graph Representation Learning with Differentiable Pooling
- » Learning an SAT Solver from Single-Bit Supervision

- » GNN Explainer - A Tool for Post-hoc Explanation of Graph Neural Networks
- » TuckER - Tensor Factorization for Knowledge Graph Completion
- » Pre-training Graph Neural Networks with Kernels
- » Representation Tradeoffs for Hyperbolic Embeddings
- » Poincaré Embeddings for Learning Hierarchical Representations
- » Hierarchical Graph Representation Learning with Differentiable Pooling
- » Neural Message Passing for Quantum Chemistry
- » Neural Relational Inference for Interacting Systems
- » StarSpace - Embed All The Things!
- » Revisiting Semi-Supervised Learning with Graph Embeddings
- » HARP - Hierarchical Representation Learning for Networks

- » Decentralized Reinforcement Learning -- Global Decision-Making via Local Economic Transactions
- » Model Primitive Hierarchical Lifelong Reinforcement Learning

- » Decentralized Reinforcement Learning -- Global Decision-Making via Local Economic Transactions
- » Model Primitive Hierarchical Lifelong Reinforcement Learning

- » Representation Tradeoffs for Hyperbolic Embeddings
- » Poincaré Embeddings for Learning Hierarchical Representations

- » Outrageously Large Neural Networks--The Sparsely-Gated Mixture-of-Experts Layer
- » Network Randomization - A Simple Technique for Generalization in Deep Reinforcement Learning
- » mixup - Beyond Empirical Risk Minimization
- » ELECTRA - Pre-training Text Encoders as Discriminators Rather Than Generators
- » Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One
- » Observational Overfitting in Reinforcement Learning
- » Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML
- » ALBERT - A Lite BERT for Self-supervised Learning of Language Representations
- » How to train your MAML
- » Measuring abstract reasoning in neural networks
- » Relational Reinforcement Learning
- » Meta-Learning Update Rules for Unsupervised Representation Learning
- » Diversity is All You Need - Learning Skills without a Reward Function
- » Hierarchical RL Using an Ensemble of Proprioceptive Periodic Policies
- » Efficient Lifelong Learning with A-GEM
- » Smooth Loss Functions for Deep Top-k Classification
- » BabyAI - First Steps Towards Grounded Language Learning With a Human In the Loop
- » Memory-based Parameter Adaptation
- » Learning to Count Objects in Natural Images for Visual Question Answering
- » An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural Networks
- » Word Representations via Gaussian Embedding

- » Smooth Loss Functions for Deep Top-k Classification
- » BabyAI - First Steps Towards Grounded Language Learning With a Human In the Loop
- » Memory-based Parameter Adaptation
- » Learning to Count Objects in Natural Images for Visual Question Answering

- » ALBERT - A Lite BERT for Self-supervised Learning of Language Representations
- » How to train your MAML
- » Measuring abstract reasoning in neural networks
- » Relational Reinforcement Learning
- » Meta-Learning Update Rules for Unsupervised Representation Learning
- » Diversity is All You Need - Learning Skills without a Reward Function
- » Hierarchical RL Using an Ensemble of Proprioceptive Periodic Policies
- » Efficient Lifelong Learning with A-GEM

- » Network Randomization - A Simple Technique for Generalization in Deep Reinforcement Learning
- » ELECTRA - Pre-training Text Encoders as Discriminators Rather Than Generators
- » Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One
- » Observational Overfitting in Reinforcement Learning
- » Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML

- » GradNorm--Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks
- » TaskNorm--Rethinking Batch Normalization for Meta-Learning
- » Decentralized Reinforcement Learning -- Global Decision-Making via Local Economic Transactions
- » Quantifying Generalization in Reinforcement Learning
- » Set Transformer - A Framework for Attention-based Permutation-Invariant Neural Networks
- » Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations
- » Learned Optimizers that Scale and Generalize
- » A Semantic Loss Function for Deep Learning with Symbolic Knowledge
- » Kronecker Recurrent Units
- » Learning Independent Causal Mechanisms
- » Born Again Neural Networks
- » Revisiting Semi-Supervised Learning with Graph Embeddings

- » GradNorm--Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks
- » Representation Tradeoffs for Hyperbolic Embeddings
- » A Semantic Loss Function for Deep Learning with Symbolic Knowledge
- » Kronecker Recurrent Units
- » Learning Independent Causal Mechanisms
- » Born Again Neural Networks

- » Quantifying Generalization in Reinforcement Learning
- » Set Transformer - A Framework for Attention-based Permutation-Invariant Neural Networks

- » Supervised Contrastive Learning
- » What Does Classifying More Than 10,000 Image Categories Tell Us?
- » Accurate, Large Minibatch SGD - Training ImageNet in 1 Hour

- » Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning
- » Task-Oriented Query Reformulation with Reinforcement Learning

- » PTE - Predictive Text Embedding through Large-scale Heterogeneous Text Networks
- » ReasoNet - Learning to Stop Reading in Machine Comprehension

- » Gradient based sample selection for online continual learning
- » Superposition of many models into one

- » Gradient based sample selection for online continual learning
- » Superposition of many models into one
- » Towards a natural benchmark for continual learning
- » Model Primitive Hierarchical Lifelong Reinforcement Learning
- » Efficient Lifelong Learning with A-GEM
- » Net2Net-Accelerating Learning via Knowledge Transfer

- » Smooth Loss Functions for Deep Top-k Classification
- » A Semantic Loss Function for Deep Learning with Symbolic Knowledge

- » Smooth Loss Functions for Deep Top-k Classification
- » A Semantic Loss Function for Deep Learning with Symbolic Knowledge

- » TaskNorm--Rethinking Batch Normalization for Meta-Learning
- » Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML
- » How to train your MAML

- » Observational Overfitting in Reinforcement Learning
- » Towards a Unified Theory of State Abstraction for MDPs

- » Two-Stage Synthesis Networks for Transfer Learning in Machine Comprehension
- » Reading Wikipedia to Answer Open-Domain Questions
- » R-NET - Machine Reading Comprehension with Self-matching Networks
- » ReasoNet - Learning to Stop Reading in Machine Comprehension

- » Observational Overfitting in Reinforcement Learning
- » Towards a Unified Theory of State Abstraction for MDPs

- » Large Memory Layers with Product Keys
- » Linguistic Knowledge as Memory for Recurrent Neural Networks
- » One-shot Learning with Memory-Augmented Neural Networks

- » TaskNorm--Rethinking Batch Normalization for Meta-Learning
- » Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML
- » How to train your MAML
- » Meta-Reinforcement Learning of Structured Exploration Strategies
- » Meta-Learning Update Rules for Unsupervised Representation Learning
- » Modular meta-learning
- » Learned Optimizers that Scale and Generalize
- » One-shot Learning with Memory-Augmented Neural Networks

- » When to use parametric models in reinforcement learning?
- » Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model
- » Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models
- » Multiple Model-Based Reinforcement Learning
- » Imagination-Augmented Agents for Deep Reinforcement Learning

- » When to use parametric models in reinforcement learning?
- » Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model
- » Imagination-Augmented Agents for Deep Reinforcement Learning

- » Higher-order organization of complex networks
- » Network Motifs - Simple Building Blocks of Complex Networks

- » Gradient Surgery for Multi-Task Learning
- » GradNorm--Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks
- » Massively Multilingual Neural Machine Translation in the Wild - Findings and Challenges
- » To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks
- » StarSpace - Embed All The Things!

- » Hindsight Experience Replay
- » HoME - a Household Multimodal Environment
- » Imagination-Augmented Agents for Deep Reinforcement Learning
- » Exploring Models and Data for Image Question Answering
- » How transferable are features in deep neural networks
- » Distilling the Knowledge in a Neural Network
- » Pointer Networks

- » How transferable are features in deep neural networks
- » Distilling the Knowledge in a Neural Network

- » Hindsight Experience Replay
- » HoME - a Household Multimodal Environment
- » Imagination-Augmented Agents for Deep Reinforcement Learning

- » ELECTRA - Pre-training Text Encoders as Discriminators Rather Than Generators
- » Massively Multilingual Neural Machine Translation in the Wild - Findings and Challenges
- » ALBERT - A Lite BERT for Self-supervised Learning of Language Representations
- » Everything Happens for a Reason - Discovering the Purpose of Actions in Procedural Text
- » Large Memory Layers with Product Keys
- » Abductive Commonsense Reasoning
- » Good-Enough Compositional Data Augmentation
- » To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks
- » Linguistic Knowledge as Memory for Recurrent Neural Networks
- » BabyAI - First Steps Towards Grounded Language Learning With a Human In the Loop
- » Poincaré Embeddings for Learning Hierarchical Representations
- » When Recurrent Models Don’t Need To Be Recurrent
- » Emergence of Grounded Compositional Language in Multi-Agent Populations
- » Kronecker Recurrent Units
- » Learning to Count Objects in Natural Images for Visual Question Answering
- » Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning
- » Stylistic Transfer in Natural Language Generation Systems Using Recurrent Neural Networks
- » Get To The Point - Summarization with Pointer-Generator Networks
- » StarSpace - Embed All The Things!
- » Emotional Chatting Machine - Emotional Conversation Generation with Internal and External Memory
- » Exploring Models and Data for Image Question Answering
- » PTE - Predictive Text Embedding through Large-scale Heterogeneous Text Networks
- » Two-Stage Synthesis Networks for Transfer Learning in Machine Comprehension
- » Word Representations via Gaussian Embedding
- » Reading Wikipedia to Answer Open-Domain Questions
- » Task-Oriented Query Reformulation with Reinforcement Learning
- » Refining Source Representations with Relation Networks for Neural Machine Translation
- » Pointer Networks
- » Learning to Compute Word Embeddings On the Fly
- » R-NET - Machine Reading Comprehension with Self-matching Networks
- » ReasoNet - Learning to Stop Reading in Machine Comprehension
- » Ask Me Anything - Dynamic Memory Networks for Natural Language Processing
- » One Model To Learn Them All
- » Two/Too Simple Adaptations of Word2Vec for Syntax Problems
- » A Decomposable Attention Model for Natural Language Inference
- » A Fast and Accurate Dependency Parser using Neural Networks
- » Neural Module Networks
- » Simple Baseline for Visual Question Answering
- » VQA-Visual Question Answering

- » Massively Multilingual Neural Machine Translation in the Wild - Findings and Challenges
- » Refining Source Representations with Relation Networks for Neural Machine Translation

- » ELECTRA - Pre-training Text Encoders as Discriminators Rather Than Generators
- » Massively Multilingual Neural Machine Translation in the Wild - Findings and Challenges
- » ALBERT - A Lite BERT for Self-supervised Learning of Language Representations
- » Large Memory Layers with Product Keys
- » Abductive Commonsense Reasoning
- » To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks
- » Linguistic Knowledge as Memory for Recurrent Neural Networks
- » BabyAI - First Steps Towards Grounded Language Learning With a Human In the Loop
- » Poincaré Embeddings for Learning Hierarchical Representations
- » When Recurrent Models Don’t Need To Be Recurrent
- » Emergence of Grounded Compositional Language in Multi-Agent Populations

- » PTE - Predictive Text Embedding through Large-scale Heterogeneous Text Networks
- » Higher-order organization of complex networks
- » Network Motifs - Simple Building Blocks of Complex Networks

- » Gradient based sample selection for online continual learning
- » Meta-Reinforcement Learning of Structured Exploration Strategies

- » Meta-Reinforcement Learning of Structured Exploration Strategies
- » Towards a natural benchmark for continual learning

- » When to use parametric models in reinforcement learning?
- » Gossip based Actor-Learner Architectures for Deep RL
- » Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models

- » When to use parametric models in reinforcement learning?
- » Gossip based Actor-Learner Architectures for Deep RL

- » Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One
- » Principled Detection of Out-of-Distribution Examples in Neural Networks

- » When to use parametric models in reinforcement learning?
- » Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

- » Representation Tradeoffs for Hyperbolic Embeddings
- » Poincaré Embeddings for Learning Hierarchical Representations

- » ELECTRA - Pre-training Text Encoders as Discriminators Rather Than Generators
- » Pre-training Graph Neural Networks with Kernels

- » To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks
- » Linguistic Knowledge as Memory for Recurrent Neural Networks
- » Two-Stage Synthesis Networks for Transfer Learning in Machine Comprehension
- » Reading Wikipedia to Answer Open-Domain Questions
- » R-NET - Machine Reading Comprehension with Self-matching Networks
- » ReasoNet - Learning to Stop Reading in Machine Comprehension
- » Ask Me Anything - Dynamic Memory Networks for Natural Language Processing

- » Deep Reinforcement Learning and the Deadly Triad
- » Decentralized Reinforcement Learning -- Global Decision-Making via Local Economic Transactions
- » When to use parametric models in reinforcement learning?
- » Network Randomization - A Simple Technique for Generalization in Deep Reinforcement Learning
- » CURL - Contrastive Unsupervised Representations for Reinforcement Learning
- » Observational Overfitting in Reinforcement Learning
- » Towards a Unified Theory of State Abstraction for MDPs
- » Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model
- » Gossip based Actor-Learner Architectures for Deep RL
- » Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models
- » Assessing Generalization in Deep Reinforcement Learning
- » Quantifying Generalization in Reinforcement Learning
- » Meta-Reinforcement Learning of Structured Exploration Strategies
- » Relational Reinforcement Learning
- » Multiple Model-Based Reinforcement Learning
- » Model Primitive Hierarchical Lifelong Reinforcement Learning
- » Diversity is All You Need - Learning Skills without a Reward Function
- » Hierarchical RL Using an Ensemble of Proprioceptive Periodic Policies
- » Hindsight Experience Replay
- » BabyAI - First Steps Towards Grounded Language Learning With a Human In the Loop
- » Imagination-Augmented Agents for Deep Reinforcement Learning
- » Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning
- » Task-Oriented Query Reformulation with Reinforcement Learning
- » R-NET - Machine Reading Comprehension with Self-matching Networks
- » ReasoNet - Learning to Stop Reading in Machine Comprehension

- » Linguistic Knowledge as Memory for Recurrent Neural Networks
- » Learned Optimizers that Scale and Generalize
- » When Recurrent Models Don’t Need To Be Recurrent
- » Kronecker Recurrent Units

- » Everything Happens for a Reason - Discovering the Purpose of Actions in Procedural Text
- » PHYRE - A New Benchmark for Physical Reasoning
- » Abductive Commonsense Reasoning

- » Deep Reinforcement Learning and the Deadly Triad
- » Decentralized Reinforcement Learning -- Global Decision-Making via Local Economic Transactions
- » When to use parametric models in reinforcement learning?
- » Network Randomization - A Simple Technique for Generalization in Deep Reinforcement Learning
- » CURL - Contrastive Unsupervised Representations for Reinforcement Learning
- » Observational Overfitting in Reinforcement Learning
- » Towards a Unified Theory of State Abstraction for MDPs
- » Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model
- » Gossip based Actor-Learner Architectures for Deep RL
- » Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models
- » Assessing Generalization in Deep Reinforcement Learning
- » Quantifying Generalization in Reinforcement Learning
- » Meta-Reinforcement Learning of Structured Exploration Strategies
- » Relational Reinforcement Learning
- » Multiple Model-Based Reinforcement Learning
- » Model Primitive Hierarchical Lifelong Reinforcement Learning
- » Diversity is All You Need - Learning Skills without a Reward Function
- » Hindsight Experience Replay
- » BabyAI - First Steps Towards Grounded Language Learning With a Human In the Loop

- » Everything Happens for a Reason - Discovering the Purpose of Actions in Procedural Text
- » Set Transformer - A Framework for Attention-based Permutation-Invariant Neural Networks
- » Measuring abstract reasoning in neural networks

- » Everything Happens for a Reason - Discovering the Purpose of Actions in Procedural Text
- » Contrastive Learning of Structured World Models
- » Set Transformer - A Framework for Attention-based Permutation-Invariant Neural Networks
- » Measuring abstract reasoning in neural networks
- » Relational Reinforcement Learning

- » ALBERT - A Lite BERT for Self-supervised Learning of Language Representations
- » StarSpace - Embed All The Things!
- » Word Representations via Gaussian Embedding
- » Refining Source Representations with Relation Networks for Neural Machine Translation

- » mixup - Beyond Empirical Risk Minimization
- » Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One

- » ALBERT - A Lite BERT for Self-supervised Learning of Language Representations
- » Learning to Count Objects in Natural Images for Visual Question Answering
- » Get To The Point - Summarization with Pointer-Generator Networks
- » HARP - Hierarchical Representation Learning for Networks
- » Swish - a Self-Gated Activation Function
- » R-NET - Machine Reading Comprehension with Self-matching Networks
- » ReasoNet - Learning to Stop Reading in Machine Comprehension
- » Ask Me Anything - Dynamic Memory Networks for Natural Language Processing
- » A Decomposable Attention Model for Natural Language Inference

- » CURL - Contrastive Unsupervised Representations for Reinforcement Learning
- » Hindsight Experience Replay

- » Higher-order organization of complex networks
- » Network Motifs - Simple Building Blocks of Complex Networks

- » Alpha Net--Adaptation with Composition in Classifier Space
- » On the Difficulty of Warm-Starting Neural Network Training
- » To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks
- » How transferable are features in deep neural networks
- » Two-Stage Synthesis Networks for Transfer Learning in Machine Comprehension

- » ELECTRA - Pre-training Text Encoders as Discriminators Rather Than Generators
- » ALBERT - A Lite BERT for Self-supervised Learning of Language Representations

- » CURL - Contrastive Unsupervised Representations for Reinforcement Learning
- » Meta-Learning Update Rules for Unsupervised Representation Learning
- » Diversity is All You Need - Learning Skills without a Reward Function
- » Unsupervised Learning by Predicting Noise

- » Learning to Count Objects in Natural Images for Visual Question Answering
- » Exploring Models and Data for Image Question Answering
- » Neural Module Networks
- » Simple Baseline for Visual Question Answering
- » VQA-Visual Question Answering

- » StarSpace - Embed All The Things!
- » Word Representations via Gaussian Embedding
- » Two/Too Simple Adaptations of Word2Vec for Syntax Problems