Zero-shot Learning by Generating Task-specific Adapters

01 Feb 2021

Introduction

The paper introduces HYPTER - a framework for zero-shot learning (ZSL) in text-to-text transformer models by training a HyperNetwork to generate task-specific adapters from task descriptions.
The focus is on in-task zero-shot learning (e.g., learning to predict an unseen class or relation) and not on cross-task learning (e.g., training on sentiment analysis and evaluating on question-answering task).
Link to the paper

Task - a NLP task, like classification or question answering.
Sub-task
- A class/relation/question within a task.
- Denotes by a tuple $(d, D)$ where $d$ is the language description while $D$ represents the subtask’s dataset.

Develop ZSL approach for transfer to new subtasks within a task, using the task description available for each subtask.

HYPTER has two main parts:
- Main network
  - A pretrained text-to-text network
  - Instantiated as a BERT-Base/Large
- HyperNetwork
  - Generates the weights for adapter networks that will be plugged into the main network.
HyperNetwork has two parts:
- Encoder
  - Encodes the task description
  - Instantiated as a RoBERTa-Base model
- Decoder
  - Decodes the encoding into weights for multiple adapters (in parallel)
  - Instantiated as a Feedforward Network
The model trains in two phases:
- Main network is trained on all the data by concatenating the task description with the input.
- Adapters are trained by sampling a task from the train set while keeping the main network frozen.

While the idea is very promising and interesting, the evaluation felt quite limited. It uses just two datasets Zero-shot learning from Task Descriptions and Zero-shot Relation Extraction and shows some improvements over the baseline of directly finetuning with task descriptions as the prompt.