Papers I Read Notes and Summaries

Emotional Chatting Machine - Emotional Conversation Generation with Internal and External Memory

  • The paper proposes ECM (Emotional Chatting Machine) which can generate both semantically and emotionally appropriate responses in a dialogue setting.

  • More specifically, given an input utterance or dialogue and the desired emotional category of the response, ECM is to generate an appropriate response that conforms to the given emotional category.

  • Link to the paper

  • Much of the recent, deep learning based work on conversational agents has focused on the use of encoder-decoder framework where the input utterance (given sequence of words) is mapped to a response utterance (target sequence of words). This is the so-called seq2seq family of models.

  • ECM model can sit within this framework and introduces 3 new components:

    • Emotion Category Embedding
      • Embed the emotion categories into a real-valued, low-dimensional vector space.
      • These embeddings are used as input to the decoder and are learnt along with rest of the model.
    • Internal Memory
      • Physiological, emotional responses are relatively short-lived and involve changes.
      • ECM accounts for this effect by adding an Internal Memory which captures this dynamics of emotions during decoding.
      • It starts with “full” emotions in the beginning and keeps decaying the emotion value over time.
      • How much of the emotion value is to be decayed is determined by a sigmoid gate.
      • By the time the sentence is decoded, the value becomes zero, signifying that the emotion has been completely expressed.
    • External Memory
      • Emotional responses are expected to carry emotionally strong words along with generic, neutral words.
      • An external memory is used to include the emotionally strong words explicitly by using 2 non-overlapping vocabularies - generic vocabulary and the emotion vocabulary (read from the external memory).
      • Both these vocabularies are assigned different generation probabilities and an output gate controls the weights of generic and emotion words.
      • This way the emotion words are included in an otherwise neutral response.
  • Loss function

    • The first component is the cross-entropy loss between predicted and target token distribution.
    • A regularization term on internal memory to make sure the emotional state decays to 0 at the end of the decoding process.
    • Another regularization term on external memory to supervise the probability of selection of a generic vs emotion word.
  • *Dataset
    • STC Dataset (~220K posts and ~4300K responses) annotated by the emotional classifier. Any error on the part of the classifier degrades the quality of the training dataset.
    • NLPCC Dataset - Emotion classification dataset with 23105 sentences.
  • Metric

    • Perplexity to evaluate the model at the content level.
    • Emotion accuracy to evaluate the model at the emotional level.
  • ECM achieves a perplexity of 65.9 and emotional accuracy of 0.773.

  • Based on human evaluations, ECM statistically outperforms the seq2seq baselines on both naturalness (likeliness of response being generated by a human) and emotion accuracy.

  • Notes

    • It is an interesting idea to let the sigmoid gate decide how the emotion “value” be spent while decoding. It seems similar to the idea of how much do we want to “attend” to the emotion value the key difference being that your total attention is limited. It would be interesting to see the shape of the distribution of how much of the emotion value is spent at each decoding time step. If the curve is highly biased towards say using most of the emotion value towards the end of the decoding process, maybe another regularisation term is needed to ensure a more balanced distribution of how the emotion is spent.