Papers I Read Notes and Summaries

Towards a Unified Theory of State Abstraction for MDPs

Introduction

  • The paper studies five different techniques for stat abstraction in MDPs (Markov Decision Processes) and evaluates their usefulness for planning and learning.

  • The general idea behind abstraction is to map the actual (or observed) state to an abstract state that should be more amenable for learning.

  • It can be thought of as a mapping from one representation to another representation while preserving some useful properties.

  • Link to the paper

General Definition

  • Consider a MDP where is the finite set of states, is finite set of actions, is the transition function, is the bounded reward function and is the discount factor.

  • The abstract version of the MDP is where is the finite set if abstract states, is the transition function in the abstract state space and is the bounded reward function in the abstract reward space.

  • Abstraction function is a function that maps a given state to its abstract counterpart .

  • The inverse image is the set of ground states that map to the under the abstraction function .

  • A wieghing functioon is used to measure how much does a state contribute to the abstract state .

Topology of Abstraction Space

  • Given two abstraction functions and , is said to be finer than iff for any states if then .

  • This finer relation is reflex, antisymmetric, transitive and partially ordered.

Five Types of Abstraction

  • While many abstractions are possible, not all abstractions are equally important.

  • Model-irrelevance abstraction :

    • If two states $s_{1}$ and $s_{2}$ have the same abstracted state, then their one-step model is preserved.

    • Consider any action and any abstract state , if then and .

  • -irrelevance abstraction:

    • It preserves the state-action value finction for all the states.

    • implies .

  • -irrelevance abstraction:

    • It preserves the optimal state-action value function.
  • -irrelevance abstraction:

    • It preserves the optimal action and its value function.
  • -irrelevance abstraction:

    • It preserves the optimal action.
  • In terms of fineness, . Here is the identity mapping ie

  • If a property applies to any abstraction, it also applies to all the finer abstractions.

Key Theorems

  • As we go from finer to coarser abstractions, the information loss increases (ie fewer components can be recovered) while the state-space reduces (ie the efficiency of solving the problem increases). This leads to a tradeoff when selecting abstractions.

  • For example, with abstractions , the optimal abstract policy is optimal in the ground MDP.

  • Similarly, if each state-action pair is visited infinitely often and the step-size decays properly, Q-learning with converges to the optimal state-action value functions in the MDP. More conditions are needed for convergence in the case of the remaining two abstractions.

  • For , the model built with the experience converges to the true abstract model with infinite experience if the weighing function is fixed.