Towards a Unified Theory of State Abstraction for MDPs
26 Dec 2019Introduction

The paper studies five different techniques for stat abstraction in MDPs (Markov Decision Processes) and evaluates their usefulness for planning and learning.

The general idea behind abstraction is to map the actual (or observed) state to an abstract state that should be more amenable for learning.

It can be thought of as a mapping from one representation to another representation while preserving some useful properties.
General Definition

Consider a MDP where is the finite set of states, is finite set of actions, is the transition function, is the bounded reward function and is the discount factor.

The abstract version of the MDP is where is the finite set if abstract states, is the transition function in the abstract state space and is the bounded reward function in the abstract reward space.

Abstraction function is a function that maps a given state to its abstract counterpart .

The inverse image is the set of ground states that map to the under the abstraction function .

A wieghing functioon is used to measure how much does a state contribute to the abstract state .
Topology of Abstraction Space

Given two abstraction functions and , is said to be finer than iff for any states if then .

This finer relation is reflex, antisymmetric, transitive and partially ordered.
Five Types of Abstraction

While many abstractions are possible, not all abstractions are equally important.

Modelirrelevance abstraction :

If two states $s_{1}$ and $s_{2}$ have the same abstracted state, then their onestep model is preserved.

Consider any action and any abstract state , if then and .


irrelevance abstraction:

It preserves the stateaction value finction for all the states.

implies .


irrelevance abstraction:
 It preserves the optimal stateaction value function.

irrelevance abstraction:
 It preserves the optimal action and its value function.

irrelevance abstraction:
 It preserves the optimal action.

In terms of fineness, . Here is the identity mapping ie

If a property applies to any abstraction, it also applies to all the finer abstractions.
Key Theorems

As we go from finer to coarser abstractions, the information loss increases (ie fewer components can be recovered) while the statespace reduces (ie the efficiency of solving the problem increases). This leads to a tradeoff when selecting abstractions.

For example, with abstractions , the optimal abstract policy is optimal in the ground MDP.

Similarly, if each stateaction pair is visited infinitely often and the stepsize decays properly, Qlearning with converges to the optimal stateaction value functions in the MDP. More conditions are needed for convergence in the case of the remaining two abstractions.

For , the model built with the experience converges to the true abstract model with infinite experience if the weighing function is fixed.