Towards a Unified Theory of State Abstraction for MDPs
26 Dec 2019Introduction

The paper studies five different techniques for stat abstraction in MDPs (Markov Decision Processes) and evaluates their usefulness for planning and learning.

The general idea behind abstraction is to map the actual (or observed) state to an abstract state that should be more amenable for learning.

It can be thought of as a mapping from one representation to another representation while preserving some useful properties.
General Definition

Consider a MDP \(M = <S, A, P, R, \gamma>\) where \(S\) is the finite set of states, \(A\) is finite set of actions, \(P\) is the transition function, \(R\) is the bounded reward function and \(\gamma\) is the discount factor.

The abstract version of the MDP is \(\widetilde{M} = <\widetilde{S}, A, \widetilde{P}, \widetilde{R}, \gamma>\) where \(\widetilde{S}\) is the finite set if abstract states, \(\widetilde{P}\) is the transition function in the abstract state space and \(\widetilde{R}\) is the bounded reward function in the abstract reward space.

Abstraction function \(\phi\) is a function that maps a given state \(s\) to its abstract counterpart \(\widetilde{s}\).

The inverse image \(\phi^{1}(\widetilde{s})\) is the set of ground states that map to the \(\widetilde{s}\) under the abstraction function \(\phi\).

A wieghing functioon \(w(s)\) is used to measure how much does a state \(s\) contribute to the abstract state \(\phi(s)\).
Topology of Abstraction Space

Given two abstraction functions \(\phi_{1}\) and \(\phi_{2}\), \(\phi_{1}\) is said to be finer than \(\phi_{2}\) iff for any states \(s_{1}, s_{2}\) if \(\phi_{1}(s_{1}) = \phi_{1}(s_{2})\) then \(\phi_{2}(s_{1}) = \phi_{2}(s_{2})\).

This finer relation is reflex, antisymmetric, transitive and partially ordered.
Five Types of Abstraction

While many abstractions are possible, not all abstractions are equally important.

Modelirrelevance abstraction \(\phi_{model}\):

If two states $s_{1}$ and $s_{2}$ have the same abstracted state, then their onestep model is preserved.

Consider any action \(a\) and any abstract state \(\widetilde{s}\), if \(\phi_{model}(s_{1} = \phi_{model}(s_{2})\) then \(R(s_1, a) = R(s_2, a)\) and \(\sum_{s' \in \phi_{model}^{1}\widetilde(s)}P_{s_1, s'}^{a} = \sum_{s' \in \phi_{model}^{1}\widetilde(s)}P_{s_2, s'}^{a}\).


\(Q^{\pi}\)irrelevance abstraction:

It preserves the stateaction value finction for all the states.

\(\phi_{Q^{\pi}}(s_1) = \phi_{Q^{\pi}}(s_2)\) implies \(Q^{\pi}(s_1, a) = Q^{\pi}(s_1, a)\).


\(Q^{*}\)irrelevance abstraction:
 It preserves the optimal stateaction value function.

\(a^{*}\)irrelevance abstraction:
 It preserves the optimal action and its value function.

\(\phi_{\pi^{*}}\)irrelevance abstraction:
 It preserves the optimal action.

In terms of fineness, \(\phi_0 \geq \phi_{model} \geq \phi_{Q^{\pi}} \geq \phi_{Q^*} \geq \phi_{a^*} \geq \phi_{\pi^*}\). Here \(\phi_0\) is the identity mapping ie \(\phi_0(s) = s\)

If a property applies to any abstraction, it also applies to all the finer abstractions.
Key Theorems

As we go from finer to coarser abstractions, the information loss increases (ie fewer components can be recovered) while the statespace reduces (ie the efficiency of solving the problem increases). This leads to a tradeoff when selecting abstractions.

For example, with abstractions \(\phi_{model}, \phi_{Q^{\pi}}, \phi_{Q^*}, \phi_{a^*}\), the optimal abstract policy \(\widetilde(\pi)^*\) is optimal in the ground MDP.

Similarly, if each stateaction pair is visited infinitely often and the stepsize decays properly, Qlearning with \(\phi_{model}, \phi_{Q^{\pi}}, \phi_{Q^*}\) converges to the optimal stateaction value functions in the MDP. More conditions are needed for convergence in the case of the remaining two abstractions.

For \(\phi_{model}, \phi_{Q^{\pi}}, \phi_{Q^*}, \phi_{a^*}\), the model built with the experience converges to the true abstract model with infinite experience if the weighing function \(w(s)\) is fixed.