Project information
Regular Decision Processes (RDPs) have been recently introduced as a non-Markovian extension of MDPs that
does not require knowing or hypothesizing a hidden state. An RDP is a fully observable, non-Markovian
model in which the next state and reward are a stochastic function of the entire history of the system.
However, this dependence on the past is restricted to regular functions. That is the next state
distribution and reward depend on which regular expression the history satisfies. An RDP can be
transformed into an MDP by extending the state of the RDP with variables that track the satisfaction of
the regular expression governing the RDP dynamics. Thus, essentially, to learn an RDP, we need to learn
these regular expressions.