Project information

Regular Decision Processes (RDPs) have been recently introduced as a non-Markovian extension of MDPs that does not require knowing or hypothesizing a hidden state. An RDP is a fully observable, non-Markovian model in which the next state and reward are a stochastic function of the entire history of the system. However, this dependence on the past is restricted to regular functions. That is the next state distribution and reward depend on which regular expression the history satisfies. An RDP can be transformed into an MDP by extending the state of the RDP with variables that track the satisfaction of the regular expression governing the RDP dynamics. Thus, essentially, to learn an RDP, we need to learn these regular expressions.