Skip to content

Implementation of the code in the paper "Offline RL in Regular Decision Processes: Sample Efficiency via Language Metrics"

License

Notifications You must be signed in to change notification settings

aig-upf/adact-l

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ADACT-L

This repository contains the implementation of this paper.

Setup:

#install requirements
pip install -r requirements.txt

Usage:

Tmaze:

You can plot the time (in seconds) for increasing the T-maze corridor length using the following script:

#implemntation of script found in tmaze.py
python tmaze.py [l_min] [l_max] -l

where l_min nad l_max are the minimum and maximum lengths of the corridor of the T-maze needed to be plotted. The last parameter can be set to -n or -l to display a normal or logarithmic graph, respectively. result result

Other environments:

#implemntation of script found in run.py
python run.py [env name] H K threshold

where H, K, threshold are the horizon length, number of episodes, and threshold for the TestDistinct() function to return False, respectively. The code prints the path of the final RDP .json file.

Offline RL in Regular Decision Processes: Sample Efficiency via Language Metrics

Ahana Deb, Roberto Cipollone, Anders Jonsson, Alessandro Ronca, Mohammad Sadegh Talebi

Abstract

This work studies offline Reinforcement Learning (RL) in a class of non-Markovian environments called Regular Decision Processes (RDPs). In RDPs, the unknown dependency of future observations and rewards from the past interactions can be captured by some hidden finite-state automaton. For this reason, many RDP algorithms first reconstruct this unknown dependency using automata learning techniques. In this paper, we consider episodic RDPs and show that it is possible to overcome the limitations of existing offline RL algorithms for RDPs via the introduction of two original techniques: a novel metric grounded in formal language theory and an approach based on Count-Min-Sketch (CMS). Owing to the novel language metric, our algorithm is proven to be more sample efficient than existing results, and in some problem instances admitting low complexity languages, the gain is showcased to be exponential in the episode length. The CMS-based approach removes the need for naïve counting and alleviates the memory requirements for long planning horizons. We derive Probably Approximately Correct (PAC) sample complexity bounds associated to each of these techniques, and validate the approach experimentally.

Cite as:

@inproceedings{
deb2025offline,
title={Offline {RL} in Regular Decision Processes: Sample Efficiency via Language Metrics},
author={Ahana Deb and Roberto Cipollone and Anders Jonsson and Alessandro Ronca and Mohammad Sadegh Talebi},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=EW6bNEqalF}
}
}

About

Implementation of the code in the paper "Offline RL in Regular Decision Processes: Sample Efficiency via Language Metrics"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages