ADACT-L

This repository contains the implementation of this paper.

Setup:

#install requirements
pip install -r requirements.txt

Usage:

Tmaze:

You can plot the time (in seconds) for increasing the T-maze corridor length using the following script:

#implemntation of script found in tmaze.py
python tmaze.py [l_min] [l_max] -l

where l_min nad l_max are the minimum and maximum lengths of the corridor of the T-maze needed to be plotted. The last parameter can be set to -n or -l to display a normal or logarithmic graph, respectively.

Other environments:

#implemntation of script found in run.py
python run.py [env name] H K threshold

where H, K, threshold are the horizon length, number of episodes, and threshold for the TestDistinct() function to return False, respectively. The code prints the path of the final RDP .json file.

Offline RL in Regular Decision Processes: Sample Efficiency via Language Metrics

Ahana Deb, Roberto Cipollone, Anders Jonsson, Alessandro Ronca, Mohammad Sadegh Talebi

Abstract

This work studies offline Reinforcement Learning (RL) in a class of non-Markovian environments called Regular Decision Processes (RDPs). In RDPs, the unknown dependency of future observations and rewards from the past interactions can be captured by some hidden finite-state automaton. For this reason, many RDP algorithms first reconstruct this unknown dependency using automata learning techniques. In this paper, we consider episodic RDPs and show that it is possible to overcome the limitations of existing offline RL algorithms for RDPs via the introduction of two original techniques: a novel metric grounded in formal language theory and an approach based on Count-Min-Sketch (CMS). Owing to the novel language metric, our algorithm is proven to be more sample efficient than existing results, and in some problem instances admitting low complexity languages, the gain is showcased to be exponential in the episode length. The CMS-based approach removes the need for naïve counting and alleviates the memory requirements for long planning horizons. We derive Probably Approximately Correct (PAC) sample complexity bounds associated to each of these techniques, and validate the approach experimentally.

Cite as:

@inproceedings{
deb2025offline,
title={Offline {RL} in Regular Decision Processes: Sample Efficiency via Language Metrics},
author={Ahana Deb and Roberto Cipollone and Anders Jonsson and Alessandro Ronca and Mohammad Sadegh Talebi},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=EW6bNEqalF}
}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.idea		.idea
PyPOMDP		PyPOMDP
__pycache__		__pycache__
env		env
graphs		graphs
json		json
scripts		scripts
src		src
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
run.py		run.py
simul.py		simul.py
tmaze.py		tmaze.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ADACT-L

Offline RL in Regular Decision Processes: Sample Efficiency via Language Metrics

About

Releases

Packages

Languages

License

aig-upf/adact-l

Folders and files

Latest commit

History

Repository files navigation

ADACT-L

Offline RL in Regular Decision Processes: Sample Efficiency via Language Metrics

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages