Shengxiang Xu (徐圣翔)*
,
Jiayi Zhang (张佳钇)*
,
Shimin Di (邸世民)✉
,
Yuyu Luo (骆昱宇)✉
,
Liang Yao (姚亮)
,
Hanmo Liu (刘翰墨)
,
Jia Zhu (朱佳)
,
Fan Liu (刘凡)
,
Min-Ling Zhang (张敏灵)
,
* Equal Contribution ✉ Corresponding Author
If you encounter any difficulties in using or reproducing the code, please get in touch with me directly (Email: xushx@seu.edu.cn, Wechat: 13270628738).
Welcome to the official repository of our paper "RobustFlow: Towards Robust Agentic Workflow Generation"!
The automated generation of agentic workflows is a promising frontier for enabling large language models (LLMs) to solve complex tasks. However, our investigation reveals that the robustness of agentic workflow remains a critical, unaddressed challenge. Current methods often generate wildly inconsistent workflows when provided with instructions that are semantically identical but differently phrased. This brittleness severely undermines their reliability and trustworthiness for real-world applications.
To quantitatively diagnose this instability, we propose metrics based on nodal and topological similarity to evaluate workflow consistency against common semantic variations such as paraphrasing and noise injection.
Subsequently, we further propose a novel training framework, RobustFlow, that leverages preference optimization to teach models invariance to instruction variations.
By training on sets of synonymous task descriptions, RobustFlow boosts workflow robustness scores to 70% - 90%, which is a substantial improvement over existing approaches.
-
Setup the Python environment:
conda create -n robustflow python=3.9 pip install -r requirements.txt
-
Data Preparation
You can download our prepared datasets or reproduce them locally.
-
Place the official original file in the dataset folder (example:
noise_dataset/DROP/drop_original.jsonl). -
Run the rewrite script in that folder:
cd noise_dataset/DROP/ python rewrite_drop.pyThis generates:
drop_paraphrasing.jsonldrop_requirements.jsonldrop_light_noise.jsonldrop_moderate_noise.jsonldrop_heavy_noise.jsonl
If you want to analyze the dataset, you can refer to the examples under
noise_dataset/Distribution/and follow the steps below:cd noise_dataset/Distribution bash extract.shThis will generate dataset embeddings in the
embedding/directory. To Analyze and visualize, you can either write your own script or use the provided ones:python analyze.py python draw.py
These scripts compute statistics and clustering results from the embeddings, and generate distribution visualizations in the
visual/directory. -
-
Baseline Evaluation
Clone the official repositories of AFlow, ScoreFlow and Flow into
AFlow/,Scoreflow/andFlow/, and run each project strictly following its README to reproduce the baseline results as-is.-
AFlow Evaluation
cd evaluate/ bash infer_aflow.sh python aflow_scripts/find.py python eval_aflow.py cat aflow_score.txt -
ScoreFlow Evaluation
-
Flow Evaluation
-
-
Additional case studies are available in
samples/for qualitative analysis.
If you use RobustFlow in your research, please cite our paper:
@article{xu2025robustflow,
title={RobustFlow: Towards Robust Agentic Workflow Generation},
author={Xu, Shengxiang and Zhang, Jiayi and Di, Shimin and Luo, Yuyu and Yao, Liang and Liu, Hanmo and Zhu, Jia and Liu, Fan and Zhang, Min-Ling},
journal={arXiv preprint arXiv:2509.21834},
year={2025}
}


