Flow re-runs getting progressively worse with number of re-runs #5946
-
Hello all, So, a strange thing is happening that I can't pinpoint at the moment (and as it doesn't affect me that much at the moment I'll probably not have the time to look into it more) and wanted to ask around if anyone experienced the same behaviour and whether this is considered normal. I have a flow that consists of several smaller flows (that consist of smaller flows and python ops) that attempts to connect to a server via a jump station using custom python code. If I execute this flow from action menu, the flow takes ~30 seconds to fail (intentionally). However, if I, during the development and testing process, use the rerun button to re-run the flow with the same variables, the execution times get progressively worse, until these execution times become a pain (a single noop that just checks evaluates which account to use based on a simple regex in YAQL takes upwards of 5 seconds to execute). Execution of this flow using re-running a previous run results in an execution that previously took ~30 seconds to fail to a run that now takes 238 seconds to fail. This is then significantly worse if I re-run the same flow multiple times at once (500+ seconds to fail). The flow inputs are exactly the same each time and this is what the utilization looks like most of the time the flow is running (st2rulesengine and st2workflowengine near 100% while st2actionrunner processes are chilling):
183772 st2 20 0 245612 104660 14860 R 100.0 0.3 1:11.82 st2rulesengine This is running on a RHEL8 standalone installation on a vmware VM with 16vCPUs and 32G of RAM. I've enabled debug mode and was trying to look into things in more detail, but I see absolutely no reason the logs for these delays. I've monitored the rabbitmq queues and didn't see significant values in there, hovering near 0. I've monitored db.live_action_d_b.find({"status":"running"}).count() and the values while the flow was executing were around 10 to 12. I haven't seen significant I/O ops while looking at iostat. I would expect the rerun feature to just create new executions of previously run executions with the same inputs / parameters, but I thing something else is going on that makes the reruns worse with time. I've tried to increase the amount of workers and increase the green threads for both workflows and executions, to no observable effect. Anyone experienced anything like this? When I create a new execution with the same inputs as the re-runs, the flow fails in 30 seconds. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
I suspect the entire original action_execution_db is being reused and not just the inputs. Likely your action writes a lot of information to the context variables and outputs. What would help here is a short simplified example. Also switching to zstandard and removing the embedded liveaction should help. |
Beta Was this translation helpful? Give feedback.
I suspect the entire original action_execution_db is being reused and not just the inputs. Likely your action writes a lot of information to the context variables and outputs.
What would help here is a short simplified example.
Also switching to zstandard and removing the embedded liveaction should help.