Flow re-runs getting progressively worse with number of re-runs #5946

fdrab · 2023-03-28T08:52:47Z

fdrab
Mar 28, 2023

Hello all,

So, a strange thing is happening that I can't pinpoint at the moment (and as it doesn't affect me that much at the moment I'll probably not have the time to look into it more) and wanted to ask around if anyone experienced the same behaviour and whether this is considered normal.

I have a flow that consists of several smaller flows (that consist of smaller flows and python ops) that attempts to connect to a server via a jump station using custom python code. If I execute this flow from action menu, the flow takes ~30 seconds to fail (intentionally). However, if I, during the development and testing process, use the rerun button to re-run the flow with the same variables, the execution times get progressively worse, until these execution times become a pain (a single noop that just checks evaluates which account to use based on a simple regex in YAQL takes upwards of 5 seconds to execute). Execution of this flow using re-running a previous run results in an execution that previously took ~30 seconds to fail to a run that now takes 238 seconds to fail. This is then significantly worse if I re-run the same flow multiple times at once (500+ seconds to fail). The flow inputs are exactly the same each time and this is what the utilization looks like most of the time the flow is running (st2rulesengine and st2workflowengine near 100% while st2actionrunner processes are chilling):

PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND

183772 st2 20 0 245612 104660 14860 R 100.0 0.3 1:11.82 st2rulesengine
183784 st2 20 0 272640 121260 15472 R 100.0 0.4 2:20.14 st2workflowengi
1504 root 20 0 735196 90732 88060 S 50.0 0.3 0:45.66 rsyslogd
194819 root 20 0 65876 5100 4092 R 16.7 0.0 0:00.06 top
183790 st2 20 0 233848 82384 15288 S 5.6 0.3 1:47.97 st2scheduler

This is running on a RHEL8 standalone installation on a vmware VM with 16vCPUs and 32G of RAM. I've enabled debug mode and was trying to look into things in more detail, but I see absolutely no reason the logs for these delays. I've monitored the rabbitmq queues and didn't see significant values in there, hovering near 0. I've monitored db.live_action_d_b.find({"status":"running"}).count() and the values while the flow was executing were around 10 to 12. I haven't seen significant I/O ops while looking at iostat.

I would expect the rerun feature to just create new executions of previously run executions with the same inputs / parameters, but I thing something else is going on that makes the reruns worse with time.

I've tried to increase the amount of workers and increase the green threads for both workflows and executions, to no observable effect.

Anyone experienced anything like this?

When I create a new execution with the same inputs as the re-runs, the flow fails in 30 seconds.

Answered by guzzijones

Jul 6, 2023

I suspect the entire original action_execution_db is being reused and not just the inputs. Likely your action writes a lot of information to the context variables and outputs.

What would help here is a short simplified example.

Also switching to zstandard and removing the embedded liveaction should help.

View full answer

guzzijones · 2023-07-06T02:26:16Z

guzzijones
Jul 6, 2023
Maintainer

I suspect the entire original action_execution_db is being reused and not just the inputs. Likely your action writes a lot of information to the context variables and outputs.

What would help here is a short simplified example.

Also switching to zstandard and removing the embedded liveaction should help.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flow re-runs getting progressively worse with number of re-runs #5946

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Flow re-runs getting progressively worse with number of re-runs #5946

fdrab Mar 28, 2023

Replies: 1 comment

guzzijones Jul 6, 2023 Maintainer

fdrab
Mar 28, 2023

guzzijones
Jul 6, 2023
Maintainer