feat: add extension spec for task processing #14

qdelamea-aneo · 2026-01-20T12:38:00Z

This PR introduces TaskProcessingExt, a refined TLA+ specification that extends the existing TaskProcessing spec. The new specification provides a more detailed task lifecycle, explicitly modeling:

Task processing outcomes: success, retriable failure, and fatal failure
User actions: task cancellation, pausing, and resuming

qdelamea-aneo · 2026-01-20T12:51:30Z

specs/TaskProcessingExt.tla

+            TASK_CANCELED,
+            TASK_PAUSED
+        }]
+    /\ nextAttemptOf \in [TaskId -> TaskId \union {NULL}]


Does TLA+ support partial functions? If so, wouldn't it be better to make nextAttemptOf a partial function? This would avoid having to define a NULL constant.

Definitions

Partial Function: A function ( f: X \to Y ) is partial if for some elements ( x \in X ), ( f(x) ) is undefined. In other words, ( f ) does not necessarily have an output for every input in its domain ( X ).

Total Function: A function ( f: X \to Y ) is total if for every ( x \in X ), ( f(x) ) is defined. That is, ( f ) assigns exactly one output in ( Y ) to every input in ( X ).

qdelamea-aneo · 2026-01-20T13:11:20Z

specs/TaskProcessingExt.tla

+(**
+ * RETRYABLE TASK REGISTRATION
+ * A set of tasks 'T' that have not yet been retried are cloned by a set of
+ * tasks 'U' (each task in 'T' being associated with a single task in 'U' by
+ * the bijection 'f') which are registered to allow the re-execution of the same
+ * computations as those attempted by the tasks of 'T'.
+ *)
+\* Remove atomic double update
+RetryTasks(T, U) ==
+    LET
+        f == CHOOSE x \in Bijection(T, U) : TRUE
+    IN
+        /\ T /= {} /\ T \subseteq UnretriedTask /\ U \subseteq UnknownTask
+        /\ Cardinality(T) = Cardinality(U)
+        /\ taskState' =
+            [u \in TaskId |-> IF u \in U THEN TASK_REGISTERED ELSE taskState[u]]
+        /\ nextAttemptOf' =
+            [t \in TaskId |-> IF t \in T THEN f[t] ELSE nextAttemptOf[t]]
+        /\ UNCHANGED agentTaskAlloc


This action describes an atomic transition of the system in which:

a new task is registered in the system

this newly created task is recorded as a new attempt of a task that previously failed.

In the current implementation, these two operations are implemented by two separate requests on two separate documents (in MongoDB). Performing these two operations in a single atomic operation would require a transaction that is not necessarily supported by database systems. It therefore seems more reasonable to split this action into two actions: one for each step.

Proposed corrections:
The action must be “split” into two. There are two possible options depending on the order in which the two operations are performed. Since the RegisterTasks action already allows new tasks to be registered, this operator simply needs to be modified to implement step (2). A fairness condition must also be added to ensure that both actions always take place (if the first occurs, the second must also occur). The order determines which operator the fairness condition will apply to (RegisterTasks or RetryTasks).

A priori, there does not seem to be one order that is more relevant than another.

Whichever option is chosen, it will lead to a modification of the action constraint (ActionConstraint) used during model checking to avoid improper violations of liveness properties.

In practice, I have the feeling that RegisterTasks should be done before RetryTasks as you would need to know about the task to mark nextAttempt ?
I have the feeling that this would be required to ensure that the data can be produced.

Shouldn't the status of t be changed after this transition ?

Regarding the order you're right. It seems more natural to first register the task before updating nextAttemptOf.

For the status update it is done by another action (a FinalizeTasks action) and the fairness condition on that action guarantees that it will eventually happen.

qdelamea-aneo · 2026-01-20T13:34:20Z

specs/TaskProcessingExt.tla

+ * valid.
+ *)
+\* registered task can also be canceled
+CancelTasks(T) ==


This action presents two problems:

It prohibits the cancellation of a task in the REGISTERED state. However, this may well be requested by the user. It is therefore necessary to allow such a transition in the system.

In the current implementation, canceling a task consists of a database query to record the cancellation request; then, if the task is being processed, the agent responsible for the task is notified to interrupt its processing. The proposed modeling confuses these two steps by describing a different process depending on whether the task is assigned to an agent or not. This must therefore be changed.

Proposed correction:

Modify the enabling condition of the action.

Modify this action so that it describes the first step (record the cancellation request), then modify the ProcessTasks action so that it describes the second step (interrupting task processing when the agent receives the notification).

Note: For correction (2), two options are considered:

In the first step, all tasks are marked as CANCELING, then they continue the normal processing chain and change to CANCELED when processed by an agent.

In the first step, tasks that are not currently being processed are marked as CANCELED, while those currently being processed are marked as CANCELING and then become CANCELED by the ProcessTasks action later.

This choice is important because it will determine how cascading task cancellation in graphs will be described in future refinements.

After further consideration, it seems inappropriate to use a ‘CANCELING’ status to mark tasks for which cancellation has been requested, as this status would overwrite information about where the task is in the processing workflow. A more appropriate modeling choice would therefore be to introduce a new state variable that would indicate for each task whether cancellation has been requested.

Fixed by commit: 9ea740d9730ab64ba2ac5dfbc1c81a59843e55f9

According to the above, the cancelRequested state variable indicates for each task whether cancellation has been requested. The CancelTasks action updates this variable and changes the status of registered, paused, and some staged tasks to CANCELED. It is reasonable to assume that some staged tasks cannot be canceled immediately, as per the current implementation: a staged task is associated with a message in the queue, and cancellation is only realized when the task is processed by an agent. Hence this design choice.

Next, a new action, AcknowledgeTasksCancellation, models the effective cancellation of one or more tasks by an agent. A subtle case then arises: an assigned task whose cancellation is requested may be released, in which case it will be reacquired later and must be canceled at that time. Indeed, upon reacquisition, the agent will see the cancellation request and act accordingly. To model this behavior, the processing of a task by an agent was detailed by introducing the STARTED state, designating a task whose execution has begun. For the previously described case, the cancellation operation is therefore performed when the task is assigned but not yet started.

To be complete, the AcknowledgeTasksCancellation action can occur either:

Before a task begins, if it was assigned after the cancellation request, or
During task execution, if the cancellation was requested after the task started execution.

qdelamea-aneo · 2026-01-20T13:42:33Z

specs/TaskProcessingExt.tla

+(**
+ * TASK PAUSING
+ * A set 'T' of tasks is paused, meaning that the execution of the tasks is
+ * postponed (until they are resumed). This action only applies to staged tasks.
+ *)
+ \* assigned tasks can be paused
+PauseTasks(T) ==
+    /\ T /= {} /\ T \subseteq StagedTask
+    /\ taskState' =
+        [t \in TaskId |-> IF t \in T THEN TASK_PAUSED ELSE taskState[t]]
+    /\ UNCHANGED << agentTaskAlloc, nextAttemptOf >>


So far the ASSIGNED status remains somewhat vague. Assigning a task to an agent does not necessarily mean that it is currently being executed. In fact, the task may be in pre/post-processing, for example. If it is not possible to pause a task once it has started running, it is possible to pause a task in pre-processing by an agent. An ASSIGNED task can therefore be paused.

Proposed correction: Given that the ASSIGNED status is vague (abstract) in this specification, it is sufficient to modify the enabling condition fo the action to allow pausing. Future refinements will need to detail the conditions for pausing an assigned task.

It ultimately appears that simply modifying the enabling condition is not sufficient. Indeed, as in the case of cancellation, pausing is done in two steps: request and then acknowledgment. The modeling must therefore be based on what has been done for task cancellations. In addition, introducing the STARTED status removes the ambiguity mentioned above. A task can only be paused if its execution has not started, which is indicated by the STARTED status.

Fixed by commit: 98d5a32de696cf53c7a79516cd1866fbf4f7364c

As for cancellation, the pausingRequested state variable indicates for each task whether pausing has been requested. The PauseTasks action updates this variable and changes the status of some staged tasks to PAUSED (for the same Reason s for cancellation, not all staged tasks are paused immediatly). The reamining tasks are then paused by an agent through the AcknowledgeTasksPausing action. This action can also pause tasks for which the pausing request arrived when the agent wa already assigned to the agent but not yet started.

The combination of pause and resume actions can lead to undesirable behavior. To overcome this problem, two constraints have been added:

A canceled task cannot be paused (this seems fairly natural).

A user cannot pause/resume a task an infinite number of times (this constraint/modeling assumption is described by the Liveness operator).

qdelamea-aneo · 2026-01-20T13:45:57Z

specs/TaskProcessingExt.tla

+        \union AbortedTask
+        \union CanceledTask
+    \* Is this condition really needed
+    /\ UnretriedTask = {}


Is this conjunct really needed as the fairness conditions should prohibit behaviors where the system stutters on a state where certain tasks have not been retried.

wkirschenmann · 2026-01-20T13:57:22Z

specs/TaskProcessingExt.tla

+ *)
+TypeInv ==
+    /\ agentTaskAlloc \in [AgentId -> SUBSET TaskId]
+    /\ taskState \in [TaskId -> {


Add comment explaining the meaning of all these statuses.

These states are defined and documented in TaskStates.tla.

wkirschenmann · 2026-01-20T15:09:54Z

specs/TaskProcessingExt.tla

+    /\ nextAttemptOf = [t \in TaskId |-> NULL]
+
+(**
+ * TASK STAGING


Suggested change

* TASK STAGING

* TASK REGISTRATION

wkirschenmann · 2026-01-20T15:17:22Z

specs/TaskProcessingExt.tla

+(**
+ * RETRYABLE TASK REGISTRATION
+ * A set of tasks 'T' that have not yet been retried are cloned by a set of
+ * tasks 'U' (each task in 'T' being associated with a single task in 'U' by
+ * the bijection 'f') which are registered to allow the re-execution of the same
+ * computations as those attempted by the tasks of 'T'.
+ *)
+\* Remove atomic double update
+RetryTasks(T, U) ==
+    LET
+        f == CHOOSE x \in Bijection(T, U) : TRUE
+    IN
+        /\ T /= {} /\ T \subseteq UnretriedTask /\ U \subseteq UnknownTask
+        /\ Cardinality(T) = Cardinality(U)
+        /\ taskState' =
+            [u \in TaskId |-> IF u \in U THEN TASK_REGISTERED ELSE taskState[u]]
+        /\ nextAttemptOf' =
+            [t \in TaskId |-> IF t \in T THEN f[t] ELSE nextAttemptOf[t]]
+        /\ UNCHANGED agentTaskAlloc


In practice, I have the feeling that RegisterTasks should be done before RetryTasks as you would need to know about the task to mark nextAttempt ?
I have the feeling that this would be required to ensure that the data can be produced.

Shouldn't the status of t be changed after this transition ?

wkirschenmann · 2026-01-20T15:19:51Z

specs/TaskProcessingExt.tla

+        /\ T /= {} /\ T \subseteq UnretriedTask /\ U \subseteq UnknownTask
+        /\ Cardinality(T) = Cardinality(U)
+        /\ taskState' =
+            [u \in TaskId |-> IF u \in U THEN TASK_REGISTERED ELSE taskState[u]]


Is'nt the following missing ?
There is no v in U such as nextAttemptOf[v]=t

The link between T and U is defined by bijection f define above.

qdelamea-aneo · 2026-01-21T09:21:22Z

specs/TaskProcessingExt.tla

+(**
+ * LIVENESS
+ * A failed task is eventually retried.
+ *)
+FailedTaskEventualRetry ==
+    \A t \in TaskId:
+        t \in UnretriedTask ~>
+            nextAttemptOf[t] \in StagedTask


A task represents an execution instance of a computation. The information contained in a task makes it possible to identify the computation as well as the state of its execution. In the TaskProcessing specification, a task is abstracted by a simple ID.

The notion of retry for a task means that a computation that has failed must be able to be relaunched if necessary (for example, if the data produced by the computation has not been produced elsewhere). The variable nextAttemptOf materializes a link between two tasks. However, it is not this link itself that is important, but rather the fact that the second task is a new instance of the same computation, which can be relaunched following the failure of the first task.

In practice, this link does exist in the current implementation (a field is dedicated to it in task documents in the database), but it is used for observability purposes (monitoring and debugging).

In summary, the underlying concept illustrated by nextAttemptOf is the idea that a second instance of a failed computation must be created and become ready to be executed (that is, be in the STAGED state).

Proposed Correction:
To clarify the above, we can replace the variable nextAttemptOf with taskComputation, which indicates, for each task, the computation of which it is an instance (modeled by an abstract ID). The previous property would then become:

FailedTaskEventualRetry == \A t \in TaskId: t \in FailedTask ~> \E u \in TaskId : u \in StagedTask /\ taskComputation[u] = taskComputation[t]

This new version of the property better expresses what is expected from the system, without making assumptions about how it is implemented.

qdelamea-aneo · 2026-01-21T09:26:52Z

specs/TaskProcessingExt.tla

+ * valid.
+ *)
+\* registered task can also be canceled
+CancelTasks(T) ==


After further consideration, it seems inappropriate to use a ‘CANCELING’ status to mark tasks for which cancellation has been requested, as this status would overwrite information about where the task is in the processing workflow. A more appropriate modeling choice would therefore be to introduce a new state variable that would indicate for each task whether cancellation has been requested.

lemaitre-aneo · 2026-01-22T15:24:34Z

specs/TaskProcessingExt.tla

+            \/ AssignTasks(a, T)
+            \/ StartTasks(a, T)


Those transitions (Assign and Start) should be disabled if cancellation or pause is requested.
Instead, cancellation acknowledgment should be a transition from Assigned or Staged when the cancellation is requested.

lemaitre-aneo · 2026-01-22T15:25:28Z

specs/TaskProcessingExt.tla

+            \/ AcknowledgeTasksCancellation(a, T)
+            \/ AcknowledgeTasksPausing(a, T)


Instead of CancelTask -> AcknowledgeCancellation, it seems to be more consistent with your state name to have RequestTaskCancellation -> CancelTask

lemaitre-aneo · 2026-01-22T15:37:43Z

specs/TaskProcessingExt.tla

+    /\ \E ExemptStagedTasks \in SUBSET (T \intersect StagedTask):
+        taskState' =
+            [t \in TaskId |-> IF t \in T /\ t \notin (AssignedTask \union ExemptStagedTasks)
+                                THEN TASK_CANCELED
+                                ELSE taskState[t]]


Not needed. We do not need to force atomicity between the request, and the state change.

lemaitre-aneo · 2026-01-22T15:38:30Z

specs/TaskProcessingExt.tla

+    /\ cancelRequested \in [TaskId -> BOOLEAN]
+    /\ pausingRequested \in [TaskId -> BOOLEAN]


Those variables could be sets instead of functions.

lemaitre-aneo · 2026-01-22T15:41:26Z

specs/TaskProcessingExt.tla

+ *)
+CancelTasks(T) ==
+    /\ T /= {}
+    /\ T \subseteq UNION {RegisteredTask, StagedTask, PausedTask, AssignedTask}


Remove this constraint and deal with other status when the cancellation is actually performed.

qdelamea-aneo commented Jan 20, 2026

View reviewed changes

wkirschenmann reviewed Jan 20, 2026

View reviewed changes

qdelamea-aneo commented Jan 21, 2026

View reviewed changes

qdelamea-aneo force-pushed the qd/task-ext branch from 485bbac to 9c684d6 Compare January 21, 2026 16:44

fix: update the modeling of the task cancellation process

9ea740d

qdelamea-aneo force-pushed the qd/task-ext branch from 9c684d6 to 9ea740d Compare January 21, 2026 22:11

fix: update the modeling of the task pausing process

98d5a32

lemaitre-aneo reviewed Jan 22, 2026

View reviewed changes

qdelamea-aneo force-pushed the qd/task-ext branch from 64f428d to 0409700 Compare January 24, 2026 17:38

fix: simplifies the modeling of task cancellation and pause processes.

ea9b0fd

qdelamea-aneo force-pushed the qd/task-ext branch from 0409700 to ea9b0fd Compare January 24, 2026 17:40

[wip] fix: retry modeling

22fd8ed

		\/ AcknowledgeTasksCancellation(a, T)
		\/ AcknowledgeTasksPausing(a, T)

		/\ cancelRequested \in [TaskId -> BOOLEAN]
		/\ pausingRequested \in [TaskId -> BOOLEAN]

feat: add extension spec for task processing #14

Are you sure you want to change the base?

feat: add extension spec for task processing #14

Uh oh!

Conversation

qdelamea-aneo commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qdelamea-aneo Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

qdelamea-aneo commented Jan 20, 2026 •

edited

Loading

qdelamea-aneo Jan 21, 2026 •

edited

Loading