Skip to content

Conversation

lina-temporal
Copy link
Contributor

What changed?

  • Adds CHASM task executors for the outbound queue.

How did you test it?

  • built
  • run locally and tested manually
  • covered by existing tests
  • added new unit test(s)
  • added new functional test(s)

@lina-temporal lina-temporal requested a review from yycptt June 24, 2025 01:32
@lina-temporal lina-temporal requested a review from a team as a code owner June 24, 2025 01:32
ctx context.Context,
task *tasks.ChasmTask,
) error {
ctx, cancel := context.WithTimeout(ctx, taskTimeout)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we probably need to use a higher timeout, nexus will eventually use chasm too. Maybe let's do 10s here? Outbound queue has a different worker pool implementation so higher timeout should be fine.

We can change that later as well. no strong opinion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, updated to 10s.

@lina-temporal lina-temporal enabled auto-merge (squash) July 1, 2025 20:30
@lina-temporal lina-temporal merged commit 793dd6e into main Jul 1, 2025
53 checks passed
@lina-temporal lina-temporal deleted the chasm_outbound branch July 1, 2025 20:50
yycptt added a commit that referenced this pull request Aug 21, 2025
## What changed?
- Revert history task timeout from 10s to 3s for non-outbound tasks.

## Why?
- The original change was made due to a misunderstanding of my comment
[here](#7951 (comment)).
I was meant to suggest only use 10s as the timeout for outbound chasm tasks.
But for other tasks, like transfer/timer, it should still use 3s as the
timeout.

## How did you test it?
- [x] built
- [ ] run locally and tested manually
- [x] covered by existing tests
- [ ] added new unit test(s)
- [ ] added new functional test(s)
lina-temporal added a commit that referenced this pull request Aug 28, 2025
commit 9f30a1d
Author: Lina Jodoin <[email protected]>
Date:   Wed Aug 27 16:08:12 2025 -0700

    [Scheduled Actions] Use the Execution returned from PollMutableState when calling GetWorkflowExecutionHistory (#8207)

    ## What changed?
    - In WatchWorkflow, we'll now use the Execution returned as a result
    from `PollMutableState`, instead of the Execution we used as part of the
    `PollMutableState` request.

    ## Why?
    - We have a likely race condition if a workflow starts and completes
    during the `PollMutableState` call, where our originally-requested
    Execution is no longer the latest.

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

commit 8b717ec
Author: Roman Dmytrenko <[email protected]>
Date:   Wed Aug 27 19:34:05 2025 +0000

    chore(deps): upgrade go from 1.24.5 to 1.25.0 (#8209)

    ## What changed?

    Upgrade go to the 1.25.0

    ## How did you test it?

    - [x] built
    - [x] run locally and tested manually

    ~Blocked by #8174~

    ---------

    Signed-off-by: Roman Dmytrenko <[email protected]>
    Co-authored-by: Stephan Behnke <[email protected]>
    Co-authored-by: Stephan Behnke <[email protected]>

commit 3f18d90
Author: Stephan Behnke <[email protected]>
Date:   Tue Aug 26 18:28:26 2025 -0700

    GetWorkflowExecutionHistory long poll soft timeout (#8238)

    ## What changed?

    Added a "soft timeout" (language used by Workflow Update) to
    `GetWorkflowExecutionHistory` long polls.

    ## Why?

    We don't want to terminate the long poll connection but instead keep it
    alive by sending a response back just before the timeout. The idea is
    that this will prevent connections from opening/terminating repeatedly
    (ie connection churn).

    ## How did you test it?
    - [ ] built
    - [x] run locally and tested manually
    - [x] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks

    I was only able to verify this manually by forcing a timeout in the
    server and verifying that instead of a deadline exceeded I saw a result.
    I'll assume this will work since the existing code already tried doing
    just exactly that, but it didn't do it well.

commit 64884b1
Author: Sean Kane <[email protected]>
Date:   Tue Aug 26 14:26:02 2025 -0600

    fix: handle nil ptr in legacy batch processing (#8244)

    ## What changed?
    `BatchWorkflow` is not yet deprecated, fix was not properly applied on
    previous PR

    ## Why?
    nilptr exception

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [X] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks
    na

commit 96c3aae
Author: Kent Gruber <[email protected]>
Date:   Tue Aug 26 13:23:40 2025 -0400

    Use better string splitting techniques where possible (#8226)

    ## What changed?

    This PR aims to avoid usage of
    [`strings.Split`](https://pkg.go.dev/strings#Split) where possible in
    favor of better string splitting techniques, speficially:
    [`strings.SplitN`](https://pkg.go.dev/strings#SplitN) and
    [`strings.SplitSeq`](https://pkg.go.dev/strings#SplitSeq) where
    appropriate.

    There was also a [`strings.Fields`](https://pkg.go.dev/strings#Fields)
    change I made to use
    [`strings.FieldsSeq`](https://pkg.go.dev/strings#FieldsSeq) instead, and
    another for S3 to use the [`path`](https://pkg.go.dev/path) package
    instead of [`strings.Split`](https://pkg.go.dev/strings#Split).

    ## Why?

    [`strings.SplitN`](https://pkg.go.dev/strings#SplitN) and
    [`strings.SplitSeq`](https://pkg.go.dev/strings#SplitSeq) are often
    better options in many cases, and can be _partially_ detected using
    [`modernize`](https://pkg.go.dev/golang.org/x/tools/gopls/internal/analysis/modernize):
    > `stringsseq`: replace Split in "for range strings.Split(...)" by
    go1.24's more efficient `SplitSeq`, or `Fields` with `FieldSeq`.

    ## How did you test it?
    - [X] built
    - [X] run locally and tested manually
    - [x] covered by existing tests
    - [x] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks

    There are lots of potentially subtle behaviors from the `strings.Split`
    (and `strings.Fields`) usage that should be accounted for. If our
    existing tests don't cover those subtleties, there's risk for
    introducing an unintended bug. More intricate handling/parsing
    previously using the `strings` package should get extra attention from
    reviewers. I've attempted to break up my changes into logical commit
    chunks to aid in review / help spot potentially concerning changes.

commit ebcc3fd
Author: Sean Kane <[email protected]>
Date:   Tue Aug 26 09:15:23 2025 -0600

    fix: handle nil ptr in batch processing (#8240)

    ## What changed?
    Batch workflows were panicking because executions can be nil, but there
    is no check to prevent nil pointer exception.

    ## Why?
    Prevent nil-ptrs

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [X] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks
    NA

commit dfa8b3e
Author: Yu Xia <[email protected]>
Date:   Mon Aug 25 23:40:08 2025 -0700

    Adding minimum timeout require in system workflow (#8231)

    ## What changed?
    Adding minimum timeout require in system workflow

    ## Why?
    The system workflow needs to have sufficient time to execute the defer
    logic.

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

commit 8f1ad7c
Author: Yu Xia <[email protected]>
Date:   Mon Aug 25 15:52:12 2025 -0700

    Wire up api health monitor component (#8217)

    ## What changed?
    Wire up api health monitor component

    ## Why?
    The health monitor component did not wire correctly in fx

    ## How did you test it?
    - [x] built
    - [x] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

commit 45da266
Author: pdoerner <[email protected]>
Date:   Mon Aug 25 11:40:10 2025 -0700

    Remove dynamic config warnings for shared structures (#8236)

    ## What changed?
    Removed warning logs for shared dynamic config structures

    ## Why?
    Was failing integration tests

commit 2d74130
Author: Hai Zhao <[email protected]>
Date:   Mon Aug 25 09:12:03 2025 -0700

    Add replication state to response of DescribeNamespace/ListNamespaces/UpdateNamespace (#8234)

    ## What changed?
    Add replication state to response of
    DescribeNamespace/ListNamespaces/UpdateNamespace.

    ## Why?
    We want to check replication state quickly from cli.

    ## How did you test it?
    - [x] built
    - [x] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks

commit 54893bc
Author: David Reiss <[email protected]>
Date:   Fri Aug 22 17:04:29 2025 -0700

    Only force-load child partitions after successful initialization (#8230)

    ## What changed?
    The force-load child partitions mechanism should only happen after
    successful initialization of the root.

    ## Why?
    If the root fails to load, things can get stuck in a loop where the root
    loads the children and the children cause the root to be loaded again
    (from userdata polling).

    ## How did you test it?
    - [x] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

commit 251e20a
Author: sivagirish81 <[email protected]>
Date:   Thu Aug 21 19:08:02 2025 -0700

    TaskQueue Fairness Rate Limit (#8135)

    ## What changed?
    - Move the rate limit logic for fairness from priMatcher to
    taskQueuePartitionManagerLevel
    - Attach the fairness queue rate limit and the per-key rate limit to the
    simple rate limiter implementation.

    ## Why?
    - Implementation of UpdateTaskQueueConfig api for fairness tasks.

    ## How did you test it?
    - [x] built
    - [x] run locally and tested manually
    - [x] covered by existing tests
    - [x] added new unit test(s)
    - [x] added new functional test(s)

    ## Potential risks
    N/A

commit 16f7688
Author: Will Duan <[email protected]>
Date:   Fri Aug 22 07:16:52 2025 +0800

    Add log for slow replication tasks (#8225)

    ## What changed?
    Log replication task details when processing takes too long

    ## Why?
    For operator investigation

    ## How did you test it?
    - [x] built
    - [x] run locally and tested manually
    - [x] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks
    no risk.

commit 934c58d
Author: Will Duan <[email protected]>
Date:   Fri Aug 22 06:51:57 2025 +0800

    Fix VerifyVersionedTransition Task (#8227)

    ## What changed?
    Fix VerifyVersionedTransition Task

    ## Why?
    Without fix, there is risk of success the task without verifying.

    ## How did you test it?
    - [x] built
    - [x] run locally and tested manually
    - [x] covered by existing tests
    - [x] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks
    no risk.

commit f454f2f
Author: Yichao Yang <[email protected]>
Date:   Thu Aug 21 14:32:35 2025 -0700

    Revert history task processing timeout change (#8228)

    ## What changed?
    - Revert history task timeout from 10s to 3s for non-outbound tasks.

    ## Why?
    - The original change was made due to a misunderstanding of my comment
    [here](#7951 (comment)).
    I was meant to suggest only use 10s as the timeout for outbound chasm tasks.
    But for other tasks, like transfer/timer, it should still use 3s as the
    timeout.

    ## How did you test it?
    - [x] built
    - [ ] run locally and tested manually
    - [x] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

commit 14d52ce
Author: Roman Dmytrenko <[email protected]>
Date:   Thu Aug 21 15:16:29 2025 +0000

    ci: bump golangci-lint from v1.64.8 to v2.4.0 (#8174)

    ## What changed?

    Upgrade golangci-lint to v2

    ## How did you test it?

    - [x] run locally and tested manually

    ---------

    Signed-off-by: Roman Dmytrenko <[email protected]>

commit fb863dc
Author: Stephan Behnke <[email protected]>
Date:   Wed Aug 20 18:19:02 2025 -0700

    Explain test build tags and env variables (#7991)

    ## What changed?

    Added documentation for test-related build tags and env variables.

    ## Why?

    So help developers with their test setup.

commit 17c4c07
Author: pdoerner <[email protected]>
Date:   Wed Aug 20 16:05:45 2025 -0700

    Add dynamic config for forwarded Nexus request dispatch type (#8224)

    ## What changed?
    Added a new dynamic config to control whether forwarded Nexus HTTP
    requests should use the same dispatch type as the original request or
    always use dispatch by namespace + task queue.

    ## Why?
    Endpoints do not support replication, so forwarding by endpoint will not
    work out of the box because the two clusters will have a different ID
    for the endpoint.

commit 1d340f5
Author: pdoerner <[email protected]>
Date:   Wed Aug 20 15:33:30 2025 -0700

    Pass through original HTTP headers for forwarded Nexus requests (#8204)

    ## What changed?
    When forwarding Nexus Start/Cancel requests, the original HTTP headers
    will be passed through without sanitization.

    ## Why?
    Some headers that are still needed for the forwarded request may be
    sanitized during original request processing (e.g. authorization
    information headers).

    ## How did you test it?
    existing tests

commit 3b1b8d0
Author: Roey Berman <[email protected]>
Date:   Wed Aug 20 11:07:49 2025 -0600

    Upgrade Go SDK to 1.35.0 (#8216)

    Had to change WorkerDeploymentOptions and VersioningOverride to work
    with new SDK.

    Changed the worker setup for the versioning internal workflow replay
    tests, so I generated a new set of workflow histories to test with.
    Normally we generate new workflow histories only when we've made a
    change to the workflow definitions that we want to test (ie. a new
    patch), but since the operations to generate the workflow histories had
    to change slightly, I think it makes sense to generate a fresh set of
    histories to replay-test with next time there is a change.

    ---------

    Co-authored-by: Carly de Frondeville <[email protected]>

commit 4d1212a
Author: Stephan Behnke <[email protected]>
Date:   Tue Aug 19 11:52:55 2025 -0700

    Fix typo in unprocessedUpdateFailure (#8212)

    WISOTT

commit f38c88a
Author: David Reiss <[email protected]>
Date:   Tue Aug 19 06:51:19 2025 -0700

    Allow more retries for matching client polls (#8155)

    ## What changed?
    Allow frontend->matching poll requests to retry up to their context
    timeout instead of just once.

    ## Why?
    On matching service deployments, a busy new matching node may hit its
    persistence rps limit trying to acquire new task queues and be unable to
    accept polls.

    ## How did you test it?
    - [x] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

commit fdb7f31
Author: Yu Xia <[email protected]>
Date:   Mon Aug 18 16:24:42 2025 -0700

    Change sys background low to use the correct level (#8208)

    ## What changed?
    Change sys background low to use the correct level

    ## Why?
    Fix this based on the variable name

    ## How did you test it?
    - [x] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

commit e4b378f
Author: David Reiss <[email protected]>
Date:   Mon Aug 18 15:34:34 2025 -0700

    Support subscriptions to settings with constrained defaults (#8180)

    ## What changed?
    Fill in support for subscriptions to dynamic config values with
    constrained defaults.

    ## Why?
    We'd like to use this combination of functionality.

    ## How did you test it?
    - [x] added new unit test(s)

commit 8ed0361
Author: David Reiss <[email protected]>
Date:   Mon Aug 18 15:29:17 2025 -0700

    Allow empty data in DataBlob (#8181)

    ## What changed?
    Remove check for zero-length data in NewDataBlob.

    ## Why?
    Zero-length data is a valid encoding for some encodings, e.g. proto3.
    NewDataBlob should not have an opinion on the length of data.

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks
    Some code may be making assumptions about this behavior.

commit 40ac028
Author: David Reiss <[email protected]>
Date:   Mon Aug 18 12:41:41 2025 -0700

    Warn on dynamic config default values with shared structure (#8176)

    ## What changed?
    Log softassert warnings if dynamic config settings are registered with
    default values with shared structure.

    ## Why?
    This is very likely unintended and may lead to unexpected behavior of
    settings (values will be parsed on top of a copy of the default).

    ## How did you test it?
    - [x] run locally and tested manually
    - [x] added new unit test(s)

commit 9c75cd6
Author: pdoerner <[email protected]>
Date:   Fri Aug 15 12:43:47 2025 -0700

    Forward Nexus requests using same dispatch type as original request (#8199)

    ## What changed?
    When forwarding Nexus requests that were originally sent to the
    `DispatchByEndpoint` URL, the forwarding URL will also be constructed to
    send the request to the `DispatchByEndpoint` URL on the remote cluster.
    Previously, we were always sending forwarding requests using
    `DispatchByNamespaceAndTaskQueue`

    ## Why?
    bug fix

    ## How did you test it?
    existing tests

commit 21f556c
Author: Roey Berman <[email protected]>
Date:   Fri Aug 15 13:13:28 2025 -0600

    Commit generated scheduler protos (#8200)

    ## What

    - Commit generated scheduler protos.
    - Improve `make ensure-no-changes` to detect untracked files.

    ## Why?

    The protos were not generated since the tool was committed in a separate
    PR from where the protos were added.

commit 4c59cd1
Author: Roey Berman <[email protected]>
Date:   Fri Aug 15 12:09:58 2025 -0600

    Add support for protos in chasm libs (#8182)

    ## What changed?

    Added support for defining protos in chasm libs.

    ## Why?

    Keep everything local to the library.

    ## How did you test it?
    - [x] built
    - [x] run locally and tested manually

commit 08e2dfd
Author: pdoerner <[email protected]>
Date:   Thu Aug 14 16:49:44 2025 -0700

    Reconstruct failure for forwarded Nexus completion requests (#8198)

    ## What changed?
    When forwarding a `CompleteNexusOperation` HTTP request that contains a
    failure, the completion will be reconstructed instead of reusing the
    original request body.

    ## Why?
    The Nexus SDK reads and closes the HTTP request body when the operation
    state is `failed` or `canceled` so we cannot reuse it for the forwarded
    request. For `successful` operations, the SDK just passes on the result
    content in the form of a `nexus.LazyValue` which we can forward directly
    since it is not read or closed.

    ## How did you test it?
    new functional xdc tests

commit 9d82cae
Author: Lina Jodoin <[email protected]>
Date:   Thu Aug 14 16:06:01 2025 -0700

    Fix BufferedStart reference in chasm scheduler proto (#8197)

    ## What changed?
    _Describe what has changed in this PR._

    ## Why?
    _Tell your future self why have you made these changes._

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks
    _Any change is risky. Identify all risks you are aware of. If none,
    remove this section._

commit 41cce70
Author: Vladyslav Simonenko <[email protected]>
Date:   Wed Aug 13 15:34:38 2025 -0700

    Produce workflow_duration metric on completion (#8185)

    ## What changed?
    This PR produces the metric workflow_duration, when the workflow
    execution completes.

    ## Why?
    Currently there is no metric that captures the duration of the workflow
    execution. It's also valuable to have the duration broken down by task
    queue, namespace, workflow type, which this PR enables

    ## How did you test it?
    - [X] built
    - [X] run locally and tested manually
    - [X] covered by existing tests
    - [X] added new unit test(s)
    - [ ] added new functional test(s)

commit a1df862
Author: Lina Jodoin <[email protected]>
Date:   Wed Aug 13 15:30:48 2025 -0700

    [CHASM Scheduler] Move scheduler protobufs to scheduler/proto package (#8189)

    ## What changed?
    - CHASM scheduler protos are moved to live alongside the scheduler
    implementation code, within the `chasm` package.

    ## Why?
    - See #8182. Sending this PR in advance, as that PR asserts protobufs
    were generated.

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [x] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

commit 8fe5cee
Author: Sean Kane <[email protected]>
Date:   Thu Aug 14 00:10:06 2025 +0200

    improvement: remove waits before fetching activities (#8144)

    ## What changed?
    optimize the batch operation processing in `BatchActivity` and
    `BatchActivityWithProtobuf` by removing the need to wait for entire
    pages to complete before fetching the next page.

    - Implemented proactive page fetching once a worker becomes available
    - common `processWorkflowsWithProactiveFetching` function to reduce code
    duplication

    ## Why?
    The previous implementation had workers wait for entire pages to
    complete. This optimization improves resource utilization. The
    refactoring also eliminates duplicated functions in the `BatchParams`
    struct and `BatchOperation` protobuf.

    Addresses issue #8098.

    ## How did you test it?
    - [x] built
    - [x] run locally and tested manually
    - [x] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    The changes maintain backward compatibility.

    ## Potential risks
    While this change improves performance, it does modify the concurrency
    model of batch processing:

    1. **Timing changes**: The optimization changes when pages are fetched
    relative to task completion, which could expose edge cases in error
    handling or heartbeat timing
    2. **Memory usage**: Pages may be fetched earlier, potentially
    increasing peak memory usage if the next page is large
    3. **Rate limiting interaction**: The more aggressive task scheduling
    could interact differently with rate limiting, though the same
    per-worker limits are maintained
    4. **Heartbeat behavior**: heartbeats track the progress of an entire
    page and are applied after an entire page finishes

    The changes preserve all existing error handling, retry logic, and rate
    limiting behavior, but the different execution timing could surface
    previously hidden race conditions.

    ---------

    Co-authored-by: Roey Berman <[email protected]>

commit 469526e
Author: pdoerner <[email protected]>
Date:   Wed Aug 13 09:58:03 2025 -0700

    Change default for `component.nexusoperations.recordCancelRequestCompletionEvents` (#8191)

    ## What changed?
    Changed default for
    `component.nexusoperations.recordCancelRequestCompletionEvents` to
    `true`

    ## Why?
    Flag was added to ensure backwards compatibility. Now that 1.28 is
    released, can change the default. Flag will be removed after 1.29 is
    released.

commit 69e6b6c
Author: Rodrigo Zhou <[email protected]>
Date:   Tue Aug 12 11:31:27 2025 -0700

    Bump Temporal API to v1.52.0 (#8187)

    ## What changed?
    Bump Temporal API to v1.52.0

    ## Why?
    Bump Temporal API to v1.52.0

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks

commit da90f62
Author: Stephan Behnke <[email protected]>
Date:   Fri Aug 8 14:43:15 2025 -0700

    Decode of nil data (#8179)

    ## What changed?

    Don't catch `Data: nil` in test; let it fall through to decoder. The
    decoder will return an error. An error is the better choice than a `nil`
    response since that signals to the user that the decoded data is
    usable/valid.

    ## Why?

    Follow-up to #8111; an
    internal test expects an error instead of `nil`.

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [x] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks

    Hard to believe that returning nil and using un-decoded data is a
    good/valid alternative.

commit b8497fa
Author: David Reiss <[email protected]>
Date:   Fri Aug 8 08:32:58 2025 -0700

    Dynamic config conversion improvements (#7052)

    ## What changed?
    - Split implementation of "constrained default" settings from "plain
    default" settings. This is more code and the diff looks complex, but the
    individual paths are both simpler than the mixed version.
    - Add conversion cache using a weak map.
    - Remove GlobalCachedTypedValue.
    - Use "raw" values for subscription dispatch deduping to avoid
    unnecessary conversions.
    - Deep copy default values when using mapstructure, to avoid problems
    with merging over shared default values.

    ## Why?
    - Fixes #6756
    - Performance improvement for "plain default" settings (almost all of
    them)
    - Performance improvement for settings with complex converters
    - Remove footgun in defaults that aren't scalar values

    ## How did you test it?
    existing+new unit tests

commit f9bd083
Author: Lina Jodoin <[email protected]>
Date:   Thu Aug 7 17:11:53 2025 -0700

    [Scheduled Actions] Update Scheduler protos for CHASM (#8163)

    ## What changed?
    - Added protos for the new Scheduler task types.
    - Added TODOs for cleanup when the HSM component is removed.

    ## Why?
    - A few fields and messages were made obsolete with the CHASM port.

commit f8b97e5
Author: Vladyslav Simonenko <[email protected]>
Date:   Thu Aug 7 16:24:54 2025 -0700

    Break out of pagination in scavenger on errors (#8133)

    ## What changed?
    Break out of the loop, when iteration through mutable states fails

    ## Why?
    Previously, we continued to iterate, leading to the panic:
    #8037

    ## How did you test it?
    - [X] run locally and tested manually
    - [X] added new unit test(s)
lina-temporal added a commit that referenced this pull request Aug 28, 2025
commit 9f30a1d
Author: Lina Jodoin <[email protected]>
Date:   Wed Aug 27 16:08:12 2025 -0700

    [Scheduled Actions] Use the Execution returned from PollMutableState when calling GetWorkflowExecutionHistory (#8207)

    ## What changed?
    - In WatchWorkflow, we'll now use the Execution returned as a result
    from `PollMutableState`, instead of the Execution we used as part of the
    `PollMutableState` request.

    ## Why?
    - We have a likely race condition if a workflow starts and completes
    during the `PollMutableState` call, where our originally-requested
    Execution is no longer the latest.

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

commit 8b717ec
Author: Roman Dmytrenko <[email protected]>
Date:   Wed Aug 27 19:34:05 2025 +0000

    chore(deps): upgrade go from 1.24.5 to 1.25.0 (#8209)

    ## What changed?

    Upgrade go to the 1.25.0

    ## How did you test it?

    - [x] built
    - [x] run locally and tested manually

    ~Blocked by #8174~

    ---------

    Signed-off-by: Roman Dmytrenko <[email protected]>
    Co-authored-by: Stephan Behnke <[email protected]>
    Co-authored-by: Stephan Behnke <[email protected]>

commit 3f18d90
Author: Stephan Behnke <[email protected]>
Date:   Tue Aug 26 18:28:26 2025 -0700

    GetWorkflowExecutionHistory long poll soft timeout (#8238)

    ## What changed?

    Added a "soft timeout" (language used by Workflow Update) to
    `GetWorkflowExecutionHistory` long polls.

    ## Why?

    We don't want to terminate the long poll connection but instead keep it
    alive by sending a response back just before the timeout. The idea is
    that this will prevent connections from opening/terminating repeatedly
    (ie connection churn).

    ## How did you test it?
    - [ ] built
    - [x] run locally and tested manually
    - [x] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks

    I was only able to verify this manually by forcing a timeout in the
    server and verifying that instead of a deadline exceeded I saw a result.
    I'll assume this will work since the existing code already tried doing
    just exactly that, but it didn't do it well.

commit 64884b1
Author: Sean Kane <[email protected]>
Date:   Tue Aug 26 14:26:02 2025 -0600

    fix: handle nil ptr in legacy batch processing (#8244)

    ## What changed?
    `BatchWorkflow` is not yet deprecated, fix was not properly applied on
    previous PR

    ## Why?
    nilptr exception

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [X] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks
    na

commit 96c3aae
Author: Kent Gruber <[email protected]>
Date:   Tue Aug 26 13:23:40 2025 -0400

    Use better string splitting techniques where possible (#8226)

    ## What changed?

    This PR aims to avoid usage of
    [`strings.Split`](https://pkg.go.dev/strings#Split) where possible in
    favor of better string splitting techniques, speficially:
    [`strings.SplitN`](https://pkg.go.dev/strings#SplitN) and
    [`strings.SplitSeq`](https://pkg.go.dev/strings#SplitSeq) where
    appropriate.

    There was also a [`strings.Fields`](https://pkg.go.dev/strings#Fields)
    change I made to use
    [`strings.FieldsSeq`](https://pkg.go.dev/strings#FieldsSeq) instead, and
    another for S3 to use the [`path`](https://pkg.go.dev/path) package
    instead of [`strings.Split`](https://pkg.go.dev/strings#Split).

    ## Why?

    [`strings.SplitN`](https://pkg.go.dev/strings#SplitN) and
    [`strings.SplitSeq`](https://pkg.go.dev/strings#SplitSeq) are often
    better options in many cases, and can be _partially_ detected using
    [`modernize`](https://pkg.go.dev/golang.org/x/tools/gopls/internal/analysis/modernize):
    > `stringsseq`: replace Split in "for range strings.Split(...)" by
    go1.24's more efficient `SplitSeq`, or `Fields` with `FieldSeq`.

    ## How did you test it?
    - [X] built
    - [X] run locally and tested manually
    - [x] covered by existing tests
    - [x] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks

    There are lots of potentially subtle behaviors from the `strings.Split`
    (and `strings.Fields`) usage that should be accounted for. If our
    existing tests don't cover those subtleties, there's risk for
    introducing an unintended bug. More intricate handling/parsing
    previously using the `strings` package should get extra attention from
    reviewers. I've attempted to break up my changes into logical commit
    chunks to aid in review / help spot potentially concerning changes.

commit ebcc3fd
Author: Sean Kane <[email protected]>
Date:   Tue Aug 26 09:15:23 2025 -0600

    fix: handle nil ptr in batch processing (#8240)

    ## What changed?
    Batch workflows were panicking because executions can be nil, but there
    is no check to prevent nil pointer exception.

    ## Why?
    Prevent nil-ptrs

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [X] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks
    NA

commit dfa8b3e
Author: Yu Xia <[email protected]>
Date:   Mon Aug 25 23:40:08 2025 -0700

    Adding minimum timeout require in system workflow (#8231)

    ## What changed?
    Adding minimum timeout require in system workflow

    ## Why?
    The system workflow needs to have sufficient time to execute the defer
    logic.

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

commit 8f1ad7c
Author: Yu Xia <[email protected]>
Date:   Mon Aug 25 15:52:12 2025 -0700

    Wire up api health monitor component (#8217)

    ## What changed?
    Wire up api health monitor component

    ## Why?
    The health monitor component did not wire correctly in fx

    ## How did you test it?
    - [x] built
    - [x] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

commit 45da266
Author: pdoerner <[email protected]>
Date:   Mon Aug 25 11:40:10 2025 -0700

    Remove dynamic config warnings for shared structures (#8236)

    ## What changed?
    Removed warning logs for shared dynamic config structures

    ## Why?
    Was failing integration tests

commit 2d74130
Author: Hai Zhao <[email protected]>
Date:   Mon Aug 25 09:12:03 2025 -0700

    Add replication state to response of DescribeNamespace/ListNamespaces/UpdateNamespace (#8234)

    ## What changed?
    Add replication state to response of
    DescribeNamespace/ListNamespaces/UpdateNamespace.

    ## Why?
    We want to check replication state quickly from cli.

    ## How did you test it?
    - [x] built
    - [x] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks

commit 54893bc
Author: David Reiss <[email protected]>
Date:   Fri Aug 22 17:04:29 2025 -0700

    Only force-load child partitions after successful initialization (#8230)

    ## What changed?
    The force-load child partitions mechanism should only happen after
    successful initialization of the root.

    ## Why?
    If the root fails to load, things can get stuck in a loop where the root
    loads the children and the children cause the root to be loaded again
    (from userdata polling).

    ## How did you test it?
    - [x] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

commit 251e20a
Author: sivagirish81 <[email protected]>
Date:   Thu Aug 21 19:08:02 2025 -0700

    TaskQueue Fairness Rate Limit (#8135)

    ## What changed?
    - Move the rate limit logic for fairness from priMatcher to
    taskQueuePartitionManagerLevel
    - Attach the fairness queue rate limit and the per-key rate limit to the
    simple rate limiter implementation.

    ## Why?
    - Implementation of UpdateTaskQueueConfig api for fairness tasks.

    ## How did you test it?
    - [x] built
    - [x] run locally and tested manually
    - [x] covered by existing tests
    - [x] added new unit test(s)
    - [x] added new functional test(s)

    ## Potential risks
    N/A

commit 16f7688
Author: Will Duan <[email protected]>
Date:   Fri Aug 22 07:16:52 2025 +0800

    Add log for slow replication tasks (#8225)

    ## What changed?
    Log replication task details when processing takes too long

    ## Why?
    For operator investigation

    ## How did you test it?
    - [x] built
    - [x] run locally and tested manually
    - [x] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks
    no risk.

commit 934c58d
Author: Will Duan <[email protected]>
Date:   Fri Aug 22 06:51:57 2025 +0800

    Fix VerifyVersionedTransition Task (#8227)

    ## What changed?
    Fix VerifyVersionedTransition Task

    ## Why?
    Without fix, there is risk of success the task without verifying.

    ## How did you test it?
    - [x] built
    - [x] run locally and tested manually
    - [x] covered by existing tests
    - [x] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks
    no risk.

commit f454f2f
Author: Yichao Yang <[email protected]>
Date:   Thu Aug 21 14:32:35 2025 -0700

    Revert history task processing timeout change (#8228)

    ## What changed?
    - Revert history task timeout from 10s to 3s for non-outbound tasks.

    ## Why?
    - The original change was made due to a misunderstanding of my comment
    [here](#7951 (comment)).
    I was meant to suggest only use 10s as the timeout for outbound chasm tasks.
    But for other tasks, like transfer/timer, it should still use 3s as the
    timeout.

    ## How did you test it?
    - [x] built
    - [ ] run locally and tested manually
    - [x] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

commit 14d52ce
Author: Roman Dmytrenko <[email protected]>
Date:   Thu Aug 21 15:16:29 2025 +0000

    ci: bump golangci-lint from v1.64.8 to v2.4.0 (#8174)

    ## What changed?

    Upgrade golangci-lint to v2

    ## How did you test it?

    - [x] run locally and tested manually

    ---------

    Signed-off-by: Roman Dmytrenko <[email protected]>

commit fb863dc
Author: Stephan Behnke <[email protected]>
Date:   Wed Aug 20 18:19:02 2025 -0700

    Explain test build tags and env variables (#7991)

    ## What changed?

    Added documentation for test-related build tags and env variables.

    ## Why?

    So help developers with their test setup.

commit 17c4c07
Author: pdoerner <[email protected]>
Date:   Wed Aug 20 16:05:45 2025 -0700

    Add dynamic config for forwarded Nexus request dispatch type (#8224)

    ## What changed?
    Added a new dynamic config to control whether forwarded Nexus HTTP
    requests should use the same dispatch type as the original request or
    always use dispatch by namespace + task queue.

    ## Why?
    Endpoints do not support replication, so forwarding by endpoint will not
    work out of the box because the two clusters will have a different ID
    for the endpoint.

commit 1d340f5
Author: pdoerner <[email protected]>
Date:   Wed Aug 20 15:33:30 2025 -0700

    Pass through original HTTP headers for forwarded Nexus requests (#8204)

    ## What changed?
    When forwarding Nexus Start/Cancel requests, the original HTTP headers
    will be passed through without sanitization.

    ## Why?
    Some headers that are still needed for the forwarded request may be
    sanitized during original request processing (e.g. authorization
    information headers).

    ## How did you test it?
    existing tests

commit 3b1b8d0
Author: Roey Berman <[email protected]>
Date:   Wed Aug 20 11:07:49 2025 -0600

    Upgrade Go SDK to 1.35.0 (#8216)

    Had to change WorkerDeploymentOptions and VersioningOverride to work
    with new SDK.

    Changed the worker setup for the versioning internal workflow replay
    tests, so I generated a new set of workflow histories to test with.
    Normally we generate new workflow histories only when we've made a
    change to the workflow definitions that we want to test (ie. a new
    patch), but since the operations to generate the workflow histories had
    to change slightly, I think it makes sense to generate a fresh set of
    histories to replay-test with next time there is a change.

    ---------

    Co-authored-by: Carly de Frondeville <[email protected]>

commit 4d1212a
Author: Stephan Behnke <[email protected]>
Date:   Tue Aug 19 11:52:55 2025 -0700

    Fix typo in unprocessedUpdateFailure (#8212)

    WISOTT

commit f38c88a
Author: David Reiss <[email protected]>
Date:   Tue Aug 19 06:51:19 2025 -0700

    Allow more retries for matching client polls (#8155)

    ## What changed?
    Allow frontend->matching poll requests to retry up to their context
    timeout instead of just once.

    ## Why?
    On matching service deployments, a busy new matching node may hit its
    persistence rps limit trying to acquire new task queues and be unable to
    accept polls.

    ## How did you test it?
    - [x] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

commit fdb7f31
Author: Yu Xia <[email protected]>
Date:   Mon Aug 18 16:24:42 2025 -0700

    Change sys background low to use the correct level (#8208)

    ## What changed?
    Change sys background low to use the correct level

    ## Why?
    Fix this based on the variable name

    ## How did you test it?
    - [x] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

commit e4b378f
Author: David Reiss <[email protected]>
Date:   Mon Aug 18 15:34:34 2025 -0700

    Support subscriptions to settings with constrained defaults (#8180)

    ## What changed?
    Fill in support for subscriptions to dynamic config values with
    constrained defaults.

    ## Why?
    We'd like to use this combination of functionality.

    ## How did you test it?
    - [x] added new unit test(s)

commit 8ed0361
Author: David Reiss <[email protected]>
Date:   Mon Aug 18 15:29:17 2025 -0700

    Allow empty data in DataBlob (#8181)

    ## What changed?
    Remove check for zero-length data in NewDataBlob.

    ## Why?
    Zero-length data is a valid encoding for some encodings, e.g. proto3.
    NewDataBlob should not have an opinion on the length of data.

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks
    Some code may be making assumptions about this behavior.

commit 40ac028
Author: David Reiss <[email protected]>
Date:   Mon Aug 18 12:41:41 2025 -0700

    Warn on dynamic config default values with shared structure (#8176)

    ## What changed?
    Log softassert warnings if dynamic config settings are registered with
    default values with shared structure.

    ## Why?
    This is very likely unintended and may lead to unexpected behavior of
    settings (values will be parsed on top of a copy of the default).

    ## How did you test it?
    - [x] run locally and tested manually
    - [x] added new unit test(s)

commit 9c75cd6
Author: pdoerner <[email protected]>
Date:   Fri Aug 15 12:43:47 2025 -0700

    Forward Nexus requests using same dispatch type as original request (#8199)

    ## What changed?
    When forwarding Nexus requests that were originally sent to the
    `DispatchByEndpoint` URL, the forwarding URL will also be constructed to
    send the request to the `DispatchByEndpoint` URL on the remote cluster.
    Previously, we were always sending forwarding requests using
    `DispatchByNamespaceAndTaskQueue`

    ## Why?
    bug fix

    ## How did you test it?
    existing tests

commit 21f556c
Author: Roey Berman <[email protected]>
Date:   Fri Aug 15 13:13:28 2025 -0600

    Commit generated scheduler protos (#8200)

    ## What

    - Commit generated scheduler protos.
    - Improve `make ensure-no-changes` to detect untracked files.

    ## Why?

    The protos were not generated since the tool was committed in a separate
    PR from where the protos were added.

commit 4c59cd1
Author: Roey Berman <[email protected]>
Date:   Fri Aug 15 12:09:58 2025 -0600

    Add support for protos in chasm libs (#8182)

    ## What changed?

    Added support for defining protos in chasm libs.

    ## Why?

    Keep everything local to the library.

    ## How did you test it?
    - [x] built
    - [x] run locally and tested manually

commit 08e2dfd
Author: pdoerner <[email protected]>
Date:   Thu Aug 14 16:49:44 2025 -0700

    Reconstruct failure for forwarded Nexus completion requests (#8198)

    ## What changed?
    When forwarding a `CompleteNexusOperation` HTTP request that contains a
    failure, the completion will be reconstructed instead of reusing the
    original request body.

    ## Why?
    The Nexus SDK reads and closes the HTTP request body when the operation
    state is `failed` or `canceled` so we cannot reuse it for the forwarded
    request. For `successful` operations, the SDK just passes on the result
    content in the form of a `nexus.LazyValue` which we can forward directly
    since it is not read or closed.

    ## How did you test it?
    new functional xdc tests

commit 9d82cae
Author: Lina Jodoin <[email protected]>
Date:   Thu Aug 14 16:06:01 2025 -0700

    Fix BufferedStart reference in chasm scheduler proto (#8197)

    ## What changed?
    _Describe what has changed in this PR._

    ## Why?
    _Tell your future self why have you made these changes._

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks
    _Any change is risky. Identify all risks you are aware of. If none,
    remove this section._

commit 41cce70
Author: Vladyslav Simonenko <[email protected]>
Date:   Wed Aug 13 15:34:38 2025 -0700

    Produce workflow_duration metric on completion (#8185)

    ## What changed?
    This PR produces the metric workflow_duration, when the workflow
    execution completes.

    ## Why?
    Currently there is no metric that captures the duration of the workflow
    execution. It's also valuable to have the duration broken down by task
    queue, namespace, workflow type, which this PR enables

    ## How did you test it?
    - [X] built
    - [X] run locally and tested manually
    - [X] covered by existing tests
    - [X] added new unit test(s)
    - [ ] added new functional test(s)

commit a1df862
Author: Lina Jodoin <[email protected]>
Date:   Wed Aug 13 15:30:48 2025 -0700

    [CHASM Scheduler] Move scheduler protobufs to scheduler/proto package (#8189)

    ## What changed?
    - CHASM scheduler protos are moved to live alongside the scheduler
    implementation code, within the `chasm` package.

    ## Why?
    - See #8182. Sending this PR in advance, as that PR asserts protobufs
    were generated.

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [x] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

commit 8fe5cee
Author: Sean Kane <[email protected]>
Date:   Thu Aug 14 00:10:06 2025 +0200

    improvement: remove waits before fetching activities (#8144)

    ## What changed?
    optimize the batch operation processing in `BatchActivity` and
    `BatchActivityWithProtobuf` by removing the need to wait for entire
    pages to complete before fetching the next page.

    - Implemented proactive page fetching once a worker becomes available
    - common `processWorkflowsWithProactiveFetching` function to reduce code
    duplication

    ## Why?
    The previous implementation had workers wait for entire pages to
    complete. This optimization improves resource utilization. The
    refactoring also eliminates duplicated functions in the `BatchParams`
    struct and `BatchOperation` protobuf.

    Addresses issue #8098.

    ## How did you test it?
    - [x] built
    - [x] run locally and tested manually
    - [x] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    The changes maintain backward compatibility.

    ## Potential risks
    While this change improves performance, it does modify the concurrency
    model of batch processing:

    1. **Timing changes**: The optimization changes when pages are fetched
    relative to task completion, which could expose edge cases in error
    handling or heartbeat timing
    2. **Memory usage**: Pages may be fetched earlier, potentially
    increasing peak memory usage if the next page is large
    3. **Rate limiting interaction**: The more aggressive task scheduling
    could interact differently with rate limiting, though the same
    per-worker limits are maintained
    4. **Heartbeat behavior**: heartbeats track the progress of an entire
    page and are applied after an entire page finishes

    The changes preserve all existing error handling, retry logic, and rate
    limiting behavior, but the different execution timing could surface
    previously hidden race conditions.

    ---------

    Co-authored-by: Roey Berman <[email protected]>

commit 469526e
Author: pdoerner <[email protected]>
Date:   Wed Aug 13 09:58:03 2025 -0700

    Change default for `component.nexusoperations.recordCancelRequestCompletionEvents` (#8191)

    ## What changed?
    Changed default for
    `component.nexusoperations.recordCancelRequestCompletionEvents` to
    `true`

    ## Why?
    Flag was added to ensure backwards compatibility. Now that 1.28 is
    released, can change the default. Flag will be removed after 1.29 is
    released.

commit 69e6b6c
Author: Rodrigo Zhou <[email protected]>
Date:   Tue Aug 12 11:31:27 2025 -0700

    Bump Temporal API to v1.52.0 (#8187)

    ## What changed?
    Bump Temporal API to v1.52.0

    ## Why?
    Bump Temporal API to v1.52.0

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks

commit da90f62
Author: Stephan Behnke <[email protected]>
Date:   Fri Aug 8 14:43:15 2025 -0700

    Decode of nil data (#8179)

    ## What changed?

    Don't catch `Data: nil` in test; let it fall through to decoder. The
    decoder will return an error. An error is the better choice than a `nil`
    response since that signals to the user that the decoded data is
    usable/valid.

    ## Why?

    Follow-up to #8111; an
    internal test expects an error instead of `nil`.

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [x] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks

    Hard to believe that returning nil and using un-decoded data is a
    good/valid alternative.

commit b8497fa
Author: David Reiss <[email protected]>
Date:   Fri Aug 8 08:32:58 2025 -0700

    Dynamic config conversion improvements (#7052)

    ## What changed?
    - Split implementation of "constrained default" settings from "plain
    default" settings. This is more code and the diff looks complex, but the
    individual paths are both simpler than the mixed version.
    - Add conversion cache using a weak map.
    - Remove GlobalCachedTypedValue.
    - Use "raw" values for subscription dispatch deduping to avoid
    unnecessary conversions.
    - Deep copy default values when using mapstructure, to avoid problems
    with merging over shared default values.

    ## Why?
    - Fixes #6756
    - Performance improvement for "plain default" settings (almost all of
    them)
    - Performance improvement for settings with complex converters
    - Remove footgun in defaults that aren't scalar values

    ## How did you test it?
    existing+new unit tests

commit f9bd083
Author: Lina Jodoin <[email protected]>
Date:   Thu Aug 7 17:11:53 2025 -0700

    [Scheduled Actions] Update Scheduler protos for CHASM (#8163)

    ## What changed?
    - Added protos for the new Scheduler task types.
    - Added TODOs for cleanup when the HSM component is removed.

    ## Why?
    - A few fields and messages were made obsolete with the CHASM port.

commit f8b97e5
Author: Vladyslav Simonenko <[email protected]>
Date:   Thu Aug 7 16:24:54 2025 -0700

    Break out of pagination in scavenger on errors (#8133)

    ## What changed?
    Break out of the loop, when iteration through mutable states fails

    ## Why?
    Previously, we continued to iterate, leading to the panic:
    #8037

    ## How did you test it?
    - [X] run locally and tested manually
    - [X] added new unit test(s)
lina-temporal added a commit that referenced this pull request Aug 28, 2025
commit 9f30a1d
Author: Lina Jodoin <[email protected]>
Date:   Wed Aug 27 16:08:12 2025 -0700

    [Scheduled Actions] Use the Execution returned from PollMutableState when calling GetWorkflowExecutionHistory (#8207)

    ## What changed?
    - In WatchWorkflow, we'll now use the Execution returned as a result
    from `PollMutableState`, instead of the Execution we used as part of the
    `PollMutableState` request.

    ## Why?
    - We have a likely race condition if a workflow starts and completes
    during the `PollMutableState` call, where our originally-requested
    Execution is no longer the latest.

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

commit 8b717ec
Author: Roman Dmytrenko <[email protected]>
Date:   Wed Aug 27 19:34:05 2025 +0000

    chore(deps): upgrade go from 1.24.5 to 1.25.0 (#8209)

    ## What changed?

    Upgrade go to the 1.25.0

    ## How did you test it?

    - [x] built
    - [x] run locally and tested manually

    ~Blocked by #8174~

    ---------

    Signed-off-by: Roman Dmytrenko <[email protected]>
    Co-authored-by: Stephan Behnke <[email protected]>
    Co-authored-by: Stephan Behnke <[email protected]>

commit 3f18d90
Author: Stephan Behnke <[email protected]>
Date:   Tue Aug 26 18:28:26 2025 -0700

    GetWorkflowExecutionHistory long poll soft timeout (#8238)

    ## What changed?

    Added a "soft timeout" (language used by Workflow Update) to
    `GetWorkflowExecutionHistory` long polls.

    ## Why?

    We don't want to terminate the long poll connection but instead keep it
    alive by sending a response back just before the timeout. The idea is
    that this will prevent connections from opening/terminating repeatedly
    (ie connection churn).

    ## How did you test it?
    - [ ] built
    - [x] run locally and tested manually
    - [x] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks

    I was only able to verify this manually by forcing a timeout in the
    server and verifying that instead of a deadline exceeded I saw a result.
    I'll assume this will work since the existing code already tried doing
    just exactly that, but it didn't do it well.

commit 64884b1
Author: Sean Kane <[email protected]>
Date:   Tue Aug 26 14:26:02 2025 -0600

    fix: handle nil ptr in legacy batch processing (#8244)

    ## What changed?
    `BatchWorkflow` is not yet deprecated, fix was not properly applied on
    previous PR

    ## Why?
    nilptr exception

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [X] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks
    na

commit 96c3aae
Author: Kent Gruber <[email protected]>
Date:   Tue Aug 26 13:23:40 2025 -0400

    Use better string splitting techniques where possible (#8226)

    ## What changed?

    This PR aims to avoid usage of
    [`strings.Split`](https://pkg.go.dev/strings#Split) where possible in
    favor of better string splitting techniques, speficially:
    [`strings.SplitN`](https://pkg.go.dev/strings#SplitN) and
    [`strings.SplitSeq`](https://pkg.go.dev/strings#SplitSeq) where
    appropriate.

    There was also a [`strings.Fields`](https://pkg.go.dev/strings#Fields)
    change I made to use
    [`strings.FieldsSeq`](https://pkg.go.dev/strings#FieldsSeq) instead, and
    another for S3 to use the [`path`](https://pkg.go.dev/path) package
    instead of [`strings.Split`](https://pkg.go.dev/strings#Split).

    ## Why?

    [`strings.SplitN`](https://pkg.go.dev/strings#SplitN) and
    [`strings.SplitSeq`](https://pkg.go.dev/strings#SplitSeq) are often
    better options in many cases, and can be _partially_ detected using
    [`modernize`](https://pkg.go.dev/golang.org/x/tools/gopls/internal/analysis/modernize):
    > `stringsseq`: replace Split in "for range strings.Split(...)" by
    go1.24's more efficient `SplitSeq`, or `Fields` with `FieldSeq`.

    ## How did you test it?
    - [X] built
    - [X] run locally and tested manually
    - [x] covered by existing tests
    - [x] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks

    There are lots of potentially subtle behaviors from the `strings.Split`
    (and `strings.Fields`) usage that should be accounted for. If our
    existing tests don't cover those subtleties, there's risk for
    introducing an unintended bug. More intricate handling/parsing
    previously using the `strings` package should get extra attention from
    reviewers. I've attempted to break up my changes into logical commit
    chunks to aid in review / help spot potentially concerning changes.

commit ebcc3fd
Author: Sean Kane <[email protected]>
Date:   Tue Aug 26 09:15:23 2025 -0600

    fix: handle nil ptr in batch processing (#8240)

    ## What changed?
    Batch workflows were panicking because executions can be nil, but there
    is no check to prevent nil pointer exception.

    ## Why?
    Prevent nil-ptrs

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [X] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks
    NA

commit dfa8b3e
Author: Yu Xia <[email protected]>
Date:   Mon Aug 25 23:40:08 2025 -0700

    Adding minimum timeout require in system workflow (#8231)

    ## What changed?
    Adding minimum timeout require in system workflow

    ## Why?
    The system workflow needs to have sufficient time to execute the defer
    logic.

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

commit 8f1ad7c
Author: Yu Xia <[email protected]>
Date:   Mon Aug 25 15:52:12 2025 -0700

    Wire up api health monitor component (#8217)

    ## What changed?
    Wire up api health monitor component

    ## Why?
    The health monitor component did not wire correctly in fx

    ## How did you test it?
    - [x] built
    - [x] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

commit 45da266
Author: pdoerner <[email protected]>
Date:   Mon Aug 25 11:40:10 2025 -0700

    Remove dynamic config warnings for shared structures (#8236)

    ## What changed?
    Removed warning logs for shared dynamic config structures

    ## Why?
    Was failing integration tests

commit 2d74130
Author: Hai Zhao <[email protected]>
Date:   Mon Aug 25 09:12:03 2025 -0700

    Add replication state to response of DescribeNamespace/ListNamespaces/UpdateNamespace (#8234)

    ## What changed?
    Add replication state to response of
    DescribeNamespace/ListNamespaces/UpdateNamespace.

    ## Why?
    We want to check replication state quickly from cli.

    ## How did you test it?
    - [x] built
    - [x] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks

commit 54893bc
Author: David Reiss <[email protected]>
Date:   Fri Aug 22 17:04:29 2025 -0700

    Only force-load child partitions after successful initialization (#8230)

    ## What changed?
    The force-load child partitions mechanism should only happen after
    successful initialization of the root.

    ## Why?
    If the root fails to load, things can get stuck in a loop where the root
    loads the children and the children cause the root to be loaded again
    (from userdata polling).

    ## How did you test it?
    - [x] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

commit 251e20a
Author: sivagirish81 <[email protected]>
Date:   Thu Aug 21 19:08:02 2025 -0700

    TaskQueue Fairness Rate Limit (#8135)

    ## What changed?
    - Move the rate limit logic for fairness from priMatcher to
    taskQueuePartitionManagerLevel
    - Attach the fairness queue rate limit and the per-key rate limit to the
    simple rate limiter implementation.

    ## Why?
    - Implementation of UpdateTaskQueueConfig api for fairness tasks.

    ## How did you test it?
    - [x] built
    - [x] run locally and tested manually
    - [x] covered by existing tests
    - [x] added new unit test(s)
    - [x] added new functional test(s)

    ## Potential risks
    N/A

commit 16f7688
Author: Will Duan <[email protected]>
Date:   Fri Aug 22 07:16:52 2025 +0800

    Add log for slow replication tasks (#8225)

    ## What changed?
    Log replication task details when processing takes too long

    ## Why?
    For operator investigation

    ## How did you test it?
    - [x] built
    - [x] run locally and tested manually
    - [x] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks
    no risk.

commit 934c58d
Author: Will Duan <[email protected]>
Date:   Fri Aug 22 06:51:57 2025 +0800

    Fix VerifyVersionedTransition Task (#8227)

    ## What changed?
    Fix VerifyVersionedTransition Task

    ## Why?
    Without fix, there is risk of success the task without verifying.

    ## How did you test it?
    - [x] built
    - [x] run locally and tested manually
    - [x] covered by existing tests
    - [x] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks
    no risk.

commit f454f2f
Author: Yichao Yang <[email protected]>
Date:   Thu Aug 21 14:32:35 2025 -0700

    Revert history task processing timeout change (#8228)

    ## What changed?
    - Revert history task timeout from 10s to 3s for non-outbound tasks.

    ## Why?
    - The original change was made due to a misunderstanding of my comment
    [here](#7951 (comment)).
    I was meant to suggest only use 10s as the timeout for outbound chasm tasks.
    But for other tasks, like transfer/timer, it should still use 3s as the
    timeout.

    ## How did you test it?
    - [x] built
    - [ ] run locally and tested manually
    - [x] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

commit 14d52ce
Author: Roman Dmytrenko <[email protected]>
Date:   Thu Aug 21 15:16:29 2025 +0000

    ci: bump golangci-lint from v1.64.8 to v2.4.0 (#8174)

    ## What changed?

    Upgrade golangci-lint to v2

    ## How did you test it?

    - [x] run locally and tested manually

    ---------

    Signed-off-by: Roman Dmytrenko <[email protected]>

commit fb863dc
Author: Stephan Behnke <[email protected]>
Date:   Wed Aug 20 18:19:02 2025 -0700

    Explain test build tags and env variables (#7991)

    ## What changed?

    Added documentation for test-related build tags and env variables.

    ## Why?

    So help developers with their test setup.

commit 17c4c07
Author: pdoerner <[email protected]>
Date:   Wed Aug 20 16:05:45 2025 -0700

    Add dynamic config for forwarded Nexus request dispatch type (#8224)

    ## What changed?
    Added a new dynamic config to control whether forwarded Nexus HTTP
    requests should use the same dispatch type as the original request or
    always use dispatch by namespace + task queue.

    ## Why?
    Endpoints do not support replication, so forwarding by endpoint will not
    work out of the box because the two clusters will have a different ID
    for the endpoint.

commit 1d340f5
Author: pdoerner <[email protected]>
Date:   Wed Aug 20 15:33:30 2025 -0700

    Pass through original HTTP headers for forwarded Nexus requests (#8204)

    ## What changed?
    When forwarding Nexus Start/Cancel requests, the original HTTP headers
    will be passed through without sanitization.

    ## Why?
    Some headers that are still needed for the forwarded request may be
    sanitized during original request processing (e.g. authorization
    information headers).

    ## How did you test it?
    existing tests

commit 3b1b8d0
Author: Roey Berman <[email protected]>
Date:   Wed Aug 20 11:07:49 2025 -0600

    Upgrade Go SDK to 1.35.0 (#8216)

    Had to change WorkerDeploymentOptions and VersioningOverride to work
    with new SDK.

    Changed the worker setup for the versioning internal workflow replay
    tests, so I generated a new set of workflow histories to test with.
    Normally we generate new workflow histories only when we've made a
    change to the workflow definitions that we want to test (ie. a new
    patch), but since the operations to generate the workflow histories had
    to change slightly, I think it makes sense to generate a fresh set of
    histories to replay-test with next time there is a change.

    ---------

    Co-authored-by: Carly de Frondeville <[email protected]>

commit 4d1212a
Author: Stephan Behnke <[email protected]>
Date:   Tue Aug 19 11:52:55 2025 -0700

    Fix typo in unprocessedUpdateFailure (#8212)

    WISOTT

commit f38c88a
Author: David Reiss <[email protected]>
Date:   Tue Aug 19 06:51:19 2025 -0700

    Allow more retries for matching client polls (#8155)

    ## What changed?
    Allow frontend->matching poll requests to retry up to their context
    timeout instead of just once.

    ## Why?
    On matching service deployments, a busy new matching node may hit its
    persistence rps limit trying to acquire new task queues and be unable to
    accept polls.

    ## How did you test it?
    - [x] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

commit fdb7f31
Author: Yu Xia <[email protected]>
Date:   Mon Aug 18 16:24:42 2025 -0700

    Change sys background low to use the correct level (#8208)

    ## What changed?
    Change sys background low to use the correct level

    ## Why?
    Fix this based on the variable name

    ## How did you test it?
    - [x] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

commit e4b378f
Author: David Reiss <[email protected]>
Date:   Mon Aug 18 15:34:34 2025 -0700

    Support subscriptions to settings with constrained defaults (#8180)

    ## What changed?
    Fill in support for subscriptions to dynamic config values with
    constrained defaults.

    ## Why?
    We'd like to use this combination of functionality.

    ## How did you test it?
    - [x] added new unit test(s)

commit 8ed0361
Author: David Reiss <[email protected]>
Date:   Mon Aug 18 15:29:17 2025 -0700

    Allow empty data in DataBlob (#8181)

    ## What changed?
    Remove check for zero-length data in NewDataBlob.

    ## Why?
    Zero-length data is a valid encoding for some encodings, e.g. proto3.
    NewDataBlob should not have an opinion on the length of data.

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks
    Some code may be making assumptions about this behavior.

commit 40ac028
Author: David Reiss <[email protected]>
Date:   Mon Aug 18 12:41:41 2025 -0700

    Warn on dynamic config default values with shared structure (#8176)

    ## What changed?
    Log softassert warnings if dynamic config settings are registered with
    default values with shared structure.

    ## Why?
    This is very likely unintended and may lead to unexpected behavior of
    settings (values will be parsed on top of a copy of the default).

    ## How did you test it?
    - [x] run locally and tested manually
    - [x] added new unit test(s)

commit 9c75cd6
Author: pdoerner <[email protected]>
Date:   Fri Aug 15 12:43:47 2025 -0700

    Forward Nexus requests using same dispatch type as original request (#8199)

    ## What changed?
    When forwarding Nexus requests that were originally sent to the
    `DispatchByEndpoint` URL, the forwarding URL will also be constructed to
    send the request to the `DispatchByEndpoint` URL on the remote cluster.
    Previously, we were always sending forwarding requests using
    `DispatchByNamespaceAndTaskQueue`

    ## Why?
    bug fix

    ## How did you test it?
    existing tests

commit 21f556c
Author: Roey Berman <[email protected]>
Date:   Fri Aug 15 13:13:28 2025 -0600

    Commit generated scheduler protos (#8200)

    ## What

    - Commit generated scheduler protos.
    - Improve `make ensure-no-changes` to detect untracked files.

    ## Why?

    The protos were not generated since the tool was committed in a separate
    PR from where the protos were added.

commit 4c59cd1
Author: Roey Berman <[email protected]>
Date:   Fri Aug 15 12:09:58 2025 -0600

    Add support for protos in chasm libs (#8182)

    ## What changed?

    Added support for defining protos in chasm libs.

    ## Why?

    Keep everything local to the library.

    ## How did you test it?
    - [x] built
    - [x] run locally and tested manually

commit 08e2dfd
Author: pdoerner <[email protected]>
Date:   Thu Aug 14 16:49:44 2025 -0700

    Reconstruct failure for forwarded Nexus completion requests (#8198)

    ## What changed?
    When forwarding a `CompleteNexusOperation` HTTP request that contains a
    failure, the completion will be reconstructed instead of reusing the
    original request body.

    ## Why?
    The Nexus SDK reads and closes the HTTP request body when the operation
    state is `failed` or `canceled` so we cannot reuse it for the forwarded
    request. For `successful` operations, the SDK just passes on the result
    content in the form of a `nexus.LazyValue` which we can forward directly
    since it is not read or closed.

    ## How did you test it?
    new functional xdc tests

commit 9d82cae
Author: Lina Jodoin <[email protected]>
Date:   Thu Aug 14 16:06:01 2025 -0700

    Fix BufferedStart reference in chasm scheduler proto (#8197)

    ## What changed?
    _Describe what has changed in this PR._

    ## Why?
    _Tell your future self why have you made these changes._

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks
    _Any change is risky. Identify all risks you are aware of. If none,
    remove this section._

commit 41cce70
Author: Vladyslav Simonenko <[email protected]>
Date:   Wed Aug 13 15:34:38 2025 -0700

    Produce workflow_duration metric on completion (#8185)

    ## What changed?
    This PR produces the metric workflow_duration, when the workflow
    execution completes.

    ## Why?
    Currently there is no metric that captures the duration of the workflow
    execution. It's also valuable to have the duration broken down by task
    queue, namespace, workflow type, which this PR enables

    ## How did you test it?
    - [X] built
    - [X] run locally and tested manually
    - [X] covered by existing tests
    - [X] added new unit test(s)
    - [ ] added new functional test(s)

commit a1df862
Author: Lina Jodoin <[email protected]>
Date:   Wed Aug 13 15:30:48 2025 -0700

    [CHASM Scheduler] Move scheduler protobufs to scheduler/proto package (#8189)

    ## What changed?
    - CHASM scheduler protos are moved to live alongside the scheduler
    implementation code, within the `chasm` package.

    ## Why?
    - See #8182. Sending this PR in advance, as that PR asserts protobufs
    were generated.

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [x] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

commit 8fe5cee
Author: Sean Kane <[email protected]>
Date:   Thu Aug 14 00:10:06 2025 +0200

    improvement: remove waits before fetching activities (#8144)

    ## What changed?
    optimize the batch operation processing in `BatchActivity` and
    `BatchActivityWithProtobuf` by removing the need to wait for entire
    pages to complete before fetching the next page.

    - Implemented proactive page fetching once a worker becomes available
    - common `processWorkflowsWithProactiveFetching` function to reduce code
    duplication

    ## Why?
    The previous implementation had workers wait for entire pages to
    complete. This optimization improves resource utilization. The
    refactoring also eliminates duplicated functions in the `BatchParams`
    struct and `BatchOperation` protobuf.

    Addresses issue #8098.

    ## How did you test it?
    - [x] built
    - [x] run locally and tested manually
    - [x] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    The changes maintain backward compatibility.

    ## Potential risks
    While this change improves performance, it does modify the concurrency
    model of batch processing:

    1. **Timing changes**: The optimization changes when pages are fetched
    relative to task completion, which could expose edge cases in error
    handling or heartbeat timing
    2. **Memory usage**: Pages may be fetched earlier, potentially
    increasing peak memory usage if the next page is large
    3. **Rate limiting interaction**: The more aggressive task scheduling
    could interact differently with rate limiting, though the same
    per-worker limits are maintained
    4. **Heartbeat behavior**: heartbeats track the progress of an entire
    page and are applied after an entire page finishes

    The changes preserve all existing error handling, retry logic, and rate
    limiting behavior, but the different execution timing could surface
    previously hidden race conditions.

    ---------

    Co-authored-by: Roey Berman <[email protected]>

commit 469526e
Author: pdoerner <[email protected]>
Date:   Wed Aug 13 09:58:03 2025 -0700

    Change default for `component.nexusoperations.recordCancelRequestCompletionEvents` (#8191)

    ## What changed?
    Changed default for
    `component.nexusoperations.recordCancelRequestCompletionEvents` to
    `true`

    ## Why?
    Flag was added to ensure backwards compatibility. Now that 1.28 is
    released, can change the default. Flag will be removed after 1.29 is
    released.

commit 69e6b6c
Author: Rodrigo Zhou <[email protected]>
Date:   Tue Aug 12 11:31:27 2025 -0700

    Bump Temporal API to v1.52.0 (#8187)

    ## What changed?
    Bump Temporal API to v1.52.0

    ## Why?
    Bump Temporal API to v1.52.0

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks

commit da90f62
Author: Stephan Behnke <[email protected]>
Date:   Fri Aug 8 14:43:15 2025 -0700

    Decode of nil data (#8179)

    ## What changed?

    Don't catch `Data: nil` in test; let it fall through to decoder. The
    decoder will return an error. An error is the better choice than a `nil`
    response since that signals to the user that the decoded data is
    usable/valid.

    ## Why?

    Follow-up to #8111; an
    internal test expects an error instead of `nil`.

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [x] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks

    Hard to believe that returning nil and using un-decoded data is a
    good/valid alternative.

commit b8497fa
Author: David Reiss <[email protected]>
Date:   Fri Aug 8 08:32:58 2025 -0700

    Dynamic config conversion improvements (#7052)

    ## What changed?
    - Split implementation of "constrained default" settings from "plain
    default" settings. This is more code and the diff looks complex, but the
    individual paths are both simpler than the mixed version.
    - Add conversion cache using a weak map.
    - Remove GlobalCachedTypedValue.
    - Use "raw" values for subscription dispatch deduping to avoid
    unnecessary conversions.
    - Deep copy default values when using mapstructure, to avoid problems
    with merging over shared default values.

    ## Why?
    - Fixes #6756
    - Performance improvement for "plain default" settings (almost all of
    them)
    - Performance improvement for settings with complex converters
    - Remove footgun in defaults that aren't scalar values

    ## How did you test it?
    existing+new unit tests

commit f9bd083
Author: Lina Jodoin <[email protected]>
Date:   Thu Aug 7 17:11:53 2025 -0700

    [Scheduled Actions] Update Scheduler protos for CHASM (#8163)

    ## What changed?
    - Added protos for the new Scheduler task types.
    - Added TODOs for cleanup when the HSM component is removed.

    ## Why?
    - A few fields and messages were made obsolete with the CHASM port.

commit f8b97e5
Author: Vladyslav Simonenko <[email protected]>
Date:   Thu Aug 7 16:24:54 2025 -0700

    Break out of pagination in scavenger on errors (#8133)

    ## What changed?
    Break out of the loop, when iteration through mutable states fails

    ## Why?
    Previously, we continued to iterate, leading to the panic:
    #8037

    ## How did you test it?
    - [X] run locally and tested manually
    - [X] added new unit test(s)
lina-temporal added a commit that referenced this pull request Aug 28, 2025
commit 9f30a1d
Author: Lina Jodoin <[email protected]>
Date:   Wed Aug 27 16:08:12 2025 -0700

    [Scheduled Actions] Use the Execution returned from PollMutableState when calling GetWorkflowExecutionHistory (#8207)

    ## What changed?
    - In WatchWorkflow, we'll now use the Execution returned as a result
    from `PollMutableState`, instead of the Execution we used as part of the
    `PollMutableState` request.

    ## Why?
    - We have a likely race condition if a workflow starts and completes
    during the `PollMutableState` call, where our originally-requested
    Execution is no longer the latest.

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

commit 8b717ec
Author: Roman Dmytrenko <[email protected]>
Date:   Wed Aug 27 19:34:05 2025 +0000

    chore(deps): upgrade go from 1.24.5 to 1.25.0 (#8209)

    ## What changed?

    Upgrade go to the 1.25.0

    ## How did you test it?

    - [x] built
    - [x] run locally and tested manually

    ~Blocked by #8174~

    ---------

    Signed-off-by: Roman Dmytrenko <[email protected]>
    Co-authored-by: Stephan Behnke <[email protected]>
    Co-authored-by: Stephan Behnke <[email protected]>

commit 3f18d90
Author: Stephan Behnke <[email protected]>
Date:   Tue Aug 26 18:28:26 2025 -0700

    GetWorkflowExecutionHistory long poll soft timeout (#8238)

    ## What changed?

    Added a "soft timeout" (language used by Workflow Update) to
    `GetWorkflowExecutionHistory` long polls.

    ## Why?

    We don't want to terminate the long poll connection but instead keep it
    alive by sending a response back just before the timeout. The idea is
    that this will prevent connections from opening/terminating repeatedly
    (ie connection churn).

    ## How did you test it?
    - [ ] built
    - [x] run locally and tested manually
    - [x] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks

    I was only able to verify this manually by forcing a timeout in the
    server and verifying that instead of a deadline exceeded I saw a result.
    I'll assume this will work since the existing code already tried doing
    just exactly that, but it didn't do it well.

commit 64884b1
Author: Sean Kane <[email protected]>
Date:   Tue Aug 26 14:26:02 2025 -0600

    fix: handle nil ptr in legacy batch processing (#8244)

    ## What changed?
    `BatchWorkflow` is not yet deprecated, fix was not properly applied on
    previous PR

    ## Why?
    nilptr exception

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [X] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks
    na

commit 96c3aae
Author: Kent Gruber <[email protected]>
Date:   Tue Aug 26 13:23:40 2025 -0400

    Use better string splitting techniques where possible (#8226)

    ## What changed?

    This PR aims to avoid usage of
    [`strings.Split`](https://pkg.go.dev/strings#Split) where possible in
    favor of better string splitting techniques, speficially:
    [`strings.SplitN`](https://pkg.go.dev/strings#SplitN) and
    [`strings.SplitSeq`](https://pkg.go.dev/strings#SplitSeq) where
    appropriate.

    There was also a [`strings.Fields`](https://pkg.go.dev/strings#Fields)
    change I made to use
    [`strings.FieldsSeq`](https://pkg.go.dev/strings#FieldsSeq) instead, and
    another for S3 to use the [`path`](https://pkg.go.dev/path) package
    instead of [`strings.Split`](https://pkg.go.dev/strings#Split).

    ## Why?

    [`strings.SplitN`](https://pkg.go.dev/strings#SplitN) and
    [`strings.SplitSeq`](https://pkg.go.dev/strings#SplitSeq) are often
    better options in many cases, and can be _partially_ detected using
    [`modernize`](https://pkg.go.dev/golang.org/x/tools/gopls/internal/analysis/modernize):
    > `stringsseq`: replace Split in "for range strings.Split(...)" by
    go1.24's more efficient `SplitSeq`, or `Fields` with `FieldSeq`.

    ## How did you test it?
    - [X] built
    - [X] run locally and tested manually
    - [x] covered by existing tests
    - [x] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks

    There are lots of potentially subtle behaviors from the `strings.Split`
    (and `strings.Fields`) usage that should be accounted for. If our
    existing tests don't cover those subtleties, there's risk for
    introducing an unintended bug. More intricate handling/parsing
    previously using the `strings` package should get extra attention from
    reviewers. I've attempted to break up my changes into logical commit
    chunks to aid in review / help spot potentially concerning changes.

commit ebcc3fd
Author: Sean Kane <[email protected]>
Date:   Tue Aug 26 09:15:23 2025 -0600

    fix: handle nil ptr in batch processing (#8240)

    ## What changed?
    Batch workflows were panicking because executions can be nil, but there
    is no check to prevent nil pointer exception.

    ## Why?
    Prevent nil-ptrs

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [X] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks
    NA

commit dfa8b3e
Author: Yu Xia <[email protected]>
Date:   Mon Aug 25 23:40:08 2025 -0700

    Adding minimum timeout require in system workflow (#8231)

    ## What changed?
    Adding minimum timeout require in system workflow

    ## Why?
    The system workflow needs to have sufficient time to execute the defer
    logic.

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

commit 8f1ad7c
Author: Yu Xia <[email protected]>
Date:   Mon Aug 25 15:52:12 2025 -0700

    Wire up api health monitor component (#8217)

    ## What changed?
    Wire up api health monitor component

    ## Why?
    The health monitor component did not wire correctly in fx

    ## How did you test it?
    - [x] built
    - [x] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

commit 45da266
Author: pdoerner <[email protected]>
Date:   Mon Aug 25 11:40:10 2025 -0700

    Remove dynamic config warnings for shared structures (#8236)

    ## What changed?
    Removed warning logs for shared dynamic config structures

    ## Why?
    Was failing integration tests

commit 2d74130
Author: Hai Zhao <[email protected]>
Date:   Mon Aug 25 09:12:03 2025 -0700

    Add replication state to response of DescribeNamespace/ListNamespaces/UpdateNamespace (#8234)

    ## What changed?
    Add replication state to response of
    DescribeNamespace/ListNamespaces/UpdateNamespace.

    ## Why?
    We want to check replication state quickly from cli.

    ## How did you test it?
    - [x] built
    - [x] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks

commit 54893bc
Author: David Reiss <[email protected]>
Date:   Fri Aug 22 17:04:29 2025 -0700

    Only force-load child partitions after successful initialization (#8230)

    ## What changed?
    The force-load child partitions mechanism should only happen after
    successful initialization of the root.

    ## Why?
    If the root fails to load, things can get stuck in a loop where the root
    loads the children and the children cause the root to be loaded again
    (from userdata polling).

    ## How did you test it?
    - [x] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

commit 251e20a
Author: sivagirish81 <[email protected]>
Date:   Thu Aug 21 19:08:02 2025 -0700

    TaskQueue Fairness Rate Limit (#8135)

    ## What changed?
    - Move the rate limit logic for fairness from priMatcher to
    taskQueuePartitionManagerLevel
    - Attach the fairness queue rate limit and the per-key rate limit to the
    simple rate limiter implementation.

    ## Why?
    - Implementation of UpdateTaskQueueConfig api for fairness tasks.

    ## How did you test it?
    - [x] built
    - [x] run locally and tested manually
    - [x] covered by existing tests
    - [x] added new unit test(s)
    - [x] added new functional test(s)

    ## Potential risks
    N/A

commit 16f7688
Author: Will Duan <[email protected]>
Date:   Fri Aug 22 07:16:52 2025 +0800

    Add log for slow replication tasks (#8225)

    ## What changed?
    Log replication task details when processing takes too long

    ## Why?
    For operator investigation

    ## How did you test it?
    - [x] built
    - [x] run locally and tested manually
    - [x] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks
    no risk.

commit 934c58d
Author: Will Duan <[email protected]>
Date:   Fri Aug 22 06:51:57 2025 +0800

    Fix VerifyVersionedTransition Task (#8227)

    ## What changed?
    Fix VerifyVersionedTransition Task

    ## Why?
    Without fix, there is risk of success the task without verifying.

    ## How did you test it?
    - [x] built
    - [x] run locally and tested manually
    - [x] covered by existing tests
    - [x] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks
    no risk.

commit f454f2f
Author: Yichao Yang <[email protected]>
Date:   Thu Aug 21 14:32:35 2025 -0700

    Revert history task processing timeout change (#8228)

    ## What changed?
    - Revert history task timeout from 10s to 3s for non-outbound tasks.

    ## Why?
    - The original change was made due to a misunderstanding of my comment
    [here](#7951 (comment)).
    I was meant to suggest only use 10s as the timeout for outbound chasm tasks.
    But for other tasks, like transfer/timer, it should still use 3s as the
    timeout.

    ## How did you test it?
    - [x] built
    - [ ] run locally and tested manually
    - [x] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

commit 14d52ce
Author: Roman Dmytrenko <[email protected]>
Date:   Thu Aug 21 15:16:29 2025 +0000

    ci: bump golangci-lint from v1.64.8 to v2.4.0 (#8174)

    ## What changed?

    Upgrade golangci-lint to v2

    ## How did you test it?

    - [x] run locally and tested manually

    ---------

    Signed-off-by: Roman Dmytrenko <[email protected]>

commit fb863dc
Author: Stephan Behnke <[email protected]>
Date:   Wed Aug 20 18:19:02 2025 -0700

    Explain test build tags and env variables (#7991)

    ## What changed?

    Added documentation for test-related build tags and env variables.

    ## Why?

    So help developers with their test setup.

commit 17c4c07
Author: pdoerner <[email protected]>
Date:   Wed Aug 20 16:05:45 2025 -0700

    Add dynamic config for forwarded Nexus request dispatch type (#8224)

    ## What changed?
    Added a new dynamic config to control whether forwarded Nexus HTTP
    requests should use the same dispatch type as the original request or
    always use dispatch by namespace + task queue.

    ## Why?
    Endpoints do not support replication, so forwarding by endpoint will not
    work out of the box because the two clusters will have a different ID
    for the endpoint.

commit 1d340f5
Author: pdoerner <[email protected]>
Date:   Wed Aug 20 15:33:30 2025 -0700

    Pass through original HTTP headers for forwarded Nexus requests (#8204)

    ## What changed?
    When forwarding Nexus Start/Cancel requests, the original HTTP headers
    will be passed through without sanitization.

    ## Why?
    Some headers that are still needed for the forwarded request may be
    sanitized during original request processing (e.g. authorization
    information headers).

    ## How did you test it?
    existing tests

commit 3b1b8d0
Author: Roey Berman <[email protected]>
Date:   Wed Aug 20 11:07:49 2025 -0600

    Upgrade Go SDK to 1.35.0 (#8216)

    Had to change WorkerDeploymentOptions and VersioningOverride to work
    with new SDK.

    Changed the worker setup for the versioning internal workflow replay
    tests, so I generated a new set of workflow histories to test with.
    Normally we generate new workflow histories only when we've made a
    change to the workflow definitions that we want to test (ie. a new
    patch), but since the operations to generate the workflow histories had
    to change slightly, I think it makes sense to generate a fresh set of
    histories to replay-test with next time there is a change.

    ---------

    Co-authored-by: Carly de Frondeville <[email protected]>

commit 4d1212a
Author: Stephan Behnke <[email protected]>
Date:   Tue Aug 19 11:52:55 2025 -0700

    Fix typo in unprocessedUpdateFailure (#8212)

    WISOTT

commit f38c88a
Author: David Reiss <[email protected]>
Date:   Tue Aug 19 06:51:19 2025 -0700

    Allow more retries for matching client polls (#8155)

    ## What changed?
    Allow frontend->matching poll requests to retry up to their context
    timeout instead of just once.

    ## Why?
    On matching service deployments, a busy new matching node may hit its
    persistence rps limit trying to acquire new task queues and be unable to
    accept polls.

    ## How did you test it?
    - [x] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

commit fdb7f31
Author: Yu Xia <[email protected]>
Date:   Mon Aug 18 16:24:42 2025 -0700

    Change sys background low to use the correct level (#8208)

    ## What changed?
    Change sys background low to use the correct level

    ## Why?
    Fix this based on the variable name

    ## How did you test it?
    - [x] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

commit e4b378f
Author: David Reiss <[email protected]>
Date:   Mon Aug 18 15:34:34 2025 -0700

    Support subscriptions to settings with constrained defaults (#8180)

    ## What changed?
    Fill in support for subscriptions to dynamic config values with
    constrained defaults.

    ## Why?
    We'd like to use this combination of functionality.

    ## How did you test it?
    - [x] added new unit test(s)

commit 8ed0361
Author: David Reiss <[email protected]>
Date:   Mon Aug 18 15:29:17 2025 -0700

    Allow empty data in DataBlob (#8181)

    ## What changed?
    Remove check for zero-length data in NewDataBlob.

    ## Why?
    Zero-length data is a valid encoding for some encodings, e.g. proto3.
    NewDataBlob should not have an opinion on the length of data.

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks
    Some code may be making assumptions about this behavior.

commit 40ac028
Author: David Reiss <[email protected]>
Date:   Mon Aug 18 12:41:41 2025 -0700

    Warn on dynamic config default values with shared structure (#8176)

    ## What changed?
    Log softassert warnings if dynamic config settings are registered with
    default values with shared structure.

    ## Why?
    This is very likely unintended and may lead to unexpected behavior of
    settings (values will be parsed on top of a copy of the default).

    ## How did you test it?
    - [x] run locally and tested manually
    - [x] added new unit test(s)

commit 9c75cd6
Author: pdoerner <[email protected]>
Date:   Fri Aug 15 12:43:47 2025 -0700

    Forward Nexus requests using same dispatch type as original request (#8199)

    ## What changed?
    When forwarding Nexus requests that were originally sent to the
    `DispatchByEndpoint` URL, the forwarding URL will also be constructed to
    send the request to the `DispatchByEndpoint` URL on the remote cluster.
    Previously, we were always sending forwarding requests using
    `DispatchByNamespaceAndTaskQueue`

    ## Why?
    bug fix

    ## How did you test it?
    existing tests

commit 21f556c
Author: Roey Berman <[email protected]>
Date:   Fri Aug 15 13:13:28 2025 -0600

    Commit generated scheduler protos (#8200)

    ## What

    - Commit generated scheduler protos.
    - Improve `make ensure-no-changes` to detect untracked files.

    ## Why?

    The protos were not generated since the tool was committed in a separate
    PR from where the protos were added.

commit 4c59cd1
Author: Roey Berman <[email protected]>
Date:   Fri Aug 15 12:09:58 2025 -0600

    Add support for protos in chasm libs (#8182)

    ## What changed?

    Added support for defining protos in chasm libs.

    ## Why?

    Keep everything local to the library.

    ## How did you test it?
    - [x] built
    - [x] run locally and tested manually

commit 08e2dfd
Author: pdoerner <[email protected]>
Date:   Thu Aug 14 16:49:44 2025 -0700

    Reconstruct failure for forwarded Nexus completion requests (#8198)

    ## What changed?
    When forwarding a `CompleteNexusOperation` HTTP request that contains a
    failure, the completion will be reconstructed instead of reusing the
    original request body.

    ## Why?
    The Nexus SDK reads and closes the HTTP request body when the operation
    state is `failed` or `canceled` so we cannot reuse it for the forwarded
    request. For `successful` operations, the SDK just passes on the result
    content in the form of a `nexus.LazyValue` which we can forward directly
    since it is not read or closed.

    ## How did you test it?
    new functional xdc tests

commit 9d82cae
Author: Lina Jodoin <[email protected]>
Date:   Thu Aug 14 16:06:01 2025 -0700

    Fix BufferedStart reference in chasm scheduler proto (#8197)

    ## What changed?
    _Describe what has changed in this PR._

    ## Why?
    _Tell your future self why have you made these changes._

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks
    _Any change is risky. Identify all risks you are aware of. If none,
    remove this section._

commit 41cce70
Author: Vladyslav Simonenko <[email protected]>
Date:   Wed Aug 13 15:34:38 2025 -0700

    Produce workflow_duration metric on completion (#8185)

    ## What changed?
    This PR produces the metric workflow_duration, when the workflow
    execution completes.

    ## Why?
    Currently there is no metric that captures the duration of the workflow
    execution. It's also valuable to have the duration broken down by task
    queue, namespace, workflow type, which this PR enables

    ## How did you test it?
    - [X] built
    - [X] run locally and tested manually
    - [X] covered by existing tests
    - [X] added new unit test(s)
    - [ ] added new functional test(s)

commit a1df862
Author: Lina Jodoin <[email protected]>
Date:   Wed Aug 13 15:30:48 2025 -0700

    [CHASM Scheduler] Move scheduler protobufs to scheduler/proto package (#8189)

    ## What changed?
    - CHASM scheduler protos are moved to live alongside the scheduler
    implementation code, within the `chasm` package.

    ## Why?
    - See #8182. Sending this PR in advance, as that PR asserts protobufs
    were generated.

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [x] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

commit 8fe5cee
Author: Sean Kane <[email protected]>
Date:   Thu Aug 14 00:10:06 2025 +0200

    improvement: remove waits before fetching activities (#8144)

    ## What changed?
    optimize the batch operation processing in `BatchActivity` and
    `BatchActivityWithProtobuf` by removing the need to wait for entire
    pages to complete before fetching the next page.

    - Implemented proactive page fetching once a worker becomes available
    - common `processWorkflowsWithProactiveFetching` function to reduce code
    duplication

    ## Why?
    The previous implementation had workers wait for entire pages to
    complete. This optimization improves resource utilization. The
    refactoring also eliminates duplicated functions in the `BatchParams`
    struct and `BatchOperation` protobuf.

    Addresses issue #8098.

    ## How did you test it?
    - [x] built
    - [x] run locally and tested manually
    - [x] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    The changes maintain backward compatibility.

    ## Potential risks
    While this change improves performance, it does modify the concurrency
    model of batch processing:

    1. **Timing changes**: The optimization changes when pages are fetched
    relative to task completion, which could expose edge cases in error
    handling or heartbeat timing
    2. **Memory usage**: Pages may be fetched earlier, potentially
    increasing peak memory usage if the next page is large
    3. **Rate limiting interaction**: The more aggressive task scheduling
    could interact differently with rate limiting, though the same
    per-worker limits are maintained
    4. **Heartbeat behavior**: heartbeats track the progress of an entire
    page and are applied after an entire page finishes

    The changes preserve all existing error handling, retry logic, and rate
    limiting behavior, but the different execution timing could surface
    previously hidden race conditions.

    ---------

    Co-authored-by: Roey Berman <[email protected]>

commit 469526e
Author: pdoerner <[email protected]>
Date:   Wed Aug 13 09:58:03 2025 -0700

    Change default for `component.nexusoperations.recordCancelRequestCompletionEvents` (#8191)

    ## What changed?
    Changed default for
    `component.nexusoperations.recordCancelRequestCompletionEvents` to
    `true`

    ## Why?
    Flag was added to ensure backwards compatibility. Now that 1.28 is
    released, can change the default. Flag will be removed after 1.29 is
    released.

commit 69e6b6c
Author: Rodrigo Zhou <[email protected]>
Date:   Tue Aug 12 11:31:27 2025 -0700

    Bump Temporal API to v1.52.0 (#8187)

    ## What changed?
    Bump Temporal API to v1.52.0

    ## Why?
    Bump Temporal API to v1.52.0

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks

commit da90f62
Author: Stephan Behnke <[email protected]>
Date:   Fri Aug 8 14:43:15 2025 -0700

    Decode of nil data (#8179)

    ## What changed?

    Don't catch `Data: nil` in test; let it fall through to decoder. The
    decoder will return an error. An error is the better choice than a `nil`
    response since that signals to the user that the decoded data is
    usable/valid.

    ## Why?

    Follow-up to #8111; an
    internal test expects an error instead of `nil`.

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [x] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks

    Hard to believe that returning nil and using un-decoded data is a
    good/valid alternative.

commit b8497fa
Author: David Reiss <[email protected]>
Date:   Fri Aug 8 08:32:58 2025 -0700

    Dynamic config conversion improvements (#7052)

    ## What changed?
    - Split implementation of "constrained default" settings from "plain
    default" settings. This is more code and the diff looks complex, but the
    individual paths are both simpler than the mixed version.
    - Add conversion cache using a weak map.
    - Remove GlobalCachedTypedValue.
    - Use "raw" values for subscription dispatch deduping to avoid
    unnecessary conversions.
    - Deep copy default values when using mapstructure, to avoid problems
    with merging over shared default values.

    ## Why?
    - Fixes #6756
    - Performance improvement for "plain default" settings (almost all of
    them)
    - Performance improvement for settings with complex converters
    - Remove footgun in defaults that aren't scalar values

    ## How did you test it?
    existing+new unit tests

commit f9bd083
Author: Lina Jodoin <[email protected]>
Date:   Thu Aug 7 17:11:53 2025 -0700

    [Scheduled Actions] Update Scheduler protos for CHASM (#8163)

    ## What changed?
    - Added protos for the new Scheduler task types.
    - Added TODOs for cleanup when the HSM component is removed.

    ## Why?
    - A few fields and messages were made obsolete with the CHASM port.

commit f8b97e5
Author: Vladyslav Simonenko <[email protected]>
Date:   Thu Aug 7 16:24:54 2025 -0700

    Break out of pagination in scavenger on errors (#8133)

    ## What changed?
    Break out of the loop, when iteration through mutable states fails

    ## Why?
    Previously, we continued to iterate, leading to the panic:
    #8037

    ## How did you test it?
    - [X] run locally and tested manually
    - [X] added new unit test(s)
lina-temporal added a commit that referenced this pull request Aug 28, 2025
commit 9f30a1d
Author: Lina Jodoin <[email protected]>
Date:   Wed Aug 27 16:08:12 2025 -0700

    [Scheduled Actions] Use the Execution returned from PollMutableState when calling GetWorkflowExecutionHistory (#8207)

    ## What changed?
    - In WatchWorkflow, we'll now use the Execution returned as a result
    from `PollMutableState`, instead of the Execution we used as part of the
    `PollMutableState` request.

    ## Why?
    - We have a likely race condition if a workflow starts and completes
    during the `PollMutableState` call, where our originally-requested
    Execution is no longer the latest.

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

commit 8b717ec
Author: Roman Dmytrenko <[email protected]>
Date:   Wed Aug 27 19:34:05 2025 +0000

    chore(deps): upgrade go from 1.24.5 to 1.25.0 (#8209)

    ## What changed?

    Upgrade go to the 1.25.0

    ## How did you test it?

    - [x] built
    - [x] run locally and tested manually

    ~Blocked by #8174~

    ---------

    Signed-off-by: Roman Dmytrenko <[email protected]>
    Co-authored-by: Stephan Behnke <[email protected]>
    Co-authored-by: Stephan Behnke <[email protected]>

commit 3f18d90
Author: Stephan Behnke <[email protected]>
Date:   Tue Aug 26 18:28:26 2025 -0700

    GetWorkflowExecutionHistory long poll soft timeout (#8238)

    ## What changed?

    Added a "soft timeout" (language used by Workflow Update) to
    `GetWorkflowExecutionHistory` long polls.

    ## Why?

    We don't want to terminate the long poll connection but instead keep it
    alive by sending a response back just before the timeout. The idea is
    that this will prevent connections from opening/terminating repeatedly
    (ie connection churn).

    ## How did you test it?
    - [ ] built
    - [x] run locally and tested manually
    - [x] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks

    I was only able to verify this manually by forcing a timeout in the
    server and verifying that instead of a deadline exceeded I saw a result.
    I'll assume this will work since the existing code already tried doing
    just exactly that, but it didn't do it well.

commit 64884b1
Author: Sean Kane <[email protected]>
Date:   Tue Aug 26 14:26:02 2025 -0600

    fix: handle nil ptr in legacy batch processing (#8244)

    ## What changed?
    `BatchWorkflow` is not yet deprecated, fix was not properly applied on
    previous PR

    ## Why?
    nilptr exception

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [X] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks
    na

commit 96c3aae
Author: Kent Gruber <[email protected]>
Date:   Tue Aug 26 13:23:40 2025 -0400

    Use better string splitting techniques where possible (#8226)

    ## What changed?

    This PR aims to avoid usage of
    [`strings.Split`](https://pkg.go.dev/strings#Split) where possible in
    favor of better string splitting techniques, speficially:
    [`strings.SplitN`](https://pkg.go.dev/strings#SplitN) and
    [`strings.SplitSeq`](https://pkg.go.dev/strings#SplitSeq) where
    appropriate.

    There was also a [`strings.Fields`](https://pkg.go.dev/strings#Fields)
    change I made to use
    [`strings.FieldsSeq`](https://pkg.go.dev/strings#FieldsSeq) instead, and
    another for S3 to use the [`path`](https://pkg.go.dev/path) package
    instead of [`strings.Split`](https://pkg.go.dev/strings#Split).

    ## Why?

    [`strings.SplitN`](https://pkg.go.dev/strings#SplitN) and
    [`strings.SplitSeq`](https://pkg.go.dev/strings#SplitSeq) are often
    better options in many cases, and can be _partially_ detected using
    [`modernize`](https://pkg.go.dev/golang.org/x/tools/gopls/internal/analysis/modernize):
    > `stringsseq`: replace Split in "for range strings.Split(...)" by
    go1.24's more efficient `SplitSeq`, or `Fields` with `FieldSeq`.

    ## How did you test it?
    - [X] built
    - [X] run locally and tested manually
    - [x] covered by existing tests
    - [x] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks

    There are lots of potentially subtle behaviors from the `strings.Split`
    (and `strings.Fields`) usage that should be accounted for. If our
    existing tests don't cover those subtleties, there's risk for
    introducing an unintended bug. More intricate handling/parsing
    previously using the `strings` package should get extra attention from
    reviewers. I've attempted to break up my changes into logical commit
    chunks to aid in review / help spot potentially concerning changes.

commit ebcc3fd
Author: Sean Kane <[email protected]>
Date:   Tue Aug 26 09:15:23 2025 -0600

    fix: handle nil ptr in batch processing (#8240)

    ## What changed?
    Batch workflows were panicking because executions can be nil, but there
    is no check to prevent nil pointer exception.

    ## Why?
    Prevent nil-ptrs

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [X] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks
    NA

commit dfa8b3e
Author: Yu Xia <[email protected]>
Date:   Mon Aug 25 23:40:08 2025 -0700

    Adding minimum timeout require in system workflow (#8231)

    ## What changed?
    Adding minimum timeout require in system workflow

    ## Why?
    The system workflow needs to have sufficient time to execute the defer
    logic.

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

commit 8f1ad7c
Author: Yu Xia <[email protected]>
Date:   Mon Aug 25 15:52:12 2025 -0700

    Wire up api health monitor component (#8217)

    ## What changed?
    Wire up api health monitor component

    ## Why?
    The health monitor component did not wire correctly in fx

    ## How did you test it?
    - [x] built
    - [x] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

commit 45da266
Author: pdoerner <[email protected]>
Date:   Mon Aug 25 11:40:10 2025 -0700

    Remove dynamic config warnings for shared structures (#8236)

    ## What changed?
    Removed warning logs for shared dynamic config structures

    ## Why?
    Was failing integration tests

commit 2d74130
Author: Hai Zhao <[email protected]>
Date:   Mon Aug 25 09:12:03 2025 -0700

    Add replication state to response of DescribeNamespace/ListNamespaces/UpdateNamespace (#8234)

    ## What changed?
    Add replication state to response of
    DescribeNamespace/ListNamespaces/UpdateNamespace.

    ## Why?
    We want to check replication state quickly from cli.

    ## How did you test it?
    - [x] built
    - [x] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks

commit 54893bc
Author: David Reiss <[email protected]>
Date:   Fri Aug 22 17:04:29 2025 -0700

    Only force-load child partitions after successful initialization (#8230)

    ## What changed?
    The force-load child partitions mechanism should only happen after
    successful initialization of the root.

    ## Why?
    If the root fails to load, things can get stuck in a loop where the root
    loads the children and the children cause the root to be loaded again
    (from userdata polling).

    ## How did you test it?
    - [x] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

commit 251e20a
Author: sivagirish81 <[email protected]>
Date:   Thu Aug 21 19:08:02 2025 -0700

    TaskQueue Fairness Rate Limit (#8135)

    ## What changed?
    - Move the rate limit logic for fairness from priMatcher to
    taskQueuePartitionManagerLevel
    - Attach the fairness queue rate limit and the per-key rate limit to the
    simple rate limiter implementation.

    ## Why?
    - Implementation of UpdateTaskQueueConfig api for fairness tasks.

    ## How did you test it?
    - [x] built
    - [x] run locally and tested manually
    - [x] covered by existing tests
    - [x] added new unit test(s)
    - [x] added new functional test(s)

    ## Potential risks
    N/A

commit 16f7688
Author: Will Duan <[email protected]>
Date:   Fri Aug 22 07:16:52 2025 +0800

    Add log for slow replication tasks (#8225)

    ## What changed?
    Log replication task details when processing takes too long

    ## Why?
    For operator investigation

    ## How did you test it?
    - [x] built
    - [x] run locally and tested manually
    - [x] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks
    no risk.

commit 934c58d
Author: Will Duan <[email protected]>
Date:   Fri Aug 22 06:51:57 2025 +0800

    Fix VerifyVersionedTransition Task (#8227)

    ## What changed?
    Fix VerifyVersionedTransition Task

    ## Why?
    Without fix, there is risk of success the task without verifying.

    ## How did you test it?
    - [x] built
    - [x] run locally and tested manually
    - [x] covered by existing tests
    - [x] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks
    no risk.

commit f454f2f
Author: Yichao Yang <[email protected]>
Date:   Thu Aug 21 14:32:35 2025 -0700

    Revert history task processing timeout change (#8228)

    ## What changed?
    - Revert history task timeout from 10s to 3s for non-outbound tasks.

    ## Why?
    - The original change was made due to a misunderstanding of my comment
    [here](#7951 (comment)).
    I was meant to suggest only use 10s as the timeout for outbound chasm tasks.
    But for other tasks, like transfer/timer, it should still use 3s as the
    timeout.

    ## How did you test it?
    - [x] built
    - [ ] run locally and tested manually
    - [x] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

commit 14d52ce
Author: Roman Dmytrenko <[email protected]>
Date:   Thu Aug 21 15:16:29 2025 +0000

    ci: bump golangci-lint from v1.64.8 to v2.4.0 (#8174)

    ## What changed?

    Upgrade golangci-lint to v2

    ## How did you test it?

    - [x] run locally and tested manually

    ---------

    Signed-off-by: Roman Dmytrenko <[email protected]>

commit fb863dc
Author: Stephan Behnke <[email protected]>
Date:   Wed Aug 20 18:19:02 2025 -0700

    Explain test build tags and env variables (#7991)

    ## What changed?

    Added documentation for test-related build tags and env variables.

    ## Why?

    So help developers with their test setup.

commit 17c4c07
Author: pdoerner <[email protected]>
Date:   Wed Aug 20 16:05:45 2025 -0700

    Add dynamic config for forwarded Nexus request dispatch type (#8224)

    ## What changed?
    Added a new dynamic config to control whether forwarded Nexus HTTP
    requests should use the same dispatch type as the original request or
    always use dispatch by namespace + task queue.

    ## Why?
    Endpoints do not support replication, so forwarding by endpoint will not
    work out of the box because the two clusters will have a different ID
    for the endpoint.

commit 1d340f5
Author: pdoerner <[email protected]>
Date:   Wed Aug 20 15:33:30 2025 -0700

    Pass through original HTTP headers for forwarded Nexus requests (#8204)

    ## What changed?
    When forwarding Nexus Start/Cancel requests, the original HTTP headers
    will be passed through without sanitization.

    ## Why?
    Some headers that are still needed for the forwarded request may be
    sanitized during original request processing (e.g. authorization
    information headers).

    ## How did you test it?
    existing tests

commit 3b1b8d0
Author: Roey Berman <[email protected]>
Date:   Wed Aug 20 11:07:49 2025 -0600

    Upgrade Go SDK to 1.35.0 (#8216)

    Had to change WorkerDeploymentOptions and VersioningOverride to work
    with new SDK.

    Changed the worker setup for the versioning internal workflow replay
    tests, so I generated a new set of workflow histories to test with.
    Normally we generate new workflow histories only when we've made a
    change to the workflow definitions that we want to test (ie. a new
    patch), but since the operations to generate the workflow histories had
    to change slightly, I think it makes sense to generate a fresh set of
    histories to replay-test with next time there is a change.

    ---------

    Co-authored-by: Carly de Frondeville <[email protected]>

commit 4d1212a
Author: Stephan Behnke <[email protected]>
Date:   Tue Aug 19 11:52:55 2025 -0700

    Fix typo in unprocessedUpdateFailure (#8212)

    WISOTT

commit f38c88a
Author: David Reiss <[email protected]>
Date:   Tue Aug 19 06:51:19 2025 -0700

    Allow more retries for matching client polls (#8155)

    ## What changed?
    Allow frontend->matching poll requests to retry up to their context
    timeout instead of just once.

    ## Why?
    On matching service deployments, a busy new matching node may hit its
    persistence rps limit trying to acquire new task queues and be unable to
    accept polls.

    ## How did you test it?
    - [x] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

commit fdb7f31
Author: Yu Xia <[email protected]>
Date:   Mon Aug 18 16:24:42 2025 -0700

    Change sys background low to use the correct level (#8208)

    ## What changed?
    Change sys background low to use the correct level

    ## Why?
    Fix this based on the variable name

    ## How did you test it?
    - [x] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

commit e4b378f
Author: David Reiss <[email protected]>
Date:   Mon Aug 18 15:34:34 2025 -0700

    Support subscriptions to settings with constrained defaults (#8180)

    ## What changed?
    Fill in support for subscriptions to dynamic config values with
    constrained defaults.

    ## Why?
    We'd like to use this combination of functionality.

    ## How did you test it?
    - [x] added new unit test(s)

commit 8ed0361
Author: David Reiss <[email protected]>
Date:   Mon Aug 18 15:29:17 2025 -0700

    Allow empty data in DataBlob (#8181)

    ## What changed?
    Remove check for zero-length data in NewDataBlob.

    ## Why?
    Zero-length data is a valid encoding for some encodings, e.g. proto3.
    NewDataBlob should not have an opinion on the length of data.

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks
    Some code may be making assumptions about this behavior.

commit 40ac028
Author: David Reiss <[email protected]>
Date:   Mon Aug 18 12:41:41 2025 -0700

    Warn on dynamic config default values with shared structure (#8176)

    ## What changed?
    Log softassert warnings if dynamic config settings are registered with
    default values with shared structure.

    ## Why?
    This is very likely unintended and may lead to unexpected behavior of
    settings (values will be parsed on top of a copy of the default).

    ## How did you test it?
    - [x] run locally and tested manually
    - [x] added new unit test(s)

commit 9c75cd6
Author: pdoerner <[email protected]>
Date:   Fri Aug 15 12:43:47 2025 -0700

    Forward Nexus requests using same dispatch type as original request (#8199)

    ## What changed?
    When forwarding Nexus requests that were originally sent to the
    `DispatchByEndpoint` URL, the forwarding URL will also be constructed to
    send the request to the `DispatchByEndpoint` URL on the remote cluster.
    Previously, we were always sending forwarding requests using
    `DispatchByNamespaceAndTaskQueue`

    ## Why?
    bug fix

    ## How did you test it?
    existing tests

commit 21f556c
Author: Roey Berman <[email protected]>
Date:   Fri Aug 15 13:13:28 2025 -0600

    Commit generated scheduler protos (#8200)

    ## What

    - Commit generated scheduler protos.
    - Improve `make ensure-no-changes` to detect untracked files.

    ## Why?

    The protos were not generated since the tool was committed in a separate
    PR from where the protos were added.

commit 4c59cd1
Author: Roey Berman <[email protected]>
Date:   Fri Aug 15 12:09:58 2025 -0600

    Add support for protos in chasm libs (#8182)

    ## What changed?

    Added support for defining protos in chasm libs.

    ## Why?

    Keep everything local to the library.

    ## How did you test it?
    - [x] built
    - [x] run locally and tested manually

commit 08e2dfd
Author: pdoerner <[email protected]>
Date:   Thu Aug 14 16:49:44 2025 -0700

    Reconstruct failure for forwarded Nexus completion requests (#8198)

    ## What changed?
    When forwarding a `CompleteNexusOperation` HTTP request that contains a
    failure, the completion will be reconstructed instead of reusing the
    original request body.

    ## Why?
    The Nexus SDK reads and closes the HTTP request body when the operation
    state is `failed` or `canceled` so we cannot reuse it for the forwarded
    request. For `successful` operations, the SDK just passes on the result
    content in the form of a `nexus.LazyValue` which we can forward directly
    since it is not read or closed.

    ## How did you test it?
    new functional xdc tests

commit 9d82cae
Author: Lina Jodoin <[email protected]>
Date:   Thu Aug 14 16:06:01 2025 -0700

    Fix BufferedStart reference in chasm scheduler proto (#8197)

    ## What changed?
    _Describe what has changed in this PR._

    ## Why?
    _Tell your future self why have you made these changes._

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks
    _Any change is risky. Identify all risks you are aware of. If none,
    remove this section._

commit 41cce70
Author: Vladyslav Simonenko <[email protected]>
Date:   Wed Aug 13 15:34:38 2025 -0700

    Produce workflow_duration metric on completion (#8185)

    ## What changed?
    This PR produces the metric workflow_duration, when the workflow
    execution completes.

    ## Why?
    Currently there is no metric that captures the duration of the workflow
    execution. It's also valuable to have the duration broken down by task
    queue, namespace, workflow type, which this PR enables

    ## How did you test it?
    - [X] built
    - [X] run locally and tested manually
    - [X] covered by existing tests
    - [X] added new unit test(s)
    - [ ] added new functional test(s)

commit a1df862
Author: Lina Jodoin <[email protected]>
Date:   Wed Aug 13 15:30:48 2025 -0700

    [CHASM Scheduler] Move scheduler protobufs to scheduler/proto package (#8189)

    ## What changed?
    - CHASM scheduler protos are moved to live alongside the scheduler
    implementation code, within the `chasm` package.

    ## Why?
    - See #8182. Sending this PR in advance, as that PR asserts protobufs
    were generated.

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [x] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

commit 8fe5cee
Author: Sean Kane <[email protected]>
Date:   Thu Aug 14 00:10:06 2025 +0200

    improvement: remove waits before fetching activities (#8144)

    ## What changed?
    optimize the batch operation processing in `BatchActivity` and
    `BatchActivityWithProtobuf` by removing the need to wait for entire
    pages to complete before fetching the next page.

    - Implemented proactive page fetching once a worker becomes available
    - common `processWorkflowsWithProactiveFetching` function to reduce code
    duplication

    ## Why?
    The previous implementation had workers wait for entire pages to
    complete. This optimization improves resource utilization. The
    refactoring also eliminates duplicated functions in the `BatchParams`
    struct and `BatchOperation` protobuf.

    Addresses issue #8098.

    ## How did you test it?
    - [x] built
    - [x] run locally and tested manually
    - [x] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    The changes maintain backward compatibility.

    ## Potential risks
    While this change improves performance, it does modify the concurrency
    model of batch processing:

    1. **Timing changes**: The optimization changes when pages are fetched
    relative to task completion, which could expose edge cases in error
    handling or heartbeat timing
    2. **Memory usage**: Pages may be fetched earlier, potentially
    increasing peak memory usage if the next page is large
    3. **Rate limiting interaction**: The more aggressive task scheduling
    could interact differently with rate limiting, though the same
    per-worker limits are maintained
    4. **Heartbeat behavior**: heartbeats track the progress of an entire
    page and are applied after an entire page finishes

    The changes preserve all existing error handling, retry logic, and rate
    limiting behavior, but the different execution timing could surface
    previously hidden race conditions.

    ---------

    Co-authored-by: Roey Berman <[email protected]>

commit 469526e
Author: pdoerner <[email protected]>
Date:   Wed Aug 13 09:58:03 2025 -0700

    Change default for `component.nexusoperations.recordCancelRequestCompletionEvents` (#8191)

    ## What changed?
    Changed default for
    `component.nexusoperations.recordCancelRequestCompletionEvents` to
    `true`

    ## Why?
    Flag was added to ensure backwards compatibility. Now that 1.28 is
    released, can change the default. Flag will be removed after 1.29 is
    released.

commit 69e6b6c
Author: Rodrigo Zhou <[email protected]>
Date:   Tue Aug 12 11:31:27 2025 -0700

    Bump Temporal API to v1.52.0 (#8187)

    ## What changed?
    Bump Temporal API to v1.52.0

    ## Why?
    Bump Temporal API to v1.52.0

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks

commit da90f62
Author: Stephan Behnke <[email protected]>
Date:   Fri Aug 8 14:43:15 2025 -0700

    Decode of nil data (#8179)

    ## What changed?

    Don't catch `Data: nil` in test; let it fall through to decoder. The
    decoder will return an error. An error is the better choice than a `nil`
    response since that signals to the user that the decoded data is
    usable/valid.

    ## Why?

    Follow-up to #8111; an
    internal test expects an error instead of `nil`.

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [x] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks

    Hard to believe that returning nil and using un-decoded data is a
    good/valid alternative.

commit b8497fa
Author: David Reiss <[email protected]>
Date:   Fri Aug 8 08:32:58 2025 -0700

    Dynamic config conversion improvements (#7052)

    ## What changed?
    - Split implementation of "constrained default" settings from "plain
    default" settings. This is more code and the diff looks complex, but the
    individual paths are both simpler than the mixed version.
    - Add conversion cache using a weak map.
    - Remove GlobalCachedTypedValue.
    - Use "raw" values for subscription dispatch deduping to avoid
    unnecessary conversions.
    - Deep copy default values when using mapstructure, to avoid problems
    with merging over shared default values.

    ## Why?
    - Fixes #6756
    - Performance improvement for "plain default" settings (almost all of
    them)
    - Performance improvement for settings with complex converters
    - Remove footgun in defaults that aren't scalar values

    ## How did you test it?
    existing+new unit tests

commit f9bd083
Author: Lina Jodoin <[email protected]>
Date:   Thu Aug 7 17:11:53 2025 -0700

    [Scheduled Actions] Update Scheduler protos for CHASM (#8163)

    ## What changed?
    - Added protos for the new Scheduler task types.
    - Added TODOs for cleanup when the HSM component is removed.

    ## Why?
    - A few fields and messages were made obsolete with the CHASM port.

commit f8b97e5
Author: Vladyslav Simonenko <[email protected]>
Date:   Thu Aug 7 16:24:54 2025 -0700

    Break out of pagination in scavenger on errors (#8133)

    ## What changed?
    Break out of the loop, when iteration through mutable states fails

    ## Why?
    Previously, we continued to iterate, leading to the panic:
    #8037

    ## How did you test it?
    - [X] run locally and tested manually
    - [X] added new unit test(s)
lina-temporal added a commit that referenced this pull request Aug 28, 2025
commit 9f30a1d
Author: Lina Jodoin <[email protected]>
Date:   Wed Aug 27 16:08:12 2025 -0700

    [Scheduled Actions] Use the Execution returned from PollMutableState when calling GetWorkflowExecutionHistory (#8207)

    ## What changed?
    - In WatchWorkflow, we'll now use the Execution returned as a result
    from `PollMutableState`, instead of the Execution we used as part of the
    `PollMutableState` request.

    ## Why?
    - We have a likely race condition if a workflow starts and completes
    during the `PollMutableState` call, where our originally-requested
    Execution is no longer the latest.

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

commit 8b717ec
Author: Roman Dmytrenko <[email protected]>
Date:   Wed Aug 27 19:34:05 2025 +0000

    chore(deps): upgrade go from 1.24.5 to 1.25.0 (#8209)

    ## What changed?

    Upgrade go to the 1.25.0

    ## How did you test it?

    - [x] built
    - [x] run locally and tested manually

    ~Blocked by #8174~

    ---------

    Signed-off-by: Roman Dmytrenko <[email protected]>
    Co-authored-by: Stephan Behnke <[email protected]>
    Co-authored-by: Stephan Behnke <[email protected]>

commit 3f18d90
Author: Stephan Behnke <[email protected]>
Date:   Tue Aug 26 18:28:26 2025 -0700

    GetWorkflowExecutionHistory long poll soft timeout (#8238)

    ## What changed?

    Added a "soft timeout" (language used by Workflow Update) to
    `GetWorkflowExecutionHistory` long polls.

    ## Why?

    We don't want to terminate the long poll connection but instead keep it
    alive by sending a response back just before the timeout. The idea is
    that this will prevent connections from opening/terminating repeatedly
    (ie connection churn).

    ## How did you test it?
    - [ ] built
    - [x] run locally and tested manually
    - [x] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks

    I was only able to verify this manually by forcing a timeout in the
    server and verifying that instead of a deadline exceeded I saw a result.
    I'll assume this will work since the existing code already tried doing
    just exactly that, but it didn't do it well.

commit 64884b1
Author: Sean Kane <[email protected]>
Date:   Tue Aug 26 14:26:02 2025 -0600

    fix: handle nil ptr in legacy batch processing (#8244)

    ## What changed?
    `BatchWorkflow` is not yet deprecated, fix was not properly applied on
    previous PR

    ## Why?
    nilptr exception

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [X] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks
    na

commit 96c3aae
Author: Kent Gruber <[email protected]>
Date:   Tue Aug 26 13:23:40 2025 -0400

    Use better string splitting techniques where possible (#8226)

    ## What changed?

    This PR aims to avoid usage of
    [`strings.Split`](https://pkg.go.dev/strings#Split) where possible in
    favor of better string splitting techniques, speficially:
    [`strings.SplitN`](https://pkg.go.dev/strings#SplitN) and
    [`strings.SplitSeq`](https://pkg.go.dev/strings#SplitSeq) where
    appropriate.

    There was also a [`strings.Fields`](https://pkg.go.dev/strings#Fields)
    change I made to use
    [`strings.FieldsSeq`](https://pkg.go.dev/strings#FieldsSeq) instead, and
    another for S3 to use the [`path`](https://pkg.go.dev/path) package
    instead of [`strings.Split`](https://pkg.go.dev/strings#Split).

    ## Why?

    [`strings.SplitN`](https://pkg.go.dev/strings#SplitN) and
    [`strings.SplitSeq`](https://pkg.go.dev/strings#SplitSeq) are often
    better options in many cases, and can be _partially_ detected using
    [`modernize`](https://pkg.go.dev/golang.org/x/tools/gopls/internal/analysis/modernize):
    > `stringsseq`: replace Split in "for range strings.Split(...)" by
    go1.24's more efficient `SplitSeq`, or `Fields` with `FieldSeq`.

    ## How did you test it?
    - [X] built
    - [X] run locally and tested manually
    - [x] covered by existing tests
    - [x] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks

    There are lots of potentially subtle behaviors from the `strings.Split`
    (and `strings.Fields`) usage that should be accounted for. If our
    existing tests don't cover those subtleties, there's risk for
    introducing an unintended bug. More intricate handling/parsing
    previously using the `strings` package should get extra attention from
    reviewers. I've attempted to break up my changes into logical commit
    chunks to aid in review / help spot potentially concerning changes.

commit ebcc3fd
Author: Sean Kane <[email protected]>
Date:   Tue Aug 26 09:15:23 2025 -0600

    fix: handle nil ptr in batch processing (#8240)

    ## What changed?
    Batch workflows were panicking because executions can be nil, but there
    is no check to prevent nil pointer exception.

    ## Why?
    Prevent nil-ptrs

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [X] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks
    NA

commit dfa8b3e
Author: Yu Xia <[email protected]>
Date:   Mon Aug 25 23:40:08 2025 -0700

    Adding minimum timeout require in system workflow (#8231)

    ## What changed?
    Adding minimum timeout require in system workflow

    ## Why?
    The system workflow needs to have sufficient time to execute the defer
    logic.

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

commit 8f1ad7c
Author: Yu Xia <[email protected]>
Date:   Mon Aug 25 15:52:12 2025 -0700

    Wire up api health monitor component (#8217)

    ## What changed?
    Wire up api health monitor component

    ## Why?
    The health monitor component did not wire correctly in fx

    ## How did you test it?
    - [x] built
    - [x] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

commit 45da266
Author: pdoerner <[email protected]>
Date:   Mon Aug 25 11:40:10 2025 -0700

    Remove dynamic config warnings for shared structures (#8236)

    ## What changed?
    Removed warning logs for shared dynamic config structures

    ## Why?
    Was failing integration tests

commit 2d74130
Author: Hai Zhao <[email protected]>
Date:   Mon Aug 25 09:12:03 2025 -0700

    Add replication state to response of DescribeNamespace/ListNamespaces/UpdateNamespace (#8234)

    ## What changed?
    Add replication state to response of
    DescribeNamespace/ListNamespaces/UpdateNamespace.

    ## Why?
    We want to check replication state quickly from cli.

    ## How did you test it?
    - [x] built
    - [x] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks

commit 54893bc
Author: David Reiss <[email protected]>
Date:   Fri Aug 22 17:04:29 2025 -0700

    Only force-load child partitions after successful initialization (#8230)

    ## What changed?
    The force-load child partitions mechanism should only happen after
    successful initialization of the root.

    ## Why?
    If the root fails to load, things can get stuck in a loop where the root
    loads the children and the children cause the root to be loaded again
    (from userdata polling).

    ## How did you test it?
    - [x] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

commit 251e20a
Author: sivagirish81 <[email protected]>
Date:   Thu Aug 21 19:08:02 2025 -0700

    TaskQueue Fairness Rate Limit (#8135)

    ## What changed?
    - Move the rate limit logic for fairness from priMatcher to
    taskQueuePartitionManagerLevel
    - Attach the fairness queue rate limit and the per-key rate limit to the
    simple rate limiter implementation.

    ## Why?
    - Implementation of UpdateTaskQueueConfig api for fairness tasks.

    ## How did you test it?
    - [x] built
    - [x] run locally and tested manually
    - [x] covered by existing tests
    - [x] added new unit test(s)
    - [x] added new functional test(s)

    ## Potential risks
    N/A

commit 16f7688
Author: Will Duan <[email protected]>
Date:   Fri Aug 22 07:16:52 2025 +0800

    Add log for slow replication tasks (#8225)

    ## What changed?
    Log replication task details when processing takes too long

    ## Why?
    For operator investigation

    ## How did you test it?
    - [x] built
    - [x] run locally and tested manually
    - [x] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks
    no risk.

commit 934c58d
Author: Will Duan <[email protected]>
Date:   Fri Aug 22 06:51:57 2025 +0800

    Fix VerifyVersionedTransition Task (#8227)

    ## What changed?
    Fix VerifyVersionedTransition Task

    ## Why?
    Without fix, there is risk of success the task without verifying.

    ## How did you test it?
    - [x] built
    - [x] run locally and tested manually
    - [x] covered by existing tests
    - [x] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks
    no risk.

commit f454f2f
Author: Yichao Yang <[email protected]>
Date:   Thu Aug 21 14:32:35 2025 -0700

    Revert history task processing timeout change (#8228)

    ## What changed?
    - Revert history task timeout from 10s to 3s for non-outbound tasks.

    ## Why?
    - The original change was made due to a misunderstanding of my comment
    [here](#7951 (comment)).
    I was meant to suggest only use 10s as the timeout for outbound chasm tasks.
    But for other tasks, like transfer/timer, it should still use 3s as the
    timeout.

    ## How did you test it?
    - [x] built
    - [ ] run locally and tested manually
    - [x] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

commit 14d52ce
Author: Roman Dmytrenko <[email protected]>
Date:   Thu Aug 21 15:16:29 2025 +0000

    ci: bump golangci-lint from v1.64.8 to v2.4.0 (#8174)

    ## What changed?

    Upgrade golangci-lint to v2

    ## How did you test it?

    - [x] run locally and tested manually

    ---------

    Signed-off-by: Roman Dmytrenko <[email protected]>

commit fb863dc
Author: Stephan Behnke <[email protected]>
Date:   Wed Aug 20 18:19:02 2025 -0700

    Explain test build tags and env variables (#7991)

    ## What changed?

    Added documentation for test-related build tags and env variables.

    ## Why?

    So help developers with their test setup.

commit 17c4c07
Author: pdoerner <[email protected]>
Date:   Wed Aug 20 16:05:45 2025 -0700

    Add dynamic config for forwarded Nexus request dispatch type (#8224)

    ## What changed?
    Added a new dynamic config to control whether forwarded Nexus HTTP
    requests should use the same dispatch type as the original request or
    always use dispatch by namespace + task queue.

    ## Why?
    Endpoints do not support replication, so forwarding by endpoint will not
    work out of the box because the two clusters will have a different ID
    for the endpoint.

commit 1d340f5
Author: pdoerner <[email protected]>
Date:   Wed Aug 20 15:33:30 2025 -0700

    Pass through original HTTP headers for forwarded Nexus requests (#8204)

    ## What changed?
    When forwarding Nexus Start/Cancel requests, the original HTTP headers
    will be passed through without sanitization.

    ## Why?
    Some headers that are still needed for the forwarded request may be
    sanitized during original request processing (e.g. authorization
    information headers).

    ## How did you test it?
    existing tests

commit 3b1b8d0
Author: Roey Berman <[email protected]>
Date:   Wed Aug 20 11:07:49 2025 -0600

    Upgrade Go SDK to 1.35.0 (#8216)

    Had to change WorkerDeploymentOptions and VersioningOverride to work
    with new SDK.

    Changed the worker setup for the versioning internal workflow replay
    tests, so I generated a new set of workflow histories to test with.
    Normally we generate new workflow histories only when we've made a
    change to the workflow definitions that we want to test (ie. a new
    patch), but since the operations to generate the workflow histories had
    to change slightly, I think it makes sense to generate a fresh set of
    histories to replay-test with next time there is a change.

    ---------

    Co-authored-by: Carly de Frondeville <[email protected]>

commit 4d1212a
Author: Stephan Behnke <[email protected]>
Date:   Tue Aug 19 11:52:55 2025 -0700

    Fix typo in unprocessedUpdateFailure (#8212)

    WISOTT

commit f38c88a
Author: David Reiss <[email protected]>
Date:   Tue Aug 19 06:51:19 2025 -0700

    Allow more retries for matching client polls (#8155)

    ## What changed?
    Allow frontend->matching poll requests to retry up to their context
    timeout instead of just once.

    ## Why?
    On matching service deployments, a busy new matching node may hit its
    persistence rps limit trying to acquire new task queues and be unable to
    accept polls.

    ## How did you test it?
    - [x] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

commit fdb7f31
Author: Yu Xia <[email protected]>
Date:   Mon Aug 18 16:24:42 2025 -0700

    Change sys background low to use the correct level (#8208)

    ## What changed?
    Change sys background low to use the correct level

    ## Why?
    Fix this based on the variable name

    ## How did you test it?
    - [x] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

commit e4b378f
Author: David Reiss <[email protected]>
Date:   Mon Aug 18 15:34:34 2025 -0700

    Support subscriptions to settings with constrained defaults (#8180)

    ## What changed?
    Fill in support for subscriptions to dynamic config values with
    constrained defaults.

    ## Why?
    We'd like to use this combination of functionality.

    ## How did you test it?
    - [x] added new unit test(s)

commit 8ed0361
Author: David Reiss <[email protected]>
Date:   Mon Aug 18 15:29:17 2025 -0700

    Allow empty data in DataBlob (#8181)

    ## What changed?
    Remove check for zero-length data in NewDataBlob.

    ## Why?
    Zero-length data is a valid encoding for some encodings, e.g. proto3.
    NewDataBlob should not have an opinion on the length of data.

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks
    Some code may be making assumptions about this behavior.

commit 40ac028
Author: David Reiss <[email protected]>
Date:   Mon Aug 18 12:41:41 2025 -0700

    Warn on dynamic config default values with shared structure (#8176)

    ## What changed?
    Log softassert warnings if dynamic config settings are registered with
    default values with shared structure.

    ## Why?
    This is very likely unintended and may lead to unexpected behavior of
    settings (values will be parsed on top of a copy of the default).

    ## How did you test it?
    - [x] run locally and tested manually
    - [x] added new unit test(s)

commit 9c75cd6
Author: pdoerner <[email protected]>
Date:   Fri Aug 15 12:43:47 2025 -0700

    Forward Nexus requests using same dispatch type as original request (#8199)

    ## What changed?
    When forwarding Nexus requests that were originally sent to the
    `DispatchByEndpoint` URL, the forwarding URL will also be constructed to
    send the request to the `DispatchByEndpoint` URL on the remote cluster.
    Previously, we were always sending forwarding requests using
    `DispatchByNamespaceAndTaskQueue`

    ## Why?
    bug fix

    ## How did you test it?
    existing tests

commit 21f556c
Author: Roey Berman <[email protected]>
Date:   Fri Aug 15 13:13:28 2025 -0600

    Commit generated scheduler protos (#8200)

    ## What

    - Commit generated scheduler protos.
    - Improve `make ensure-no-changes` to detect untracked files.

    ## Why?

    The protos were not generated since the tool was committed in a separate
    PR from where the protos were added.

commit 4c59cd1
Author: Roey Berman <[email protected]>
Date:   Fri Aug 15 12:09:58 2025 -0600

    Add support for protos in chasm libs (#8182)

    ## What changed?

    Added support for defining protos in chasm libs.

    ## Why?

    Keep everything local to the library.

    ## How did you test it?
    - [x] built
    - [x] run locally and tested manually

commit 08e2dfd
Author: pdoerner <[email protected]>
Date:   Thu Aug 14 16:49:44 2025 -0700

    Reconstruct failure for forwarded Nexus completion requests (#8198)

    ## What changed?
    When forwarding a `CompleteNexusOperation` HTTP request that contains a
    failure, the completion will be reconstructed instead of reusing the
    original request body.

    ## Why?
    The Nexus SDK reads and closes the HTTP request body when the operation
    state is `failed` or `canceled` so we cannot reuse it for the forwarded
    request. For `successful` operations, the SDK just passes on the result
    content in the form of a `nexus.LazyValue` which we can forward directly
    since it is not read or closed.

    ## How did you test it?
    new functional xdc tests

commit 9d82cae
Author: Lina Jodoin <[email protected]>
Date:   Thu Aug 14 16:06:01 2025 -0700

    Fix BufferedStart reference in chasm scheduler proto (#8197)

    ## What changed?
    _Describe what has changed in this PR._

    ## Why?
    _Tell your future self why have you made these changes._

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks
    _Any change is risky. Identify all risks you are aware of. If none,
    remove this section._

commit 41cce70
Author: Vladyslav Simonenko <[email protected]>
Date:   Wed Aug 13 15:34:38 2025 -0700

    Produce workflow_duration metric on completion (#8185)

    ## What changed?
    This PR produces the metric workflow_duration, when the workflow
    execution completes.

    ## Why?
    Currently there is no metric that captures the duration of the workflow
    execution. It's also valuable to have the duration broken down by task
    queue, namespace, workflow type, which this PR enables

    ## How did you test it?
    - [X] built
    - [X] run locally and tested manually
    - [X] covered by existing tests
    - [X] added new unit test(s)
    - [ ] added new functional test(s)

commit a1df862
Author: Lina Jodoin <[email protected]>
Date:   Wed Aug 13 15:30:48 2025 -0700

    [CHASM Scheduler] Move scheduler protobufs to scheduler/proto package (#8189)

    ## What changed?
    - CHASM scheduler protos are moved to live alongside the scheduler
    implementation code, within the `chasm` package.

    ## Why?
    - See #8182. Sending this PR in advance, as that PR asserts protobufs
    were generated.

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [x] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

commit 8fe5cee
Author: Sean Kane <[email protected]>
Date:   Thu Aug 14 00:10:06 2025 +0200

    improvement: remove waits before fetching activities (#8144)

    ## What changed?
    optimize the batch operation processing in `BatchActivity` and
    `BatchActivityWithProtobuf` by removing the need to wait for entire
    pages to complete before fetching the next page.

    - Implemented proactive page fetching once a worker becomes available
    - common `processWorkflowsWithProactiveFetching` function to reduce code
    duplication

    ## Why?
    The previous implementation had workers wait for entire pages to
    complete. This optimization improves resource utilization. The
    refactoring also eliminates duplicated functions in the `BatchParams`
    struct and `BatchOperation` protobuf.

    Addresses issue #8098.

    ## How did you test it?
    - [x] built
    - [x] run locally and tested manually
    - [x] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    The changes maintain backward compatibility.

    ## Potential risks
    While this change improves performance, it does modify the concurrency
    model of batch processing:

    1. **Timing changes**: The optimization changes when pages are fetched
    relative to task completion, which could expose edge cases in error
    handling or heartbeat timing
    2. **Memory usage**: Pages may be fetched earlier, potentially
    increasing peak memory usage if the next page is large
    3. **Rate limiting interaction**: The more aggressive task scheduling
    could interact differently with rate limiting, though the same
    per-worker limits are maintained
    4. **Heartbeat behavior**: heartbeats track the progress of an entire
    page and are applied after an entire page finishes

    The changes preserve all existing error handling, retry logic, and rate
    limiting behavior, but the different execution timing could surface
    previously hidden race conditions.

    ---------

    Co-authored-by: Roey Berman <[email protected]>

commit 469526e
Author: pdoerner <[email protected]>
Date:   Wed Aug 13 09:58:03 2025 -0700

    Change default for `component.nexusoperations.recordCancelRequestCompletionEvents` (#8191)

    ## What changed?
    Changed default for
    `component.nexusoperations.recordCancelRequestCompletionEvents` to
    `true`

    ## Why?
    Flag was added to ensure backwards compatibility. Now that 1.28 is
    released, can change the default. Flag will be removed after 1.29 is
    released.

commit 69e6b6c
Author: Rodrigo Zhou <[email protected]>
Date:   Tue Aug 12 11:31:27 2025 -0700

    Bump Temporal API to v1.52.0 (#8187)

    ## What changed?
    Bump Temporal API to v1.52.0

    ## Why?
    Bump Temporal API to v1.52.0

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks

commit da90f62
Author: Stephan Behnke <[email protected]>
Date:   Fri Aug 8 14:43:15 2025 -0700

    Decode of nil data (#8179)

    ## What changed?

    Don't catch `Data: nil` in test; let it fall through to decoder. The
    decoder will return an error. An error is the better choice than a `nil`
    response since that signals to the user that the decoded data is
    usable/valid.

    ## Why?

    Follow-up to #8111; an
    internal test expects an error instead of `nil`.

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [x] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks

    Hard to believe that returning nil and using un-decoded data is a
    good/valid alternative.

commit b8497fa
Author: David Reiss <[email protected]>
Date:   Fri Aug 8 08:32:58 2025 -0700

    Dynamic config conversion improvements (#7052)

    ## What changed?
    - Split implementation of "constrained default" settings from "plain
    default" settings. This is more code and the diff looks complex, but the
    individual paths are both simpler than the mixed version.
    - Add conversion cache using a weak map.
    - Remove GlobalCachedTypedValue.
    - Use "raw" values for subscription dispatch deduping to avoid
    unnecessary conversions.
    - Deep copy default values when using mapstructure, to avoid problems
    with merging over shared default values.

    ## Why?
    - Fixes #6756
    - Performance improvement for "plain default" settings (almost all of
    them)
    - Performance improvement for settings with complex converters
    - Remove footgun in defaults that aren't scalar values

    ## How did you test it?
    existing+new unit tests

commit f9bd083
Author: Lina Jodoin <[email protected]>
Date:   Thu Aug 7 17:11:53 2025 -0700

    [Scheduled Actions] Update Scheduler protos for CHASM (#8163)

    ## What changed?
    - Added protos for the new Scheduler task types.
    - Added TODOs for cleanup when the HSM component is removed.

    ## Why?
    - A few fields and messages were made obsolete with the CHASM port.

commit f8b97e5
Author: Vladyslav Simonenko <[email protected]>
Date:   Thu Aug 7 16:24:54 2025 -0700

    Break out of pagination in scavenger on errors (#8133)

    ## What changed?
    Break out of the loop, when iteration through mutable states fails

    ## Why?
    Previously, we continued to iterate, leading to the panic:
    #8037

    ## How did you test it?
    - [X] run locally and tested manually
    - [X] added new unit test(s)
lina-temporal added a commit that referenced this pull request Aug 28, 2025
commit 9f30a1d
Author: Lina Jodoin <[email protected]>
Date:   Wed Aug 27 16:08:12 2025 -0700

    [Scheduled Actions] Use the Execution returned from PollMutableState when calling GetWorkflowExecutionHistory (#8207)

    ## What changed?
    - In WatchWorkflow, we'll now use the Execution returned as a result
    from `PollMutableState`, instead of the Execution we used as part of the
    `PollMutableState` request.

    ## Why?
    - We have a likely race condition if a workflow starts and completes
    during the `PollMutableState` call, where our originally-requested
    Execution is no longer the latest.

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

commit 8b717ec
Author: Roman Dmytrenko <[email protected]>
Date:   Wed Aug 27 19:34:05 2025 +0000

    chore(deps): upgrade go from 1.24.5 to 1.25.0 (#8209)

    ## What changed?

    Upgrade go to the 1.25.0

    ## How did you test it?

    - [x] built
    - [x] run locally and tested manually

    ~Blocked by #8174~

    ---------

    Signed-off-by: Roman Dmytrenko <[email protected]>
    Co-authored-by: Stephan Behnke <[email protected]>
    Co-authored-by: Stephan Behnke <[email protected]>

commit 3f18d90
Author: Stephan Behnke <[email protected]>
Date:   Tue Aug 26 18:28:26 2025 -0700

    GetWorkflowExecutionHistory long poll soft timeout (#8238)

    ## What changed?

    Added a "soft timeout" (language used by Workflow Update) to
    `GetWorkflowExecutionHistory` long polls.

    ## Why?

    We don't want to terminate the long poll connection but instead keep it
    alive by sending a response back just before the timeout. The idea is
    that this will prevent connections from opening/terminating repeatedly
    (ie connection churn).

    ## How did you test it?
    - [ ] built
    - [x] run locally and tested manually
    - [x] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks

    I was only able to verify this manually by forcing a timeout in the
    server and verifying that instead of a deadline exceeded I saw a result.
    I'll assume this will work since the existing code already tried doing
    just exactly that, but it didn't do it well.

commit 64884b1
Author: Sean Kane <[email protected]>
Date:   Tue Aug 26 14:26:02 2025 -0600

    fix: handle nil ptr in legacy batch processing (#8244)

    ## What changed?
    `BatchWorkflow` is not yet deprecated, fix was not properly applied on
    previous PR

    ## Why?
    nilptr exception

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [X] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks
    na

commit 96c3aae
Author: Kent Gruber <[email protected]>
Date:   Tue Aug 26 13:23:40 2025 -0400

    Use better string splitting techniques where possible (#8226)

    ## What changed?

    This PR aims to avoid usage of
    [`strings.Split`](https://pkg.go.dev/strings#Split) where possible in
    favor of better string splitting techniques, speficially:
    [`strings.SplitN`](https://pkg.go.dev/strings#SplitN) and
    [`strings.SplitSeq`](https://pkg.go.dev/strings#SplitSeq) where
    appropriate.

    There was also a [`strings.Fields`](https://pkg.go.dev/strings#Fields)
    change I made to use
    [`strings.FieldsSeq`](https://pkg.go.dev/strings#FieldsSeq) instead, and
    another for S3 to use the [`path`](https://pkg.go.dev/path) package
    instead of [`strings.Split`](https://pkg.go.dev/strings#Split).

    ## Why?

    [`strings.SplitN`](https://pkg.go.dev/strings#SplitN) and
    [`strings.SplitSeq`](https://pkg.go.dev/strings#SplitSeq) are often
    better options in many cases, and can be _partially_ detected using
    [`modernize`](https://pkg.go.dev/golang.org/x/tools/gopls/internal/analysis/modernize):
    > `stringsseq`: replace Split in "for range strings.Split(...)" by
    go1.24's more efficient `SplitSeq`, or `Fields` with `FieldSeq`.

    ## How did you test it?
    - [X] built
    - [X] run locally and tested manually
    - [x] covered by existing tests
    - [x] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks

    There are lots of potentially subtle behaviors from the `strings.Split`
    (and `strings.Fields`) usage that should be accounted for. If our
    existing tests don't cover those subtleties, there's risk for
    introducing an unintended bug. More intricate handling/parsing
    previously using the `strings` package should get extra attention from
    reviewers. I've attempted to break up my changes into logical commit
    chunks to aid in review / help spot potentially concerning changes.

commit ebcc3fd
Author: Sean Kane <[email protected]>
Date:   Tue Aug 26 09:15:23 2025 -0600

    fix: handle nil ptr in batch processing (#8240)

    ## What changed?
    Batch workflows were panicking because executions can be nil, but there
    is no check to prevent nil pointer exception.

    ## Why?
    Prevent nil-ptrs

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [X] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks
    NA

commit dfa8b3e
Author: Yu Xia <[email protected]>
Date:   Mon Aug 25 23:40:08 2025 -0700

    Adding minimum timeout require in system workflow (#8231)

    ## What changed?
    Adding minimum timeout require in system workflow

    ## Why?
    The system workflow needs to have sufficient time to execute the defer
    logic.

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

commit 8f1ad7c
Author: Yu Xia <[email protected]>
Date:   Mon Aug 25 15:52:12 2025 -0700

    Wire up api health monitor component (#8217)

    ## What changed?
    Wire up api health monitor component

    ## Why?
    The health monitor component did not wire correctly in fx

    ## How did you test it?
    - [x] built
    - [x] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

commit 45da266
Author: pdoerner <[email protected]>
Date:   Mon Aug 25 11:40:10 2025 -0700

    Remove dynamic config warnings for shared structures (#8236)

    ## What changed?
    Removed warning logs for shared dynamic config structures

    ## Why?
    Was failing integration tests

commit 2d74130
Author: Hai Zhao <[email protected]>
Date:   Mon Aug 25 09:12:03 2025 -0700

    Add replication state to response of DescribeNamespace/ListNamespaces/UpdateNamespace (#8234)

    ## What changed?
    Add replication state to response of
    DescribeNamespace/ListNamespaces/UpdateNamespace.

    ## Why?
    We want to check replication state quickly from cli.

    ## How did you test it?
    - [x] built
    - [x] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks

commit 54893bc
Author: David Reiss <[email protected]>
Date:   Fri Aug 22 17:04:29 2025 -0700

    Only force-load child partitions after successful initialization (#8230)

    ## What changed?
    The force-load child partitions mechanism should only happen after
    successful initialization of the root.

    ## Why?
    If the root fails to load, things can get stuck in a loop where the root
    loads the children and the children cause the root to be loaded again
    (from userdata polling).

    ## How did you test it?
    - [x] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

commit 251e20a
Author: sivagirish81 <[email protected]>
Date:   Thu Aug 21 19:08:02 2025 -0700

    TaskQueue Fairness Rate Limit (#8135)

    ## What changed?
    - Move the rate limit logic for fairness from priMatcher to
    taskQueuePartitionManagerLevel
    - Attach the fairness queue rate limit and the per-key rate limit to the
    simple rate limiter implementation.

    ## Why?
    - Implementation of UpdateTaskQueueConfig api for fairness tasks.

    ## How did you test it?
    - [x] built
    - [x] run locally and tested manually
    - [x] covered by existing tests
    - [x] added new unit test(s)
    - [x] added new functional test(s)

    ## Potential risks
    N/A

commit 16f7688
Author: Will Duan <[email protected]>
Date:   Fri Aug 22 07:16:52 2025 +0800

    Add log for slow replication tasks (#8225)

    ## What changed?
    Log replication task details when processing takes too long

    ## Why?
    For operator investigation

    ## How did you test it?
    - [x] built
    - [x] run locally and tested manually
    - [x] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks
    no risk.

commit 934c58d
Author: Will Duan <[email protected]>
Date:   Fri Aug 22 06:51:57 2025 +0800

    Fix VerifyVersionedTransition Task (#8227)

    ## What changed?
    Fix VerifyVersionedTransition Task

    ## Why?
    Without fix, there is risk of success the task without verifying.

    ## How did you test it?
    - [x] built
    - [x] run locally and tested manually
    - [x] covered by existing tests
    - [x] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks
    no risk.

commit f454f2f
Author: Yichao Yang <[email protected]>
Date:   Thu Aug 21 14:32:35 2025 -0700

    Revert history task processing timeout change (#8228)

    ## What changed?
    - Revert history task timeout from 10s to 3s for non-outbound tasks.

    ## Why?
    - The original change was made due to a misunderstanding of my comment
    [here](#7951 (comment)).
    I was meant to suggest only use 10s as the timeout for outbound chasm tasks.
    But for other tasks, like transfer/timer, it should still use 3s as the
    timeout.

    ## How did you test it?
    - [x] built
    - [ ] run locally and tested manually
    - [x] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

commit 14d52ce
Author: Roman Dmytrenko <[email protected]>
Date:   Thu Aug 21 15:16:29 2025 +0000

    ci: bump golangci-lint from v1.64.8 to v2.4.0 (#8174)

    ## What changed?

    Upgrade golangci-lint to v2

    ## How did you test it?

    - [x] run locally and tested manually

    ---------

    Signed-off-by: Roman Dmytrenko <[email protected]>

commit fb863dc
Author: Stephan Behnke <[email protected]>
Date:   Wed Aug 20 18:19:02 2025 -0700

    Explain test build tags and env variables (#7991)

    ## What changed?

    Added documentation for test-related build tags and env variables.

    ## Why?

    So help developers with their test setup.

commit 17c4c07
Author: pdoerner <[email protected]>
Date:   Wed Aug 20 16:05:45 2025 -0700

    Add dynamic config for forwarded Nexus request dispatch type (#8224)

    ## What changed?
    Added a new dynamic config to control whether forwarded Nexus HTTP
    requests should use the same dispatch type as the original request or
    always use dispatch by namespace + task queue.

    ## Why?
    Endpoints do not support replication, so forwarding by endpoint will not
    work out of the box because the two clusters will have a different ID
    for the endpoint.

commit 1d340f5
Author: pdoerner <[email protected]>
Date:   Wed Aug 20 15:33:30 2025 -0700

    Pass through original HTTP headers for forwarded Nexus requests (#8204)

    ## What changed?
    When forwarding Nexus Start/Cancel requests, the original HTTP headers
    will be passed through without sanitization.

    ## Why?
    Some headers that are still needed for the forwarded request may be
    sanitized during original request processing (e.g. authorization
    information headers).

    ## How did you test it?
    existing tests

commit 3b1b8d0
Author: Roey Berman <[email protected]>
Date:   Wed Aug 20 11:07:49 2025 -0600

    Upgrade Go SDK to 1.35.0 (#8216)

    Had to change WorkerDeploymentOptions and VersioningOverride to work
    with new SDK.

    Changed the worker setup for the versioning internal workflow replay
    tests, so I generated a new set of workflow histories to test with.
    Normally we generate new workflow histories only when we've made a
    change to the workflow definitions that we want to test (ie. a new
    patch), but since the operations to generate the workflow histories had
    to change slightly, I think it makes sense to generate a fresh set of
    histories to replay-test with next time there is a change.

    ---------

    Co-authored-by: Carly de Frondeville <[email protected]>

commit 4d1212a
Author: Stephan Behnke <[email protected]>
Date:   Tue Aug 19 11:52:55 2025 -0700

    Fix typo in unprocessedUpdateFailure (#8212)

    WISOTT

commit f38c88a
Author: David Reiss <[email protected]>
Date:   Tue Aug 19 06:51:19 2025 -0700

    Allow more retries for matching client polls (#8155)

    ## What changed?
    Allow frontend->matching poll requests to retry up to their context
    timeout instead of just once.

    ## Why?
    On matching service deployments, a busy new matching node may hit its
    persistence rps limit trying to acquire new task queues and be unable to
    accept polls.

    ## How did you test it?
    - [x] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

commit fdb7f31
Author: Yu Xia <[email protected]>
Date:   Mon Aug 18 16:24:42 2025 -0700

    Change sys background low to use the correct level (#8208)

    ## What changed?
    Change sys background low to use the correct level

    ## Why?
    Fix this based on the variable name

    ## How did you test it?
    - [x] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

commit e4b378f
Author: David Reiss <[email protected]>
Date:   Mon Aug 18 15:34:34 2025 -0700

    Support subscriptions to settings with constrained defaults (#8180)

    ## What changed?
    Fill in support for subscriptions to dynamic config values with
    constrained defaults.

    ## Why?
    We'd like to use this combination of functionality.

    ## How did you test it?
    - [x] added new unit test(s)

commit 8ed0361
Author: David Reiss <[email protected]>
Date:   Mon Aug 18 15:29:17 2025 -0700

    Allow empty data in DataBlob (#8181)

    ## What changed?
    Remove check for zero-length data in NewDataBlob.

    ## Why?
    Zero-length data is a valid encoding for some encodings, e.g. proto3.
    NewDataBlob should not have an opinion on the length of data.

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks
    Some code may be making assumptions about this behavior.

commit 40ac028
Author: David Reiss <[email protected]>
Date:   Mon Aug 18 12:41:41 2025 -0700

    Warn on dynamic config default values with shared structure (#8176)

    ## What changed?
    Log softassert warnings if dynamic config settings are registered with
    default values with shared structure.

    ## Why?
    This is very likely unintended and may lead to unexpected behavior of
    settings (values will be parsed on top of a copy of the default).

    ## How did you test it?
    - [x] run locally and tested manually
    - [x] added new unit test(s)

commit 9c75cd6
Author: pdoerner <[email protected]>
Date:   Fri Aug 15 12:43:47 2025 -0700

    Forward Nexus requests using same dispatch type as original request (#8199)

    ## What changed?
    When forwarding Nexus requests that were originally sent to the
    `DispatchByEndpoint` URL, the forwarding URL will also be constructed to
    send the request to the `DispatchByEndpoint` URL on the remote cluster.
    Previously, we were always sending forwarding requests using
    `DispatchByNamespaceAndTaskQueue`

    ## Why?
    bug fix

    ## How did you test it?
    existing tests

commit 21f556c
Author: Roey Berman <[email protected]>
Date:   Fri Aug 15 13:13:28 2025 -0600

    Commit generated scheduler protos (#8200)

    ## What

    - Commit generated scheduler protos.
    - Improve `make ensure-no-changes` to detect untracked files.

    ## Why?

    The protos were not generated since the tool was committed in a separate
    PR from where the protos were added.

commit 4c59cd1
Author: Roey Berman <[email protected]>
Date:   Fri Aug 15 12:09:58 2025 -0600

    Add support for protos in chasm libs (#8182)

    ## What changed?

    Added support for defining protos in chasm libs.

    ## Why?

    Keep everything local to the library.

    ## How did you test it?
    - [x] built
    - [x] run locally and tested manually

commit 08e2dfd
Author: pdoerner <[email protected]>
Date:   Thu Aug 14 16:49:44 2025 -0700

    Reconstruct failure for forwarded Nexus completion requests (#8198)

    ## What changed?
    When forwarding a `CompleteNexusOperation` HTTP request that contains a
    failure, the completion will be reconstructed instead of reusing the
    original request body.

    ## Why?
    The Nexus SDK reads and closes the HTTP request body when the operation
    state is `failed` or `canceled` so we cannot reuse it for the forwarded
    request. For `successful` operations, the SDK just passes on the result
    content in the form of a `nexus.LazyValue` which we can forward directly
    since it is not read or closed.

    ## How did you test it?
    new functional xdc tests

commit 9d82cae
Author: Lina Jodoin <[email protected]>
Date:   Thu Aug 14 16:06:01 2025 -0700

    Fix BufferedStart reference in chasm scheduler proto (#8197)

    ## What changed?
    _Describe what has changed in this PR._

    ## Why?
    _Tell your future self why have you made these changes._

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks
    _Any change is risky. Identify all risks you are aware of. If none,
    remove this section._

commit 41cce70
Author: Vladyslav Simonenko <[email protected]>
Date:   Wed Aug 13 15:34:38 2025 -0700

    Produce workflow_duration metric on completion (#8185)

    ## What changed?
    This PR produces the metric workflow_duration, when the workflow
    execution completes.

    ## Why?
    Currently there is no metric that captures the duration of the workflow
    execution. It's also valuable to have the duration broken down by task
    queue, namespace, workflow type, which this PR enables

    ## How did you test it?
    - [X] built
    - [X] run locally and tested manually
    - [X] covered by existing tests
    - [X] added new unit test(s)
    - [ ] added new functional test(s)

commit a1df862
Author: Lina Jodoin <[email protected]>
Date:   Wed Aug 13 15:30:48 2025 -0700

    [CHASM Scheduler] Move scheduler protobufs to scheduler/proto package (#8189)

    ## What changed?
    - CHASM scheduler protos are moved to live alongside the scheduler
    implementation code, within the `chasm` package.

    ## Why?
    - See #8182. Sending this PR in advance, as that PR asserts protobufs
    were generated.

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [x] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

commit 8fe5cee
Author: Sean Kane <[email protected]>
Date:   Thu Aug 14 00:10:06 2025 +0200

    improvement: remove waits before fetching activities (#8144)

    ## What changed?
    optimize the batch operation processing in `BatchActivity` and
    `BatchActivityWithProtobuf` by removing the need to wait for entire
    pages to complete before fetching the next page.

    - Implemented proactive page fetching once a worker becomes available
    - common `processWorkflowsWithProactiveFetching` function to reduce code
    duplication

    ## Why?
    The previous implementation had workers wait for entire pages to
    complete. This optimization improves resource utilization. The
    refactoring also eliminates duplicated functions in the `BatchParams`
    struct and `BatchOperation` protobuf.

    Addresses issue #8098.

    ## How did you test it?
    - [x] built
    - [x] run locally and tested manually
    - [x] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    The changes maintain backward compatibility.

    ## Potential risks
    While this change improves performance, it does modify the concurrency
    model of batch processing:

    1. **Timing changes**: The optimization changes when pages are fetched
    relative to task completion, which could expose edge cases in error
    handling or heartbeat timing
    2. **Memory usage**: Pages may be fetched earlier, potentially
    increasing peak memory usage if the next page is large
    3. **Rate limiting interaction**: The more aggressive task scheduling
    could interact differently with rate limiting, though the same
    per-worker limits are maintained
    4. **Heartbeat behavior**: heartbeats track the progress of an entire
    page and are applied after an entire page finishes

    The changes preserve all existing error handling, retry logic, and rate
    limiting behavior, but the different execution timing could surface
    previously hidden race conditions.

    ---------

    Co-authored-by: Roey Berman <[email protected]>

commit 469526e
Author: pdoerner <[email protected]>
Date:   Wed Aug 13 09:58:03 2025 -0700

    Change default for `component.nexusoperations.recordCancelRequestCompletionEvents` (#8191)

    ## What changed?
    Changed default for
    `component.nexusoperations.recordCancelRequestCompletionEvents` to
    `true`

    ## Why?
    Flag was added to ensure backwards compatibility. Now that 1.28 is
    released, can change the default. Flag will be removed after 1.29 is
    released.

commit 69e6b6c
Author: Rodrigo Zhou <[email protected]>
Date:   Tue Aug 12 11:31:27 2025 -0700

    Bump Temporal API to v1.52.0 (#8187)

    ## What changed?
    Bump Temporal API to v1.52.0

    ## Why?
    Bump Temporal API to v1.52.0

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [ ] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks

commit da90f62
Author: Stephan Behnke <[email protected]>
Date:   Fri Aug 8 14:43:15 2025 -0700

    Decode of nil data (#8179)

    ## What changed?

    Don't catch `Data: nil` in test; let it fall through to decoder. The
    decoder will return an error. An error is the better choice than a `nil`
    response since that signals to the user that the decoded data is
    usable/valid.

    ## Why?

    Follow-up to #8111; an
    internal test expects an error instead of `nil`.

    ## How did you test it?
    - [ ] built
    - [ ] run locally and tested manually
    - [ ] covered by existing tests
    - [x] added new unit test(s)
    - [ ] added new functional test(s)

    ## Potential risks

    Hard to believe that returning nil and using un-decoded data is a
    good/valid alternative.

commit b8497fa
Author: David Reiss <[email protected]>
Date:   Fri Aug 8 08:32:58 2025 -0700

    Dynamic config conversion improvements (#7052)

    ## What changed?
    - Split implementation of "constrained default" settings from "plain
    default" settings. This is more code and the diff looks complex, but the
    individual paths are both simpler than the mixed version.
    - Add conversion cache using a weak map.
    - Remove GlobalCachedTypedValue.
    - Use "raw" values for subscription dispatch deduping to avoid
    unnecessary conversions.
    - Deep copy default values when using mapstructure, to avoid problems
    with merging over shared default values.

    ## Why?
    - Fixes #6756
    - Performance improvement for "plain default" settings (almost all of
    them)
    - Performance improvement for settings with complex converters
    - Remove footgun in defaults that aren't scalar values

    ## How did you test it?
    existing+new unit tests

commit f9bd083
Author: Lina Jodoin <[email protected]>
Date:   Thu Aug 7 17:11:53 2025 -0700

    [Scheduled Actions] Update Scheduler protos for CHASM (#8163)

    ## What changed?
    - Added protos for the new Scheduler task types.
    - Added TODOs for cleanup when the HSM component is removed.

    ## Why?
    - A few fields and messages were made obsolete with the CHASM port.

commit f8b97e5
Author: Vladyslav Simonenko <[email protected]>
Date:   Thu Aug 7 16:24:54 2025 -0700

    Break out of pagination in scavenger on errors (#8133)

    ## What changed?
    Break out of the loop, when iteration through mutable states fails

    ## Why?
    Previously, we continued to iterate, leading to the panic:
    #8037

    ## How did you test it?
    - [X] run locally and tested manually
    - [X] added new unit test(s)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants