fix(credentials): retry transient Windows FS errors when replacing auth-profiles.json (TAURI-RUST-92J)

## Source

Sentry: https://sentry.tinyhumans.ai/organizations/tinyhumans/issues/10246/
Short ID: `TAURI-RUST-92J` (project `tauri-rust`)
Events: 10,158 · Users affected: 1 · First seen: 2026-06-03 04:50 UTC · Last seen: 2026-06-04 10:31 UTC
Reproducing release: `openhuman@0.56.0+e8968077aeb5`
Platform: Windows 10.0.26200 (Windows 11 24H2) · x86_64

## Symptom

```
Failed to replace auth profile store at C:\Users\<user>\.openhuman\users\<uid>\auth-profiles.json
```

Captured by `report_error_or_expected` from the JSON-RPC error path during `openhuman.app_state_snapshot` (domain `rpc`, operation `invoke_method`, `elapsed_ms ≈ 3`). Message-only Sentry event (no stack).

## Where it fails

`src/openhuman/credentials/profiles.rs:931-943` (function `write_persisted_locked`):

```rust
fs::write(&tmp_path, &json).with_context(|| {
    format!("Failed to write temporary auth profile file at {}", tmp_path.display())
})?;

fs::rename(&tmp_path, &self.path).with_context(|| {
    format!("Failed to replace auth profile store at {}", self.path.display())
})?;
```

Neither `fs::write` nor `fs::rename` is wrapped in `crate::openhuman::util::retry_with_backoff`, which is the helper that handles the exact Windows transient FS-error family (`is_transient_fs_error` already recognises `ERROR_ACCESS_DENIED (5)`, `ERROR_SHARING_VIOLATION (32)`, `ERROR_LOCK_VIOLATION (33)`, `ERROR_DELETE_PENDING (303)`, `ERROR_USER_MAPPED_FILE (1224)` — see `src/openhuman/util.rs:615`).

The same helper IS used for the sibling `.lock` create at `profiles.rs:987` (Sentry OPENHUMAN-TAURI-H1 / H8 fix, PRs #2085 / #1641). The `.json` rename path was left out — partial fix.

## Why the event count is 10k+ in 24h

`load_locked` runs on every `app_state_snapshot` poll. When a profile is dropped (decrypt failure, unrecognized `kind`, or — pre-#3125 — OAuth missing `access_token`), `load_locked` calls `write_persisted_locked` at `profiles.rs:744` to persist the purge. If the rename fails, the on-disk state is unchanged, so the **next** `app_state_snapshot` poll re-drops the same profile, re-attempts the same write, and re-fails. Tight loop until the file handle is released — and on Windows, AV / Search-Indexer / Defender can hold a file handle for many seconds.

The frontend health-check polls `app_state_snapshot` rapidly, so a single sustained AV hold amplifies into thousands of Sentry events.

## Reproduces on

- Branch: `upstream/main` @ `87a91ae02` (v0.57.14)
- Code path unchanged since release 0.56.0; the only `profiles.rs` commits since 0.56.0 (#3125, #3075) do not touch the rename retry path. Verified via `git log e8968077aeb5..upstream/main -- src/openhuman/credentials/profiles.rs`.
- Cannot repro live on non-Windows hosts (Windows-only error code family).
- Manual repro on Windows: hold an open file handle on `auth-profiles.json` (e.g. via `Get-Content -Wait` in PowerShell) while triggering any `app_state_snapshot` that exercises a drop / migration branch.

## Bug shape

Windows transient FS-race on `fs::rename`. Same family as the lock-create races already retried in PR #1641 / #2085 / #2180. Generic classifier (`is_transient_fs_error`) already in place; the call site here just isn't routed through it.

## Fix scope

1. Route `fs::write(&tmp_path, &json)` and `fs::rename(&tmp_path, &self.path)` through `retry_with_backoff("...", 6, 100, …)`, matching the parameters used by the `.lock` create at `profiles.rs:987`.
2. Persisted-write amplification guard: when `write_persisted_locked` exhausts retries during a `load_locked` purge, log + tag the error path so subsequent rapid `app_state_snapshot` polls don't replay the same write-and-fail loop until the AV handle is released. Either short-cache a "purge already attempted this session" flag, or surface the rename failure once and return the in-memory purged state without persisting. Either route defuses the 10k-event-per-day amplification.
3. Add a Rust regression test using the `__TEST_TRANSIENT__` sentinel `is_transient_fs_error` already understands (`src/openhuman/util.rs:618`) to verify the rename path retries.

Sentry-Issue: TAURI-RUST-92J

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(credentials): retry transient Windows FS errors when replacing auth-profiles.json (TAURI-RUST-92J) #3355

Source

Symptom

Where it fails

Why the event count is 10k+ in 24h

Reproduces on

Bug shape

Fix scope

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

fix(credentials): retry transient Windows FS errors when replacing auth-profiles.json (TAURI-RUST-92J) #3355

Description

Source

Symptom

Where it fails

Why the event count is 10k+ in 24h

Reproduces on

Bug shape

Fix scope

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions