Draft: Waybar perf hot-path optimization plan

> [!NOTE]
> Draft umbrella issue for a small Waybar performance PR stack. The evidence below is from the direct `baseline.perf.data` vs `optimized.perf.data` comparison, not from the earlier stripped-binary profile.

## Why This Stack

`baseline.perf.data` is dominated by repeated user-space work in Waybar's hot paths: regex-heavy JSON preparation, Sway full-tree IPC, JsonCpp tree parsing, and GTK label/layout churn. The PRs below target those costs without replacing JsonCpp, rewriting GTK/Pango rendering, or introducing a large Sway backend rewrite.

| Baseline hot area | Cost | Why it matters | Targeted by |
|---|---:|---|---|
| Sway IPC `sendCmd` child path | **2.66B cycles** | event handling repeatedly waits on and processes Sway replies | workspace/window Sway fast paths |
| `getTree` child path | **1.18B cycles** | full Sway tree fetches are expensive when only smaller payloads are needed | workspace/window Sway fast paths |
| `std::regex` search/replace | **~1.10B child cycles** | JSON parsing pays a large common-case regex tax | JSON parser cleanup |
| `std::regex::_M_dfs` self | **743M cycles** | largest Waybar-side self hotspot in the baseline | JSON parser cleanup |
| Json parse path | **834M cycles** | parsing full Sway replies dominates residual user-space work | parser and Sway payload reductions |
| Json array parsing | **799M cycles** | tree/list parsing is amplified by full-tree IPC replies | Sway payload reductions |
| GTK layout/draw-ish | **213M cycles** | no-op label writes can still trigger layout/redraw paths | label markup guard |

> [!IMPORTANT]
> JsonCpp is still the largest percentage bucket in the optimized profile, but its absolute cost is much lower: roughly 700M cycles to 186M cycles, or **-73.4%**. The percentage is high because the total profile is much smaller.

## PR Stack

The PRs are ordered from smallest/localest to most behavior-sensitive. Each section includes the relevant baseline signal and the corresponding before/after effect where the aggregate comparison can attribute it.

### 1. Parse JSON directly from string buffers

PR: [perf(json): parse directly from string buffers](https://github.com/Alexays/Waybar/pull/5108)

**Why this was needed:** `baseline.perf.data` showed the JSON utility paying a large common-case regex tax before parsing. `std::regex` search/replace accounted for about **1.10B child cycles**, and `std::regex::_M_dfs` alone accounted for **743M self cycles**. Json parsing itself was also hot at **834M child cycles**.

**Change:** Parse from the existing string buffer and only run the `\x` compatibility repair when that escape is present.

**Effect in comparison:** The regex path disappears from visible hot symbols. Json parse work drops from **834M** to **188M** cycles, and the overall JsonCpp bucket drops from roughly **700M** to **186M** cycles.

### 2. Avoid duplicate MPD playing-state updates

PR: [perf(mpd): avoid duplicate playing-state updates](https://github.com/Alexays/Waybar/pull/5100)

**Why this was needed:** MPD was not a dominant bucket in `baseline.perf.data`; the baseline was dominated by Sway IPC, regex, JSON, and label work. This PR is included because the code had an obvious duplicate periodic fetch/update while the stack was being trimmed down.

**Change:** `Playing::on_timer()` already fetches state before checking whether playback is active. Reuse that result instead of calling `queryMPD()` immediately afterward and emitting twice.

**Effect in comparison:** This is a small local cleanup rather than a primary aggregate-profile driver. It removes redundant periodic MPD work without changing the larger Sway/JSON result.

### 3. Skip unchanged label and tooltip markup

PR: [perf(label): skip redundant markup updates](https://github.com/Alexays/Waybar/pull/5111)

**Why this was needed:** `baseline.perf.data` had visible GTK/UI update work. The GTK layout/draw-ish path accounted for about **213M cycles**, which is avoidable when a module writes identical markup repeatedly.

**Change:** Cache the last label and tooltip markup and skip `Gtk::Label::set_markup()` / `set_tooltip_markup()` when the markup is unchanged.

**Effect in comparison:** GTK layout/draw-ish work drops from **213M** to **17M** cycles, indicating fewer no-op relayout/redraw paths.

### 4. Use smaller Sway workspace replies for simple configs

PR: [perf(sway/workspaces): avoid full tree for simple configs](https://github.com/SaveTheRbtz/Waybar/pull/3)

**Why this was needed:** `baseline.perf.data` showed repeated Sway tree/list work: Sway IPC `sendCmd` child path at **2.66B cycles**, `getTree` child path at **1.18B cycles**, Json parse path at **834M cycles**, and Json array parsing at **799M cycles**.

**Change:** Use `IPC_GET_WORKSPACES` when `sway/workspaces` does not need per-window tree data. Preserve `IPC_GET_TREE` for `window-rewrite`, where child window nodes are required.

**Effect in comparison:** This contributes to the reduction in repeated tree/list parsing. Json array parsing drops from **799M** to **145M** cycles, while full Sway IPC/tree work is much lower across the optimized stack.

### 5. Avoid Sway window tree fetches for simple events

PR: [perf(sway/window): avoid tree fetches for simple events](https://github.com/SaveTheRbtz/Waybar/pull/7)

**Why this was needed:** The largest baseline child path was Sway IPC/event processing: `sendCmd` at **2.66B cycles**. Repeated full-tree work was also prominent: `getTree` at **1.18B cycles**, plus Json parse/tree work in the hundreds of millions of cycles.

**Change:** Use the Sway window event payload for focused-window title/mark updates. Keep `IPC_GET_TREE` for structural events such as focus, move, close, floating, and workspace changes.

**Effect in comparison:** This directly attacks the event -> `IPC_GET_TREE` -> full JsonCpp parse -> tree walk path. `sendCmd` drops from **2.66B** to **214M** cycles, and `getTree` drops from **1.18B** to **42M** cycles.

## Outcome

The optimized build reduces sampled CPU work by about **86%** over a comparable 30s capture.

| Metric | Baseline | Optimized | Change | Read |
|---|---:|---:|---:|---|
| Total cycles | 3.42B | 0.46B | **-86.5%** | about 7.4x less sampled work |
| Cycles/sec | 114.5M/s | 15.9M/s | **-86.1%** | normalized result is also about 7.2x better |
| Samples | 332 | 58 | **-82.5%** | far fewer hot samples |
| Waybar binary bucket | 1.03B | 22.5M | **-97.8%** | most app-side hot work removed |

<details>
<summary>Sampling caveat</summary>

The optimized profile had only 58 samples, so small symbol-level deltas are noisy. The large reductions above are still strong enough to treat as real because they are visible in absolute cycle counts and align with the removed code paths.

</details>



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Draft: Waybar perf hot-path optimization plan #8

Why This Stack

PR Stack

1. Parse JSON directly from string buffers

2. Avoid duplicate MPD playing-state updates

3. Skip unchanged label and tooltip markup

4. Use smaller Sway workspace replies for simple configs

5. Avoid Sway window tree fetches for simple events

Outcome

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Baseline hot area	Cost	Why it matters	Targeted by
Sway IPC `sendCmd` child path	2.66B cycles	event handling repeatedly waits on and processes Sway replies	workspace/window Sway fast paths
`getTree` child path	1.18B cycles	full Sway tree fetches are expensive when only smaller payloads are needed	workspace/window Sway fast paths
`std::regex` search/replace	~1.10B child cycles	JSON parsing pays a large common-case regex tax	JSON parser cleanup
`std::regex::_M_dfs` self	743M cycles	largest Waybar-side self hotspot in the baseline	JSON parser cleanup
Json parse path	834M cycles	parsing full Sway replies dominates residual user-space work	parser and Sway payload reductions
Json array parsing	799M cycles	tree/list parsing is amplified by full-tree IPC replies	Sway payload reductions
GTK layout/draw-ish	213M cycles	no-op label writes can still trigger layout/redraw paths	label markup guard

Metric	Baseline	Optimized	Change	Read
Total cycles	3.42B	0.46B	-86.5%	about 7.4x less sampled work
Cycles/sec	114.5M/s	15.9M/s	-86.1%	normalized result is also about 7.2x better
Samples	332	58	-82.5%	far fewer hot samples
Waybar binary bucket	1.03B	22.5M	-97.8%	most app-side hot work removed

Draft: Waybar perf hot-path optimization plan #8

Description

Why This Stack

PR Stack

1. Parse JSON directly from string buffers

2. Avoid duplicate MPD playing-state updates

3. Skip unchanged label and tooltip markup

4. Use smaller Sway workspace replies for simple configs

5. Avoid Sway window tree fetches for simple events

Outcome

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions