Skip to content

Draft: Waybar perf hot-path optimization plan #8

Description

@SaveTheRbtz

Note

Draft umbrella issue for a small Waybar performance PR stack. The evidence below is from the direct baseline.perf.data vs optimized.perf.data comparison, not from the earlier stripped-binary profile.

Why This Stack

baseline.perf.data is dominated by repeated user-space work in Waybar's hot paths: regex-heavy JSON preparation, Sway full-tree IPC, JsonCpp tree parsing, and GTK label/layout churn. The PRs below target those costs without replacing JsonCpp, rewriting GTK/Pango rendering, or introducing a large Sway backend rewrite.

Baseline hot area Cost Why it matters Targeted by
Sway IPC sendCmd child path 2.66B cycles event handling repeatedly waits on and processes Sway replies workspace/window Sway fast paths
getTree child path 1.18B cycles full Sway tree fetches are expensive when only smaller payloads are needed workspace/window Sway fast paths
std::regex search/replace ~1.10B child cycles JSON parsing pays a large common-case regex tax JSON parser cleanup
std::regex::_M_dfs self 743M cycles largest Waybar-side self hotspot in the baseline JSON parser cleanup
Json parse path 834M cycles parsing full Sway replies dominates residual user-space work parser and Sway payload reductions
Json array parsing 799M cycles tree/list parsing is amplified by full-tree IPC replies Sway payload reductions
GTK layout/draw-ish 213M cycles no-op label writes can still trigger layout/redraw paths label markup guard

Important

JsonCpp is still the largest percentage bucket in the optimized profile, but its absolute cost is much lower: roughly 700M cycles to 186M cycles, or -73.4%. The percentage is high because the total profile is much smaller.

PR Stack

The PRs are ordered from smallest/localest to most behavior-sensitive. Each section includes the relevant baseline signal and the corresponding before/after effect where the aggregate comparison can attribute it.

1. Parse JSON directly from string buffers

PR: perf(json): parse directly from string buffers

Why this was needed: baseline.perf.data showed the JSON utility paying a large common-case regex tax before parsing. std::regex search/replace accounted for about 1.10B child cycles, and std::regex::_M_dfs alone accounted for 743M self cycles. Json parsing itself was also hot at 834M child cycles.

Change: Parse from the existing string buffer and only run the \x compatibility repair when that escape is present.

Effect in comparison: The regex path disappears from visible hot symbols. Json parse work drops from 834M to 188M cycles, and the overall JsonCpp bucket drops from roughly 700M to 186M cycles.

2. Avoid duplicate MPD playing-state updates

PR: perf(mpd): avoid duplicate playing-state updates

Why this was needed: MPD was not a dominant bucket in baseline.perf.data; the baseline was dominated by Sway IPC, regex, JSON, and label work. This PR is included because the code had an obvious duplicate periodic fetch/update while the stack was being trimmed down.

Change: Playing::on_timer() already fetches state before checking whether playback is active. Reuse that result instead of calling queryMPD() immediately afterward and emitting twice.

Effect in comparison: This is a small local cleanup rather than a primary aggregate-profile driver. It removes redundant periodic MPD work without changing the larger Sway/JSON result.

3. Skip unchanged label and tooltip markup

PR: perf(label): skip redundant markup updates

Why this was needed: baseline.perf.data had visible GTK/UI update work. The GTK layout/draw-ish path accounted for about 213M cycles, which is avoidable when a module writes identical markup repeatedly.

Change: Cache the last label and tooltip markup and skip Gtk::Label::set_markup() / set_tooltip_markup() when the markup is unchanged.

Effect in comparison: GTK layout/draw-ish work drops from 213M to 17M cycles, indicating fewer no-op relayout/redraw paths.

4. Use smaller Sway workspace replies for simple configs

PR: perf(sway/workspaces): avoid full tree for simple configs

Why this was needed: baseline.perf.data showed repeated Sway tree/list work: Sway IPC sendCmd child path at 2.66B cycles, getTree child path at 1.18B cycles, Json parse path at 834M cycles, and Json array parsing at 799M cycles.

Change: Use IPC_GET_WORKSPACES when sway/workspaces does not need per-window tree data. Preserve IPC_GET_TREE for window-rewrite, where child window nodes are required.

Effect in comparison: This contributes to the reduction in repeated tree/list parsing. Json array parsing drops from 799M to 145M cycles, while full Sway IPC/tree work is much lower across the optimized stack.

5. Avoid Sway window tree fetches for simple events

PR: perf(sway/window): avoid tree fetches for simple events

Why this was needed: The largest baseline child path was Sway IPC/event processing: sendCmd at 2.66B cycles. Repeated full-tree work was also prominent: getTree at 1.18B cycles, plus Json parse/tree work in the hundreds of millions of cycles.

Change: Use the Sway window event payload for focused-window title/mark updates. Keep IPC_GET_TREE for structural events such as focus, move, close, floating, and workspace changes.

Effect in comparison: This directly attacks the event -> IPC_GET_TREE -> full JsonCpp parse -> tree walk path. sendCmd drops from 2.66B to 214M cycles, and getTree drops from 1.18B to 42M cycles.

Outcome

The optimized build reduces sampled CPU work by about 86% over a comparable 30s capture.

Metric Baseline Optimized Change Read
Total cycles 3.42B 0.46B -86.5% about 7.4x less sampled work
Cycles/sec 114.5M/s 15.9M/s -86.1% normalized result is also about 7.2x better
Samples 332 58 -82.5% far fewer hot samples
Waybar binary bucket 1.03B 22.5M -97.8% most app-side hot work removed
Sampling caveat

The optimized profile had only 58 samples, so small symbol-level deltas are noisy. The large reductions above are still strong enough to treat as real because they are visible in absolute cycle counts and align with the removed code paths.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions