Skip to content

Performance: ~92% of CPU is QPainter software rasterization; sluggish pan drag is a full-window sibling repaint #3283

@jensenpat

Description

@jensenpat

Summary

CPU profiling of a live AetherSDR session shows that ~92% of process CPU is Qt software rasterization (QPainter/QRasterPaintEngine), not DSP and not the GPU spectrum path. Two scenarios were captured:

  1. Idle / steady-state RX — cost concentrated in QPainter-rendered widgets (audio scope WaveformWidget, plus S-meter and VFO).
  2. Panadapter drag (the user-visible "sluggish drag" symptom) — total CPU roughly doubles, and the trace exposes the root cause: each drag frame triggers a full-window backingstore repaint that re-rasterizes sibling widgets (scope, meters, and QPushButtons) and re-shapes all text from scratch via HarfBuzz every frame.

The GPU spectrum/waterfall (QRhiMetal) and the audio DSP pipeline are both cheap and healthy. This issue documents the full profiler report and proposes optimizations across widget, GPU, panadapter, and audio subsystems.

Profiling/analysis only — no code changed. Refresh-rate defaults and software-vs-GPU rendering are UX/architecture decisions, so this is filed for maintainer review per the autonomy boundaries in CLAUDE.md.


Methodology

  • Tool: xcrun xctrace record --template "Time Profiler" --attach <pid> (Instruments' Time Profiler engine, CLI), attached to a live session.
  • Captures: 25 s idle/steady-state, and 120 s while actively dragging the panadapter.
  • Symbolication: against the local RelWithDebInfo build.
  • Aggregation: exported the time-profile table to XML; aggregated weight per thread, per thread-state (running vs. waiting), per leaf symbol, and inclusive per paint/event entry frame.
  • Environment: Apple Silicon (SoC t8142), macOS 26.5 (arm64), Qt 6.11.0 (Homebrew).
  • Note: absolute ms = relative weight, not a worst-case ceiling. Idle window averaged ~0.97 cores; drag window averaged ~1.49 cores.

Per-thread CPU breakdown

All active threads sampled at 100% "Running" — genuinely on-CPU, no lock contention / priority inversion observed. The bottleneck is main-thread throughput, not blocking.

Idle (25 s window)

Thread On-CPU Share Role
Main thread 9,047 ms 37.5% QPainter software rasterization
Thread (pooled) 13,346 ms 55.3% combined Qt raster engine parallel span fills
AudioEngine 364 ms 1.5% Opus + r8b resampler + EQ / NR DSP
PanadapterStream 287 ms 1.2% VITA-49 parse + signal emit
AetherSDR (Metal/GCD ×4) ~1,060 ms ~4% QRhiMetal GPU submit + libdispatch

Panadapter drag (120 s window) — total CPU ~doubles

Thread On-CPU Share Δ vs idle
Thread (pooled) 85,879 ms 48.1% pinned ~0.72 cores
Main thread 77,160 ms 43.3% 362 → 643 ms/s
AudioEngine 3,231 ms 1.8% unchanged rate
PanadapterStream 3,075 ms 1.7% unchanged rate
com.apple.NSEventThread 855 ms 0.5% input delivery is not the bottleneck

The 5 pooled threads are not DSP workers

They are Qt's raster paint engine auto-threading large fills (QThreadPool): blend_untransformed_argb, comp_func_solid_SourceOver_neon, blend_color_argb. Critically, the main-thread raster engine dispatches spans to the pool and then blocks waiting for them, so main + pool is effectively one synchronous workload (~93% of CPU), and every paint stalls the event loop for the full rasterize-and-fill duration.


Root cause of the sluggish panadapter drag

The drag trace shows the full main-thread repaint chain (inclusive):

QApplication::notify                         13,900 ms
└─ QMainWindow::event                        11,426 ms
   └─ QWidgetRepaintManager::sync/flush      11,357 ms   ← full backingstore repaint cycle
      └─ QWidgetPrivate::drawWidget           9,434 ms
         └─ paintSiblingsRecursive            6,464 ms   ← THE smoking gun
            ├─ WaveformWidget::paintEvent     2,858 ms   ← audio scope (software QPainter)
            ├─ SpectrumWidget renderGpuFrame  2,376 ms   ← expected (pan moves)
            ├─ QPushButton::paintEvent        1,927 ms   ← buttons repainting during a pan drag (!)
            └─ SMeterWidget::paintEvent         750 ms

paintSiblingsRecursive at 6.5 s is the key finding. Each drag frame does not merely redraw the panadapter — Qt walks and re-rasterizes the widgets adjacent to it in the invalidated region: the audio scope, the meters, and a panel of QPushButtons. This is the classic signature of either:

  • non-opaque widgets (missing Qt::WA_OpaquePaintEvent / autoFillBackground) forcing Qt to repaint siblings behind/around the dirty region, or
  • a frequency-change signal cascade calling update() over a region wide enough to overlap those siblings (doActivate signal emission = 1,721 ms during the drag).

So the scope/meters are real contributors but collateral — pulled in by an over-broad repaint, alongside buttons that should never redraw during a pan drag.

Second drag-specific cost: text re-shaped every frame

Largely invisible at idle, significant during drag:

QTextEngine::shapeTextWithHarfbuzzNG          779 ms (main)
QPainterPrivate::drawTextItem                 758 ms
QRasterPaintEngine::drawCachedGlyphs          673 ms
QCoreTextFontEngine glyphIndex/loadAdvances/stringToCMap  ~520 ms

Text is HarfBuzz-shaped from scratch on every repaint — not just the VFO frequency (which legitimately changes during a drag) but apparently static button/meter labels too. Shaping is one of the most expensive per-frame operations and is almost entirely cacheable.


Where the raster cost originates (idle inclusive attribution)

Paint entry (your code unless noted) Inclusive
QWidgetPrivate::drawWidget (compositing root) 2,222 ms
QRasterPaintEngine::fill 1,012 ms
WaveformWidget::paintEvent 991 ms
WaveformWidget::drawGraph 836 ms
QRasterPaintEnginePrivate::rasterize 825 ms
SMeterWidget::paintEvent 249 ms
QRasterPaintEngine::drawTextItem 142 ms
VfoWidget::paintEvent 120 ms
SpectrumWidget::renderGpuFrame (GPU path) 89 ms

Hottest leaf symbols are all antialiased polygon rasterization (gray_set_cell 411 ms, gray_render_scanline/line, qt_alphamapblit, qt_memfill32).


Subsystem analysis & proposed optimizations

1. Repaint region / compositing (NEW — highest leverage for drag latency)

The single biggest lever for the sluggish drag is to stop the drag from invalidating sibling widgets. Fixing this collapses paintSiblingsRecursive (6.5 s) plus the button/scope/meter repaints (~5.5 s) that ride on it.

  • Audit what the panadapter drag invalidates — ensure mouse-move handling updates only the spectrum/overlay region, not a region (or a parent) overlapping the scope, meters, and button panel.
  • Ensure neighboring widgets set Qt::WA_OpaquePaintEvent (and/or opaque autoFillBackground) so Qt does not repaint them as siblings of the dirty region.
  • Check the frequency-change signal path (doActivate 1.7 s during drag) — coalesce/limit the widgets that update() per mouse-move so a single drag step doesn't fan out into a full-window repaint.
  • Confirm QPushButtons are opaque and not in the invalidated region — they should never repaint during a pan drag.

2. Widgets (highest leverage for overall CPU)

WaveformWidget::drawGraph (src/gui/WaveformWidget.cpp:447) per frame builds four full-width QPainterPaths (peak/RMS top/bottom, one node per pixel column), issues a per-column drawLine, draws them antialiased, and (in drawEnvelope) does a full-plot alpha fillPath — at up to 24 Hz (WaveformWidget.h:113).

  • Batch per-column min/max bars into one drawLines(QVector<QLineF>).
  • Replace the four QPainterPaths with drawPolyline() over prebuilt QPointF arrays (materially cheaper to rasterize).
  • Disable AA on the dense vertical min/max bars; keep AA only on the thin peak/RMS traces.
  • In drawEnvelope, replace the full-plot alpha fillPath with a precomputed gradient brush, or make it optional.
  • Lower / make adaptive the 24 Hz default (e.g. 15 Hz, or back off on near-silent audio).
  • Structural win: port WaveformWidget to the existing QRhi/GPU path (infra already exists for SpectrumWidget); a scope is a trivial GPU workload and would erase most of the combined ~22 s raster cost.

SMeterWidget / VfoWidget:

  • Cache shaped text (QStaticText / cached glyph runs); re-shape only when the value string changes. ~2 s recoverable during drag.
  • Verify repaint cadence vs. MeterSmoother output rate.

3. GPU (SpectrumWidget / QRhiMetal) — healthy, minor tuning

GPU path is efficient (renderGpuFrame 89 ms idle; ~2.4 s during a 120 s drag, expected since the pan moves).

  • Skip beginPass / frame submission when there is no new FFT/waterfall data (avoid a render pass per vsync when the model is unchanged).
  • Confirm waterfall uses a ring/scroll texture rather than full-texture re-upload per row (enqueueSubresUpload ~194 ms during drag).
  • If the scope moves to GPU (§2), validate one swapchain present per vsync and no Metal queue oversubscription.

4. Panadapter / VITA-49 (PanadapterStream) — healthy, watch allocations

Low weight (287 ms idle, 3.1 s during drag), dominated by expected processDatagram, recvmsg/recvfrom, and queued-signal dispatch.

  • Reuse a pooled buffer for the per-row QList<float> instead of allocating per datagram (QList<float>::fill + per-datagram QQueuedMetaCallEvent show up) — matters more at multi-pan / fast-waterfall rates.
  • Re-profile under multiple panadapters + fast waterfall.

5. Audio (AudioEngine) — healthy

Only 364 ms idle / 3.2 s during drag (rate unchanged), spread across expected DSP: op_pvq_search_c/celt_pitch_xcorr_c (Opus), r8b::ooura_fft/CDSPBlockConvolver (resampler), ClientEq::process, tanhf (tube/saturation), remove_doubling. CoreAudio IO thread never blocks.

  • Optional: replace per-sample tanhf in the tube stage with a polynomial/LUT if TX CPU ever becomes a concern.
  • Re-profile during TX (Opus encode + tube pre-amp + RN2/NR2/NR4/DFNR).

Suggested priority

  1. Narrow the drag repaint region (§1) — biggest win for the user-visible sluggish drag; collapses ~12 s of sibling repaints + button redraws.
  2. WaveformWidget paint cost (§2) — batched drawLines + polyline + selective AA (safe, no design change), then evaluate GPU port.
  3. Cache shaped text in VFO + S-meter (~2 s during drag, easy).
  4. Refresh-rate review for scope + meters (UX decision).
  5. GPU skip-frame-when-unchanged + panadapter buffer pooling (forward-looking).

Follow-ups / additional profiling

  • TX path (Opus encode + tube pre-amp + RN2/NR2/NR4/DFNR).
  • Multiple panadapters + fast waterfall.
  • Idle/near-silent audio (does the scope still repaint at 24 Hz?).
  • Allocations (Instruments "Allocations") to quantify per-frame malloc churn in paint + VITA-49.

Both .trace captures are reproducible with the xctrace command above; happy to attach exported summaries or a focused sub-trace on request.

Metadata

Metadata

Assignees

No one assigned

    Labels

    GUIUser interfaceaudioAudio engine and streamingenhancementImprovement to existing featuremacOSmacOS-specific issuemaintainer-reviewRequires maintainer review before any action is takenpriority: mediumMedium priorityrefactorCode cleanup, restructuring, or consolidation — no user-visible behavior change

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions