Summary
CPU profiling of a live AetherSDR session shows that ~92% of process CPU is Qt software rasterization (QPainter/QRasterPaintEngine), not DSP and not the GPU spectrum path. Two scenarios were captured:
- Idle / steady-state RX — cost concentrated in
QPainter-rendered widgets (audio scope WaveformWidget, plus S-meter and VFO).
- Panadapter drag (the user-visible "sluggish drag" symptom) — total CPU roughly doubles, and the trace exposes the root cause: each drag frame triggers a full-window backingstore repaint that re-rasterizes sibling widgets (scope, meters, and
QPushButtons) and re-shapes all text from scratch via HarfBuzz every frame.
The GPU spectrum/waterfall (QRhiMetal) and the audio DSP pipeline are both cheap and healthy. This issue documents the full profiler report and proposes optimizations across widget, GPU, panadapter, and audio subsystems.
Profiling/analysis only — no code changed. Refresh-rate defaults and software-vs-GPU rendering are UX/architecture decisions, so this is filed for maintainer review per the autonomy boundaries in CLAUDE.md.
Methodology
- Tool:
xcrun xctrace record --template "Time Profiler" --attach <pid> (Instruments' Time Profiler engine, CLI), attached to a live session.
- Captures: 25 s idle/steady-state, and 120 s while actively dragging the panadapter.
- Symbolication: against the local
RelWithDebInfo build.
- Aggregation: exported the
time-profile table to XML; aggregated weight per thread, per thread-state (running vs. waiting), per leaf symbol, and inclusive per paint/event entry frame.
- Environment: Apple Silicon (SoC
t8142), macOS 26.5 (arm64), Qt 6.11.0 (Homebrew).
- Note: absolute ms = relative weight, not a worst-case ceiling. Idle window averaged ~0.97 cores; drag window averaged ~1.49 cores.
Per-thread CPU breakdown
All active threads sampled at 100% "Running" — genuinely on-CPU, no lock contention / priority inversion observed. The bottleneck is main-thread throughput, not blocking.
Idle (25 s window)
| Thread |
On-CPU |
Share |
Role |
| Main thread |
9,047 ms |
37.5% |
QPainter software rasterization |
5× Thread (pooled) |
13,346 ms |
55.3% combined |
Qt raster engine parallel span fills |
AudioEngine |
364 ms |
1.5% |
Opus + r8b resampler + EQ / NR DSP |
PanadapterStream |
287 ms |
1.2% |
VITA-49 parse + signal emit |
AetherSDR (Metal/GCD ×4) |
~1,060 ms |
~4% |
QRhiMetal GPU submit + libdispatch |
Panadapter drag (120 s window) — total CPU ~doubles
| Thread |
On-CPU |
Share |
Δ vs idle |
5× Thread (pooled) |
85,879 ms |
48.1% |
pinned ~0.72 cores |
| Main thread |
77,160 ms |
43.3% |
362 → 643 ms/s |
AudioEngine |
3,231 ms |
1.8% |
unchanged rate |
PanadapterStream |
3,075 ms |
1.7% |
unchanged rate |
com.apple.NSEventThread |
855 ms |
0.5% |
input delivery is not the bottleneck |
The 5 pooled threads are not DSP workers
They are Qt's raster paint engine auto-threading large fills (QThreadPool): blend_untransformed_argb, comp_func_solid_SourceOver_neon, blend_color_argb. Critically, the main-thread raster engine dispatches spans to the pool and then blocks waiting for them, so main + pool is effectively one synchronous workload (~93% of CPU), and every paint stalls the event loop for the full rasterize-and-fill duration.
Root cause of the sluggish panadapter drag
The drag trace shows the full main-thread repaint chain (inclusive):
QApplication::notify 13,900 ms
└─ QMainWindow::event 11,426 ms
└─ QWidgetRepaintManager::sync/flush 11,357 ms ← full backingstore repaint cycle
└─ QWidgetPrivate::drawWidget 9,434 ms
└─ paintSiblingsRecursive 6,464 ms ← THE smoking gun
├─ WaveformWidget::paintEvent 2,858 ms ← audio scope (software QPainter)
├─ SpectrumWidget renderGpuFrame 2,376 ms ← expected (pan moves)
├─ QPushButton::paintEvent 1,927 ms ← buttons repainting during a pan drag (!)
└─ SMeterWidget::paintEvent 750 ms
paintSiblingsRecursive at 6.5 s is the key finding. Each drag frame does not merely redraw the panadapter — Qt walks and re-rasterizes the widgets adjacent to it in the invalidated region: the audio scope, the meters, and a panel of QPushButtons. This is the classic signature of either:
- non-opaque widgets (missing
Qt::WA_OpaquePaintEvent / autoFillBackground) forcing Qt to repaint siblings behind/around the dirty region, or
- a frequency-change signal cascade calling
update() over a region wide enough to overlap those siblings (doActivate signal emission = 1,721 ms during the drag).
So the scope/meters are real contributors but collateral — pulled in by an over-broad repaint, alongside buttons that should never redraw during a pan drag.
Second drag-specific cost: text re-shaped every frame
Largely invisible at idle, significant during drag:
QTextEngine::shapeTextWithHarfbuzzNG 779 ms (main)
QPainterPrivate::drawTextItem 758 ms
QRasterPaintEngine::drawCachedGlyphs 673 ms
QCoreTextFontEngine glyphIndex/loadAdvances/stringToCMap ~520 ms
Text is HarfBuzz-shaped from scratch on every repaint — not just the VFO frequency (which legitimately changes during a drag) but apparently static button/meter labels too. Shaping is one of the most expensive per-frame operations and is almost entirely cacheable.
Where the raster cost originates (idle inclusive attribution)
| Paint entry (your code unless noted) |
Inclusive |
QWidgetPrivate::drawWidget (compositing root) |
2,222 ms |
QRasterPaintEngine::fill |
1,012 ms |
WaveformWidget::paintEvent |
991 ms |
WaveformWidget::drawGraph |
836 ms |
QRasterPaintEnginePrivate::rasterize |
825 ms |
SMeterWidget::paintEvent |
249 ms |
QRasterPaintEngine::drawTextItem |
142 ms |
VfoWidget::paintEvent |
120 ms |
SpectrumWidget::renderGpuFrame (GPU path) |
89 ms ✅ |
Hottest leaf symbols are all antialiased polygon rasterization (gray_set_cell 411 ms, gray_render_scanline/line, qt_alphamapblit, qt_memfill32).
Subsystem analysis & proposed optimizations
1. Repaint region / compositing (NEW — highest leverage for drag latency)
The single biggest lever for the sluggish drag is to stop the drag from invalidating sibling widgets. Fixing this collapses paintSiblingsRecursive (6.5 s) plus the button/scope/meter repaints (~5.5 s) that ride on it.
2. Widgets (highest leverage for overall CPU)
WaveformWidget::drawGraph (src/gui/WaveformWidget.cpp:447) per frame builds four full-width QPainterPaths (peak/RMS top/bottom, one node per pixel column), issues a per-column drawLine, draws them antialiased, and (in drawEnvelope) does a full-plot alpha fillPath — at up to 24 Hz (WaveformWidget.h:113).
SMeterWidget / VfoWidget:
3. GPU (SpectrumWidget / QRhiMetal) — healthy, minor tuning
GPU path is efficient (renderGpuFrame 89 ms idle; ~2.4 s during a 120 s drag, expected since the pan moves).
4. Panadapter / VITA-49 (PanadapterStream) — healthy, watch allocations
Low weight (287 ms idle, 3.1 s during drag), dominated by expected processDatagram, recvmsg/recvfrom, and queued-signal dispatch.
5. Audio (AudioEngine) — healthy
Only 364 ms idle / 3.2 s during drag (rate unchanged), spread across expected DSP: op_pvq_search_c/celt_pitch_xcorr_c (Opus), r8b::ooura_fft/CDSPBlockConvolver (resampler), ClientEq::process, tanhf (tube/saturation), remove_doubling. CoreAudio IO thread never blocks.
Suggested priority
- Narrow the drag repaint region (§1) — biggest win for the user-visible sluggish drag; collapses ~12 s of sibling repaints + button redraws.
WaveformWidget paint cost (§2) — batched drawLines + polyline + selective AA (safe, no design change), then evaluate GPU port.
- Cache shaped text in VFO + S-meter (~2 s during drag, easy).
- Refresh-rate review for scope + meters (UX decision).
- GPU skip-frame-when-unchanged + panadapter buffer pooling (forward-looking).
Follow-ups / additional profiling
Both .trace captures are reproducible with the xctrace command above; happy to attach exported summaries or a focused sub-trace on request.
Summary
CPU profiling of a live AetherSDR session shows that ~92% of process CPU is Qt software rasterization (
QPainter/QRasterPaintEngine), not DSP and not the GPU spectrum path. Two scenarios were captured:QPainter-rendered widgets (audio scopeWaveformWidget, plus S-meter and VFO).QPushButtons) and re-shapes all text from scratch via HarfBuzz every frame.The GPU spectrum/waterfall (
QRhiMetal) and the audio DSP pipeline are both cheap and healthy. This issue documents the full profiler report and proposes optimizations across widget, GPU, panadapter, and audio subsystems.Methodology
xcrun xctrace record --template "Time Profiler" --attach <pid>(Instruments' Time Profiler engine, CLI), attached to a live session.RelWithDebInfobuild.time-profiletable to XML; aggregated weight per thread, per thread-state (running vs. waiting), per leaf symbol, and inclusive per paint/event entry frame.t8142), macOS 26.5 (arm64), Qt 6.11.0 (Homebrew).Per-thread CPU breakdown
All active threads sampled at 100% "Running" — genuinely on-CPU, no lock contention / priority inversion observed. The bottleneck is main-thread throughput, not blocking.
Idle (25 s window)
QPaintersoftware rasterizationThread (pooled)AudioEnginePanadapterStreamAetherSDR(Metal/GCD ×4)QRhiMetalGPU submit + libdispatchPanadapter drag (120 s window) — total CPU ~doubles
Thread (pooled)AudioEnginePanadapterStreamcom.apple.NSEventThreadThe 5 pooled threads are not DSP workers
They are Qt's raster paint engine auto-threading large fills (
QThreadPool):blend_untransformed_argb,comp_func_solid_SourceOver_neon,blend_color_argb. Critically, the main-thread raster engine dispatches spans to the pool and then blocks waiting for them, somain + poolis effectively one synchronous workload (~93% of CPU), and every paint stalls the event loop for the full rasterize-and-fill duration.Root cause of the sluggish panadapter drag
The drag trace shows the full main-thread repaint chain (inclusive):
paintSiblingsRecursiveat 6.5 s is the key finding. Each drag frame does not merely redraw the panadapter — Qt walks and re-rasterizes the widgets adjacent to it in the invalidated region: the audio scope, the meters, and a panel ofQPushButtons. This is the classic signature of either:Qt::WA_OpaquePaintEvent/autoFillBackground) forcing Qt to repaint siblings behind/around the dirty region, orupdate()over a region wide enough to overlap those siblings (doActivatesignal emission = 1,721 ms during the drag).So the scope/meters are real contributors but collateral — pulled in by an over-broad repaint, alongside buttons that should never redraw during a pan drag.
Second drag-specific cost: text re-shaped every frame
Largely invisible at idle, significant during drag:
Text is HarfBuzz-shaped from scratch on every repaint — not just the VFO frequency (which legitimately changes during a drag) but apparently static button/meter labels too. Shaping is one of the most expensive per-frame operations and is almost entirely cacheable.
Where the raster cost originates (idle inclusive attribution)
QWidgetPrivate::drawWidget(compositing root)QRasterPaintEngine::fillWaveformWidget::paintEventWaveformWidget::drawGraphQRasterPaintEnginePrivate::rasterizeSMeterWidget::paintEventQRasterPaintEngine::drawTextItemVfoWidget::paintEventSpectrumWidget::renderGpuFrame(GPU path)Hottest leaf symbols are all antialiased polygon rasterization (
gray_set_cell411 ms,gray_render_scanline/line,qt_alphamapblit,qt_memfill32).Subsystem analysis & proposed optimizations
1. Repaint region / compositing (NEW — highest leverage for drag latency)
The single biggest lever for the sluggish drag is to stop the drag from invalidating sibling widgets. Fixing this collapses
paintSiblingsRecursive(6.5 s) plus the button/scope/meter repaints (~5.5 s) that ride on it.Qt::WA_OpaquePaintEvent(and/or opaqueautoFillBackground) so Qt does not repaint them as siblings of the dirty region.doActivate1.7 s during drag) — coalesce/limit the widgets thatupdate()per mouse-move so a single drag step doesn't fan out into a full-window repaint.QPushButtons are opaque and not in the invalidated region — they should never repaint during a pan drag.2. Widgets (highest leverage for overall CPU)
WaveformWidget::drawGraph(src/gui/WaveformWidget.cpp:447) per frame builds four full-widthQPainterPaths (peak/RMS top/bottom, one node per pixel column), issues a per-columndrawLine, draws them antialiased, and (indrawEnvelope) does a full-plot alphafillPath— at up to 24 Hz (WaveformWidget.h:113).drawLines(QVector<QLineF>).QPainterPaths withdrawPolyline()over prebuiltQPointFarrays (materially cheaper to rasterize).drawEnvelope, replace the full-plot alphafillPathwith a precomputed gradient brush, or make it optional.WaveformWidgetto the existing QRhi/GPU path (infra already exists forSpectrumWidget); a scope is a trivial GPU workload and would erase most of the combined ~22 s raster cost.SMeterWidget/VfoWidget:QStaticText/ cached glyph runs); re-shape only when the value string changes. ~2 s recoverable during drag.MeterSmootheroutput rate.3. GPU (
SpectrumWidget/QRhiMetal) — healthy, minor tuningGPU path is efficient (
renderGpuFrame89 ms idle; ~2.4 s during a 120 s drag, expected since the pan moves).beginPass/ frame submission when there is no new FFT/waterfall data (avoid a render pass per vsync when the model is unchanged).enqueueSubresUpload~194 ms during drag).4. Panadapter / VITA-49 (
PanadapterStream) — healthy, watch allocationsLow weight (287 ms idle, 3.1 s during drag), dominated by expected
processDatagram,recvmsg/recvfrom, and queued-signal dispatch.QList<float>instead of allocating per datagram (QList<float>::fill+ per-datagramQQueuedMetaCallEventshow up) — matters more at multi-pan / fast-waterfall rates.5. Audio (
AudioEngine) — healthyOnly 364 ms idle / 3.2 s during drag (rate unchanged), spread across expected DSP:
op_pvq_search_c/celt_pitch_xcorr_c(Opus),r8b::ooura_fft/CDSPBlockConvolver(resampler),ClientEq::process,tanhf(tube/saturation),remove_doubling. CoreAudio IO thread never blocks.tanhfin the tube stage with a polynomial/LUT if TX CPU ever becomes a concern.Suggested priority
WaveformWidgetpaint cost (§2) — batcheddrawLines+ polyline + selective AA (safe, no design change), then evaluate GPU port.Follow-ups / additional profiling
Both
.tracecaptures are reproducible with thexctracecommand above; happy to attach exported summaries or a focused sub-trace on request.