Skip to content

[Autoloop: tsb-perf-evolve]#255

Merged
mrjf merged 1 commit into
mainfrom
autoloop/tsb-perf-evolve
Apr 30, 2026
Merged

[Autoloop: tsb-perf-evolve]#255
mrjf merged 1 commit into
mainfrom
autoloop/tsb-perf-evolve

Conversation

@github-actions
Copy link
Copy Markdown
Contributor

🤖 This PR is maintained by Autoloop. Each accepted iteration adds a commit to this branch.

Program Goal

Minimize fitness = tsb_mean_ms / pandas_mean_ms for Series.sortValues at n=100k. Lower is better; < 1.0 means tsb beats pandas.

Current best metric: 21.841 (tsb≈116ms / pandas≈5.34ms, iteration 28 — merged via #249)

Iteration 29: AoS scatter layout

Switch the LSD radix sort ping-pong buffers from SoA (6 separate typed arrays: _rxA_idx, _rxA_lo, _rxA_hi, _rxB_idx, _rxB_lo, _rxB_hi) to a single AoS layout (_rxA, _rxB — each element uses 3 consecutive uint32 words: [origRowIdx, loKey, hiKey]).

With SoA, each scatter step writes 3 words to 3 separate large arrays (random positions), touching 3 separate cache lines per element. With AoS, all 3 writes target consecutive addresses in a single array, hitting one cache line per element — ~3× fewer cache-line evictions during scatter.

Hypothesis: The 8×n random scatter writes are the dominant bottleneck at n=100k. Packing all three fields into one cache line per element should reduce cache pressure and improve throughput.

Invariants preserved: same algorithm (8-pass LSD radix), same public signature, same NaN/null handling, same sort correctness. Tests unchanged.

Related issue: #189

Run: https://github.com/githubnext/tsessebe/actions/runs/25183052353

Generated by Autoloop · ● 4.4M ·

Switch the radix sort ping-pong buffers from SoA (_rxA_idx, _rxA_lo, _rxA_hi,
_rxB_idx, _rxB_lo, _rxB_hi — 6 separate typed arrays) to a single AoS layout
(_rxA, _rxB — each element occupies 3 consecutive uint32 words: [origRowIdx, loKey,
hiKey]). With AoS, all three scatter writes per element target the same cache line
(12 consecutive bytes), reducing random-write cache-line pressure ~3× versus the
previous SoA layout where each write touched a separate cache line in a separate
1MB buffer.

Run: https://github.com/githubnext/tsessebe/actions/runs/25183052353

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@mrjf mrjf marked this pull request as ready for review April 30, 2026 21:13
@mrjf mrjf merged commit 36a2857 into main Apr 30, 2026
12 checks passed
@mrjf mrjf deleted the autoloop/tsb-perf-evolve branch April 30, 2026 22:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant