Skip to content

Conversation

@michaelciraci
Copy link

@michaelciraci michaelciraci commented Oct 11, 2024

Right now, the sum for Complex floats do not auto-vectorize. This uses an intermediate data type during sum to vectorize the sum. There is no unsafe, and on my computer I get almost a 4x speed improvement for f32:

sum_simd                time:   [5.6591 µs 5.7808 µs 5.9707 µs]
Found 10 outliers among 100 measurements (10.00%)
  4 (4.00%) high mild
  6 (6.00%) high severe

sum_scalar              time:   [19.787 µs 20.187 µs 20.938 µs]
Found 15 outliers among 100 measurements (15.00%)
  3 (3.00%) high mild
  12 (12.00%) high severe

I made a repo if you want to test the results yourself: https://github.com/michaelciraci/num-complex-simd-comparison

This however would technically be a breaking change, due to the order that the floats are summed (float1 + float2 + float3 may not equal float3 + float2 + float1).

This however might be an opportunity to have an SIMD feature for floats.

I waited to implement SIMD product to see what route you wanted to go down (if you were interested at all).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant