`extrema` could be faster #44606

LilithHafner · 2022-03-13T21:47:22Z

This implementation seems to be slightly faster than extrema:

function _extrema(v)
    mn = mx = first(v)
    @inbounds for i in firstindex(v)+1:lastindex(v)
        vi = v[i]
        vi < mn && (mn = vi)
        mx < vi && (mx = vi)
    end
    mn, mx
end

@belapsed extrema(x) setup=(x=rand(Int, 10_000))
# 2.475222222222222e-6
@belapsed _extrema(x) setup=(x=rand(Int, 10_000))
# 2.0984e-6

@belapsed extrema(x) setup=(x=rand(Int, 10))
# 1.018937875751503e-8
@belapsed _extrema(x) setup=(x=rand(Int, 10))
# 7.78878878878879e-9

Originally posted by @LilithHafner in #44230 (comment)

The text was updated successfully, but these errors were encountered:

Moelf · 2022-03-14T03:15:44Z

Extrema is slower than maximum + minimum #31442

N5N3 · 2022-03-14T04:49:52Z

#31442 was mainly for float case though. (LLVM vectorlize Integer's extrema well).
Some of the performance difference comes from the default pairwise_blocksize(f, op) = 1024, which might be typemax(Int) for minimum/maximum/extrema, as the result should be the same.
The rest difference seems confusing. If I follows mapreduce_impl style

function __extrema(v)
    mn = mx = first(v)
    @inbounds if length(v) > 1
        mn = min(mn, v[2])
        mx = max(mx, v[2])
        for i in firstindex(v)+2:lastindex(v)
            vi = v[i]
            vi < mn && (mn = vi)
            mx < vi && (mx = vi)
        end
    end
    mn, mx
end

Then we have

julia> x = rand(Int, 4096);

julia> @btime _extrema($x);
  607.910 ns (0 allocations: 0 bytes)

julia> @btime __extrema($x);
  682.237 ns (0 allocations: 0 bytes)

julia> @btime extrema($x);
  724.627 ns (0 allocations: 0 bytes)

julia> Base.pairwise_blocksize(::Base.ExtremaMap, ::typeof(Base._extrema_rf)) = typemax(Int)

julia> @btime extrema($x);
  688.000 ns (0 allocations: 0 bytes) # quite close to `__extrema` !!!

ViralBShah · 2022-03-14T15:42:16Z

Related: #34790

@benchmark

This method is no longer an optimization over what Julia can do with the naive definition on most (if not all) architectures. Like JuliaLang#58267, I asked for a smattering of crowdsourced multi-architecture benchmarking of this simple example: ```julia using BenchmarkTools A = rand(10000); b1 = @benchmark extrema($A) b2 = @benchmark mapreduce(x->(x,x),((min1, max1), (min2, max2))->(min(min1, min2), max(max1, max2)), $A) println("$(Sys.CPU_NAME): $(round(median(b1).time/median(b2).time, digits=1))x faster") ``` With results: ```txt cortex-a72: 13.2x faster cortex-a76: 15.8x faster neoverse-n1: 16.4x faster neoverse-v2: 23.4x faster a64fx: 46.5x faster apple-m1: 54.9x faster apple-m4*: 43.7x faster znver2: 8.6x faster znver4: 12.8x faster znver5: 16.7x faster haswell (32-bit): 3.5x faster skylake-avx512: 7.4x faster rocketlake: 7.8x faster alderlake: 5.2x faster cascadelake: 8.8x faster cascadelake: 7.1x faster ``` The results are even more dramatic for Float32s, here on my M1: ```julia julia> A = rand(Float32, 10000); julia> @benchmark extrema($A) BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample. Range (min … max): 49.083 μs … 151.750 μs ┊ GC (min … max): 0.00% … 0.00% Time (median): 49.375 μs ┊ GC (median): 0.00% Time (mean ± σ): 49.731 μs ± 2.350 μs ┊ GC (mean ± σ): 0.00% ± 0.00% ▅██▅▁ ▁▂▂ ▁▂▁ ▂ ██████▇▇▇▇█▇████▇▆▆▆▇▇███▇▇▆▆▆▅▆▅▃▄▃▄▅▄▄▆▆▅▃▁▄▃▅▄▅▄▄▁▄▄▅▃▄▁▄ █ 49.1 μs Histogram: log(frequency) by time 56.8 μs < Memory estimate: 0 bytes, allocs estimate: 0. julia> @benchmark mapreduce(x->(x,x),((min1, max1), (min2, max2))->(min(min1, min2), max(max1, max2)), $A) BenchmarkTools.Trial: 10000 samples with 191 evaluations per sample. Range (min … max): 524.435 ns … 1.104 μs ┊ GC (min … max): 0.00% … 0.00% Time (median): 525.089 ns ┊ GC (median): 0.00% Time (mean ± σ): 529.323 ns ± 20.876 ns ┊ GC (mean ± σ): 0.00% ± 0.00% █▃ ▁ ▃▃ ▁ █████▇███▇███▇▇▇▇▇▇▇▇▅▆▆▆▆▆▅▅▄▆▃▄▄▃▅▅▄▃▅▄▄▄▅▅▅▃▅▄▄▁▄▄▅▆▄▄▅▄▅ █ 524 ns Histogram: log(frequency) by time 609 ns < Memory estimate: 0 bytes, allocs estimate: 0. ``` Closes JuliaLang#34790, closes JuliaLang#31442, closes JuliaLang#44606. --------- Co-authored-by: Mosè Giordano <[email protected]>

oscardssmith added the performance Must go faster label Mar 13, 2022

LilithHafner mentioned this issue Mar 16, 2022

Add radix sort #44230

Merged

7 tasks

mbauman mentioned this issue Apr 29, 2025

Speed up extrema 3-50x #58280

Merged

oscardssmith closed this as completed in #58280 May 2, 2025

oscardssmith closed this as completed in bb7b6e7 May 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

`extrema` could be faster #44606

`extrema` could be faster #44606

LilithHafner commented Mar 13, 2022

Moelf commented Mar 14, 2022 •

edited

Loading

Uh oh!

N5N3 commented Mar 14, 2022 •

edited

Loading

Uh oh!

ViralBShah commented Mar 14, 2022

Uh oh!

Uh oh!

extrema could be faster #44606

extrema could be faster #44606

Comments

LilithHafner commented Mar 13, 2022

Moelf commented Mar 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

N5N3 commented Mar 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ViralBShah commented Mar 14, 2022

Uh oh!

`extrema` could be faster #44606

`extrema` could be faster #44606

Moelf commented Mar 14, 2022 •

edited

Loading

N5N3 commented Mar 14, 2022 •

edited

Loading