Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add precompile signatures to Markdown to reduce latency. #55715

Merged
merged 7 commits into from
Sep 9, 2024

Conversation

PallHaraldsson
Copy link
Contributor

@PallHaraldsson PallHaraldsson commented Sep 8, 2024

Fixes #55706 that is seemingly a 4472x regression, not just 16x (was my first guess, based on CondaPkg, also fixes or greatly mitigates JuliaPy/CondaPkg.jl#145), and large part of 3x regression for PythonCall.

@IanButterworth
Copy link
Sponsor Member

Does it need to be wrapped like here:

Yes, in case people are using Markdown without compiled modules.

@PallHaraldsson PallHaraldsson marked this pull request as draft September 8, 2024 18:10
@IanButterworth IanButterworth added the backport 1.11 Change should be backported to release-1.11 label Sep 8, 2024
@IanButterworth
Copy link
Sponsor Member

It's generally less brittle and cross-platform safe to add a precompile workload, as long as its self contained. Seems possible for this?

@PallHaraldsson
Copy link
Contributor Author

PallHaraldsson commented Sep 8, 2024

I'll fix that trivial thing. Last time I checked Markdown didn't work, as is on 1.12-DEV so I kind of expect CI errors... But those will be unrelated, so should I make a PR to some (1.11) branch?

@KristofferC
Copy link
Sponsor Member

Just change it to a workload and it can be backported.

@KristofferC KristofferC changed the title Fix massive Markdown first use latency regression Add precompile signatures to Markdown to reduce latency. Sep 8, 2024
@PallHaraldsson
Copy link
Contributor Author

You mean like Markdown.mdexpr(" ... ") workload? Or even with an empty string? Either way, I'm not sure where, just in same place?! Or rather the macro version: md" ... "? I was timing the former, since the timing for the latter was bogus for some reason... It's functions/methods that get precompiled, but not macros? It's unclear to me if the latter is slower, and macros should sometimes, then, be avoided...

@PallHaraldsson
Copy link
Contributor Author

Don't try to run this until I (change and) undraft it. It seems to have opened a can of worms, see here: JuliaLang/Pkg.jl#4017

@PallHaraldsson PallHaraldsson marked this pull request as ready for review September 8, 2024 19:30
@PallHaraldsson
Copy link
Contributor Author

PallHaraldsson commented Sep 8, 2024

I changed and undrafted, but I'm not sure if this is the right place for the workload. Also before rc3 may have been broken, because of my older change, not sure about this one. Unchanged RC2 is at least known ok, maybe the bug I had has always been there in RC3.

The speed regression is from 1.10, but actually also from rc1, even rc2 if I recall.

@PallHaraldsson
Copy link
Contributor Author

PallHaraldsson commented Sep 8, 2024

I think this already works, but can be improved. The workload helps to start Markdown, but is seemingly not realistic enough, since this is improved but not as much as I would like, likely because I see Markdown there:

$ julia +1.11 --trace-compile=stderr -e "@time using CondaPkg"
precompile(Tuple{typeof(Base.setindex!), Base.EnvDict, Bool, String})
precompile(Tuple{typeof(Base.Threads.lock_profiling), Bool})
precompile(Tuple{typeof(Base.getindex), Base.Threads.Atomic{Int64}})
precompile(Tuple{typeof(Base.gc_num)})
precompile(Tuple{typeof(Base.cumulative_compile_timing), Bool})
precompile(Tuple{typeof(Base.cumulative_compile_time_ns)})
precompile(Tuple{typeof(micromamba_jll.find_artifact_dir)})
precompile(Tuple{typeof(Base.invokelatest), Any})
precompile(Tuple{typeof(JLLWrappers.get_julia_libpaths)})
precompile(Tuple{typeof(Base.getproperty), Markdown.MD, Symbol})
precompile(Tuple{typeof(Markdown.hashheader), Base.GenericIOBuffer{GenericMemory{:not_atomic, UInt8, Core.AddrSpace{Core}(0x00)}}, Markdown.MD})
precompile(Tuple{Type{NamedTuple{(:eat,), T} where T<:Tuple}, Tuple{Bool}})
precompile(Tuple{typeof(Markdown.list), Base.GenericIOBuffer{GenericMemory{:not_atomic, UInt8, Core.AddrSpace{Core}(0x00)}}, Markdown.MD})
precompile(Tuple{Type{NamedTuple{(:padding,), T} where T<:Tuple}, Tuple{Bool}})
precompile(Tuple{typeof(Base.signbit), Int64})
precompile(Tuple{typeof(Markdown.fencedcode), Base.GenericIOBuffer{GenericMemory{:not_atomic, UInt8, Core.AddrSpace{Core}(0x00)}}, Markdown.MD})
precompile(Tuple{typeof(Markdown.blockquote), Base.GenericIOBuffer{GenericMemory{:not_atomic, UInt8, Core.AddrSpace{Core}(0x00)}}, Markdown.MD})
precompile(Tuple{Type{NamedTuple{(:allow_whitespace, :allowempty, :eat), T} where T<:Tuple}, Tuple{Bool, Bool, Bool}})
precompile(Tuple{typeof(Markdown.admonition), Base.GenericIOBuffer{GenericMemory{:not_atomic, UInt8, Core.AddrSpace{Core}(0x00)}}, Markdown.MD})
precompile(Tuple{Type{NamedTuple{(:rep,), T} where T<:Tuple}, Tuple{Bool}})
precompile(Tuple{typeof(Markdown.blocktex), Base.GenericIOBuffer{GenericMemory{:not_atomic, UInt8, Core.AddrSpace{Core}(0x00)}}, Markdown.MD})
precompile(Tuple{typeof(Markdown.blockinterp), Base.GenericIOBuffer{GenericMemory{:not_atomic, UInt8, Core.AddrSpace{Core}(0x00)}}, Markdown.MD})
precompile(Tuple{typeof(Markdown.indentcode), Base.GenericIOBuffer{GenericMemory{:not_atomic, UInt8, Core.AddrSpace{Core}(0x00)}}, Markdown.MD})
precompile(Tuple{typeof(Markdown.footnote), Base.GenericIOBuffer{GenericMemory{:not_atomic, UInt8, Core.AddrSpace{Core}(0x00)}}, Markdown.MD})
precompile(Tuple{typeof(Markdown.github_table), Base.GenericIOBuffer{GenericMemory{:not_atomic, UInt8, Core.AddrSpace{Core}(0x00)}}, Markdown.MD})
precompile(Tuple{typeof(Markdown.horizontalrule), Base.GenericIOBuffer{GenericMemory{:not_atomic, UInt8, Core.AddrSpace{Core}(0x00)}}, Markdown.MD})
precompile(Tuple{typeof(Markdown.setextheader), Base.GenericIOBuffer{GenericMemory{:not_atomic, UInt8, Core.AddrSpace{Core}(0x00)}}, Markdown.MD})
precompile(Tuple{typeof(Base.:(==)), Char, Char})
precompile(Tuple{Type{NamedTuple{(:breaking,), T} where T<:Tuple}, Tuple{Bool}})
precompile(Tuple{typeof(Markdown.paragraph), Base.GenericIOBuffer{GenericMemory{:not_atomic, UInt8, Core.AddrSpace{Core}(0x00)}}, Markdown.MD})
precompile(Tuple{typeof(Markdown.inline_code), Base.GenericIOBuffer{GenericMemory{:not_atomic, UInt8, Core.AddrSpace{Core}(0x00)}}, Markdown.MD})
precompile(Tuple{typeof(Markdown.en_dash), Base.GenericIOBuffer{GenericMemory{:not_atomic, UInt8, Core.AddrSpace{Core}(0x00)}}, Markdown.MD})
precompile(Tuple{typeof(Base.copy!), Array{Any, 1}, Array{Any, 1}})
precompile(Tuple{typeof(Base.first), Array{Any, 1}})
precompile(Tuple{typeof(Base.Broadcast.broadcasted), typeof(Base.:(-)), Tuple{UInt64, UInt64}, Tuple{UInt64, UInt64}})
precompile(Tuple{typeof(Base.Broadcast.materialize), Base.Broadcast.Broadcasted{Base.Broadcast.Style{Tuple}, Nothing, typeof(Base.:(-)), Tuple{Tuple{UInt64, UInt64}, Tuple{UInt64, UInt64}}}})
precompile(Tuple{Type{Base.GC_Diff}, Base.GC_Num, Base.GC_Num})
precompile(Tuple{typeof(Base.getproperty), Base.GC_Diff, Symbol})
precompile(Tuple{Type{NamedTuple{(:value, :time, :bytes, :gctime, :gcstats, :lock_conflicts, :compile_time, :recompile_time), T} where T<:Tuple}, Tuple{Nothing, Float64, Int64, Float64, Base.GC_Diff, Int64, Float64, Float64}})
precompile(Tuple{typeof(Base.getproperty), NamedTuple{(:value, :time, :bytes, :gctime, :gcstats, :lock_conflicts, :compile_time, :recompile_time), Tuple{Nothing, Float64, Int64, Float64, Base.GC_Diff, Int64, Float64, Float64}}, Symbol})
precompile(Tuple{typeof(Base.:(*)), Float64, Float64})
precompile(Tuple{typeof(Base.gc_alloc_count), Base.GC_Diff})
precompile(Tuple{typeof(Core.kwcall), NamedTuple{(:msg,), Tuple{Nothing}}, typeof(Base.time_print), Base.TTY, Float64, Int64, Int64, Int64, Int64, Float64, Float64, Bool})
precompile(Tuple{Base.var"#1109#1110"{Nothing, Float64, Int64, Int64, Int64, Float64, Float64, Bool, String}, Base.GenericIOBuffer{GenericMemory{:not_atomic, UInt8, Core.AddrSpace{Core}(0x00)}}})
precompile(Tuple{typeof(Base.prettyprint_getunits), Int64, Int64, Int64})
precompile(Tuple{typeof(Base.Ryu.writefixed), Float64, Int64})
  3.523722 seconds (1.89 M allocations: 105.742 MiB, 4.43% gc time, 77.01% compilation time)

Pkg also has a 1543x regression:

@time using Pkg
  0.965739 seconds (549.50 k allocations: 34.620 MiB, 9.79% gc time, 0.65% compilation time)

and likely some of these will help (I don't know how I would get the same effect with a "workload"):

precompile(Tuple{typeof(Base.first), Array{Any, 1}})
precompile(Tuple{typeof(Base.setindex!), Base.EnvDict, Bool, String})

maybe even just the first one, or maybe some of these too:

precompile(Tuple{typeof(Base.Threads.atomic_add!), Base.Threads.Atomic{Int64}, Int64})
precompile(Tuple{typeof(Base.print), Base.TTY, String})
precompile(Tuple{typeof(Base.println), Base.TTY})
precompile(Tuple{typeof(Core.kwcall), NamedTuple{(:bold, :italic, :underline, :blink, :reverse, :hidden, :color), Tuple{Bool, Bool, Bool, Bool, Bool, Bool, Symbol}}, typeof(Base.printstyled), Base.TTY, String})
precompile(Tuple{Base.var"##printstyled#1118", Bool, Bool, Bool, Bool, Bool, Bool, Symbol, typeof(Base.printstyled), String, Vararg{String}})
precompile(Tuple{typeof(Core.kwcall), NamedTuple{(:bold, :italic, :underline, :blink, :reverse, :hidden, :color), Tuple{Bool, Bool, Bool, Bool, Bool, Bool, Symbol}}, typeof(Base.printstyled), Base.TTY, String, Vararg{String}})
precompile(Tuple{Base.var"##printstyled#1117", Bool, Bool, Bool, Bool, Bool, Bool, Symbol, typeof(Base.printstyled), Base.TTY, String, Vararg{String}})
precompile(Tuple{typeof(Core.kwcall), NamedTuple{(:bold, :italic, :underline, :blink, :reverse, :hidden), NTuple{6, Bool}}, typeof(Base.with_output_color), Function, Symbol, Base.TTY, String, Vararg{String}})
precompile(Tuple{Base.var"##with_output_color#1116", Bool, Bool, Bool, Bool, Bool, Bool, typeof(Base.with_output_color), Function, Symbol, Base.TTY, String, Vararg{String}})

precompile(Tuple{typeof(Base.first), Array{Any, 1}})

precompile(Tuple{typeof(Base.Threads.atomic_sub!), Base.Threads.Atomic{Int64}, Int64})

Another 46x regression:

@time using NetworkOptions
  0.021625 seconds (6.27 k allocations: 436.031 KiB)

may need some or all of these:

precompile(Tuple{typeof(Base.setindex!), Base.EnvDict, Bool, String})
precompile(Tuple{typeof(Base.Threads.lock_profiling), Bool})
precompile(Tuple{typeof(Base.getindex), Base.Threads.Atomic{Int64}})
precompile(Tuple{typeof(Base.gc_num)})
precompile(Tuple{typeof(Base.cumulative_compile_timing), Bool})
precompile(Tuple{typeof(Base.cumulative_compile_time_ns)})
precompile(Tuple{typeof(Base.Broadcast.broadcasted), typeof(Base.:(-)), Tuple{UInt64, UInt64}, Tuple{UInt64, UInt64}})
precompile(Tuple{typeof(Base.Broadcast.materialize), Base.Broadcast.Broadcasted{Base.Broadcast.Style{Tuple}, Nothing, typeof(Base.:(-)), Tuple{Tuple{UInt64, UInt64}, Tuple{UInt64, UInt64}}}})
precompile(Tuple{Type{Base.GC_Diff}, Base.GC_Num, Base.GC_Num})
precompile(Tuple{typeof(Base.getproperty), Base.GC_Diff, Symbol})
precompile(Tuple{Type{NamedTuple{(:value, :time, :bytes, :gctime, :gcstats, :lock_conflicts, :compile_time, :recompile_time), T} where T<:Tuple}, Tuple{Nothing, Float64, Int64, Float64, Base.GC_Diff, Int64, Float64, Float64}})
precompile(Tuple{typeof(Base.getproperty), NamedTuple{(:value, :time, :bytes, :gctime, :gcstats, :lock_conflicts, :compile_time, :recompile_time), Tuple{Nothing, Float64, Int64, Float64, Base.GC_Diff, Int64, Float64, Float64}}, Symbol})
precompile(Tuple{typeof(Base.:(*)), Float64, Float64})
precompile(Tuple{typeof(Base.gc_alloc_count), Base.GC_Diff})
precompile(Tuple{typeof(Core.kwcall), NamedTuple{(:msg,), Tuple{Nothing}}, typeof(Base.time_print), Base.TTY, Float64, Int64, Int64, Int64, Int64, Float64, Float64, Bool})
precompile(Tuple{Base.var"#1109#1110"{Nothing, Float64, Int64, Int64, Int64, Float64, Float64, Bool, String}, Base.GenericIOBuffer{GenericMemory{:not_atomic, UInt8, Core.AddrSpace{Core}(0x00)}}})
precompile(Tuple{typeof(Base.prettyprint_getunits), Int64, Int64, Int64})
precompile(Tuple{typeof(Base.Ryu.writefixed), Float64, Int64})
precompile(Tuple{typeof(Base.print), Base.GenericIOBuffer{GenericMemory{:not_atomic, UInt8, Core.AddrSpace{Core}(0x00)}}, String, String, Vararg{Any}})
julia> @time precompile(Tuple{typeof(Base.in), String, Array{String, 1}})
  0.013533 seconds (331 allocations: 14.688 KiB, 99.87% compilation time)

@KristofferC
Copy link
Sponsor Member

Pkg also has a 1543x regression:

Pkg is not in the sysimage in 1.11. Can we please stay focused on one thing here.

@PallHaraldsson
Copy link
Contributor Author

PallHaraldsson commented Sep 9, 2024

This is ready to review/merge, feel free to change the text, add more representative Markdown markup (and drop mentioning Pkg, see Kristoffer's (not outdated) review.

This passed CI with md"" so most likely will with whatever text in there, before my most recent additional (trivial?) commit(s).

@PallHaraldsson
Copy link
Contributor Author

Pkg is not in the sysimage in 1.11. Can we please stay focused on one thing here.

Ok that part is here: #55721 (comment) though it's also seemingly useful for Markdown as an add-on for the PR, so please merge here first, if not combining merging as one.

@IanButterworth IanButterworth added the status:merge me PR is reviewed. Merge when all tests are passing label Sep 9, 2024
@IanButterworth IanButterworth merged commit 1463c99 into JuliaLang:master Sep 9, 2024
5 of 7 checks passed
@PallHaraldsson PallHaraldsson deleted the patch-20 branch September 9, 2024 21:31
@PallHaraldsson PallHaraldsson restored the patch-20 branch September 10, 2024 11:40
@PallHaraldsson
Copy link
Contributor Author

I can no longer amend by adding one commit, despite restoring the branch I previously deleted. Not sure if that was the problem or if it can't be done after merging. I can also just open another PR for the trivial fix of changing to printing also (do devnull.. ?!), since that was also a regression I overlooked (see the closed issue, that probably should be reopened).

@PallHaraldsson PallHaraldsson deleted the patch-20 branch September 10, 2024 11:44
@giordano giordano removed the status:merge me PR is reviewed. Merge when all tests are passing label Sep 11, 2024
KristofferC pushed a commit that referenced this pull request Sep 11, 2024
Fixes #55706 that is seemingly a 4472x regression, not just 16x (was my
first guess, based on CondaPkg, also fixes or greatly mitigates
JuliaPy/CondaPkg.jl#145), and large part of 3x
regression for PythonCall.

---------

Co-authored-by: Kristoffer Carlsson <[email protected]>
(cherry picked from commit 1463c99)
@KristofferC KristofferC mentioned this pull request Sep 11, 2024
33 tasks
kshyatt pushed a commit that referenced this pull request Sep 12, 2024
Fixes #55706 that is seemingly a 4472x regression, not just 16x (was my
first guess, based on CondaPkg, also fixes or greatly mitigates
JuliaPy/CondaPkg.jl#145), and large part of 3x
regression for PythonCall.

---------

Co-authored-by: Kristoffer Carlsson <[email protected]>
KristofferC added a commit that referenced this pull request Sep 17, 2024
Backported PRs:
- [x] #55480 <!-- Fix push! for OffsetVectors, add tests for push! and
append! on AbstractVector -->
- [x] #55443 <!-- Add test for upper/lower/titlecase and fix call -->
- [x] #55524 <!-- Set `.jl` sources as read-only during installation -->
- [x] #55500 <!-- make jl_thread_suspend_and_get_state safe -->
- [x] #55506 <!-- Fix indexing in _mapreducedim for OffsetArrays -->
- [x] #55564 <!-- Empty out loaded_precompiles dict instead of asserting
it's empty. -->
- [x] #55567 <!-- Initialize threadpools correctly during sysimg build
-->
- [x] #55596 <!-- Fast bounds-check for CartesianIndex ranges -->
- [x] #55605 <!-- Reroute Symmetric/Hermitian + Diagonal through
triangular -->
- [x] #55640 <!-- win: move stack_overflow_warning to the backtrace
fiber -->
- [x] #55715 <!-- Add precompile signatures to Markdown to reduce
latency. -->
- [x] #55593 <!-- Fix invalidations for FileIO -->
- [x] #55555 <!-- Revert "Don't expose guard pages to malloc_stack API
consumers" -->
- [x] #55720 <!-- Fix `pkgdir` for extensions -->
- [x] #55729 <!-- Avoid confounding compilation side effects of
`@time_imports` -->
- [x] #55718 <!-- Fix `@time_imports` extension recognition -->
- [x] #55522 <!-- Fix tr for Symmetric/Hermitian block matrices -->

Contains multiple commits, manual intervention needed:
- [ ] #55509 <!-- Fix cong implementation to be properly random and not
just cycling. -->

Non-merged PRs with backport label:
- [ ] #55641 <!-- fall back to slower stat filesize if optimized
filesize fails -->
- [ ] #55534 <!-- Set stdlib sources as read-only during installation
-->
- [ ] #55499 <!-- propagate the terminal's `displaysize` to the
`IOContext` used by the REPL -->
- [ ] #55458 <!-- Allow for generically extracting unannotated string
-->
- [ ] #55457 <!-- Make AnnotateChar equality consider annotations -->
- [ ] #55453 <!-- Privatise the annotations API, for StyledStrings -->
- [ ] #55355 <!-- relocation: account for trailing path separator in
depot paths -->
- [ ] #55220 <!-- `isfile_casesensitive` fixes on Windows -->
- [ ] #55169 <!-- `propertynames` for SVD respects private argument -->
- [ ] #54457 <!-- Make `String(::Memory)` copy -->
- [ ] #53957 <!-- tweak how filtering is done for what packages should
be precompiled -->
- [ ] #51479 <!-- prevent code loading from lookin in the versioned
environment when building Julia -->
- [ ] #50813 <!-- More doctests for Sockets and capitalization fix -->
- [ ] #50157 <!-- improve docs for `@inbounds` and
`Base.@propagate_inbounds` -->
- [ ] #41244 <!-- Fix shell `cd` error when working dir has been deleted
-->
@KristofferC KristofferC removed the backport 1.11 Change should be backported to release-1.11 label Sep 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Markdown regression on 1.11, probably by 16x (3x for PythonCall)
4 participants