Skip to content

Conversation

mhauru
Copy link
Member

@mhauru mhauru commented Sep 29, 2025

Now that the "del" flag is gone (#1058), the only flag that is ever used is "trans". Hence, no need to bother with having the Dict{String, BitVector} for Metadata.flags, and can instead have a single BitVector for Metadata.trans. EDIT: Renamed to Metadata.is_transformed.

You may wonder, given that Metadata is presumably on its way out, why bother? Two reasons:

  • I tried running the benchmark suite locally with VectorVarInfo, and there were some horrendous performance regressions there compared to using Metadata. Hence, we might not be about to switch over the VarNamedVector imminently.
  • The above experience made me wonder why there was such a performance difference, and whether the Metadata.flags field might actually be a significant cost compared to a BitVector.

My local benchmarking suggests that indeed, this makes a difference:

Before

┌───────────────────────┬───────┬─────────────┬───────────────────┬────────┬────────────────┬─────────────────┐
│                 Model │   Dim │  AD Backend │           VarInfo │ Linked │ t(eval)/t(ref) │ t(grad)/t(eval) │
├───────────────────────┼───────┼─────────────┼───────────────────┼────────┼────────────────┼─────────────────┤
│ Simple assume observe │     1 │ forwarddiff │             typed │  false │           16.0 │             1.7 │
│           Smorgasbord │   201 │ forwarddiff │             typed │  false │          790.6 │            46.1 │
│           Smorgasbord │   201 │ forwarddiff │ simple_namedtuple │   true │          382.0 │            84.3 │
│           Smorgasbord │   201 │ forwarddiff │           untyped │   true │         1431.7 │            36.0 │
│           Smorgasbord │   201 │ forwarddiff │       simple_dict │   true │        10511.1 │            21.6 │
│           Smorgasbord │   201 │ reversediff │             typed │   true │         1495.9 │            42.4 │
│           Smorgasbord │   201 │    mooncake │             typed │   true │         1637.4 │             3.4 │
│    Loop univariate 1k │  1000 │    mooncake │             typed │   true │         8635.9 │             3.2 │
│       Multivariate 1k │  1000 │    mooncake │             typed │   true │         1266.1 │             8.5 │
│   Loop univariate 10k │ 10000 │    mooncake │             typed │   true │        90116.3 │             3.2 │
│      Multivariate 10k │ 10000 │    mooncake │             typed │   true │        10364.2 │             9.7 │
│               Dynamic │    10 │    mooncake │             typed │   true │          235.0 │             5.7 │
│              Submodel │     1 │    mooncake │             typed │   true │           24.0 │             4.2 │
│                   LDA │    12 │ reversediff │             typed │   true │         1391.7 │             2.0 │
└───────────────────────┴───────┴─────────────┴───────────────────┴────────┴────────────────┴─────────────────┘

After

┌───────────────────────┬───────┬─────────────┬───────────────────┬────────┬────────────────┬─────────────────┐
│                 Model │   Dim │  AD Backend │           VarInfo │ Linked │ t(eval)/t(ref) │ t(grad)/t(eval) │
├───────────────────────┼───────┼─────────────┼───────────────────┼────────┼────────────────┼─────────────────┤
│ Simple assume observe │     1 │ forwarddiff │             typed │  false │           10.8 │             2.5 │
│           Smorgasbord │   201 │ forwarddiff │             typed │  false │          695.1 │            53.0 │
│           Smorgasbord │   201 │ forwarddiff │ simple_namedtuple │   true │          319.1 │           104.9 │
│           Smorgasbord │   201 │ forwarddiff │           untyped │   true │         1114.3 │            45.0 │
│           Smorgasbord │   201 │ forwarddiff │       simple_dict │   true │        10323.5 │            22.3 │
│           Smorgasbord │   201 │ reversediff │             typed │   true │         1190.0 │            52.4 │
│           Smorgasbord │   201 │    mooncake │             typed │   true │         1263.0 │             3.8 │
│    Loop univariate 1k │  1000 │    mooncake │             typed │   true │         5606.7 │             4.4 │
│       Multivariate 1k │  1000 │    mooncake │             typed │   true │         1236.0 │             8.7 │
│   Loop univariate 10k │ 10000 │    mooncake │             typed │   true │        63260.7 │             4.2 │
│      Multivariate 10k │ 10000 │    mooncake │             typed │   true │        11029.4 │             9.4 │
│               Dynamic │    10 │    mooncake │             typed │   true │          216.4 │             6.4 │
│              Submodel │     1 │    mooncake │             typed │   true │           19.0 │             4.6 │
│                   LDA │    12 │ reversediff │             typed │   true │         1341.4 │             2.0 │
└───────────────────────┴───────┴─────────────┴───────────────────┴────────┴────────────────┴─────────────────┘

Curious to see whether GHA benchmarks come out looking similar.

Copy link
Contributor

github-actions bot commented Sep 29, 2025

Benchmark Report for Commit a011dd6

Computer Information

Julia Version 1.11.7
Commit f2b3dbda30a (2025-09-08 12:10 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 4 × AMD EPYC 7763 64-Core Processor
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores)

Benchmark Results

┌───────────────────────┬───────┬─────────────┬───────────────────┬────────┬────────────────┬─────────────────┐
│                 Model │   Dim │  AD Backend │           VarInfo │ Linked │ t(eval)/t(ref) │ t(grad)/t(eval) │
├───────────────────────┼───────┼─────────────┼───────────────────┼────────┼────────────────┼─────────────────┤
│ Simple assume observe │     1 │ forwarddiff │             typed │  false │            7.4 │             1.6 │
│           Smorgasbord │   201 │ forwarddiff │             typed │  false │          598.5 │            49.3 │
│           Smorgasbord │   201 │ forwarddiff │ simple_namedtuple │   true │          423.2 │            57.6 │
│           Smorgasbord │   201 │ forwarddiff │           untyped │   true │         1063.4 │            32.1 │
│           Smorgasbord │   201 │ forwarddiff │       simple_dict │   true │         6740.1 │            29.0 │
│           Smorgasbord │   201 │ reversediff │             typed │   true │          914.1 │            46.1 │
│           Smorgasbord │   201 │    mooncake │             typed │   true │          875.0 │             5.8 │
│    Loop univariate 1k │  1000 │    mooncake │             typed │   true │         4455.6 │             5.6 │
│       Multivariate 1k │  1000 │    mooncake │             typed │   true │         1020.9 │             9.3 │
│   Loop univariate 10k │ 10000 │    mooncake │             typed │   true │        51734.7 │             4.9 │
│      Multivariate 10k │ 10000 │    mooncake │             typed │   true │         8677.5 │            10.3 │
│               Dynamic │    10 │    mooncake │             typed │   true │          132.2 │            10.9 │
│              Submodel │     1 │    mooncake │             typed │   true │           10.4 │             5.6 │
│                   LDA │    12 │ reversediff │             typed │   true │          992.7 │             2.1 │
└───────────────────────┴───────┴─────────────┴───────────────────┴────────┴────────────────┴─────────────────┘

Copy link

codecov bot commented Sep 29, 2025

Codecov Report

❌ Patch coverage is 93.02326% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.51%. Comparing base (08212a2) to head (4f85f2b).

Files with missing lines Patch % Lines
src/simple_varinfo.jl 80.00% 3 Missing ⚠️
src/varinfo.jl 95.23% 2 Missing ⚠️
ext/DynamicPPLEnzymeCoreExt.jl 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           breaking    #1060      +/-   ##
============================================
+ Coverage     82.39%   82.51%   +0.11%     
============================================
  Files            42       42              
  Lines          3818     3786      -32     
============================================
- Hits           3146     3124      -22     
+ Misses          672      662      -10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

DynamicPPL.jl documentation for PR #1060 is available at:
https://TuringLang.github.io/DynamicPPL.jl/previews/PR1060/

@mhauru
Copy link
Member Author

mhauru commented Sep 30, 2025

CI benchmarks. Target branch:

┌───────────────────────┬───────┬─────────────┬───────────────────┬────────┬────────────────┬─────────────────┐
│                 Model │   Dim │  AD Backend │           VarInfo │ Linked │ t(eval)/t(ref) │ t(grad)/t(eval) │
├───────────────────────┼───────┼─────────────┼───────────────────┼────────┼────────────────┼─────────────────┤
│ Simple assume observe │     1 │ forwarddiff │             typed │  false │            8.5 │             1.6 │
│           Smorgasbord │   201 │ forwarddiff │             typed │  false │          635.2 │            43.6 │
│           Smorgasbord │   201 │ forwarddiff │ simple_namedtuple │   true │          411.8 │            52.7 │
│           Smorgasbord │   201 │ forwarddiff │           untyped │   true │         1163.6 │            29.7 │
│           Smorgasbord │   201 │ forwarddiff │       simple_dict │   true │         6444.2 │            28.6 │
│           Smorgasbord │   201 │ reversediff │             typed │   true │         1022.9 │            40.9 │
│           Smorgasbord │   201 │    mooncake │             typed │   true │          980.1 │             4.5 │
│    Loop univariate 1k │  1000 │    mooncake │             typed │   true │         5750.3 │             4.3 │
│       Multivariate 1k │  1000 │    mooncake │             typed │   true │          964.6 │             9.1 │
│   Loop univariate 10k │ 10000 │    mooncake │             typed │   true │        64679.1 │             3.9 │
│      Multivariate 10k │ 10000 │    mooncake │             typed │   true │         8179.8 │            10.3 │
│               Dynamic │    10 │    mooncake │             typed │   true │          129.7 │            11.3 │
│              Submodel │     1 │    mooncake │             typed │   true │           12.2 │             5.1 │
│                   LDA │    12 │ reversediff │             typed │   true │         1006.2 │             2.0 │
└───────────────────────┴───────┴─────────────┴───────────────────┴────────┴────────────────┴─────────────────┘

This branch:

┌───────────────────────┬───────┬─────────────┬───────────────────┬────────┬────────────────┬─────────────────┐
│                 Model │   Dim │  AD Backend │           VarInfo │ Linked │ t(eval)/t(ref) │ t(grad)/t(eval) │
├───────────────────────┼───────┼─────────────┼───────────────────┼────────┼────────────────┼─────────────────┤
│ Simple assume observe │     1 │ forwarddiff │             typed │  false │            7.4 │             1.7 │
│           Smorgasbord │   201 │ forwarddiff │             typed │  false │          597.3 │            49.0 │
│           Smorgasbord │   201 │ forwarddiff │ simple_namedtuple │   true │          422.1 │            57.4 │
│           Smorgasbord │   201 │ forwarddiff │           untyped │   true │          969.2 │            35.2 │
│           Smorgasbord │   201 │ forwarddiff │       simple_dict │   true │         6575.6 │            31.0 │
│           Smorgasbord │   201 │ reversediff │             typed │   true │          883.4 │            47.6 │
│           Smorgasbord │   201 │    mooncake │             typed │   true │          854.6 │             5.1 │
│    Loop univariate 1k │  1000 │    mooncake │             typed │   true │         4305.0 │             5.6 │
│       Multivariate 1k │  1000 │    mooncake │             typed │   true │          991.4 │             9.5 │
│   Loop univariate 10k │ 10000 │    mooncake │             typed │   true │        50138.4 │             5.1 │
│      Multivariate 10k │ 10000 │    mooncake │             typed │   true │         9003.3 │            10.1 │
│               Dynamic │    10 │    mooncake │             typed │   true │          128.2 │            11.4 │
│              Submodel │     1 │    mooncake │             typed │   true │            9.9 │             5.9 │
│                   LDA │    12 │ reversediff │             typed │   true │          989.8 │             2.1 │
└───────────────────────┴───────┴─────────────┴───────────────────┴────────┴────────────────┴─────────────────┘

Roughly in line with what I saw locally. Seems worth it to me, especially if you look at the Loop univariate 1k and 10k models.

@yebai
Copy link
Member

yebai commented Sep 30, 2025

I suggest we take this chance to rename Metadata.trans to a more readable term, e.g., Metadata.is_unconstrained / Metadata.is_transformed.

@mhauru
Copy link
Member Author

mhauru commented Sep 30, 2025

Good idea, done.

@mhauru mhauru requested a review from penelopeysm September 30, 2025 16:19
Comment on lines +325 to +328
# TODO(mhauru) Eventually I would like to rename the is_transformed function to
# is_unconstrained, but that's significantly breaking.
"""
istrans(vnv::VarNamedVector, vn::VarName)
is_transformed(vnv::VarNamedVector, vn::VarName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you still thinking of this? I personally prefer islinked over istransformed, but isunconstrained / isconstrained I don't like, because it doesn't accurately capture the full lstory.

For example, unlinked variables can still be unconstrained. So is_unconstrained doesn't mean it's unconstrained, it means it's 'guaranteed' to be unconstrained. Also, I suppose linking need not necessarily unconstrain it, it depends on the link function.

But I realise this comment might be a bit out of date

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this PR, I wonder if it is worth standardising. We have islinked(::VarInfo) but istransformed(::VarInfo, ::VarName). Should we change one to the other?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I kinda punted on the is_unconstrained thing in VarNamedVector because it's invisible to users, but islinked is a good point. Now would be as good a time as any to standardise.

With VarNamedVector, I went with is_unconstrained exactly because having a non-trivial transformation does not guarantee that the variable doesn't remain constrained, and because the flag exists to guarantee unconstrainedness (of user interest) not that some transformation has been applied (not of user interest). The docstring for VarNamedVector says this:

    vector of booleans indicating whether a variable has been explicitly transformed to
    unconstrained Euclidean space, i.e. whether its domain is all of `ℝ^ⁿ`. If
    `is_unconstrained[varname_to_index[vn]]` is true, it guarantees that the variable
    `vn` is not constrained. However, the converse does not hold: if `is_unconstrained`
    is false, the variable `vn` may still happen to be unconstrained, e.g. if its
    original distribution is itself unconstrained (like a normal distribution).

I was quite pleased with that when I was writing that part of VarNamedVector, but then when I tried to use the same terminology in VarInfo yesterday I wasn't happy with it anymore. Unfortunately I can't now recall why I was unhappy with it... It seems fine to me when I think about it now.

islinked (or is_linked) feels a lot like is_transformed: It says that some link transformation has been applied, not that it's achieved the goal of making this variable unconstrained. Although maybe I misunderstand how people use the term "link" here.

At least right now, I think is_unconstrained is the best description of the flag, but especially if you dislike it, I would go with is_linked, just to match the link and unlink function names, which I wouldn't want to change (and calling them unconstrain and ununconstrain or constrain doesn't work).

Comment on lines -494 to +496
islinked(vi::SimpleVarInfo) = istrans(vi)
islinked(vi::SimpleVarInfo) = is_transformed(vi)
Copy link
Member

@penelopeysm penelopeysm Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like this line is just the same function but duplicated. so it feels like to me we could just pick one and roll with it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants