Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove (duplicate) samplers being defined explicitly in Turing.jl #2413

Open
1 of 3 tasks
Tracked by #2420
torfjelde opened this issue Dec 3, 2024 · 5 comments
Open
1 of 3 tasks
Tracked by #2420

Remove (duplicate) samplers being defined explicitly in Turing.jl #2413

torfjelde opened this issue Dec 3, 2024 · 5 comments
Labels
Milestone

Comments

@torfjelde
Copy link
Member

torfjelde commented Dec 3, 2024

We're duplicating a lot of code and a lot of effort by having a bunch of sampler (or rather, InferenceAlgorithm) implementations in Turing.jl itself.

There are a few reasons for this is / was the case:

  1. The old approach of doing Gibbs sampling took an approach that required hooking into the assume and observe statements for samplers and to mutate the varinfo in a particular, even if the functionality of the sampler itself (when used outside of Gibbs) didn't require it.
  2. The samplers in Turing.jl would often offer more convenient constructors while the sampler packages themselves, e.g. AdvancedHMC.jl, would offer a more flexible but also more complicated interfaces.
  3. InferenceAlgorithm allows us to overload the sample call explicitly to do some "non-standard" things, e.g. use chain_type=MCMCChains.Chains as the default, instead of chain_type=Vector as is default in AbstractMCMC.jl.

Everything but (3) is "easily" addressable (i.e. only requires dev-time, not necessarily any discussion on how to do it):

Removing the InferenceAlgorithm type (3)

Problem

Currently, all the samplers in Turing.jl have most of their code living outside of Turing.jl + inside Turing.jl we define a "duplicate" which is not an AbstractMCMC.AbstractSampler (as typically expected by AbstractMCMC.sample), but instead a subtype of Turing.Infernece.InferenceAlgorithm:

abstract type InferenceAlgorithm end
abstract type ParticleInference <: InferenceAlgorithm end
abstract type Hamiltonian <: InferenceAlgorithm end
abstract type StaticHamiltonian <: Hamiltonian end
abstract type AdaptiveHamiltonian <: Hamiltonian end

But exactly because these are not AbstractMCMC.AbstractSampler, we can overload sample calls to do more than what sample does for a given AbstractSampler.

One of the things we do is to make chain_type=Chains rather than chain_type=Vector (as is the default in AbstractMCMC.jl):

function AbstractMCMC.sample(
rng::AbstractRNG,
model::AbstractModel,
sampler::Sampler{<:InferenceAlgorithm},
ensemble::AbstractMCMC.AbstractMCMCEnsemble,
N::Integer,
n_chains::Integer;
chain_type=MCMCChains.Chains,
progress=PROGRESS[],
kwargs...,
)
return AbstractMCMC.mcmcsample(
rng,
model,
sampler,
ensemble,
N,
n_chains;
chain_type=chain_type,
progress=progress,
kwargs...,
)
end

Another is to perform some simple model checks to stop the user from doing things they shouldn't, e.g. accidentally using a model twice (this is done using DynamicPPL.check_model):

function AbstractMCMC.sample(
rng::AbstractRNG,
model::AbstractModel,
alg::InferenceAlgorithm,
N::Integer;
check_model::Bool=true,
kwargs...,
)
check_model && _check_model(model, alg)
return AbstractMCMC.sample(rng, model, Sampler(alg, model), N; kwargs...)
end

However, as mentioned before, having to repeat all these sampler constructors just to go from working with a AbstractSampler to InferenceAlgorithm so we can do these things is a) very annoying to maintain, and b) makes it all very confusing for newcomers to contribute.

Now, the problem is that cannot simple start overloading sample(model::DynamicPPL.Model, sampler::AbstractMCMC.AbstractSampler, ...) calls since sampler packages might define something like sample(model::AbstractMCMC.AbstractModel, sampler::MySampler, ...) (we have DynamicPPL.Model <: AbstractMCMC.AbstractModel btw) which would give rise to a host of method ambiguities.

Someone might say "oh, but nobody is going to impelment sample(model::AbstractMCMC.AbstractModel, sampler::MySampler, ...); they're always going to implement a sampler for a specific model type, e.g. AbstractMCMC.LogDensityModel", but this is not great for two reasons: a) "meta" samplers, i.e. samplers that use other samplers as components, might want to be agnostic to what the underlying model is as this "meta" sampler doesn't interact directly with the model itself, and b) if we do so, we're claiming that DynamicPPL.Model is, in some way, a special and more important model type than all other subtypes of AbstractModel, which is the exact opposite of what we wanted to do with AbstractMCMC.jl (we wanted it to be a "sampler package for all, not just Turing.jl").

externalsampler introduced in #2008 is a step towards this, but in the end we don't want to require externalsampler to wrap every sampler passed to Turing.jl; we really only want this to have to wrap samplers which do not support all the additional niceties that Turing.jl's current sample provides.

Solution 1: rename or duplicate sample

The only true solution I see, which is very, very annoying, is to either

  1. Not export AbstractMCMC.sample from Turing.jl, and instead define and export a separate Turing.sample which is a fancy wrapper around AbstractMCMC.sample.
  2. Define a new entry-point for sample from Turing.jl with a different name, e.g. infer or mcmc (or even use the internal mcmcsample from AbstractMCMC.jl naming but making it public).

None of these are ideal tbh.

(1) sucks because so many of the packages are using StatsBase.sample (as we are in AbstractMCMC.jl) for this very reasonable interface, and so diverging from this is confusing + we'll easily end up with naming collisions in the namespace of the user, e.g. using Turing, AbstractMCMC would immediately cause two sample methods to be imported.

(2) is also a bit annoying as this would be a highly breaking change. It's also a bit annoying because, well, sample is a much better name 🤷

IMHO, I think (2) is best here though. If we define a method called mcmc or mcmcsample (ideally we'd do something with AbstractMCMC.mcmcsample) which is exported from Turing.jl, we could do away with all of InferenceAlgorithm and its implementations in favour of a single (or a few) overloads of this method.

@torfjelde torfjelde added this to the Turing v1.0.0 milestone Dec 3, 2024
@torfjelde torfjelde changed the title Remove (most) samplers being defined explicitly in Turing.jl Remove (duplicate) samplers being defined explicitly in Turing.jl Dec 3, 2024
@penelopeysm
Copy link
Member

penelopeysm commented Dec 3, 2024

Would it be possible to:

  1. Create a new abstract type AbstractMCMC.TuringManagedSampler
  2. In the sampler packages, write struct HMC <: AbstractMCMC.TuringManagedSampler
  3. In Turing, we can then overload sample(::DynamicPPL.Model, ::AbstractMCMC.TuringManagedSampler), which calls check_model etc. followed by AbstractMCMC.mcmcsample
  4. It's then our responsibility to make sure we don't define sample(::AbstractMCMC.AbstractModel, ::HMC) anywhere as that will lead to method ambiguities

If someone then defines TheirSampler <: AbstractMCMC.AbstractSampler, they won't run into method ambiguities unless TheirSampler also subtypes AbstractMCMC.TuringManagedSampler (and we should make it abundantly clear that this shouldn't be done).

One point of awkwardness might be that if they then want to get the nice Turing bells and whistles, they have to declare sample(::DynamicPPL.Model, ::TheirSampler) themselves, effectively duplicating our definition. That's probably an acceptable cost as long as the bells and whistles aren't too much (like a call to check_model should be a reasonable thing to expect someone implementing a sampler to copy themselves).

(I should say that before posting this comment I had around 3 different ideas, each of which I started to write down before immediately realising that they were terrible. I haven't yet found a fatal flaw in this one, so I'm optimistic 😄 but it might just be the 4th terrible idea)

@torfjelde
Copy link
Member Author

That is not a bad idea for sure, but my immediate worries are: a) where does this Turing-managed sampler go, and b) how would people hook into this functionality in, say, a Turing.jl extension?

Issue (b) seems like an annoying one that is difficult to circumvent when we do subtyping (as is the issue with InferenceAlgorithm).

@penelopeysm
Copy link
Member

penelopeysm commented Dec 4, 2024

(a) I think it would have to live in AbstractMCMC. Sure, it's not really AbstractMCMC's business but I don't really think it's unfair to declare a single abstract supertype for internal use in another package we do control.

(b) Yeah I think that's the same as what I wrote. You can't automatically derive the functionality of Turing's sample so if they wrote a TheirSamplerDynamicPPLExt they would have to manually define sample(::DynamicPPL.Model, ::TheirSampler) in there.
I personally don't consider this a huge problem. We might know very clearly on our end what things we want to do with DPPL models. For example, we want to set chain_type to MCMCChains.Chains. However, we can't assume that everybody else also wants to do the same thing when sampling a DPPL model. Maybe they don't want MCMCChains as a dependency? Maybe they just want a vector of parameters? Maybe they want varinfos? If they want to do the same things as us, then the onus is on them to just copy our implementation.

@penelopeysm
Copy link
Member

penelopeysm commented Dec 4, 2024

Btw - usually I'm all in favour of renaming and disambiguating functions. I like breaking changes. :D

But I think that renaming sample should really be retained as a method of last resort because it will break literally every code example out there, and it's not even a breaking change that improves users' QOL by much - it's almost purely for developer QOL. We'll be explaining for months / years to come that "you should use mcmc() not sample()". And it's a really bad look because changing such a fundamental thing (for what seems like no reason) will give the impression to users that the package is not stable or mature.

If that's what we end up deciding on, I'm happy to go with it (renaming will indeed solve the problem once and for all) but (imo :)) we must really carefully evaluate and rule out every other possibility first.

@torfjelde
Copy link
Member Author

torfjelde commented Dec 4, 2024

(a) I think it would have to live in AbstractMCMC. Sure, it's not really AbstractMCMC's business but I don't really think it's unfair to declare a single abstract supertype for internal use in another package we do control.

This one still feels a bit weird to me unfortunately 😕

What about when someone else comes along and goes "Can we make our custom model in AbstractMCMC.jl? We think it would benefit us you see."

Maaaaybe it would make sense to put it in AbstractPPL.jl? 😬 Don't like this either, but does feel slightly more appropriate. Or even just a TuringBase.jl package or something stupid, though this also feels "bad".

(b) Yeah I think that's the same as what I wrote. You can't automatically derive the functionality of Turing's sample so if they wrote a TheirSamplerDynamicPPLExt they would have to manually define sample(::DynamicPPL.Model, ::TheirSampler) in there.

Aye, but this will be "overkill" for integrating with Turing.jl, e.g. see ExternalSampler which only requires overloading one method atm to work with Turing.jl (getparams). So if they have to overload sample itself, this will also be prone to method ambiguities (just because someone makes a mistake; if you look at the sample defs we currently have in src/mcmc/Inference.jl you'll see why 😬 ). So in that case, you'd effectively fall back to: everyone who isn't subtyping TuringManagedSampler should use externalsampler to interact with Turing.jl.

That is fine tbf, and might be what we want 🤷

But I think that renaming sample should really be retained as a method of last resort because it will break literally every code example out there, and it's not even a breaking change that improves users' QOL by much - it's almost purely for developer QOL.

Aye, this is really the worst part of it 😕 And tbf, it is really, really bad.

All in all, I think I agree with you @penelopeysm that it would be better to do the subtyping stuff, most definitively. If we could go with a slightly better solution than putting it in AbstractMCMC.jl, I think it's an easy choice (and it might still be the preferred choice even if we do put it in AbstractMCMC.jl 😬 )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants