Skip to content

Commit e594515

Browse files
committed
Formalise AD integration status
1 parent 4278c3d commit e594515

File tree

3 files changed

+120
-37
lines changed

3 files changed

+120
-37
lines changed

Diff for: Manifest.toml

+1-13
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
julia_version = "1.11.4"
44
manifest_format = "2.0"
5-
project_hash = "f8ede0ee89806fd59fc0feecd3210fa53189aa73"
5+
project_hash = "53934e1315cb4a39900896bd0b900a586f95f30d"
66

77
[[deps.ADTypes]]
88
git-tree-sha1 = "e2478490447631aedba0823d4d7a80b2cc8cdb32"
@@ -3667,18 +3667,6 @@ weakdeps = ["DynamicHMC", "Optim"]
36673667
TuringDynamicHMCExt = "DynamicHMC"
36683668
TuringOptimExt = "Optim"
36693669

3670-
[[deps.TuringBenchmarking]]
3671-
deps = ["ADTypes", "AbstractMCMC", "BenchmarkTools", "DynamicPPL", "ForwardDiff", "LinearAlgebra", "LogDensityProblems", "PrettyTables", "Requires", "ReverseDiff", "Zygote"]
3672-
git-tree-sha1 = "a03f2d71dfc88bf370d67c8ae84dfa109b2702bb"
3673-
uuid = "0db1332d-5c25-4deb-809f-459bc696f94f"
3674-
version = "0.5.9"
3675-
3676-
[deps.TuringBenchmarking.extensions]
3677-
TuringBenchmarkingBridgeStanExt = "BridgeStan"
3678-
3679-
[deps.TuringBenchmarking.weakdeps]
3680-
BridgeStan = "c88b6f0a-829e-4b0b-94b7-f06ab5908f5a"
3681-
36823670
[[deps.URIs]]
36833671
git-tree-sha1 = "67db6cc7b3821e19ebe75791a9dd19c9b1188f2b"
36843672
uuid = "5c2747f8-b7ea-4ff2-ba2e-563bfd36b1d4"

Diff for: Project.toml

+1-1
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ AbstractGPs = "99985d1d-32ba-4be9-9821-2ec096f28918"
44
AbstractMCMC = "80f14c24-f653-4e6a-9b94-39d6b0f70001"
55
AdvancedHMC = "0bf59076-c3b1-5ca4-86bd-e02cd72cde3d"
66
AdvancedMH = "5b7e9947-ddc0-4b3f-9b55-0d8042f74170"
7+
BenchmarkTools = "6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf"
78
Bijectors = "76274a88-744f-5084-9051-94815aaf08c4"
89
CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
910
ComponentArrays = "b0b7db55-cfe3-40fc-9ded-d10e2dbeff66"
@@ -48,7 +49,6 @@ StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"
4849
StatsFuns = "4c63d2b9-4356-54db-8cca-17b64c39e42c"
4950
StatsPlots = "f3b207a7-027a-5e70-b257-86293d7955fd"
5051
Turing = "fce5fe82-541a-59a6-adf8-730c64b5f9a0"
51-
TuringBenchmarking = "0db1332d-5c25-4deb-809f-459bc696f94f"
5252
UnPack = "3a884ed6-31ef-47d7-9d2a-63182c4928ed"
5353
Zygote = "e88e6eb3-aa80-5325-afca-941959d7151f"
5454

Diff for: usage/automatic-differentiation/index.qmd

+118-23
Original file line numberDiff line numberDiff line change
@@ -12,34 +12,128 @@ using Pkg;
1212
Pkg.instantiate();
1313
```
1414

15-
## Switching AD Modes
15+
## What is Automatic Differentiation?
1616

17-
Turing currently supports four automatic differentiation (AD) backends for sampling: [ForwardDiff](https://github.com/JuliaDiff/ForwardDiff.jl) for forward-mode AD; and [Mooncake](https://github.com/compintell/Mooncake.jl), [ReverseDiff](https://github.com/JuliaDiff/ReverseDiff.jl), and [Zygote](https://github.com/FluxML/Zygote.jl) for reverse-mode AD.
18-
`ForwardDiff` is automatically imported by Turing. To utilize `Mooncake`, `Zygote`, or `ReverseDiff` for AD, users must explicitly import them with `import Mooncake`, `import Zygote` or `import ReverseDiff`, alongside `using Turing`.
17+
Automatic differentiation (AD) is a technique used to evaluate the derivative of a function at a given set of arguments.
18+
In the context of Turing.jl, the function being differentiated is the log probability density of a model, and the arguments are the parameters of the model (i.e. the values of the random variables).
19+
The gradient of the log probability density is used by various algorithms in Turing.jl, such as HMC (including NUTS), mode estimation (which uses gradient-based optimization), and variational inference.
1920

20-
As of Turing version v0.30, the global configuration flag for the AD backend has been removed in favour of [`AdTypes.jl`](https://github.com/SciML/ADTypes.jl), allowing users to specify the AD backend for individual samplers independently.
21-
Users can pass the `adtype` keyword argument to the sampler constructor to select the desired AD backend, with the default being `AutoForwardDiff(; chunksize=0)`.
21+
The Julia ecosystem has a number of AD libraries.
22+
You can switch between them at will using the unified [ADTypes.jl](https://github.com/SciML/ADTypes.jl/) interface, which for a given AD backend, provides types such as `AutoBackend` (see [the documentation](https://docs.sciml.ai/ADTypes/stable/) for more details).
23+
For example, to use the [Mooncake.jl](https://github.com/compintell/Mooncake.jl) package for AD, you can run the following:
2224

23-
For `ForwardDiff`, pass `adtype=AutoForwardDiff(; chunksize)` to the sampler constructor. A `chunksize` of `nothing` permits the chunk size to be automatically determined. For more information regarding the selection of `chunksize`, please refer to [related section of `ForwardDiff`'s documentation](https://juliadiff.org/ForwardDiff.jl/dev/user/advanced/#Configuring-Chunk-Size).
25+
```{julia}
26+
using Turing
27+
setprogress!(false)
28+
# Note that if you specify a custom AD backend, you must also import it.
29+
using Mooncake
2430
25-
For `ReverseDiff`, pass `adtype=AutoReverseDiff()` to the sampler constructor. An additional keyword argument called `compile` can be provided to `AutoReverseDiff`. It specifies whether to pre-record the tape only once and reuse it later (`compile` is set to `false` by default, which means no pre-recording). This can substantially improve performance, but risks silently incorrect results if not used with care.
31+
@model function f()
32+
x ~ Normal()
33+
# Rest of your model here
34+
end
2635
36+
sample(f(), HMC(0.1, 5; adtype=AutoMooncake(; config=nothing)), 100)
37+
```
2738

39+
By default, if you do not specify a backend, Turing will default to [ForwardDiff.jl](https://github.com/JuliaDiff/ForwardDiff.jl).
40+
In this case, you do not need to import ForwardDiff, as it is already a dependency of Turing.
2841

29-
Pre-recorded tapes should only be used if you are absolutely certain that the sequence of operations performed in your code does not change between different executions of your model.
3042

31-
Thus, e.g., in the model definition and all implicitly and explicitly called functions in the model, all loops should be of fixed size, and `if`-statements should consistently execute the same branches.
32-
For instance, `if`-statements with conditions that can be determined at compile time or conditions that depend only on fixed properties of the model, e.g. fixed data.
33-
However, `if`-statements that depend on the model parameters can take different branches during sampling; hence, the compiled tape might be incorrect.
34-
Thus you must not use compiled tapes when your model makes decisions based on the model parameters, and you should be careful if you compute functions of parameters that those functions do not have branching which might cause them to execute different code for different values of the parameter.
43+
## Choosing an AD Backend
44+
45+
There are two aspects to choosing an AD backend: firstly, what backends are available; and secondly, which backend is best for your model.
46+
47+
### Usable AD Backends
48+
49+
Turing.jl uses the functionality in [DifferentiationInterface.jl](https://github.com/JuliaDiff/DifferentiationInterface.jl) ('DI') to interface with AD libraries in a unified way.
50+
Thus, in principle, any AD library that has integrations with DI can be used with Turing; you should consult the [DI documentation](https://juliadiff.org/DifferentiationInterface.jl/DifferentiationInterface/stable/) for an up-to-date list of compatible AD libraries.
51+
52+
Note, however, that not all AD libraries in there are tested on Turing models.
53+
Thus, it is likely that some of them will either error (because they don't know how to differentiate through Turing's code), or might silently give incorrect results.
54+
55+
Formally, our working model is that we have several _tiers_ of integration with AD libraries.
56+
Generally, we recommend that users choose AD libraries that are in **Tier 2 or above**.
57+
58+
| Integration tier | Works with DI | Tested in DynamicPPL CI | Summary | Current examples |
59+
|------------------|---------------|-------------------------|----------------------------------|--------------------------|
60+
| 3 | Yes | Yes | 'We will (try to) make it work' | Mooncake |
61+
| 2 | Yes | Yes | 'We think it should work' | ForwardDiff, ReverseDiff |
62+
| 1 | Yes | No | 'You're on your own' | Enzyme, Zygote |
63+
| 0 | No | No | 'You can't use this' | |
64+
65+
**Tier 0** means that the AD library is not integrated with DI, and thus will not work with Turing.
66+
67+
**Tier 1** means that the AD library is integrated with DI, and you can try to use it with Turing if you like; however, we provide no guarantee that it will work correctly.
68+
If you submit an issue about using Turing with a Tier 1 library, it is unlikely that we will be able to help you, unless the issue is very simple to fix.
69+
70+
**Tier 2** indicates some level of confidence on our side that the AD library will work, because it is included as part of DynamicPPL's continuous integration (CI) tests.
71+
If you find that a Tier 2 backend does not work with Turing, you are welcome to submit an issue, and we will try to look into it.
72+
Note, however, that this does not imply that we take responsibility for ensuring that any given model will work with these backends.
73+
This may be either due to upstream bugs / limitations (which exist even for ForwardDiff), or simply because of time constraints.
74+
However, if there are workarounds that can be implemented in Turing to make the backend work, we will try to do so.
75+
76+
**Tier 3** is the same as Tier 2, but in addition to that, we formally also take responsibility for ensuring that the backend works with Turing models.
77+
If you submit an issue about using Turing with a Tier 3 library, we will actively try to make it work.
78+
Realistically, this is only possible for AD backends that are actively maintained by somebody on the Turing team, such as Mooncake.
79+
80+
### The Best AD Backend for Your Model
81+
82+
Given this choice of backends, how do you choose the best one for your model?
3583

36-
For `Zygote`, pass `adtype=AutoZygote()` to the sampler constructor.
84+
A simple heuristic is to look at the number of parameters in your model.
85+
The log density of the model, i.e. the function being differentiated, is a function that goes from $\mathbb{R}^n \to \mathbb{R}$, where $n$ is the number of parameters in your model.
86+
For models with a small number of parameters (say up to 20), forward-mode AD (e.g. ForwardDiff) is generally faster due to a smaller overhead.
87+
On the other hand, for models with a large number of parameters, reverse-mode AD (e.g. ReverseDiff or Mooncake) is generally faster as it computes the gradients with respect to all parameters in a single pass.
3788

38-
And the previously used interface functions including `ADBackend`, `setadbackend`, `setsafe`, `setchunksize`, and `setrdcache` are deprecated and removed.
89+
For a more exact approach, you can benchmark the different AD backends on your model.
90+
91+
```{julia}
92+
using ADTypes
93+
using BenchmarkTools
94+
using DynamicPPL: LogDensityFunction
95+
using LogDensityProblems: logdensity_and_gradient
96+
using ForwardDiff, ReverseDiff, Mooncake
97+
98+
@model function f(y)
99+
x = Vector{Float64}(undef, length(y))
100+
for i in eachindex(y)
101+
x[i] ~ Normal()
102+
y[i] ~ Normal(x[i])
103+
end
104+
end
105+
106+
ADTYPES = [AutoForwardDiff(), AutoReverseDiff(; compile=true), AutoMooncake(; config=nothing)]
107+
108+
function benchmark_model(size)
109+
x, y = randn(size), randn(size)
110+
for adtype in ADTYPES
111+
ldf = LogDensityFunction(f(y); adtype=adtype)
112+
result = @benchmark logdensity_and_gradient($ldf, $x)
113+
println("AD type: $adtype, time: $(median(result).time)")
114+
end
115+
end
116+
117+
benchmark_model(10)
118+
```
119+
120+
(Note that the times are reported in nanoseconds.)
121+
We can also use this to see that reverse-mode AD works better on larger models:
122+
123+
```{julia}
124+
benchmark_model(100)
125+
```
126+
127+
::: {.callout-note}
128+
The additional keyword argument `compile=true` for `AutoReverseDiff` specifies whether to pre-record the tape only once and reuse it later.
129+
By default, this is set to `false`, which means no pre-recording.
130+
Setting `compile=true` can substantially improve performance, but risks silently incorrect results if not used with care.
131+
Pre-recorded tapes should only be used if you are absolutely certain that the sequence of operations performed in your code does not change between different executions of your model.
132+
:::
39133

40134
## Compositional Sampling with Differing AD Modes
41135

42-
Turing supports intermixed automatic differentiation methods for different variable spaces. The snippet below shows using `ForwardDiff` to sample the mean (`m`) parameter, and using `ReverseDiff` for the variance (`s`) parameter:
136+
Turing supports intermixed automatic differentiation methods for different variable spaces. The snippet below shows using `ForwardDiff` to sample the mean (`m`) parameter, and using `ReverseDiff` for the variance (`s²`) parameter:
43137

44138
```{julia}
45139
using Turing
@@ -65,14 +159,15 @@ c = sample(
65159
)
66160
```
67161

68-
Generally, reverse-mode AD, for instance `ReverseDiff`, is faster when sampling from variables of high dimensionality (greater than 20), while forward-mode AD, for instance `ForwardDiff`, is more efficient for lower-dimension variables. This functionality allows those who are performance sensitive to fine tune their automatic differentiation for their specific models.
162+
## For AD Backend Developers
69163

70-
If the differentiation method is not specified in this way, Turing will default to using whatever the global AD backend is.
71-
Currently, this defaults to `ForwardDiff`.
164+
Suppose you have developed a new AD backend and want to integrate it with Turing.
165+
Revisiting the tier system described above, to go from Tier 0 to Tier 1, you should first integrate your AD library with DifferentiationInterface.
72166

73-
The most reliable way to ensure you are using the fastest AD that works for your problem is to benchmark them using [`TuringBenchmarking`](https://github.com/TuringLang/TuringBenchmarking.jl):
167+
Going from Tier 1 to Tier 2 is the situation that will likely require the most work.
168+
We believe that integration tests should be run on _both_ the AD library and Turing to ensure that changes to either do not break the compatibility.
169+
Thus, we require that AD libraries in Tier 2 have their own CI tests that run Turing models; and in return we will also add tests for your AD backend on our side.
170+
Please do open an issue in the first instance if you would like to discuss this further.
74171

75-
```{julia}
76-
using TuringBenchmarking
77-
benchmark_model(gdemo(1.5, 2), adbackends=[AutoForwardDiff(), AutoReverseDiff()])
78-
```
172+
We are currently working on formalising and exporting a set of tests, which will allow you to easily test your AD library on a range of Turing models.
173+
Keep an eye out for updates on this in the future!

0 commit comments

Comments
 (0)