Value_rec_compiler: prevent duplication of values #13938

lthls · 2025-04-07T10:36:02Z

To avoid having two different values (the dummy allocation that gets updated later, and the actual allocation that eventually gets used to do the update), we move the call to caml_update_dummy to be immediately around the allocation primitive.
The now initialised value is then used in the rest of the definition where the allocation would have been.

Copying my example from #13933:

(* Before compilation *)
let rec v =
  let r = ref 0 in
  escape r;
  r

(* After compilation, old version *)
let v = alloc_dummy 1 in
caml_update_dummy v
  (let r = ref 0 in
   escape r;
   r)

(* After compilation, new version *)
let v = alloc_dummy 1 in
let r = (caml_update_dummy v (ref 0); v) in
escape r;
ignore r

Observing the difference:

let glob = ref None
let escape r = glob := Some r

let rec v =
  let r = ref 0 in
  escape r;
  r

let () =
  match !glob with
  | None -> assert false
  | Some r -> assert (v == r)

(The fix to primitive.mli is an unrelated typo that I noticed while working on that)

lthls · 2025-04-07T12:15:41Z

Note that this PR fixes both #13930 and #13931.
On the example from #13931, we end up with some rather strange results:

# let f a = let rec x = let rec y = Some a in y in x;;
(let
  (f/281 =
     (function a/283
       (funct-body f //toplevel//(1)<ghost>:6-50
         (before f //toplevel//(1):10-50
           (let (x/284 = (caml_alloc_dummy 1))
             (seq
               (let
                 (y/285 =
                    (seq (caml_update_dummy x/284 (caml_alloc_dummy 1))
                      x/284))
                 (seq
                   (seq (caml_update_dummy y/285 (makeblock 0 a/283)) y/285)
                   (before f //toplevel//(1):44-45 y/285)))
               (before f //toplevel//(1):49-50 x/284)))))))
  (apply (field_mut 1 (global Toploop!)) "f" f/281))

Here x is first allocated with caml_alloc_dummy, then a new dummy is created for y, and this dummy is used to replace the previous value of x (that's the strange part: replacing a dummy with a dummy). Afterwards, y is updated with its expected value, and this happens to setup x too as they're aliases of each other.

To avoid having two different values (the dummy allocation that gets updated later, and the actual allocation that eventually gets used to do the update), we move the call to caml_update_dummy to be immediately around the allocation primitive. The now initialised value is then used in the rest of the definition where the allocation would have been.

gasche · 2025-04-07T14:52:03Z

Here is how I understand your proposal:

Today the value compiler works by first computing the static size of each expression (to know how to pre-allocate dummy blocks), and then deciding the cycle-breaking strategy from this. (Note: it does not reuse the notion of static size computed in Value_rec_check, which makes the system more robust by turning mismatches into compile-time fatal errors rather than run-time segfaults.)
Whenever we succeed in computing a static size of a recursive right-hand-side expression, it means that we analyzed all the block-construction sites that can flow to the final result of this expression (and found them to have the same size).
We can push the dummy-update logic to these block-construction sites, instead of doing the dummy-update at the end. One advantage of doing this is that the computation then continue with the dummy value, instead of continuing with a physically-different final value, which removes weird behaviors coming from duplication of values.

This is interesting.

I see the following downsides:

This can duplicate the dummy-update calls, in programs that contain join points. (For example, let rec x = if Random.bool () then Some y else Some z would get two caml_update_dummy calls instead of one. One way to avoid some of the code duplication would be to only perform this pushing-in optimization for mutable blocks -- for immutable blocks there is no clear benefit to the transformation.
I don't understand how this works on recursive right-hand-sides of the form let x, y = <block>, <block> in ...; if p then x else y. A priori you don't know which of x or y is going to flow in the result, so you don't know which one to put in the dummy, and they may both get mutated in the ... part.
For lazy values I think that we still need a caml_update_lazy primitive, because this does not in itself solve the problem of eliding some Pmakelazyblock indirections on during compilation. That is, this is not robust to a version of lazy v: optimize structured constants #13919 that would be done in Simplif rather than on the typedtree. caml_update_lazy adds this extra safety that all valid lazy values are supported correctly, and I think we want the robustness. So even if we decide to go for your approach we do want some parts of Introduce caml_update_dummy_lazy to avoid value-letrec bugs due to Forward shortcuts #13933, I think.

lthls · 2025-04-07T15:16:37Z

This can duplicate the dummy-update calls, in programs that contain join points. (For example, let rec x = if Random.bool () then Some y else Some z would get two caml_update_dummy calls instead of one. One way to avoid some of the code duplication would be to only perform this pushing-in optimization for mutable blocks -- for immutable blocks there is no clear benefit to the transformation.

That's not an issue: join points are forbidden (if you look at join_sizes here, you will see that the join of two block of the same size is a fatal error; in Value_rec_check, all branching constructions are treated as Dynamic)

I don't understand how this works on recursive right-hand-sides of the form let x, y = <block>, <block> in ...; if p then x else y. A priori you don't know which of x or y is going to flow in the result, so you don't know which one to put in the dummy, and they may both get mutated in the ... part.

Same reasoning: no join means there cannot be any ambiguity.

For lazy values I think that we still need a caml_update_lazy primitive, because this does not in itself solve the problem of eliding some Pmakelazyblock indirections on during compilation. That is, this is not robust to a version of lazy v: optimize structured constants #13919 that would be done in Simplif rather than on the typedtree. caml_update_lazy adds this extra safety that all valid lazy values are supported correctly, and I think we want the robustness. So even if we decide to go for your approach we do want some parts of Introduce caml_update_dummy_lazy to avoid value-letrec bugs due to Forward shortcuts #13933, I think.

I think your concern is valid, so I would be in favour of adding the caml_update_dummy_lazy part of #13933, but I believe that this PR enforces that the argument to caml_update_lazy is never published anywhere so at least we don't need the parts that deal with that (CamlinternalLazy.indirect, although the PR seems to miss the actual addition of this primitive).

gasche · 2025-04-07T15:22:59Z

Is the fact that join points are forbidden an observation about the current state of what we handle, or something that is part of your internal specification for Value_rec_compiler and will remain true tomorrow? If we follow the route proposed by this PR, we are tying our hands in this respect. I don't have any desire to start accepting more recursive definitions, but I wonder if we could end up in a state where, for example, some join points would start being generated (for currently-accepted programs) in the Typedtree->Lambda translation.

lthls · 2025-04-07T16:13:38Z

Is the fact that join points are forbidden an observation about the current state of what we handle, or something that is part of your internal specification for Value_rec_compiler and will remain true tomorrow?

I explicitly chose to forbid join points when I implemented it, although at that time I did not rely on it.

If we follow the route proposed by this PR, we are tying our hands in this respect. I don't have any desire to start accepting more recursive definitions, but I wonder if we could end up in a state where, for example, some join points would start being generated (for currently-accepted programs) in the Typedtree->Lambda translation.

It is something that I have thought a little about. To sum it up, I'm not too worried. The previous code (before Value_rec_check) didn't handle any branching construction, so we don't have to worry about existing cases (even the class translation code had to manually inline a CamlinternalOO function when the class is actually recursive to make sure that the compiler could see the allocation). For the future I think we want to avoid as much as possible silently introducing branches, but if we do we could still handle it in Value_rec_compiler with a bit of work (if you read the code of this PR, you may notice that the Pduparray case already handles something like that).

gasche · 2025-04-07T16:26:23Z

My current thinking: I like that this PR improves behavior in some weird corner cases, the behavior when it works well is rather elegant. But I also wouldn't fully trust it to ensure soundness, because it is relying on subtle non-local assumptions.

To guarantee soundness, I would rather start from #13933 (including the indirection which I'm very proud of) and then the orthogonal part of this PR that recognize the allocation primitives to fix #13931.

Following this line of thinking, the question is whether we also want the present PR in addition, to fix surprising behaviors coming from the duplication of mutable state. I would be tempted to say yes. (The code is a bit complex, but conceptually the idea is clear, and I hope that we could make it pleasant.)

lthls · 2025-04-07T16:42:52Z

I'm not very fond of your indirection, but I agree that we should first go with the simpler changes to fix the real bugs, and think about the rest of this PR as a medium-term goal, including producing more documentation and specifications for recursive value definitions.

lthls added 2 commits April 7, 2025 16:47

Hygiene

2284b42

lthls force-pushed the update-dummy-eager branch from 2a1378c to 2284b42 Compare April 7, 2025 14:47

gasche mentioned this pull request Apr 7, 2025

Introduce caml_update_dummy_lazy to avoid value-letrec bugs due to Forward shortcuts #13933

Merged

lthls mentioned this pull request Apr 8, 2025

Typeopt: better handling for deeply nested unboxed types #13944

Closed

dra27 assigned gasche Apr 16, 2025

This was referenced Apr 16, 2025

Fix compilation failures in some nested recursive value definitions #13968

Merged

Error when compiling nested non-function recursive definitions #13931

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Value_rec_compiler: prevent duplication of values #13938

Value_rec_compiler: prevent duplication of values #13938

lthls commented Apr 7, 2025 •

edited by gasche

Loading

lthls commented Apr 7, 2025

gasche commented Apr 7, 2025

lthls commented Apr 7, 2025

gasche commented Apr 7, 2025

lthls commented Apr 7, 2025

gasche commented Apr 7, 2025

lthls commented Apr 7, 2025

Value_rec_compiler: prevent duplication of values #13938

Are you sure you want to change the base?

Value_rec_compiler: prevent duplication of values #13938

Conversation

lthls commented Apr 7, 2025 • edited by gasche Loading

lthls commented Apr 7, 2025

gasche commented Apr 7, 2025

lthls commented Apr 7, 2025

gasche commented Apr 7, 2025

lthls commented Apr 7, 2025

gasche commented Apr 7, 2025

lthls commented Apr 7, 2025

lthls commented Apr 7, 2025 •

edited by gasche

Loading