Skip to content

Value_rec_compiler: prevent duplication of values #13938

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: trunk
Choose a base branch
from

Conversation

lthls
Copy link
Contributor

@lthls lthls commented Apr 7, 2025

To avoid having two different values (the dummy allocation that gets updated later, and the actual allocation that eventually gets used to do the update), we move the call to caml_update_dummy to be immediately around the allocation primitive.
The now initialised value is then used in the rest of the definition where the allocation would have been.

Copying my example from #13933:

(* Before compilation *)
let rec v =
  let r = ref 0 in
  escape r;
  r

(* After compilation, old version *)
let v = alloc_dummy 1 in
caml_update_dummy v
  (let r = ref 0 in
   escape r;
   r)

(* After compilation, new version *)
let v = alloc_dummy 1 in
let r = (caml_update_dummy v (ref 0); v) in
escape r;
ignore r

Observing the difference:

let glob = ref None
let escape r = glob := Some r

let rec v =
  let r = ref 0 in
  escape r;
  r

let () =
  match !glob with
  | None -> assert false
  | Some r -> assert (v == r)

(The fix to primitive.mli is an unrelated typo that I noticed while working on that)

@lthls
Copy link
Contributor Author

lthls commented Apr 7, 2025

Note that this PR fixes both #13930 and #13931.
On the example from #13931, we end up with some rather strange results:

# let f a = let rec x = let rec y = Some a in y in x;;
(let
  (f/281 =
     (function a/283
       (funct-body f //toplevel//(1)<ghost>:6-50
         (before f //toplevel//(1):10-50
           (let (x/284 = (caml_alloc_dummy 1))
             (seq
               (let
                 (y/285 =
                    (seq (caml_update_dummy x/284 (caml_alloc_dummy 1))
                      x/284))
                 (seq
                   (seq (caml_update_dummy y/285 (makeblock 0 a/283)) y/285)
                   (before f //toplevel//(1):44-45 y/285)))
               (before f //toplevel//(1):49-50 x/284)))))))
  (apply (field_mut 1 (global Toploop!)) "f" f/281))

Here x is first allocated with caml_alloc_dummy, then a new dummy is created for y, and this dummy is used to replace the previous value of x (that's the strange part: replacing a dummy with a dummy). Afterwards, y is updated with its expected value, and this happens to setup x too as they're aliases of each other.

lthls added 2 commits April 7, 2025 16:47
To avoid having two different values (the dummy allocation that gets updated later,
and the actual allocation that eventually gets used to do the update),
we move the call to caml_update_dummy to be immediately around the allocation
primitive.
The now initialised value is then used in the rest of the definition where
the allocation would have been.
@lthls lthls force-pushed the update-dummy-eager branch from 2a1378c to 2284b42 Compare April 7, 2025 14:47
@gasche
Copy link
Member

gasche commented Apr 7, 2025

Here is how I understand your proposal:

  • Today the value compiler works by first computing the static size of each expression (to know how to pre-allocate dummy blocks), and then deciding the cycle-breaking strategy from this. (Note: it does not reuse the notion of static size computed in Value_rec_check, which makes the system more robust by turning mismatches into compile-time fatal errors rather than run-time segfaults.)

  • Whenever we succeed in computing a static size of a recursive right-hand-side expression, it means that we analyzed all the block-construction sites that can flow to the final result of this expression (and found them to have the same size).

  • We can push the dummy-update logic to these block-construction sites, instead of doing the dummy-update at the end. One advantage of doing this is that the computation then continue with the dummy value, instead of continuing with a physically-different final value, which removes weird behaviors coming from duplication of values.

This is interesting.

I see the following downsides:

  1. This can duplicate the dummy-update calls, in programs that contain join points. (For example, let rec x = if Random.bool () then Some y else Some z would get two caml_update_dummy calls instead of one. One way to avoid some of the code duplication would be to only perform this pushing-in optimization for mutable blocks -- for immutable blocks there is no clear benefit to the transformation.

  2. I don't understand how this works on recursive right-hand-sides of the form let x, y = <block>, <block> in ...; if p then x else y. A priori you don't know which of x or y is going to flow in the result, so you don't know which one to put in the dummy, and they may both get mutated in the ... part.

  3. For lazy values I think that we still need a caml_update_lazy primitive, because this does not in itself solve the problem of eliding some Pmakelazyblock indirections on during compilation. That is, this is not robust to a version of lazy v: optimize structured constants #13919 that would be done in Simplif rather than on the typedtree. caml_update_lazy adds this extra safety that all valid lazy values are supported correctly, and I think we want the robustness. So even if we decide to go for your approach we do want some parts of Introduce caml_update_dummy_lazy to avoid value-letrec bugs due to Forward shortcuts #13933, I think.

@lthls
Copy link
Contributor Author

lthls commented Apr 7, 2025

  • This can duplicate the dummy-update calls, in programs that contain join points. (For example, let rec x = if Random.bool () then Some y else Some z would get two caml_update_dummy calls instead of one. One way to avoid some of the code duplication would be to only perform this pushing-in optimization for mutable blocks -- for immutable blocks there is no clear benefit to the transformation.

That's not an issue: join points are forbidden (if you look at join_sizes here, you will see that the join of two block of the same size is a fatal error; in Value_rec_check, all branching constructions are treated as Dynamic)

  • I don't understand how this works on recursive right-hand-sides of the form let x, y = <block>, <block> in ...; if p then x else y. A priori you don't know which of x or y is going to flow in the result, so you don't know which one to put in the dummy, and they may both get mutated in the ... part.

Same reasoning: no join means there cannot be any ambiguity.

I think your concern is valid, so I would be in favour of adding the caml_update_dummy_lazy part of #13933, but I believe that this PR enforces that the argument to caml_update_lazy is never published anywhere so at least we don't need the parts that deal with that (CamlinternalLazy.indirect, although the PR seems to miss the actual addition of this primitive).

@gasche
Copy link
Member

gasche commented Apr 7, 2025

Is the fact that join points are forbidden an observation about the current state of what we handle, or something that is part of your internal specification for Value_rec_compiler and will remain true tomorrow? If we follow the route proposed by this PR, we are tying our hands in this respect. I don't have any desire to start accepting more recursive definitions, but I wonder if we could end up in a state where, for example, some join points would start being generated (for currently-accepted programs) in the Typedtree->Lambda translation.

@lthls
Copy link
Contributor Author

lthls commented Apr 7, 2025

Is the fact that join points are forbidden an observation about the current state of what we handle, or something that is part of your internal specification for Value_rec_compiler and will remain true tomorrow?

I explicitly chose to forbid join points when I implemented it, although at that time I did not rely on it.

If we follow the route proposed by this PR, we are tying our hands in this respect. I don't have any desire to start accepting more recursive definitions, but I wonder if we could end up in a state where, for example, some join points would start being generated (for currently-accepted programs) in the Typedtree->Lambda translation.

It is something that I have thought a little about. To sum it up, I'm not too worried. The previous code (before Value_rec_check) didn't handle any branching construction, so we don't have to worry about existing cases (even the class translation code had to manually inline a CamlinternalOO function when the class is actually recursive to make sure that the compiler could see the allocation). For the future I think we want to avoid as much as possible silently introducing branches, but if we do we could still handle it in Value_rec_compiler with a bit of work (if you read the code of this PR, you may notice that the Pduparray case already handles something like that).

@gasche
Copy link
Member

gasche commented Apr 7, 2025

My current thinking: I like that this PR improves behavior in some weird corner cases, the behavior when it works well is rather elegant. But I also wouldn't fully trust it to ensure soundness, because it is relying on subtle non-local assumptions.

To guarantee soundness, I would rather start from #13933 (including the indirection which I'm very proud of) and then the orthogonal part of this PR that recognize the allocation primitives to fix #13931.

Following this line of thinking, the question is whether we also want the present PR in addition, to fix surprising behaviors coming from the duplication of mutable state. I would be tempted to say yes. (The code is a bit complex, but conceptually the idea is clear, and I hope that we could make it pleasant.)

@lthls
Copy link
Contributor Author

lthls commented Apr 7, 2025

I'm not very fond of your indirection, but I agree that we should first go with the simpler changes to fix the real bugs, and think about the rest of this PR as a medium-term goal, including producing more documentation and specifications for recursive value definitions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants