-
Notifications
You must be signed in to change notification settings - Fork 85
flambda2-types: New n-way join algorithm #3538
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
(Looking at the failures — we should probably detect Bottom prior to "Cannot add alias between two consts: 1, 0") |
Yup, this one seems ultimately unrelated — we have an invariant asserting that we do not add aliases between two constants that we check before the code that deals with aliases between constants. Will fix in a separate PR. The other is a real bug in the n-way join — I was misled to the semantics of the "other" field for row-like by the code for the meet, and do not deal properly with the join of a tag that is known on one side and "other" in an other side. Fix incoming. |
I forgot to mention in the PR description that there is a subtle case with env extensions where we can lose equations. Consider that we are performing the join between If we later have an env extension with I don't think we are likely to hit this case frequently in practice, so it's probably fine, but ideally we'd want to process env extensions after the non-extension types to avoid this corner case. |
3f6ca78
to
7c28ee7
Compare
Note: the new join implements a variation of @lthls's suggestion in #472 and fixes #472 even though it does not use the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've gone through joined_env.ml
, which should be the most complex part of the PR. I'm posting my review notes on this file, and will move on to the rest of the PR.
7c28ee7
to
e16f8ca
Compare
This patch removes the old/basic meet and makes the new/advanced meet the only and default meet algorithm. The configuration options (`-flambda2-basic-meet`, `-flambda2-advanced-meet`, and the `flambda2-meet-algorithm` OCAMLPARAM) are kept to avoid breaking scripts, but are now silently ignored. Note that the `Meet_and_join_new` module is *NOT* renamed to `Meet_and_join`, and the `Meet_and_join` module simply forwards to `Meet_and_join_new`. This is intentional in order to re-use the same dispatch mechanism for the new join algorithm in ocaml-flambda#3538. This PR also takes the opportunity to clean up the interfaces: - Remove the now unused `meet_env_extension`; - Remove `join` from the `Flambda2_types` interface
This patch removes the old/basic meet and makes the new/advanced meet the only and default meet algorithm. The configuration options (`-flambda2-basic-meet`, `-flambda2-advanced-meet`, and the `flambda2-meet-algorithm` OCAMLPARAM) are kept to avoid breaking scripts, but are now silently ignored. Note that the `Meet_and_join_new` module is *NOT* renamed to `Meet_and_join`, and the `Meet_and_join` module simply forwards to `Meet_and_join_new`. This is intentional in order to re-use the same dispatch mechanism for the new join algorithm in ocaml-flambda#3538. This PR also takes the opportunity to clean up the interfaces: - Remove the now unused `meet_env_extension`; - Remove `join` from the `Flambda2_types` interface
251f3ef
to
bee315d
Compare
Rebased on top of #3689 and added proper This makes the review of |
This patch removes the old/basic meet and makes the new/advanced meet the only and default meet algorithm. The configuration options (`-flambda2-basic-meet`, `-flambda2-advanced-meet`, and the `flambda2-meet-algorithm` OCAMLPARAM) are kept to avoid breaking scripts, but are now silently ignored. Note that the `Meet_and_join_new` module is *NOT* renamed to `Meet_and_join`, and the `Meet_and_join` module simply forwards to `Meet_and_join_new`. This is intentional in order to re-use the same dispatch mechanism for the new join algorithm in ocaml-flambda#3538. This PR also takes the opportunity to clean up the interfaces: - Remove the now unused `meet_env_extension`; - Remove `join` from the `Flambda2_types` interface
b1272bd
to
bbbff65
Compare
Rebased after removing the new meet. I added a separate commit that introduces |
The existing join algorithm suffers from several drawbacks: - It can be slow due to the use of a quadratic algorithm, taking up to 60% of the total compilation time in -O3 mode in pathological cases (lambda_to_flambda_primitives.ml). See also ocaml-flambda#3300. - It is inefficient as it computes the join of all types appearing in *any* joined environment prior to filtering out the types that are not needed, instead of first computing the types whose join will be needed. - It is sensitive to the names of local variables that only exist in some of the joined environments but not in the target environment. - It relies on a global binding time of variables across all joined environments and the target environment that does not exist, as figured in ocaml-flambda#3278. Subsequently, it can lose aliasing information, and breaks typing env invariants by recording the same variable as defined multiple times (with dubious semantics). This patch implements a new join algorithm, based on a n-way join of types. The new algorithm is: - Faster, as it avoids quadratic complexity (outside of complex nesting of env extensions). Compared to the existing join algorithm (with advanced meet), on my machine, the new join algorithm is 30x faster on the pathological lambda_to_flambda_primitives.ml, taking only around 10% of the total compilation time and speeding up the compilation of the file by 3.5x. On camlinternalFormat.ml, the new join is about 2.5-3x faster, reducing the time spent in the join from 20% to less than 10% and speeding up the total compilation time by about 20%. - More efficient, as it only computes a join if it can possibly result in a more precise type, i.e. if the variable has been assigned a new type in all joined environments (otherwise the existing type in the target environment is already the most precise). - Independent of the names of local variables. - Only depends on a consistent binding time *order* of the shared variables (defined in both the target environment and all joined environments), which is respected. Since the result is independent of the binding times of local / existential variables, the typing env invariants are respected.
d6d4334
to
0d8ee03
Compare
The existing join algorithm suffers from several drawbacks:
It can be slow due to the use of a quadratic algorithm, taking up to 60% of the total compilation time in -O3 mode in pathological cases (lambda_to_flambda_primitives.ml). See also Improve join performance #3300.
It is inefficient as it computes the join of all types appearing in any joined environment prior to filtering out the types that are not needed, instead of first computing the types whose join will be needed.
It is sensitive to the names of local variables that only exist in some of the joined environments but not in the target environment.
It relies on a global binding time of variables across all joined environments and the target environment that does not exist, as figured in Revert "Ensure consistent variable binding times (#3217)" #3278. Subsequently, it can lose aliasing information, and breaks typing env invariants by recording the same variable as defined multiple times (with dubious semantics).
This patch implements a new join algorithm, based on a n-way join of types. The new algorithm is:
Faster, as it avoids quadratic complexity (outside of complex nesting of env extensions). Compared to the existing join algorithm (with advanced meet), on my machine, the new join algorithm is 30x faster on the pathological lambda_to_flambda_primitives.ml, taking only around 10% of the total compilation time and speeding up the compilation of the file by 3.5x. On camlinternalFormat.ml, the new join is about 2.5-3x faster, reducing the time spent in the join from 20% to less than 10% and speeding up the total compilation time by about 20%.
More efficient, as it only computes a join if it can possibly result in a more precise type, i.e. if the variable has been assigned a new type in all joined environments (otherwise the existing type in the target environment is already the most precise).
Independent of the names of local variables.
Only depends on a consistent binding time order of the shared variables (defined in both the target environment and all joined environments), which is respected. Since the result is independent of the binding times of local / existential variables, the typing env invariants are respected.
Reviewing guide
The actual changes in this PR are in the
Meet_and_join_new
andJoin_env
modules, as well asJoin_levels
(which is now just a small wrapper overJoin_env
). The changes in both modules are mostly independent: changes inMeet_and_join_new
make the join of types a n-way join, while theJoin_env
module implements the new join algorithm for environments, and in particular takes care of joining aliases. The main thing to know to relate these two modules is that the join of (canonical) alias types should go throughJoin_env.n_way_join_simple
, which will return an appropriate alias (creating a local variable if required) in the target env.(As an aside: I think the organization of the
Join_env
module is roughly right, but am not too satisfied with the names of things and happily take suggestions)The rest of the changes are temporary, and mostly intended for further debugging of the PR. They should not need to be reviewed, and will be removed before merging the PR. In particular, the
Meet_new_and_join_old
andJoin_levels_old
modules are (almost) identical copies of theMeet_and_join_new
andJoin_levels
modules frommain
. They are used in conjunction with the environment variableFLAMBDA2_JOIN_ALGORITHM=checked
to compare the results of the new and old join algorithm. This comparison uses the auxiliaryEqual_types_for_debug
module to print out the result of joins that are different when computed using the new and old join algorithms. When visually inspecting such differences, I recommend also settingFLAMBDA2_JOIN_DEBUG_IGNORE_NAMES=cse_param
-- the new join is better at preserving equations betweencse_param
variables and this creates noise.From my own inspection, the main differences between the old and new join is that the new join is better at preserving aliases (so sometimes the new join will have an alias where the old join will have duplicated the type), and it is also better at preserving the
alloc_mode
of blocks (the old join considers the alloc mode of Bottom blocks, while the new join ignores them -- which I think is correct).Symbol projections
I am not sure I understand how symbol projections are supposed to be joined. The old join seems to preserve symbol projections as long as they are present in one of the joined envs (and the variable is still accessible in the target env), which might make sense from the point of view of Simplify but seems dubious from the point of view of the typing env. The new join preserve symbol projections if they are present in all of the joined envs (up to demotions) because that is what seems sensible for a join, but I am not sure if this is the intended semantics. I don't really understand how symbol projections are created/used; maybe @mshinwell can clarify.
Basic meet
The PR is currently only compatible with the new (advanced) meet, which will always be used when joining environments. Before merging the PR, we should decide how we want to proceed w.r.t. the basic meet. If it goes away, there is nothing to do; if we still keep it for a bit, we can either keep the
Join_levels_old
module for the basic meet; force the join to go through the advanced meet (even if the basic meet is otherwise used); or adapt the n-way join code for the basic meet as a last resort.