Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simple efficient ssa #494

Open
wants to merge 5 commits into
base: 2025sp
Choose a base branch
from
Open

Conversation

neel-patel-1
Copy link
Contributor

No description provided.

Copy link
Owner

@sampsyo sampsyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for getting this post started, and for giving it a shot relatively early in the semester!

I am concerned that some of the claims you make about this paper are too strong and lack evidence. I think the problem is that you have mostly simplified specific, technical criticisms into broader, blanket statements that may be impossible to justify with convincing arguments. I have left specific references in the review, but my main suggestion here is to take a step back and think carefully about which arguments you think you can justify and which are too broad to be really justifiable.

+++
# Background

Single static assignment (SSA) form [[1]](#1) and the first efficient conversion algorithm [[2]](#2) emerged in the 1980's. SSA is an intermediate representation (IR) which enforces that each variable is assigned exactly once. It provides a convenient form and way of thinking about programs which makes some optimizations more efficient.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this is a blog post for the web and not a paper, you can just link the relevant text instead of using bracketed footnotes. Like this:

[Static single assignment (SSA) form][ssa] and the first efficient…

[ssa]: https://compilers.cs.uni-saarland.de/ssasem/talks/Kenneth.Zadeck.pdf

Then people can just click the link, instead of clicking the link to find the link to click.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to use name form throughout, but removed references section. Feel free to let me know if keeping the references section is preferred

+++
# Background

Single static assignment (SSA) form [[1]](#1) and the first efficient conversion algorithm [[2]](#2) emerged in the 1980's. SSA is an intermediate representation (IR) which enforces that each variable is assigned exactly once. It provides a convenient form and way of thinking about programs which makes some optimizations more efficient.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"each variable is assigned exactly once": This definition kinda leaves out the "static" part of the definition, and could be misinterpreted as saying that they are only assigned once dynamically (which admittedly would not make much sense).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corrected to emphasize that each program's variable assignment has a unique name


<img src="image-27.png" alt="" width="60%">

The first published algorithm for SSA construction [[6]](#6), written by Cytron et al., proceeds in two steps. The first places phi functions throughout the program, indicating ambiguities in assignments due to control flow. The second renames variables to ensure SSA’s single assignment property is satisfied. Importantly, Cytron’s algorithm relies on calculation of the dominance frontier: “the set of all CFG nodes Y such that X dominates a predecessor of Y but does. not strictly dominate Y”, for each basic block X in the control flow graph (CFG) representation of the program.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this is all true, it doesn’t quite set up the salient contrast with the next paragraph. Namely, one big contrast here is that Cytron starts with a CFG while the paper you are summarizing does not need a CFG at all (much less a domtree for that CFG). This description kinda buries that big difference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made explicit that Cytron et al.'s takes the CFG as input to better set up the next paragraph

### Shortcomings
Although these merits seem enticing, there are several shortcomings. The algorithm results in much more complexity than what the paper emphasizes. This is especially apparent in the context of compiler programmers working with this direct translation approach but there is also complexity within the actual algorithm regarding mechanisms such as sealing blocks. The main concern, however, is about the maintainability and potentially leading to many bugs within the actual optimization passes. Additionally, in many cases our future passes will rely on dominance data structures. In these cases, the intermediate steps of SSA conversion are not truly detours as the side effects bring value to a future optimization pass. It should be noted that these data structures may not be fully accurate in the case that the SSA conversion results in a global optimization that changes the CFG, but it is still presumably more efficient to adjust these dominance structures for accuracy rather than creating or recreating them later. This application seems like it will only be valuable when applied to certain use cases. It also may create a lot of implementation overhead. For example, we may need to implement lots of boilerplate for the syntax of different languages when trying to complete this direct translation which can make the overall optimization go slower. This can particularly be the case for compilers with frontends that support multiple languages. The largest source of skepticism lied within the empirical results presented in the paper. The paper made loose claims about “optimized” and “unoptimized” versions of their algorithm and Cytron et al. 's algorithm. This led to doubts about whether the performance will actually improve in a real-world compilation or if this is a project that emphasizes only theoretical benefits.
# Historical Context
Prof. Sampson mentioned that SSA form [[1]](#1) and the first efficient conversion algorithm [[2]](#2) emerged in the 1980's, whereas the simple and efficient algorithm discussed in class was published in 2013. One question discussed in class was why Cytron et al.'s implementation has been the de facto SSA conversion scheme, used in the LLVM IR [[3]](#3) and other languages’ compiler toolchains [[4]](#4). Its ~25 year head start is one likely reason.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I'm flattered that you attribute this to observation me, it is self-evident from the publication dates of the papers. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

obvio

Although these merits seem enticing, there are several shortcomings. The algorithm results in much more complexity than what the paper emphasizes. This is especially apparent in the context of compiler programmers working with this direct translation approach but there is also complexity within the actual algorithm regarding mechanisms such as sealing blocks. The main concern, however, is about the maintainability and potentially leading to many bugs within the actual optimization passes. Additionally, in many cases our future passes will rely on dominance data structures. In these cases, the intermediate steps of SSA conversion are not truly detours as the side effects bring value to a future optimization pass. It should be noted that these data structures may not be fully accurate in the case that the SSA conversion results in a global optimization that changes the CFG, but it is still presumably more efficient to adjust these dominance structures for accuracy rather than creating or recreating them later. This application seems like it will only be valuable when applied to certain use cases. It also may create a lot of implementation overhead. For example, we may need to implement lots of boilerplate for the syntax of different languages when trying to complete this direct translation which can make the overall optimization go slower. This can particularly be the case for compilers with frontends that support multiple languages. The largest source of skepticism lied within the empirical results presented in the paper. The paper made loose claims about “optimized” and “unoptimized” versions of their algorithm and Cytron et al. 's algorithm. This led to doubts about whether the performance will actually improve in a real-world compilation or if this is a project that emphasizes only theoretical benefits.
# Historical Context
Prof. Sampson mentioned that SSA form [[1]](#1) and the first efficient conversion algorithm [[2]](#2) emerged in the 1980's, whereas the simple and efficient algorithm discussed in class was published in 2013. One question discussed in class was why Cytron et al.'s implementation has been the de facto SSA conversion scheme, used in the LLVM IR [[3]](#3) and other languages’ compiler toolchains [[4]](#4). Its ~25 year head start is one likely reason.
It is yet to be seen whether the efficient algorithm implemented by Braun et al. [[5]](#5) will find applications in any compiler toolchains.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is false. You can disprove it with a quick google search for the terms "simple and efficient" ssa site:github.com, which yields several compiler projects that reference this paper. That's what I did to construct this list during the discussion period:
#488 (reply in thread)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Referenced the the projects you found that use Simple and Efficient SSA. Interesting that MemorySSA uses it. Also found this paper that cites Braun et al., but can't tell if the compiler (Triton) actually uses it

There are many positive claims about this algorithm that were made both in the paper and during the discussion section. By eliminating the need to construct dominance data structures, compile times are faster for specific scenarios. The combining of two transformations into one removes any detours along the pipeline. This direct translation also reduces the memory overhead- these data structures computed during the intermediate steps no longer need to be stored in memory. This algorithm could be particularly useful for Just-In-Time compilation contexts. These are situations where the compiler is less likely to heavily rely on dominance information for the optimizations, but it will still utilize SSA form. This is an example of a specific case where the direct translation has clear benefits- faster SSA conversion, no detours and less memory overhead.

### Shortcomings
Although these merits seem enticing, there are several shortcomings. The algorithm results in much more complexity than what the paper emphasizes. This is especially apparent in the context of compiler programmers working with this direct translation approach but there is also complexity within the actual algorithm regarding mechanisms such as sealing blocks. The main concern, however, is about the maintainability and potentially leading to many bugs within the actual optimization passes. Additionally, in many cases our future passes will rely on dominance data structures. In these cases, the intermediate steps of SSA conversion are not truly detours as the side effects bring value to a future optimization pass. It should be noted that these data structures may not be fully accurate in the case that the SSA conversion results in a global optimization that changes the CFG, but it is still presumably more efficient to adjust these dominance structures for accuracy rather than creating or recreating them later. This application seems like it will only be valuable when applied to certain use cases. It also may create a lot of implementation overhead. For example, we may need to implement lots of boilerplate for the syntax of different languages when trying to complete this direct translation which can make the overall optimization go slower. This can particularly be the case for compilers with frontends that support multiple languages. The largest source of skepticism lied within the empirical results presented in the paper. The paper made loose claims about “optimized” and “unoptimized” versions of their algorithm and Cytron et al. 's algorithm. This led to doubts about whether the performance will actually improve in a real-world compilation or if this is a project that emphasizes only theoretical benefits.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These arguments need more development in order to be convincing. Namely:

  • “much more complexity than what the paper emphasizes”: To justify this claim, you would to have to give a list of additional sources of complexity and somehow argue that the magnitude is large (for that “very”).
  • “This is especially apparent in the context of compiler programmers working with this direct translation approach”: You have not presented any arguments for why this approach would complicate the lives of programmers; you have just stated it. Can you think through how you would construct an argument to support your claim?
  • “there is also complexity within the actual algorithm regarding mechanisms such as sealing blocks”: Now I am no longer sure what you mean by “complexity.” Because you’re comparing two algorithms, maybe you mean asymptotic complexity, i.e., big-O running time? Or maybe you just mean how long and detailed the pseudocode looks? If the latter, how did you compare the two? If you’re going to say this, you need to make a convincing argument, with evidence, for the actual difference you are thinking of.
  • “Additionally, in many cases our future passes will rely on dominance data structures.” This seems like a completely different argument, about efficiency instead of simplicity. At the very least, this seems to deserve its own paragraph so you can fully develop this separate argument?
  • “only be valuable when applied to certain use cases”: At the very least, it seems like you must say what those use cases are (or what they are not). “Certain” use cases is too vague.
  • “lots of boilerplate for the syntax of different languages”: You need more sentences here to convincingly develop this argument. What you are implicitly assuming here, but not directly stating, is that we want to build a compiler framework (like LLVM) with a single IR to support many different languages. In that setting, maybe you want those frontends to be easy to write. If so, maybe it’s nicer to have a CFG IR you can generate without doing SSA first. But none of these arguments apply if you’re just building one compiler, not a framework. So I don't think it's just "particularly the case" for frameworks; it's only the case for frameworks.
  • "The paper made loose claims": That sounds pretty broadly negative, and it's hard to tell exactly which claims you're referring to (to decide whether the paper actually made those claims). When what I think you mean to summarize here is "the paper only showed empirical results about specific claim X. Specific claim Y would be nice, but the paper did not evaluate it." Then, separately, you may or may not want to add on "the paper says that claim Y is true, but we don't have the evidence." I think it is EXTREMELY important, when you are criticizing a paper for overclaiming, that you both (a) clearly demonstrate that they actually made a specific claim, and (b) clearly delineate the lack of evidence for that claim. It is a bad idea to vaguely gesture at categories of claims they may or may not have made.

Overall, I would recommend these steps:

  1. Carefully make a list of all the criticisms you want to make.
  2. Decide which ones you have actual arguments or evidence for. Discard the ones where you do not have substantiation.
  3. Rewrite each criticism as a standalone paragraph that includes both a clear statement of the criticism itself and convincing evidence to support it.

By mashing things together into one long paragraph, I worry that you can too easily hide the fact that the criticisms have insufficiently detailed support.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the Merits/Shortcomings to reflect and substantiate the claims we aimed to make:

List of merits:

  • Constructing alternative program representations and data structures can incur significant overhead, due to the runtime of the algorithm and the corresponding memory allocations. By eliminating the construction of the dominance-related data structures, in some scenarios, the end-to-end compilation time may be reduced.
  • By integrating on-the-fly optimizations into the SSA construction procedure, the algorithm could be useful when compiling in time-constrained scenarios (e.g., just-in-time compilers), where there may not be time to run a separate pass to perform an optimization.
  • Some compilers enthusiasts in CS6120 found the algorithm to be unique and thought the proofs of some of its properties were elegant. Specifically, we appreciated the use of recursion to calculate phi nodes lazily and the use of strongly connected components to prove that the algorithm minimized the number of phi functions placed throughout the SSA IR.

List of criticisms:

  • Simple, Efficient SSA may make maintainability more challenging:
    • Because Braun et al.'s algorithm goes straight from the AST to SSA form, the compiler's front-end may become more complex. Specifically, any modularity afforded by first converting a language-dependent AST into a CFG is lost. While this may be fine for compilers targeting a single language, compiler frameworks, like LLVM, implement front-ends for multiple languages. It may become burdensome for front-end developers to always have to implement Braun et al.'s algorithm. Some developers may prefer to transform the AST into a simpler non-SSA IR, and a framework that separates these concerns seems simpler to maintain.
    • The proposed algorithm enables the use of on-the-fly optimizations performed at SSA construction time. While useful, these are implemented as modifications to the SSA construction implementation, which could lead to code churn or introduce bugs.
  • Loss of data structures which may be useful for other passes
    • Since Braun et al.'s algorithm does not rely on the dominance frontier or dominator tree, these data structures would not be created at SSA construction time. If later analyses or optimizations that use these data structures are performed, much of the speedup from Braun et al.'s algorithm may be lost to subsequent passes. For example, the dominates relation is required to determine whether it is safe to perform loop-invariant code motion and the dominator tree is used for contification which can expose function inlining opportunities.
  • Hard to tell whether the proposed algorithm reduces SSA construction time significantly
    • It certainly seems reasonable that a front-end based on Braun et al.'s algorithm could significantly speed up end-to-end compilation. However, Braun et al.'s implementation in LLVM 3.1 only executed 0.28% fewer instructions in when compiling all programs in the SPEC CINT2000 suite. It is worth noting that their implementation was not as highly tuned as the baseline LLVM implementation they compared against.
    • It is also worth noting that some compilers and tools have begun adopting Braun et al.'s algorithm for SSA conversion, lending credence to the author's claim that direct AST-to-SSA could provide non-negligible speedups. For example, a SPIR-V-Tools pass converts SPIR-V functions directly into SSA form using Braun et al.'s algorithm. The Memory SSA analysis, which builds an SSA-like representation for LLVM memory operations, also uses the marker algorithm presented by Braun et al. Also, the MIR project, which focuses on building JIT compilers currently uses Braun et al.'s algorithm for SSA construction.

@neel-patel-1
Copy link
Contributor Author

@sampsyo Thanks for taking the time to review and leave feedback. After reviewing and addressing the comments, we will update the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants