Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BLOG: add Efficient Path Profiling blog post #493

Open
wants to merge 11 commits into
base: 2025sp
Choose a base branch
from

Conversation

scober
Copy link
Contributor

@scober scober commented Mar 7, 2025

Copy link
Owner

@sampsyo sampsyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, everybody—thanks for taking the plunge and being the first to write a summary blog post. Overall, you have summarized one of the key technical bits of the paper and offered a few assorted opinions, which is an adequate fulfillment of the task. I would have liked to see more of the high-level insights drawn out and highlighted, as opposed to the "inventory" approach here, but this will do fine.

I have a few detailed comments about making the arguments more airtight. Please let me know when you have a chance to address those and then we can merge!

To understand the motivation of the paper, one can first take a closer look at what **profiling**
means in the context of control-flow graph analysis. When performing a program analysis over a control-flow graph,
key insights can be ascertained by looking at the most frequently visited paths along the control-flow graph.
The process of analytically observing different routines in the graph is referred to as profiling.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure "routines" is the word you want here? That usually means "functions" to most people.

Also not sure what "analytically" adds to this sentence? That can mean a lot of things, but "just counting how many times something happens" doesn't seem particularly "analytical" to me.

The process of analytically observing different routines in the graph is referred to as profiling.

In recent times, the strongest use case for profiling in programs has been **profile-driven compilation**
– a tool by which one can optimize programs based on what paths in a control-flow graph are the most frequently visited.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, the common names for this are "profile-guided optimization" (PGO) or "feedback-directed optimization" (FDO); "profile-driven compilation" is not common.


One of the most popular profiling tools has been **edge profiling**, by which one can determine
the most frequently visited **edges** in a control-flow graph.
Edge profiling for a while served as the primary tool by which one could analyze well the 'hottest' paths of a program.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use double quotes or italics, not single quotes, to introduce a new term.


However, there are a number of cases in which edge profiling just isn't sufficient in providing a strong enough
analysis for profile-driven compilation – furthermore, often making wrong predictions. This issue has been known for a while,
but ignored, as alternative, more accurate forms of profiling have historically come with a higher overhead.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Ignored" seems strong here? Seems like people knew about it and were paying attention, but just assumed there was no practical way to do better. That doesn't quite seem like ignoring.


Of course, the control flow graphs of many interesting programs are not DAGs. So the paper gives a technique for extending the algorithm to general control flow graphs. The key idea is to remove backedges (in the depth-first search sense of the term) and replace them with fake edges that can be more easily instrumented.

In particular, if $v \rightarrow w$ is a backedge, when you remove that edge you add two new edges: $ENTRY \rightarrow w$ and $v \rightarrow EXIT$. This process turns general control flow graphs into DAGS (and preserves the unique ENTRY and EXIT vertices). The resulting encoding does not distinguish _all_ paths through the graph (there are infinite distinct paths through a cyclic graph) but it does distinguish between some important paths (namely, paths that take 1 pass through a loop vs paths that take multiple passes).
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DAGS -> DAGs


Of course, the control flow graphs of many interesting programs are not DAGs. So the paper gives a technique for extending the algorithm to general control flow graphs. The key idea is to remove backedges (in the depth-first search sense of the term) and replace them with fake edges that can be more easily instrumented.

In particular, if $v \rightarrow w$ is a backedge, when you remove that edge you add two new edges: $ENTRY \rightarrow w$ and $v \rightarrow EXIT$. This process turns general control flow graphs into DAGS (and preserves the unique ENTRY and EXIT vertices). The resulting encoding does not distinguish _all_ paths through the graph (there are infinite distinct paths through a cyclic graph) but it does distinguish between some important paths (namely, paths that take 1 pass through a loop vs paths that take multiple passes).
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To use LaTeX math in your post, please enable the relevant setting in the top matter (see the README for this repository). You might want to preview your post in Zola to see what it looks like.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been previewing the post using pandoc, but I will take a look at it in Zola as well!


The algorithm presented in this paper is interesting on pure theoretical grounds. It is always informative to find a minimum of anything, so the minimal encoding presented is valuable as a pure mathematical object. As a practical matter, the encoding is perhaps less valuable.

It turned out (in the evaluation section) that many programs have too many paths to store path counts in an array, so the profiler presented in the paper used a hash map anyway. While having small integers to represent paths is valuable in its own right, the greater benefit was in being able to store counts in an array. This is born out by the data in the paper, with programs that were small enough to use an array for path counts having noticably lower overheads than those that required a hash map.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure you have the evidence for your claim that "the greater benefit was in being able to store counts in an array." How do you know this was a greater benefit than keeping the overhead of control-flow tracing low (i.e., just an in-register accumulation)? I don't exactly know how you would compare those two things, but the evidence you have here seems to support a different claim: "using a hash table is more expensive than not using a hash table."

Overall, are you sure that you're not underselling the value of the encoding? I dunno, I just think it's pretty cool that you can use constant space (just an integer) to keep track of paths… you don't have to, like, record every edge you see in a linked list or something. Which is maybe how I thought you would have to do it without this paper.


It turned out (in the evaluation section) that many programs have too many paths to store path counts in an array, so the profiler presented in the paper used a hash map anyway. While having small integers to represent paths is valuable in its own right, the greater benefit was in being able to store counts in an array. This is born out by the data in the paper, with programs that were small enough to use an array for path counts having noticably lower overheads than those that required a hash map.

A similar argument can be made about reducing profiling overhead in general. While making programs run faster is almost always a good thing, this particular improvement did not cross the line into "viable-in-production" (for some definition of production). So for example, it is difficult to imagine a JIT compiler tolerating a 30% overhead for a running program at any stage of compilation. In this sense, the progress made in this paper towards low-overhead path profiling is perhaps better viewed as a step in the right direction than as a paradigm shift.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also pretty skeptical of your claim that 30% overhead is never tolerable in any JIT component. It is common for JITs to have a "profiling mode" that may be an interpreter: so, much slower than any compiler tier. This profiling mode can do expensive things like recording the run-time types of the values stored in every variable. As long as it runs rarely enough, that may be worth it.


Besides the path profiling algorithm, the paper's main argument is in favor of path profiling itself, arguing that it produces substantially better profiles than edge profiling.

The paper successfully argues that path profiling produces better profiles. The implemented path profiler reliably provides frequencies for paths that are meaningfully longer than those produced by the edge profiler they compare against (and, of course, it should produce nearly identical information about edge frequencies). But the paper fails to take the final step and argue that the longer paths lead to better optimized programs. The authors don't do any experiments where the path and edge profiles are used to separately optimize the profiled programs. Such an experiment would have completed their argument that path profiling is worth the extra cost.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean by "longer" here… are you saying that edge profiling can give you perfectly accurate results, as long as the paths are short? Or just that the universal inaccuracy of edge profiling gets worse as paths get longer? Maybe it would be good to be clearer/more quantitative here.


This paper was the first to offer a performant algorithm for path profiling, so almost all of the other work on path profiling uses the paper's algorithm. The question here then just rounds to asking "what is the role of path profiling in history?"

The authors mention that path profiling could be potentially used for performance tuning, profile-directed compilation, and test coverage. For test coverage, it doesn't really see any use in practice for the reasons mentioned in [the discussion](https://github.com/sampsyo/cs6120/discussions/487) (too much performance cost for not enough benefit over edge profiling). For performance tuning and profile-directed compilation, it seems like path profiling sees some use in non-real time applications, but I'm not sure how often they're used in practice. Interestingly, profile-guided optimization tools like [these](https://research.facebook.com/file/900986544473313/Improved-Basic-Block-Reordering.pdf) [two](https://arxiv.org/abs/1810.00905) block layout optimizers that might be able to benefit from path profiling don't use it. Optimistically, this suggests that path profiling offers more utility than block or edge profiling, but that it can be difficult to make use of that utility. Pessimistically, this suggests that path profiling doesn't offer much more than block or edge profiling. The Codestitcher authors do mention that the information path profiling offers is "excessive" for code layout optimization, which I guess counts towards both viewpoints depending on how you want to think about it. For similar reasons as in profile-guided optimization, path profiling is not used in practice for real-time use cases like just-in-time compilation. Most of them also rely on simpler techniques like counting function calls or branch paths in order to determine when to patch in code. However, some other reasons why path profiling might be difficult to use here are the 30% performance drop and the difficulty of effective interprocedural optimization.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend toning down your claims here along the lines of "path profiling is not used in practice for $PURPOSE". It would be more reasonable to say, "we looked around at a bunch of papers, and we couldn't find anyone saying they used path profiling." It would be good to humbly acknowledge that your searches are not somehow an exhaustive survey.

@scober
Copy link
Contributor Author

scober commented Mar 21, 2025

I pushed changes to try to address all of your comments, including some medium-sized re-writes. Let me know if there is anything else you'd like me to address!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants