Skip to content

Stack sampling flamegraph profiler #13220

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 23, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions doc/manual/rl-next/eval-profiler.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---
synopsis: Add stack sampling evaluation profiler
prs: [13220]
---

Nix evaluator now supports stack sampling evaluation profiling via `--eval-profiler flamegraph` setting.
It collects collapsed call stack information to output file specified by
`--eval-profile-file` (`nix.profile` by default) in a format directly consumable
by `flamegraph.pl` and compatible tools like [speedscope](https://speedscope.app/).
Sampling frequency can be configured via `--eval-profiler-frequency` (99 Hz by default).

Unlike existing `--trace-function-calls` this profiler includes the name of the function
being called when it's available.
1 change: 1 addition & 0 deletions doc/manual/source/SUMMARY.md.in
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@
- [Tuning Cores and Jobs](advanced-topics/cores-vs-jobs.md)
- [Verifying Build Reproducibility](advanced-topics/diff-hook.md)
- [Using the `post-build-hook`](advanced-topics/post-build-hook.md)
- [Evaluation profiler](advanced-topics/eval-profiler.md)
- [Command Reference](command-ref/index.md)
- [Common Options](command-ref/opt-common.md)
- [Common Environment Variables](command-ref/env-common.md)
Expand Down
33 changes: 33 additions & 0 deletions doc/manual/source/advanced-topics/eval-profiler.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Using the `eval-profiler`

Nix evaluator supports [evaluation](@docroot@/language/evaluation.md)
[profiling](<https://en.wikipedia.org/wiki/Profiling_(computer_programming)>)
compatible with `flamegraph.pl`. The profiler samples the nix
function call stack at regular intervals. It can be enabled with the
[`eval-profiler`](@docroot@/command-ref/conf-file.md#conf-eval-profiler)
setting:

```console
$ nix-instantiate "<nixpkgs>" -A hello --eval-profiler flamegraph
```

Stack sampling frequency and the output file path can be configured with
[`eval-profile-file`](@docroot@/command-ref/conf-file.md#conf-eval-profile-file)
and [`eval-profiler-frequency`](@docroot@/command-ref/conf-file.md#conf-eval-profiler-frequency).
By default the collected profile is saved to `nix.profile` file in the current working directory.

The collected profile can be directly consumed by `flamegraph.pl`:

```console
$ flamegraph.pl nix.profile > flamegraph.svg
```

The line information in the profile contains the location of the [call
site](https://en.wikipedia.org/wiki/Call_site) position and the name of the
function being called (when available). For example:

```
/nix/store/x9wnkly3k1gkq580m90jjn32q9f05q2v-source/pkgs/top-level/default.nix:167:5:primop import
```

Here `import` primop is called at `/nix/store/x9wnkly3k1gkq580m90jjn32q9f05q2v-source/pkgs/top-level/default.nix:167:5`.
49 changes: 49 additions & 0 deletions src/libexpr/eval-profiler-settings.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
#include "nix/expr/eval-profiler-settings.hh"
#include "nix/util/configuration.hh"
#include "nix/util/logging.hh" /* Needs to be included before config-impl.hh */
#include "nix/util/config-impl.hh"
#include "nix/util/abstract-setting-to-json.hh"

#include <nlohmann/json.hpp>

namespace nix {

template<>
EvalProfilerMode BaseSetting<EvalProfilerMode>::parse(const std::string & str) const
{
if (str == "disabled")
return EvalProfilerMode::disabled;
else if (str == "flamegraph")
return EvalProfilerMode::flamegraph;
else
throw UsageError("option '%s' has invalid value '%s'", name, str);
}

template<>
struct BaseSetting<EvalProfilerMode>::trait
{
static constexpr bool appendable = false;
};

template<>
std::string BaseSetting<EvalProfilerMode>::to_string() const
{
if (value == EvalProfilerMode::disabled)
return "disabled";
else if (value == EvalProfilerMode::flamegraph)
return "flamegraph";
else
unreachable();
}

NLOHMANN_JSON_SERIALIZE_ENUM(
EvalProfilerMode,
{
{EvalProfilerMode::disabled, "disabled"},
{EvalProfilerMode::flamegraph, "flamegraph"},
});

/* Explicit instantiation of templates */
template class BaseSetting<EvalProfilerMode>;

}
265 changes: 265 additions & 0 deletions src/libexpr/eval-profiler.cc
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
#include "nix/expr/eval-profiler.hh"
#include "nix/expr/nixexpr.hh"
#include "nix/expr/eval.hh"
#include "nix/util/lru-cache.hh"

namespace nix {

Expand Down Expand Up @@ -45,4 +47,267 @@ void MultiEvalProfiler::addProfiler(ref<EvalProfiler> profiler)
invalidateNeededHooks();
}

namespace {

class PosCache : private LRUCache<PosIdx, Pos>
{
const EvalState & state;

public:
PosCache(const EvalState & state)
: LRUCache(524288) /* ~40MiB */
, state(state)
{
}

Pos lookup(PosIdx posIdx)
{
auto posOrNone = LRUCache::get(posIdx);
if (posOrNone)
return *posOrNone;

auto pos = state.positions[posIdx];
upsert(posIdx, pos);
return pos;
}
};

struct LambdaFrameInfo
{
ExprLambda * expr;
/** Position where the lambda has been called. */
PosIdx callPos = noPos;
std::ostream & symbolize(const EvalState & state, std::ostream & os, PosCache & posCache) const;
auto operator<=>(const LambdaFrameInfo & rhs) const = default;
};

/** Primop call. */
struct PrimOpFrameInfo
{
const PrimOp * expr;
/** Position where the primop has been called. */
PosIdx callPos = noPos;
std::ostream & symbolize(const EvalState & state, std::ostream & os, PosCache & posCache) const;
auto operator<=>(const PrimOpFrameInfo & rhs) const = default;
};

/** Used for functor calls (attrset with __functor attr). */
struct FunctorFrameInfo
{
PosIdx pos;
std::ostream & symbolize(const EvalState & state, std::ostream & os, PosCache & posCache) const;
auto operator<=>(const FunctorFrameInfo & rhs) const = default;
};

/** Fallback frame info. */
struct GenericFrameInfo
{
PosIdx pos;
std::ostream & symbolize(const EvalState & state, std::ostream & os, PosCache & posCache) const;
auto operator<=>(const GenericFrameInfo & rhs) const = default;
};

using FrameInfo = std::variant<LambdaFrameInfo, PrimOpFrameInfo, FunctorFrameInfo, GenericFrameInfo>;
using FrameStack = std::vector<FrameInfo>;

/**
* Stack sampling profiler.
*/
class SampleStack : public EvalProfiler
{
/* How often stack profiles should be flushed to file. This avoids the need
to persist stack samples across the whole evaluation at the cost
of periodically flushing data to disk. */
static constexpr std::chrono::microseconds profileDumpInterval = std::chrono::milliseconds(2000);

Hooks getNeededHooksImpl() const override
{
return Hooks().set(preFunctionCall).set(postFunctionCall);
}

public:
SampleStack(const EvalState & state, std::filesystem::path profileFile, std::chrono::nanoseconds period)
: state(state)
, sampleInterval(period)
, profileFd([&]() {
AutoCloseFD fd = toDescriptor(open(profileFile.c_str(), O_WRONLY | O_CREAT | O_TRUNC, 0660));
if (!fd)
throw SysError("opening file %s", profileFile);
return fd;
}())
, posCache(state)
{
}

[[gnu::noinline]] void
preFunctionCallHook(const EvalState & state, const Value & v, std::span<Value *> args, const PosIdx pos) override;
[[gnu::noinline]] void
postFunctionCallHook(const EvalState & state, const Value & v, std::span<Value *> args, const PosIdx pos) override;

void maybeSaveProfile(std::chrono::time_point<std::chrono::high_resolution_clock> now);
void saveProfile();
FrameInfo getFrameInfoFromValueAndPos(const Value & v, PosIdx pos);

SampleStack(SampleStack &&) = default;
SampleStack & operator=(SampleStack &&) = delete;
SampleStack(const SampleStack &) = delete;
SampleStack & operator=(const SampleStack &) = delete;
~SampleStack();

private:
/** Hold on to an instance of EvalState for symbolizing positions. */
const EvalState & state;
std::chrono::nanoseconds sampleInterval;
AutoCloseFD profileFd;
FrameStack stack;
std::map<FrameStack, uint32_t> callCount;
std::chrono::time_point<std::chrono::high_resolution_clock> lastStackSample =
std::chrono::high_resolution_clock::now();
std::chrono::time_point<std::chrono::high_resolution_clock> lastDump = std::chrono::high_resolution_clock::now();
PosCache posCache;
};

FrameInfo SampleStack::getFrameInfoFromValueAndPos(const Value & v, PosIdx pos)
{
/* NOTE: No actual references to garbage collected values are not held in
the profiler. */
if (v.isLambda())
return LambdaFrameInfo{.expr = v.payload.lambda.fun, .callPos = pos};
else if (v.isPrimOp())
return PrimOpFrameInfo{.expr = v.primOp(), .callPos = pos};
else if (v.isPrimOpApp())
/* Resolve primOp eagerly. Must not hold on to a reference to a Value. */
return PrimOpFrameInfo{.expr = v.primOpAppPrimOp(), .callPos = pos};
else if (state.isFunctor(v)) {
const auto functor = v.attrs()->get(state.sFunctor);
if (auto pos_ = posCache.lookup(pos); std::holds_alternative<std::monostate>(pos_.origin))
/* HACK: In case callsite position is unresolved. */
return FunctorFrameInfo{.pos = functor->pos};
return FunctorFrameInfo{.pos = pos};
} else
/* NOTE: Add a stack frame even for invalid cases (e.g. when calling a non-function). This is what
* trace-function-calls does. */
return GenericFrameInfo{.pos = pos};
}

[[gnu::noinline]] void SampleStack::preFunctionCallHook(
const EvalState & state, const Value & v, [[maybe_unused]] std::span<Value *> args, const PosIdx pos)
{
stack.push_back(getFrameInfoFromValueAndPos(v, pos));

auto now = std::chrono::high_resolution_clock::now();

if (now - lastStackSample > sampleInterval) {
callCount[stack] += 1;
lastStackSample = now;
}

/* Do this in preFunctionCallHook because we might throw an exception, but
callFunction uses Finally, which doesn't play well with exceptions. */
maybeSaveProfile(now);
}

[[gnu::noinline]] void
SampleStack::postFunctionCallHook(const EvalState & state, const Value & v, std::span<Value *> args, const PosIdx pos)
{

if (!stack.empty())
stack.pop_back();
}

std::ostream & LambdaFrameInfo::symbolize(const EvalState & state, std::ostream & os, PosCache & posCache) const
{
if (auto pos = posCache.lookup(callPos); std::holds_alternative<std::monostate>(pos.origin))
/* HACK: To avoid dubious «none»:0 in the generated profile if the origin can't be resolved
resort to printing the lambda location instead of the callsite position. */
os << posCache.lookup(expr->getPos());
else
os << pos;
if (expr->name)
os << ":" << state.symbols[expr->name];
return os;
}

std::ostream & GenericFrameInfo::symbolize(const EvalState & state, std::ostream & os, PosCache & posCache) const
{
os << posCache.lookup(pos);
return os;
}

std::ostream & FunctorFrameInfo::symbolize(const EvalState & state, std::ostream & os, PosCache & posCache) const
{
os << posCache.lookup(pos) << ":functor";
return os;
}

std::ostream & PrimOpFrameInfo::symbolize(const EvalState & state, std::ostream & os, PosCache & posCache) const
{
/* Sometimes callsite position can have an unresolved origin, which
leads to confusing «none»:0 locations in the profile. */
auto pos = posCache.lookup(callPos);
if (!std::holds_alternative<std::monostate>(pos.origin))
os << posCache.lookup(callPos) << ":";
os << *expr;
return os;
}

void SampleStack::maybeSaveProfile(std::chrono::time_point<std::chrono::high_resolution_clock> now)
{
if (now - lastDump >= profileDumpInterval)
saveProfile();
else
return;

/* Save the last dump timepoint. Do this after actually saving data to file
to not account for the time doing the flushing to disk. */
lastDump = std::chrono::high_resolution_clock::now();

/* Free up memory used for stack sampling. This might be very significant for
long-running evaluations, so we shouldn't hog too much memory. */
callCount.clear();
}

void SampleStack::saveProfile()
{
auto os = std::ostringstream{};
for (auto & [stack, count] : callCount) {
auto first = true;
for (auto & pos : stack) {
if (first)
first = false;
else
os << ";";

std::visit([&](auto && info) { info.symbolize(state, os, posCache); }, pos);
}
os << " " << count;
writeLine(profileFd.get(), std::move(os).str());
/* Clear ostringstream. */
os.str("");
os.clear();
}
}

SampleStack::~SampleStack()
{
/* Guard against cases when we are already unwinding the stack. */
try {
saveProfile();
} catch (...) {
ignoreExceptionInDestructor();
}
}

} // namespace

ref<EvalProfiler>
makeSampleStackProfiler(const EvalState & state, std::filesystem::path profileFile, uint64_t frequency)
{
/* 0 is a special value for sampling stack after each call. */
std::chrono::nanoseconds period = frequency == 0
? std::chrono::nanoseconds{0}
: std::chrono::nanoseconds{std::nano::den / frequency / std::nano::num};
return make_ref<SampleStack>(state, profileFile, period);
}

}
Loading
Loading