Skip to content

Conversation

@jackbondpreston-arm
Copy link
Contributor

Introduce spdlog for logging in Gloo.

Motivation

  1. Reduces maintenance burden:
    1. Logging can be relatively complex, and users/developers may want to introduce additional complexity over time to suit their needs. spdlog is a widely used library, and so already has the majority of features that may be required.
    2. Introducing spdlog enables removing a chunk of the Gloo codebase, which exists to address a commonly-already-solved problem.
  2. spdlog brings many extra features that Gloo logging lacks "for free" - spdlog supports many configurations and tunables (including environment variables):
    1. Log levels can easily be configured.
    2. Formatting structure/style can easily be configured (with fmtlib-fmt/std::format style formatting).
    3. Has good support for concurrency - thread safe loggers, async loggers, customisable flushing policy, etc.
    4. Supports many different styles of logging sinks, which we can easily enable where demand exists.
  3. Allows using an already battle-tested implementation:
    1. spdlog contains its own unit tests (>150 cases). If, going forwards, the complexity of Gloo’s logging grows, tests will have to be written for complex scenarios such as concurrent logging. By using spdlog, Gloo doesn't have to worry so much about implementing logging correctly.
    2. spdlog has already been deployed in production codebases at scale, so all the features that we want to make use of in the future have also gone through extensive real-world testing.
    3. spdlog is a well-known library with a familiar interface for many programmers.

Implementation

Details of the implementation are included in the respective commit descriptions. There are 4 commits, building incrementally on top of eachother.

However, a key overview is:

  1. spdlog is not a downstream dependency
    • Gloo consumers (whether users of Gloo, or developers building against Gloo) will not have to download and install spdlog.
    • If Gloo is built statically, spdlog is used as a header-only library. If Gloo is built as a shared library, spdlog is built statically and linked in.
    • The logging header (which includes spdlog headers) is internal only and is not included by any public headers.
  2. spdlog is included as a submodule
    • This allows developers to easily grab the source code without extra steps. This matches approaches from other projects, like PyTorch.
    • spdlog is initially included at the commit 6fa3601 (v1.15.3)
  3. All existing log messages within Gloo (not including those in examples, benchmarks, etc.) are modified to use spdlog, and fmt-style formatting.

@meta-cla meta-cla bot added the CLA Signed label Oct 17, 2025
Copy link
Member

@d4l3k d4l3k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, this looks good! Let me run some additional checks on it to make sure there's no conflicts

set(BUILD_SHARED_LIBS ${_save_BUILD_SHARED_LIBS})

if(GLOO_STATIC_OR_SHARED STREQUAL "STATIC")
list(APPEND gloo_ALL_TARGETS_DEPENDENCY_LIBS $<BUILD_INTERFACE:spdlog::spdlog_header_only>)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

Comment on lines +25 to +32
spdlog::set_level(spdlog::level::warn);
/* Override log level if the environment variable is set. */
spdlog::cfg::load_env_levels("GLOO_LOG_LEVEL");

/* Set custom format. This is similiar to PyTorch's format. Equivalent to:
* [{short-log-level}{month-num}{day-of-month} {hour}:{min}:{sec}.{microsec}
* {threadid} {source_file}:{line_no}] {MESSAGE} */
spdlog::set_pattern("[%L%m%d %H:%M:%S.%f %t %s:%#] %v");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way to create a non-global logger that's gloo specific? just thinking about potential conflicts with other libraries

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I understand what you mean, sorry. This logger is specific to Gloo, in that no other consumer of Gloo can use it - the log header is not public, and spdlog is built into Gloo at compile time as a private dependency. If we were using a shared spdlog library, I guess we would have the opportunity to share loggers with other co-existing libraries using the same version of spdlog (which has advantages and disadvantages), but I imagine it's moot as we don't want to introduce a dependency there.

* usual
*/
inline float format_as(float16 v) {
return cpu_half2float(v);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this cause any issues with PyTorch? just wondering about header conflicts for rules like this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this shouldn't have the potential to cause any conflicts - both float16 and this format_as impl are in the gloo namespace.

@jackbondpreston-arm
Copy link
Contributor Author

jackbondpreston-arm commented Oct 21, 2025

I've pushed a new version containing the CI update to enable fetching the submodule. This should hopefully resolve the CI build failures.

The superlinter failure can be ignored, I would say. It is failing only on Markdown, and it is failing on Markdown that was not added/changed in this PR (it seems to lint and enforce on the whole file if you just change a small section) - and doesn't necessarily seem to be suggesting changes that we would even want to make, e.g.:

  /github/workspace/README.md:1 MD041/first-line-heading/first-line-h1 First line in a file should be a top-level heading [Context: "<p align="center">"]
  /github/workspace/README.md:154 MD040/fenced-code-language Fenced code blocks should have a language specified [Context: "```"]

Also modifies Github CI config to ensure the checkout action also checks
out the submodule.
spdlog is included and built from source, using the third_party/spdlog
submodule.

When Gloo is built as a static library, spdlog is used in its
header-only library form. When Gloo is built as a shared library, spdlog
is built statically and included.

A key goal is to make spdlog a dependency internal to Gloo only. Gloo
users should not have to install spdlog themselves.

This commit does not add any usages of spdlog in Gloo, just the build
infrastructure to use it.
Change Gloo logging macros to use SPDLOG_* macros for logging. This
change is largely mechanical - changing the structure of the logging
calls to use fmtlib::fmt-style formatting. For example:
    GLOO_DEBUG("thingOne=", thingOne, " thingTwo=", thingTwo);
becomes:
    GLOO_DEBUG("thingOne={} thingTwo={}", thingOne, thingTwo);

However, there are also some less mechanical changes:
- Since we don't want to expose spdlog as a public dependency of Gloo,
   the logging infrastructure is moved into a different header
   (gloo/common/log.h). This header is not included in GLOO_HDRS, so it
   is not installed. It should only ever be included by .cc files, or
   non-public headers.
- Due to the previous change, the GLOO_ENFORCE_* macros are moved into
   a separate header (gloo/common/enforce.h). These are used often in
   public header files, and are not inherently tied to the logging, so
   this made the most sense.
- gloo/common/log.h (effectively renamed from logging.h) no longer needs
   a .cc file, as it just passes GLOO_* macros through to the SPDLOG_*
   macros.
- gloo has no init function which must be called first, and can be built
   as a static library (precluding something like
   __attribute__((constructor))). For this reason, all logger calls are
   prefixed with a std::run_once which ensures the logger init/config is
   performed before the first use of the logger.
- gloo/transport/tcp/debug_logger.{h,cc} are deleted, and replaced with
   a custom fmt formatter implementation in gloo/transport/debug_data.h.
- Wherever lists of includes were touched as part of this change, they
   were reformatted/fixed (e.g. X.h -> cX, alphabetical re-ordering,
   changing gloo includes from <> to "").
@jackbondpreston-arm
Copy link
Contributor Author

Rebased + resolved conflict

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants