Skip to content

Conversation

@aucahuasi
Copy link

Modernizes node-rapids to RAPIDS 25.02, CUDA 12.8, and Ubuntu 24.04 with ARM64 (aarch64) support for GH200 Grace Hopper platforms.

Changes

  • Update to RAPIDS 25.02, CUDA 12.8, Ubuntu 24.04, Python 3.12
  • Add ARM64 (aarch64) support alongside x86_64
  • Update Arrow 9.0.0 to 19.0.0 (enable S3, Acero)
  • Update nvcomp 2.4.1 to 4.2.0.11 with ARM64 binaries
  • Update build system: cmake-js 7.3.1, node-gyp 10.2.0, CMake 3.30.5
  • Update TypeScript 4.5.5 to 5.3.3, Jest 26.5.3 to 29.7.0
  • Update @typescript-eslint 5.30.0 to 6.21.0 for TypeScript 5.3 compatibility
  • Update RMM bindings for RAPIDS 25.02 API changes (thrust::optional to std::optional, removed deprecated methods)
  • Remove BlazingSQL module (abandoned upstream)

Testing

Phase 1 modules (core, cuda, rmm) all passing tests on Ubuntu 24.04 (x86_64 and ARM64), CUDA 12.8, Python 3.12, Node.js 16.15.1.

Phase 2 (separate PR) will address cudf module with its significant RAPIDS 25.02 API changes. Node.js version was kept at 16.x for this phase; Phase 2 may target Node.js 20.x depending on testing and compatibility requirements.

… - Phase 1

* Update to RAPIDS 25.02
* Update to CUDA 12.8, drop CUDA 11.x support
* Update to Ubuntu 24.04, drop Ubuntu 20.04
* Update to Python 3.12
* Add ARM64 (aarch64) support for GH200 Grace Hopper
* Update Arrow 9.0.0 to 19.0.0 (enable S3, Acero)
* Update nvcomp 2.4.1 to 4.2.0.11 (proprietary, ARM64 binaries)
* Update build system: cmake-js 7.3.1, node-gyp 10.2.0 for Python 3.12
* Update TypeScript 4.5.5 to 5.3.3
* Update @typescript-eslint packages 5.30.0 to 6.21.0 for TypeScript 5.3 compatibility
* Update Jest 26.5.3 to 29.7.0, ts-jest 26.5.3 to 29.2.5
* Update RMM bindings for RAPIDS 25.02 API changes
* Remove BlazingSQL module (abandoned upstream)
* Phase 1: core, cuda, rmm modules working with tests passing
@copy-pr-bot
Copy link

copy-pr-bot bot commented Oct 18, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this update remove the blazingsql bindings? That makes me sad 😢

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I share your sadness about removing BlazingSQL. As you know, I was the first technical engineer and tech lead at BlazingSQL for years, and we collaborated on some great work together back in the early RAPIDS/cuDF days.

Since it's unmaintained upstream, I wasn't sure whether to try porting it or remove it. This PR is already complex with ARM64 + RAPIDS 25.02 + CUDA 12.8 + Ubuntu 24.04.

What's the RAPIDS team's policy: should unmaintained modules stay but be disabled, or is removal the right approach?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I understand why you removed it. We'd been using a fork I maintained and updated as necessary to match changes in libcudf. It wasn't difficult, but I imagine it would be a large amount of work to get it up to this libcudf version.

sudo apt install -y $APT_DEPS;
if [ -n "$INSTALLED_CLANGD" ]; then
sudo update-alternatives --install /usr/bin/clangd clangd /usr/bin/clangd-17 100
sudo update-alternatives --install /usr/bin/clangd clangd /usr/bin/clangd-12 100
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clangd-12 is incredibly old and likely broken for modern CUDA. clang-format-12 will also most likely regress from clang-format-17, which is not ideal. Is there a reason we're downgrading these?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, this was part of my old tests, I updated it to clang 18 (which is compatible with ubuntu 24.04)

- Upgrade LLVM/Clang from version 17 to 18 (Ubuntu 24.04 default).
- Modernize Docker scripts from deprecated docker-compose to docker compose.
- Fix @types/minimatch dependency issue by pinning to 5.1.2.
@aucahuasi aucahuasi marked this pull request as ready for review October 31, 2025 05:37
@aucahuasi aucahuasi requested a review from trxcllnt October 31, 2025 19:35
@aucahuasi
Copy link
Author

I've successfully updated the toolchain (LLVM 18, GCC 13, sccache 0.10.0, Node.js 16.20.2) and tested the core, cuda, and rmm modules in the devel container. Everything builds and runs correctly with CUDA 12.8.

A couple of questions:

  1. Version bumping: Should external contributors handle version updates across the monorepo, or do you prefer to manage this during your release process?
  2. Packages workflow: The yarn docker:build:devel:packages command fails trying to access S3 credentials. Is this workflow meant for RAPIDS internal use only, or should external contributors be able to run it? If the latter, the package.Dockerfile would need to support local-only sccache (currently sccache 0.10.0 tries AWS autodiscovery even with empty S3 config).

For now, I've verified the modules build and work in the devel container. Let me know what else you need for review!

@trxcllnt
Copy link
Collaborator

trxcllnt commented Oct 31, 2025

@aucahuasi thanks for this PR, I'll try to review it this weekend. Typically we'd update the RAPIDS version across all the projects, but I should be able to handle that. There's also some housekeeping tasks, like sccache, that I can tackle.

I'll push any updates to this branch if you don't mind, then approve and merge once I've double checked everything still works on my end.

@aucahuasi
Copy link
Author

@aucahuasi thanks for this PR, I'll try to review it this weekend. Typically we'd update the RAPIDS version across all the projects, but I should be able to handle that. There's also some housekeeping tasks, like sccache, that I can tackle.

I'll push any updates to this branch if you don't mind, then approve and merge once I've double checked everything still works on my end.

Thanks @trxcllnt! That's perfect! Please go ahead and push any updates to the branch. I appreciate you handling the version updates and sccache configuration.

For context on the Node.js 16.20.2 choice: it aligns with our production environment requirements for ARM64/GH200 integration work.

Let me know if you need any additional work/testing or info from my end!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants