Skip to content

NODE_LLAMA_CPP_GPU=false ignored — GPU detection bypasses env var #426

@TioGlo

Description

@TioGlo

Problem

QMD's ensureLlama() in llm.js calls getLlamaGpuTypes() and tries to use CUDA/Vulkan/Metal regardless of the NODE_LLAMA_CPP_GPU environment variable. On systems without a GPU (or without the CUDA toolkit installed), getLlamaGpuTypes() can still report ["cuda", "vulkan", false] because node-llama-cpp checks for prebuilt binaries rather than actual CUDA installation.

This causes a full cmake build attempt on every qmd invocation, which fails noisily:

-- Could not find nvcc, please set CUDAToolkit_ROOT.
CMake Error at llama.cpp/ggml/src/ggml-cuda/CMakeLists.txt:258 (message):
  CUDA Toolkit not found
-- Configuring incomplete, errors occurred!
ERR! OMG Process terminated: 1

[node-llama-cpp] Failed to build llama.cpp with CUDA support.
QMD Warning: cuda reported available but failed to initialize. Falling back to CPU.

This happens twice per invocation (the cmake error block appears twice in output), adding significant latency and noise before falling back to CPU anyway.

Expected Behavior

Setting NODE_LLAMA_CPP_GPU=false (which node-llama-cpp itself recognizes as a valid "off" value) should cause QMD to skip GPU detection entirely and go straight to CPU mode.

Root Cause

In dist/llm.js, the ensureLlama() method (around line 247) does its own GPU detection:

const gpuTypes = await getLlamaGpuTypes();
const preferred = ["cuda", "metal", "vulkan"].find(g => gpuTypes.includes(g));

This bypasses the NODE_LLAMA_CPP_GPU env var that node-llama-cpp's own config system respects. The comment in the code explains why: gpu:"auto" was returning false even when CUDA was available. But this workaround creates the inverse problem — it forces CUDA attempts on systems that explicitly opt out.

Suggested Fix

Check NODE_LLAMA_CPP_GPU before running GPU type detection:

const gpuTypes = await getLlamaGpuTypes();
// Respect NODE_LLAMA_CPP_GPU env var
const gpuEnv = process.env.NODE_LLAMA_CPP_GPU;
const gpuDisabled = gpuEnv && ["false", "off", "none", "disable", "disabled"].includes(gpuEnv.toLowerCase());
const preferred = gpuDisabled ? undefined : ["cuda", "metal", "vulkan"].find(g => gpuTypes.includes(g));

This preserves the existing workaround for systems where gpu:"auto" incorrectly returns false, while allowing users to explicitly disable GPU via the standard env var.

Environment

  • QMD version: 1.0.6
  • node-llama-cpp: bundled with QMD
  • OS: Ubuntu (Linux 6.17.0-14-generic x64)
  • Node: v22.22.0
  • GPU: None (CPU-only system)
  • getLlamaGpuTypes() returns: ["cuda", "vulkan", false] despite no CUDA toolkit installed

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions