Improve DLL/shared library stability by replacing static initializers #16717

gianmichelebertoni · 2025-10-22T10:23:12Z

gianmichelebertoni
Oct 22, 2025

Hi llama.cpp team,

I'm writing to propose a couple of small changes to improve the library's stability when it's loaded as a DLL/shared library, particularly from non-C++ environments like the JVM (via JNA/JNI) or perhaps Python (via ctypes).

The Problem: Unreliable Static Initialization in Foreign Hosts

When a C++ DLL is loaded into a "foreign" process (like java.exe), the C++ Runtime's automatic static initialization mechanism can be unreliable. This is a known issue in complex plugin architectures.

The result is that global static objects can be left in an uninitialized state. For llama.cpp, this can lead to crashes when, for example, a static std::mutex is used before its constructor has been called. I have verified this behavior by loading the library from a JVM environment, which results in a memory access violation.

Proposed Solutions

I've identified a couple of areas where the problematic implicit static initialization can be replaced with more robust, explicit patterns that work reliably in any environment.

Case 1: Lazy, Thread-Safe Initialization with std::call_once

Location: ggml/src/ggml-cuda/ggml-cuda.cu lines 3848-3853
Current Code:

    static bool initialized = false;
    {
        static std::mutex mutex;
        std::lock_guard<std::mutex> lock(mutex);
        if (!initialized) {
        //rest of the code
        }

Proposed Change:

    {
        static std::once_flag init_flag;
        std::call_once(init_flag, []() {
        //same code
        });

Rationale: This pattern moves initialization from the unreliable library load time to a thread-safe, on-demand call the first time the resource is needed. It completely bypasses the static initialization problem.

Case 2: Safely Initializing a Global Lock

A static std::mutex is a common point of failure. A more robust approach is to use a function-local static, which is guaranteed to be initialized in a thread-safe manner since C++11 (the "Construct on First Use Idiom").

Location: ggml/src/ggml-threading.cpp
Current Code:

std::mutex ggml_critical_section_mutex;

void ggml_critical_section_start() {
    ggml_critical_section_mutex.lock();
}

void ggml_critical_section_end(void) {
    ggml_critical_section_mutex.unlock();
}

Proposed Change:

#ifndef _WIN32
namespace {
    static std::mutex * get_critical_section_mutex() {
        static std::mutex * mtx = new std::mutex();
        return mtx;
    }
}  // namespace

void ggml_critical_section_start() {
    get_critical_section_mutex()->lock();
}

void ggml_critical_section_end() {
    get_critical_section_mutex()->unlock();
}
#else
#include <windows.h>

namespace {
    struct CriticalSectionWrapper {
        CRITICAL_SECTION cs;

        CriticalSectionWrapper() { InitializeCriticalSection(&cs); }

        ~CriticalSectionWrapper() { DeleteCriticalSection(&cs); }
    };

    static CriticalSectionWrapper * get_cs_wrapper() {
        static CriticalSectionWrapper * wrapper = new CriticalSectionWrapper();
        return wrapper;
    }
}  // namespace

void ggml_critical_section_start() {
    EnterCriticalSection(&get_cs_wrapper()->cs);
}

void ggml_critical_section_end() {
    LeaveCriticalSection(&get_cs_wrapper()->cs);
}
#endif

Rationale: This modern C++ idiom ensures the mutex is constructed correctly and only once, the first time it's needed. It's clean, efficient, and completely avoids the global static initialization issue. For older compilers, std::call_once could also be used to initialize a static pointer to the mutex. The Windows-specific code is due to a known bug in some CRTs bundled with, for example, recent JVMs (the bugfix came with VS2022 17.10)

DeepWiki problem analysis:

You're experiencing crashes due to static initialization order fiasco (SIOF) across DLL boundaries, specifically with static std::mutex objects in ggml-threading.cpp and ggml-cuda.cu.
Problem Analysis

The crashes occur at these locations:

ggml-threading.cpp: The global ggml_critical_section_mutex at line 4 ggml-threading.cpp:4 is a static mutex that may not be initialized when accessed across DLL boundaries ggml-threading.cpp:6-8 .

ggml-cuda.cu: Multiple static mutexes are used in backend registration functions:
ggml_cuda_lock at line 526 ggml-cuda.cu:526
Static mutex in ggml_backend_cuda_buffer_type() at line 734 ggml-cuda.cu:734-735
Static mutex in ggml_backend_cuda_split_buffer_type() at line 1051 ggml-cuda.cu:1051-1052
Static mutex in ggml_backend_cuda_reg() at line 3851 ggml-cuda.cu:3851-3852

Similar patterns exist in other backends like CANN ggml-cann.cpp:2972-2973 , SYCL ggml-sycl.cpp:4575-4576 , and OpenCL ggml-opencl.cpp:4022-4025 .

Thanks for considering these suggestions. I believe they would make llama.cpp even more robust and versatile.

Best regards,

GM

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve DLL/shared library stability by replacing static initializers #16717

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Improve DLL/shared library stability by replacing static initializers #16717

Uh oh!

Uh oh!

gianmichelebertoni Oct 22, 2025

Replies: 0 comments

gianmichelebertoni
Oct 22, 2025