Skip to content

add mutex per interpreter per thread #698

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Vipul-Cariappa
Copy link
Collaborator

@Vipul-Cariappa Vipul-Cariappa commented Aug 11, 2025

Big changes in this PR. Some parts of the explanation might be like a conversation.

Mutex lock per interpreter.

Question: Do we need a per-interpreter lock?
Answer: Yes. Ideally, that would improve the performance in multi-threaded cases.

Question: Does the InterOp API fully support dispatching queries (some kind of lookup) or compiling code across multiple interpreters at the same time?
Answer: Actually, (I guess) No. Our API only supports stack-like access of multiple interpreters.
Example code that will not work:

auto interpreter_1 = Cpp::CreateInterpreter();
Cpp::Compile(R"(
void f() {}
)");
auto f = Cpp::GetNamed("f");

auto interpreter_2 = Cpp::CreateInterpreter();
Cpp::Compile(R"(
void ff() {}
)");

auto ff = Cpp::GetNamed("ff");

auto f_callable = Cpp::MakeFunctionCallable(f); // This fails because:
// MakeFunctionCallable uses the interpreter instance 2 at line
// https://github.com/compiler-research/CppInterOp/blob/dba60ff3d829551b1c86128593ca817af78d67dc/lib/CppInterOp/CppInterOp.cpp#L3148-L3150

But you can make the above example work, if the user rotates the sInterpreters.
Example:

auto interpreter_1 = Cpp::CreateInterpreter();
Cpp::Compile(R"(
void f() {}
)");
auto f = Cpp::GetNamed("f");

auto interpreter_2 = Cpp::CreateInterpreter();
Cpp::Compile(R"(
void ff() {}
)");

auto ff = Cpp::GetNamed("ff");

Cpp::ActivateInterpreter(interpreter_1); // look at https://github.com/compiler-research/CppInterOp/blob/dba60ff3d829551b1c86128593ca817af78d67dc/lib/CppInterOp/CppInterOp.cpp#L3273-L3287
auto f_callable = Cpp::MakeFunctionCallable(f);

Question: Ok. So is there a way for InterOp is maintain the rotation thing?
Answer: Would not be a easy thing to achieve. From my initial view on the matter, it might be possible. But a more important question; Should we support such usecases?

Question: Ok. So given the limitation of using multiple interpreter but only as a stack-access. Do we need a mutex lock per interpreter?
Answer: No, the user can only access the top most interpreter at a time. We don't need a mutex per interpreter. (Let me know if my reasoning is wrong somewhere)

Testing this by running in parallel

gtest does not support running in parallel (I could not find anything with my google search). But there is this project, gtest-parallel, that can split the tasks into separate isolated processes. But that is not what we want. We want to split the tests to run in parallel threads that share the same interpreter instance.

Code recovery RAII

@vgvassilev, I need some pointers on incorporating the codegen error recovery RAII. I don't think we should combine the two into the same class, but let me know your opinion. And how should I go about doing it?

@Vipul-Cariappa Vipul-Cariappa marked this pull request as draft August 11, 2025 12:51
Copy link

codecov bot commented Aug 11, 2025

Codecov Report

❌ Patch coverage is 85.50725% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.29%. Comparing base (e5a3480) to head (7659fcb).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
lib/CppInterOp/CppInterOp.cpp 85.50% 10 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #698      +/-   ##
==========================================
+ Coverage   79.74%   80.29%   +0.55%     
==========================================
  Files           9        9              
  Lines        3955     4081     +126     
==========================================
+ Hits         3154     3277     +123     
- Misses        801      804       +3     
Files with missing lines Coverage Δ
lib/CppInterOp/CppInterOp.cpp 88.43% <85.50%> (+0.54%) ⬆️

... and 1 file with indirect coverage changes

Files with missing lines Coverage Δ
lib/CppInterOp/CppInterOp.cpp 88.43% <85.50%> (+0.54%) ⬆️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

There were too many comments to post at once. Showing the first 10 out of 13. Check the log or trigger a new build to see more.

namespace {
struct LockRAII {
LockRAII(std::mutex& resource_lock, std::mutex& thread_id_lock,
std::optional<std::thread::id>& current_thread_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: no header providing "std::optional" is directly included [misc-include-cleaner]

lib/CppInterOp/CppInterOp.cpp:40:

- #if CLANG_VERSION_MAJOR >= 19
+ #include <optional>
+ #if CLANG_VERSION_MAJOR >= 19

namespace {
struct LockRAII {
LockRAII(std::mutex& resource_lock, std::mutex& thread_id_lock,
std::optional<std::thread::id>& current_thread_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: no header providing "std::thread" is directly included [misc-include-cleaner]

lib/CppInterOp/CppInterOp.cpp:40:

- #if CLANG_VERSION_MAJOR >= 19
+ #include <thread>
+ #if CLANG_VERSION_MAJOR >= 19

ThreadIdLock.lock();
if (!ResourceLock.try_lock()) {
if (CurrentThreadId) {
if ((*CurrentThreadId) != std::this_thread::get_id()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: no header providing "std::this_thread::get_id" is directly included [misc-include-cleaner]

        if ((*CurrentThreadId) != std::this_thread::get_id()) {
                                                    ^

if ((*CurrentThreadId) != std::this_thread::get_id()) {
ThreadIdLock.unlock();
ResourceLock.lock(); // blocking
bool res = ThreadIdLock.try_lock(); // should be free
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: Value stored to 'res' during its initialization is never read [clang-analyzer-deadcode.DeadStores]

          bool res = ThreadIdLock.try_lock(); // should be free
               ^
Additional context

lib/CppInterOp/CppInterOp.cpp:106: Value stored to 'res' during its initialization is never read

          bool res = ThreadIdLock.try_lock(); // should be free
               ^

if ((*CurrentThreadId) != std::this_thread::get_id()) {
ThreadIdLock.unlock();
ResourceLock.lock(); // blocking
bool res = ThreadIdLock.try_lock(); // should be free
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: unused variable 'res' [clang-diagnostic-unused-variable]

          bool res = ThreadIdLock.try_lock(); // should be free
               ^

if (!owned)
return;
ThreadIdLock.lock();
CurrentThreadId = std::nullopt;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: no header providing "std::nullopt" is directly included [misc-include-cleaner]

    CurrentThreadId = std::nullopt;
                           ^

@@ -125,8 +193,17 @@

// std::deque avoids relocations and calling the dtor of InterpreterInfo.
static llvm::ManagedStatic<std::deque<InterpreterInfo>> sInterpreters;
static std::mutex InterpreterStackLock;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: variable 'InterpreterStackLock' is non-const and globally accessible, consider making it const [cppcoreguidelines-avoid-non-const-global-variables]

static std::mutex InterpreterStackLock;
                  ^

@@ -448,8 +540,9 @@
return nullptr;
}

std::vector<TCppScope_t> GetEnumConstants(TCppScope_t handle) {
auto* D = (clang::Decl*)handle;
std::vector<TCppScope_t> GetEnumConstants(TCppScope_t scope) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: no header providing "std::vector" is directly included [misc-include-cleaner]

lib/CppInterOp/CppInterOp.cpp:40:

- #if CLANG_VERSION_MAJOR >= 19
+ #include <vector>
+ #if CLANG_VERSION_MAJOR >= 19

auto* D = (clang::Decl*)handle;
std::vector<TCppScope_t> GetEnumConstants(TCppScope_t scope) {
LOCK(getInterpInfo());
auto* D = (clang::Decl*)scope;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: do not use C-style cast to convert between unrelated types [cppcoreguidelines-pro-type-cstyle-cast]

  auto* D = (clang::Decl*)scope;
            ^

return nullptr;

auto* D = (clang::Decl*)handle;
auto* D = (clang::Decl*)scope;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: do not use C-style cast to convert between unrelated types [cppcoreguidelines-pro-type-cstyle-cast]

  auto* D = (clang::Decl*)scope;
            ^

Comment on lines 96 to 127
struct LockRAII {
LockRAII(std::mutex& resource_lock, std::mutex& thread_id_lock,
std::optional<std::thread::id>& current_thread_id)
: ResourceLock(resource_lock), ThreadIdLock(thread_id_lock),
CurrentThreadId(current_thread_id) {
ThreadIdLock.lock();
if (!ResourceLock.try_lock()) {
if (CurrentThreadId) {
if ((*CurrentThreadId) != std::this_thread::get_id()) {
ThreadIdLock.unlock();
ResourceLock.lock(); // blocking
bool res = ThreadIdLock.try_lock(); // should be free
assert(res && "Internal Error: Interpreter lock not held, but "
"ThreadId lock held.\nPlease report this as a bug.\n");
ThreadIdLock.unlock();
return;
}
owned = false;
ThreadIdLock.unlock();
return;
}
assert(false &&
"Internal Error: A thread holds lock for the current "
"interpreter, but no information available on which thread "
"holds the lock.\nPlease report this as a bug.\n");
}
assert(!CurrentThreadId &&
"Internal Error: Lock released but thread id not cleared.\nPlease "
"report this as a bug.\n");
CurrentThreadId = std::this_thread::get_id();
ThreadIdLock.unlock();
}
Copy link
Collaborator Author

@Vipul-Cariappa Vipul-Cariappa Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This complicated code allows us to lock the same mutex multiple times, provided it is the same thread.
Example:

void foo() {
  RAII lock(mutex);
  clang::WriteToAST();
}

void bar() {
  RAII lock(mutex);
  clang::WriteToAST_2();
  foo(); // foo will try to lock mutex that is already locked. blocking the thread completely.
}

int main() { bar(); }

With this change, we check which thread is trying to lock the mutex, and if this thread previously locked the mutex, then it is safe to use without the second lock.

@Vipul-Cariappa Vipul-Cariappa marked this pull request as ready for review August 11, 2025 13:29
Comment on lines 3380 to 3390
if (GpuArgs.empty())
I->declare(R"(
namespace __internal_CppInterOp {
template <typename Signature>
struct function;
template <typename Res, typename... ArgTypes>
struct function<Res(ArgTypes...)> {
typedef Res result_type;
};
} // namespace __internal_CppInterOp
)");
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to myself: Did not mean to commit this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I remember asking why in the non-GPU version... Please move the fix into another separate PR.

@@ -643,7 +643,7 @@ CPPINTEROP_API TCppType_t GetCanonicalType(TCppType_t type);

/// Used to either get the built-in type of the provided string, or
/// use the name to lookup the actual type.
CPPINTEROP_API TCppType_t GetType(const std::string& type);
CPPINTEROP_API TCppType_t GetType(const std::string& name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move these seemingly unrelated changes in a separate PR?

@@ -224,11 +301,18 @@ std::string Demangle(const std::string& mangled_name) {
return demangle;
}

void EnableDebugOutput(bool value /* =true*/) { llvm::DebugFlag = value; }
void EnableDebugOutput(bool value /* =true*/) {
LOCK(getInterpInfo());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a different mental model for this. I was thinking that we can have a StateMutatingSection RAII object which in turn has a mutex-based locking (if multithreading was enabled) and a 'PushTransactionRAII` (for pch/modules). The object should take a parameter if the change is in clang AST which might trigger a derserialiazation or llvm-only. In the latter case we can skip pushing a transaction.

There will be cases where we need one of the two concepts and not both. We need to design a flexible solution addressing these needs without having to introduce both concepts at the same time when they are not needed.

@@ -125,8 +193,17 @@ struct InterpreterInfo {

// std::deque avoids relocations and calling the dtor of InterpreterInfo.
static llvm::ManagedStatic<std::deque<InterpreterInfo>> sInterpreters;
static std::mutex InterpreterStackLock;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No statics. This needs to go in the InterpreterInfo class as we will lock per interpreter. That won't be sufficient if the interfaces are changing the llvm global state as done in EnableDebugOutput. Fortunately the llvm global state is not a lot and maybe we can figure out how to address that on a later stage...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the lock for sInterpreters. We will need to lock this sInterpreters stack whenever we access it. This should be a global lock. Cpp::CreateInterpreter pushes the Cpp::Interpreter into sInterpreters. And if two threads call Cpp::CreateInterpreter at the same time, then the insertion should be atomic.

Similarly, we can have another global LLVM lock to take care of the EnableDebugOutput cases.

Any comments?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point -- did not realize that...

if (auto* CXXRD = llvm::dyn_cast_or_null<CXXRecordDecl>(D))
return CXXRD->hasDefaultConstructor();

return false;
}

TCppFunction_t GetDefaultConstructor(compat::Interpreter& interp,
TCppScope_t scope) {
static TCppFunction_t GetDefaultConstructor(compat::Interpreter& interp,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should be a separate PR.

if (result.empty())
Cpp::Declare("#include <utility>");
{
LOCK(getInterpInfo());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lock should be done in Declare. The next 2-3 lines should not be changing the compiler state (unless modules are involved)...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we should lock the AST when we do any named lookups. What if some other thread is writing to the AST at the same time? These things should be atomic. Either we finish the read and perform the write, or finish the write and then perform the read. We cannot be writing and reading the symbol table at the same time.
But say we want to check the number of arguments of a function, that would not change once the function is parsed. So we can read that data without any locks.

DeclarationName DMove = &getASTContext().Idents.get("move");
auto result = getSema().getStdNamespace()->lookup(DMove);
if (result.empty())
Cpp::Declare("#include <utility>");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should go in the special runtime header we will include in the interpreter initialization time... Which in turn will end up forcing us to implement our our std::move-like construct.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do this in the same PR, but as a different commit. We can rebase and merge.

The thing is: We need to compute the path to this initialization file/header. And this should succeed with make install and without the install step too. So developers can use it without the install step.

Let me know if you want a different PR, or if you are fine with me doing it in this PR as a different commit?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is pretty similar to the clang resouce header files, eg. stddef.h. It is okay to require such a runtime header file -- we can delay that and hardcode it but ideally still needs to happen at the initialization time. The real challenge is testing when/if this goes to llvm -- we cannot have a runtime that depends on stl.

Let's move that discussion in a separate PR.

Comment on lines 3380 to 3390
if (GpuArgs.empty())
I->declare(R"(
namespace __internal_CppInterOp {
template <typename Signature>
struct function;
template <typename Res, typename... ArgTypes>
struct function<Res(ArgTypes...)> {
typedef Res result_type;
};
} // namespace __internal_CppInterOp
)");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I remember asking why in the non-GPU version... Please move the fix into another separate PR.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

@@ -125,8 +133,17 @@ struct InterpreterInfo {

// std::deque avoids relocations and calling the dtor of InterpreterInfo.
static llvm::ManagedStatic<std::deque<InterpreterInfo>> sInterpreters;
static std::mutex InterpreterStackLock;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: variable 'sInterpreters' is non-const and globally accessible, consider making it const [cppcoreguidelines-avoid-non-const-global-variables]

static llvm::ManagedStatic<std::deque<InterpreterInfo>> sInterpreters;
                                                        ^

@@ -125,8 +133,17 @@

// std::deque avoids relocations and calling the dtor of InterpreterInfo.
static llvm::ManagedStatic<std::deque<InterpreterInfo>> sInterpreters;
static std::mutex InterpreterStackLock;

static InterpreterInfo& getInterpInfo() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: variable 'InterpreterStackLock' is non-const and globally accessible, consider making it const [cppcoreguidelines-avoid-non-const-global-variables]

static std::mutex InterpreterStackLock;
                  ^

@@ -350,8 +379,10 @@

bool IsAbstract(TCppType_t klass) {
auto* D = (clang::Decl*)klass;
if (auto* CXXRD = llvm::dyn_cast_or_null<clang::CXXRecordDecl>(D))
if (auto* CXXRD = llvm::dyn_cast_or_null<clang::CXXRecordDecl>(D)) {
LOCK(getInterpInfo());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: do not use C-style cast to convert between unrelated types [cppcoreguidelines-pro-type-cstyle-cast]

  auto* D = (clang::Decl*)klass;
            ^

@@ -451,6 +483,7 @@
std::vector<TCppScope_t> GetEnumConstants(TCppScope_t handle) {
auto* D = (clang::Decl*)handle;

LOCK(getInterpInfo());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: do not use C-style cast to convert between unrelated types [cppcoreguidelines-pro-type-cstyle-cast]

  auto* D = (clang::Decl*)handle;
            ^

@@ -822,6 +861,9 @@
return -1;
CXXRecordDecl* DCXXRD = cast<CXXRecordDecl>(DD);
CXXRecordDecl* BCXXRD = cast<CXXRecordDecl>(BD);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: use auto when initializing with a template cast to avoid duplicating the type name [modernize-use-auto]

Suggested change
auto* DCXXRD = cast<CXXRecordDecl>(DD);

@@ -822,6 +861,9 @@
return -1;
CXXRecordDecl* DCXXRD = cast<CXXRecordDecl>(DD);
CXXRecordDecl* BCXXRD = cast<CXXRecordDecl>(BD);

LOCK(getInterpInfo());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: use auto when initializing with a template cast to avoid duplicating the type name [modernize-use-auto]

Suggested change
LOCK(getInterpInfo());
auto* BCXXRD = cast<CXXRecordDecl>(BD);

@@ -893,6 +936,7 @@
bool HasDefaultConstructor(TCppScope_t scope) {
auto* D = (clang::Decl*)scope;

LOCK(getInterpInfo());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: do not use C-style cast to convert between unrelated types [cppcoreguidelines-pro-type-cstyle-cast]

  auto* D = (clang::Decl*)scope;
            ^

@@ -1862,6 +1927,7 @@
if (!builtin.isNull())
return builtin.getAsOpaquePtr();

LOCK(getInterpInfo());
auto* D = (Decl*)GetNamed(name, /* Within= */ 0);
if (auto* TD = llvm::dyn_cast_or_null<TypeDecl>(D)) {
return QualType(TD->getTypeForDecl(), 0).getAsOpaquePtr();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: argument name 'Within' in comment does not match parameter name 'parent' [bugprone-argument-comment]

  auto* D = (Decl*)GetNamed(name, /* Within= */ 0);
                                  ^
Additional context

include/CppInterOp/CppInterOp.h:411: 'parent' declared here

                                    TCppScope_t parent = nullptr);
                                                ^

lib/CppInterOp/CppInterOp.cpp:718: actual callee ('GetNamed') is declared here

TCppScope_t GetNamed(const std::string& name,
            ^

@@ -3863,6 +3973,7 @@

void GetOperator(TCppScope_t scope, Operator op,
std::vector<TCppFunction_t>& operators, OperatorArity kind) {
LOCK(getInterpInfo());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: The value '4294967295' provided to the cast expression is not in the valid range of values for 'OperatorArity' [clang-analyzer-optin.core.EnumCastOutOfRange]

  return (OperatorArity)~0U;
         ^
Additional context

include/CppInterOp/CppInterOp.h:94: enum declared here

enum OperatorArity : unsigned char { kUnary = 1, kBinary, kBoth };
     ^

lib/CppInterOp/CppInterOp.cpp:3957: Assuming 'D' is not a 'CastReturnType'

  if (auto* FD = llvm::dyn_cast<FunctionDecl>(D)) {
                 ^

lib/CppInterOp/CppInterOp.cpp:3957: 'FD' is null

  if (auto* FD = llvm::dyn_cast<FunctionDecl>(D)) {
            ^

lib/CppInterOp/CppInterOp.cpp:3957: Taking false branch

  if (auto* FD = llvm::dyn_cast<FunctionDecl>(D)) {
  ^

lib/CppInterOp/CppInterOp.cpp:3975: The value '4294967295' provided to the cast expression is not in the valid range of values for 'OperatorArity'

  return (OperatorArity)~0U;
         ^

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants