Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 4 additions & 8 deletions csrc/nv_internal/tensorrt_llm/deep_gemm/compiler.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,7 @@ std::vector<std::filesystem::path> getJitIncludeDirs() {
static std::vector<std::filesystem::path> includeDirs;
if (includeDirs.empty()) {
// Command to execute
char const* cmd = "pip show tensorrt_llm 2>/dev/null";
char const* cmd = "pip show flashinfer-python 2>/dev/null";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of this command?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the DeepGEMM JIT, it needs the header files in deep_gemm/, this command finds the installation path which is then used further down to add the deep_gemm/ to the -I

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to move the logic to python, pip show flashinfer-python doesn't necessarily show the correct package information (e.g. at AOT time when the package is not installed yet).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we can obtain the include path from python and pass the value to C++.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is where a refactor might be necessary, unfortunately these deep_gemm kernels aren't captured as part of AOT.


// Buffer to store the output
std::array<char, 128> buffer;
Expand Down Expand Up @@ -174,15 +174,11 @@ std::vector<std::filesystem::path> getJitIncludeDirs() {
location.erase(location.find_last_not_of(" \n\r\t") + 1);

// Set the include directory based on the package location
includeDirs.push_back(std::filesystem::path(location) / "tensorrt_llm" / "include");

if (!kJitUseNvcc) {
includeDirs.push_back(std::filesystem::path(location) / "tensorrt_llm" / "include" /
"cuda" / "include");
}
includeDirs.push_back(std::filesystem::path(location) / "flashinfer" / "data" / "csrc" /
"nv_internal" / "tensorrt_llm");
}
} else {
TLLM_LOG_WARNING("Failed to find TensorRT LLM installation, DeepGEMM will be disabled.");
TLLM_LOG_WARNING("Failed to find FlashInfer installation, DeepGEMM will be disabled.");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can safely assume flashinfer is installed if this function is called?

}
}
return includeDirs;
Expand Down
9 changes: 7 additions & 2 deletions csrc/nv_internal/tensorrt_llm/deep_gemm/runtime.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,13 @@ static bool kJitDebugging = []() {
}();

static bool kJitUseNvcc = []() {
char const* env_var = getenv("TRTLLM_DG_JIT_USE_NVCC");
return env_var && (std::string(env_var) == "1" || std::string(env_var) == "true");
// char const* env_var = getenv("TRTLLM_DG_JIT_USE_NVCC");
// return env_var && (std::string(env_var) == "1" || std::string(env_var) == "true");
// always use nvcc
// TODO: Enable nvrtc -- need these headers:
// [TensorRT-LLM][INFO] Compilation log:
// kernel.cu(16): catastrophic error: cannot open source file "cuda_bf16.h"
return true;
}();
Comment on lines 38 to 46
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The kJitUseNvcc variable is now hardcoded to true, and the previous implementation that used an environment variable is commented out. While the TODO comment explains why NVRTC is currently disabled, leaving commented-out code can reduce readability and maintainability. It would be cleaner to remove the commented-out lines and refine the comments to keep only the essential context.

static bool kJitUseNvcc = []() {
  // NVRTC is currently disabled due to include issues. Using NVCC by default.
  // TODO: Enable NVRTC. It fails with errors like:
  // [TensorRT-LLM][INFO] Compilation log:
  // kernel.cu(16): catastrophic error: cannot open source file "cuda_bf16.h"
  return true;
}();


static bool kJitDumpCubin = []() {
Expand Down