Skip to content
This repository was archived by the owner on Jan 27, 2026. It is now read-only.

Fix: gpu detection when multiple GPUs exist#275

Merged
JannikSt merged 8 commits into
developfrom
fix/gpu-detection
Apr 18, 2025
Merged

Fix: gpu detection when multiple GPUs exist#275
JannikSt merged 8 commits into
developfrom
fix/gpu-detection

Conversation

@JannikSt
Copy link
Copy Markdown
Member

No description provided.

@JannikSt JannikSt requested a review from Copilot April 18, 2025 10:21
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors GPU detection and handling to support systems with multiple GPUs by replacing a boolean GPU flag with a more detailed GPU specifications structure. Key changes include:

  • Replacing the boolean has_gpu flag with an Option throughout the Docker service and CLI layers.
  • Updating the DockerManager to use the GPU specifications for device requests.
  • Modifying GPU detection logic to collect and select multiple GPU devices.

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.

Show a summary per file
File Description
worker/src/docker/service.rs Switched from a boolean flag to an Option for GPU usage.
worker/src/docker/docker_manager.rs Updated the start_container parameter and host configuration to use GPU specs.
worker/src/cli/command.rs Adjusted CLI command construction to reflect the new GPU model.
worker/src/checks/hardware/hardware_check.rs Changed GPU collection to select the main GPU from multiple GPUs.
worker/src/checks/hardware/gpu.rs Redesigned GPU detection to aggregate multiple GPU devices.
shared/src/models/node.rs Added an optional indices field to the GpuSpecs model.
discovery/src/api/routes/node.rs Updated test fixtures to include GPU indices.
Comments suppressed due to low confidence (2)

worker/src/docker/docker_manager.rs:193

  • [nitpick] Consider renaming the variable 'gpu' to 'gpu_specs' for clarity, as it represents detailed GPU specifications rather than a simple boolean flag.
let host_config = if gpu.is_some() {

worker/src/docker/docker_manager.rs:200

  • [nitpick] Instead of using unwrap, consider using pattern matching (e.g., if let Some(gpu_specs) = gpu) to safely access 'indices'. This avoids any potential panic and makes the control flow clearer if 'indices' is unexpectedly None.
device_ids: gpu.unwrap().indices.map(|indices| indices.iter().map(|i| i.to_string()).collect()),

Comment on lines +86 to +106
let name = device.name().unwrap_or_else(|_| "Unknown".to_string());
let memory = device.memory_info().map(|m| m.total).unwrap_or(0);
let driver_version = nvml
.sys_driver_version()
.unwrap_or_else(|_| "Unknown".to_string());

GpuDevice::Available {
name,
memory,
driver_version,
device_count,
if let Some(existing_device) = device_map.get_mut(&name) {
existing_device.count += 1;
existing_device.indices.push(i as u32);
} else {
device_map.insert(
name.clone(),
GpuDevice {
name,
memory,
driver_version,
count: 1,
indices: vec![i as u32],
},
);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked up a few SMI examples and it seems like the "name" field is usually unique to a product, but also this doesn't seem to be guaranteed, e.g. with VRAM modded cards (that might get more popular in the future) it could cause errors not to check the VRAM available. It might be worthwhile appending the memory as a string to the name and using that as a key.

@JannikSt JannikSt merged commit 99725ee into develop Apr 18, 2025
1 check passed
JannikSt added a commit that referenced this pull request Apr 21, 2025
*use GPU type with highest count

* mount specific GPU idx and not all
JannikSt added a commit that referenced this pull request Apr 21, 2025
*use GPU type with highest count

* mount specific GPU idx and not all
@JannikSt JannikSt deleted the fix/gpu-detection branch June 25, 2025 13:00
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants