Skip to content

Commit eee97d7

Browse files
authored
Update index.md
Minor changes
1 parent dd4c11c commit eee97d7

File tree

1 file changed

+6
-5
lines changed

1 file changed

+6
-5
lines changed

docs/index.md

+6-5
Original file line numberDiff line numberDiff line change
@@ -513,7 +513,7 @@ BeyondMoore Software Ecosystem
513513
<div clas="flex flex-col justify-start">
514514
<div class="flex flex-row gap-2 justify-start items-center flex-shrink">
515515
<img width="32" src="./assets/git.webp" />
516-
<h3><a href="https://github.com/ParCoreLab/CPU-Free-model" class="text-xl font-semibold font-serif visited:text-teal-700">CPU Free Model</a><h3>
516+
<h3><a href="https://github.com/ParCoreLab/CPU-Free-model" class="text-xl font-semibold font-serif visited:text-teal-700">CPU-Free Execution Model</a><h3>
517517
</div>
518518
<p class="text-lg">This project introduces a fully autonomous execution model for multi-GPU applications, eliminating CPU involvement beyond initial kernel launch. In conventional setups, the CPU orchestrates execution, causing overhead. We propose delegating this control flow entirely to devices, leveraging techniques like persistent kernels and device-initiated communication. Our CPU-free model significantly reduces communication overhead. Demonstrations on 2D/3D Jacobi stencil and Conjugate Gradient solvers show up to a 58.8% improvement in communication latency and a 1.63x speedup for CG on 8 NVIDIA A100 GPUs compared to CPU-controlled baselines.</p>
519519
<p>
@@ -545,7 +545,7 @@ BeyondMoore Software Ecosystem
545545
<div clas="flex flex-col justify-start">
546546
<div class="flex flex-row gap-2 justify-start items-center flex-shrink">
547547
<img width="32" src="./assets/git.webp" />
548-
<a href="https://github.com/msasongko17/multigpu_callback" class="text-xl font-semibold font-serif visited:text-teal-700">Multi-GPU Callbacks</a>
548+
<a href="https://github.com/msasongko17/multigpu_callback" class="text-xl font-semibold font-serif visited:text-teal-700">GPU to CPU Callbacks</a>
549549
</div>
550550
<p class="text-lg">To address resource underutilization in multi-GPU systems, particularly in irregular applications, we propose a GPU-sided resource allocation method. This method dynamically adjusts the number of GPUs in use based on workload changes, utilizing GPU-to-CPU callbacks to request additional devices during kernel execution. We implemented and tested multiple callback methods, measuring their overheads on Nvidia and AMD platforms. Demonstrating the approach in an irregular application like Breadth-First Search (BFS), we achieved a 15.7% reduction in time to solution on average, with callback overheads as low as 6.50 microseconds on AMD and 4.83 microseconds on Nvidia. Additionally, the model can reduce total device usage by up to 35%, improving energy efficiency.</p>
551551
<p>
@@ -558,7 +558,6 @@ BeyondMoore Software Ecosystem
558558
</div>
559559

560560

561-
562561
<div id="UNICOM" class="bg-gray-100 rounded-s p-4 border-solid border-1 border-gray-200 flex flex-row justify-center items-center gap-5 transform transition-all hover:shadow-gray-100 hover:shadow-lg shadow-none">
563562
<div clas="flex flex-col justify-start">
564563
<div class="flex flex-row gap-2 justify-start items-center flex-shrink">
@@ -568,7 +567,7 @@ BeyondMoore Software Ecosystem
568567
<p class="text-lg">We're undertaking the design of an API for a unified communication library to streamline device-to-device communication within the CPU-free model by aiming to optimize communication efficiency across diverse devices. More details about the project will be available soon. The related paper is under preparation.</p>
569568
</div>
570569
<div class="grid h-[100%] justify-center place-items-center">
571-
<img width="500px" src="./assets/network-topo.png" />
570+
<img width="300px" src="./assets/network-topo.png" />
572571
</div>
573572
</div>
574573

@@ -612,7 +611,9 @@ BeyondMoore Software Ecosystem
612611
<a href="https://github.com/ParCoreLab/PES-artifact" class="text-xl font-semibold font-serif visited:text-teal-700">Precise Event Sampling</a>
613612
</div>
614613
<p class="text-lg">
615-
Precise event sampling, a profiling feature in commodity processors, accurately pinpoints instructions triggering hardware events. While widely utilized, support from vendors varies, impacting accuracy, stability, overhead, and functionality. Our study benchmarks Intel PEBS and AMD IBS, revealing PEBS's finer-grained accuracy and IBS's richer information but lower stability. PEBS incurs lower time overhead, while IBS suffers from accuracy issues. OS signal delivery adds significant time overhead. Both PEBS and IBS exhibit sampling bias. Our findings hold in a full-fledged profiling tool on modern Intel and AMD machines. This comparison offers valuable insights for hardware designers and profiling tool developers. <a href="https://github.com/ParCoreLab/PES-artifact" class="text-xl font-semibold font-serif visited:text-teal-700">All the artifacts and benchmarks can be found.</a>
614+
Precise event sampling, a profiling feature in commodity processors, accurately pinpoints instructions triggering hardware events. While widely utilized, support from vendors varies, impacting accuracy, stability, overhead, and functionality. Our study benchmarks Intel PEBS and AMD IBS, revealing PEBS's finer-grained accuracy and IBS's richer information but lower stability. PEBS incurs lower time overhead, while IBS suffers from accuracy issues. OS signal delivery adds significant time overhead. Both PEBS and IBS exhibit sampling bias. Our findings hold in a full-fledged profiling tool on modern Intel and AMD machines. This comparison offers valuable insights for hardware designers and profiling tool developers.
615+
616+
All the artifacts and benchmarks can be found <a href="https://github.com/ParCoreLab/PES-artifact" class="text-xl font-semibold font-serif visited:text-teal-700">here.</a>
616617
</p>
617618
</div>
618619
<div class="grid h-[100%] justify-center place-items-center">

0 commit comments

Comments
 (0)