We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent 1fd553f commit 19edb47Copy full SHA for 19edb47
docs/06-optimisation.md
@@ -490,7 +490,7 @@ The duration is `0.179 ms` and the effective bandwidth `697 GB/s`
490
491
- Specialised libraries are highly optimised
492
- Especially dense linear algebra (hipBLAS/cuBLAS) and FFTs.
493
-- Host-Device vs Device-Compute Unit bandwidth difference is order of 2 magnitudes
+- Host-Device vs Device-Compute Unit bandwidth difference is 2 orders of magnitude
494
- Keep data in registers
495
- But there are a finite amount of registers!
496
- Neighbouring threads access neighbouring memory locations
0 commit comments