feat: kernel hub introduction draft #2777

drbh · 2025-03-28T18:55:28Z

This PR is an early draft for an introduction to the kernel hub

TODO

review post
edit/improve/expand topics?
add disclaimer about pre stable version changes
**separately draft a post about the kernel-builder to showcase kernel creation/publishing to the hub

pcuenca

Nice, looking great! I did a quick early pass, feel free to ping again when you want!

pcuenca · 2025-03-31T12:42:01Z

assets/hello-hf-kernels/kernel-hub-five-mins-short.png

Nice! But too wide I think, it will be cropped at the sides possibly hiding part of the title. The recommended aspect ratio is 2:1.

thanks! updated to be 2:1 in the latest commits

hello-hf-kernels.md

pcuenca · 2025-03-31T12:44:20Z

hello-hf-kernels.md

+thumbnail: /blog/assets/hello-hf-kernels/kernel-hub-five-mins-short.png
+authors:
+- user: drbh
+date: 2025-03-28 


Date goes in _blog.yml using a format like "March 28, 2025"

thanks! updated in the latest commits

pcuenca · 2025-03-31T12:45:19Z

hello-hf-kernels.md

+
+# 🏎️ Learn the Hugging Face Kernel Hub in 5 Minutes
+
+**Unlock performance boosts for your models with pre-optimized compute kernels, easily loaded from the Hub.**


Suggested change

**Unlock performance boosts for your models with pre-optimized compute kernels, easily loaded from the Hub.**

**Boost your model performance with pre-optimized kernels, easily loaded from the Hub.**

Maybe, for simplification?

thanks! updated in the latest commits

pcuenca · 2025-03-31T12:46:50Z

hello-hf-kernels.md

+
+**Unlock performance boosts for your models with pre-optimized compute kernels, easily loaded from the Hub.**
+
+Today, we'll explore an exciting development from Hugging Face: the **Kernel Hub**! As ML practitioners, we know that maximizing performance often involves diving deep into optimized code, custom CUDA kernels, or complex build systems. The Kernel Hub aims to simplify this dramatically.


Suggested change

Today, we'll explore an exciting development from Hugging Face: the **Kernel Hub**! As ML practitioners, we know that maximizing performance often involves diving deep into optimized code, custom CUDA kernels, or complex build systems. The Kernel Hub aims to simplify this dramatically.

Today, we'll explore an exciting development from Hugging Face: the **Kernel Hub**! As ML practitioners, we know that maximizing performance often involves diving deep into optimized code, custom CUDA kernels, or complex build systems. The Kernel Hub simplifies this process dramatically!

oh this is better, updated in latest commit

pcuenca · 2025-03-31T13:38:45Z

hello-hf-kernels.md

+expected = torch.tensor(
+    [
+        [0.1100, 2.1309, -0.0700, 0.6802],
+        [-0.0500, 0.4800, -0.1700, -0.1700],
+        [0.3701, -0.1300, -0.0800, -0.1200],
+        [-0.0400, 0.1200, -0.1500, 1.7998],
+    ],
+    dtype=torch.float16,
+    device=DEVICE,
+)


Perhaps an alternative could be to retrieve the reference results from PyTorch's gelu?

yea agreed that is a better example, updated in latest commit

pcuenca · 2025-03-31T13:42:29Z

hello-hf-kernels.md

+
+## 2. How to Use the Kernel Hub (Basic Example)
+
+Using the Kernel Hub is designed to be straightforward. The `kernels` library provides the main interface. Here's a quick example loading an optimized GELU activation function kernel (we'll use a different kernel for the main example later).


Suggested change

Using the Kernel Hub is designed to be straightforward. The `kernels` library provides the main interface. Here's a quick example loading an optimized GELU activation function kernel (we'll use a different kernel for the main example later).

Using the Kernel Hub is designed to be straightforward. The `kernels` library provides the main interface. Here's a quick example that loads an optimized GELU activation function kernel. (Later on, we'll see another example about how to integrate a kernel in our model).

thanks this reads better, updated in latest

pcuenca · 2025-03-31T13:47:41Z

hello-hf-kernels.md

+
+**Important Notes on the `KernelModel`:**
+* **Kernel Inheritance:** The `KernelRMSNorm` class inherits from `layer_norm_kernel_module.layers.LlamaRMSNorm`, which is the RMSNorm implementation in the kernel. This allows us to use the optimized kernel directly.
+* **Accessing the Function:** The exact way to access the RMSNorm function (`layer_norm_kernel_module.layers.LlamaRMSNorm.forward`, `layer_norm_kernel_module.rms_norm_forward`, or something else) **depends entirely on how the kernel creator structured the repository on the Hub.** You may need to inspect the loaded `layer_norm_kernel_module` object (e.g., using `dir()`) or check the kernel's documentation on the Hub to find the correct function/method and its signature. I've used `rms_norm_forward` as a plausible placeholder and added error handling.


Would be nice if we can point to some kernel documentation (in the kernel's model card in the Hub) by the time this is published :) This could encourage others to adopt some common structure for kernel description / docs.

agreed! currently there is a effort to generate some useful docs started here huggingface/kernel-builder#89 however this is still a work in progress and should be updated before publishing

TODO

improve docs across all existing examples (probably autogen)

pcuenca · 2025-03-31T13:48:40Z

hello-hf-kernels.md

+from snippet2 import BaselineModel
+from snippet3 import KernelModel


We should introduce the script name before each snippet, I think.

good point, updated to have meaningful names and use them in the scripts in latest

pcuenca · 2025-03-31T13:54:27Z

hello-hf-kernels.md

+
+# Download optimized activation kernels from the Hub
+# This fetches the kernel code if not already cached
+activation_kernels = get_kernel("kernels-community/activation")


Super cool! Would something like this (different kernel) be automatically resolved? Do we want to talk (in a later section) about what happens if there's no match?

pagezyhf · 2025-04-01T12:11:52Z

hello-hf-kernels.md

+
+### Benefits of the Kernel Hub:
+
+* **Instant Access to Optimized Kernels**: Load and run kernels optimized for various hardware (like NVIDIA GPUs) without local compilation hassles.


Suggested change

* **Instant Access to Optimized Kernels**: Load and run kernels optimized for various hardware (like NVIDIA GPUs) without local compilation hassles.

* **Instant Access to Optimized Kernels**: Load and run kernels optimized for various hardware starting with NVIDIA and AMD GPUs, without local compilation hassles.

thanks! updated in the latest commits

pagezyhf · 2025-04-01T12:14:44Z

hello-hf-kernels.md

+    ~~~bash
+    pip install kernels torch numpy
+    ~~~
+    Ensure you have a compatible PyTorch version and CUDA installed if using GPU kernels.


Can we make this hardware agnostic for AMD?

good catch, i've updated the phrasing to avoid "CUDA" in the latest commit

danieldk · 2025-04-02T11:21:50Z

hello-hf-kernels.md

+
+## 1. What is the Kernel Hub?
+
+The [Kernel Hub](https://huggingface.co/kernels) (👈 Check it out!) allows Python libraries and applications to **load optimized compute kernels directly from the Hugging Face Hub**. Think of it like the Model Hub, but for low-level, high-performance code snippets (kernels) that accelerate specific operations, often on GPUs. Examples include optimized attention mechanisms (like FlashAttention), activation functions, and normalization layers (like LayerNorm or RMSNorm).


I think it would be better to mention some challenging kernels here. I think activation and normalization kernels are usually pretty good in frameworks. Maybe, attention mechanisms, quantizers, and Mixture of Expert layers?

good point, updated to include some more impactful/useful examples. thanks!

danieldk · 2025-04-02T11:24:46Z

hello-hf-kernels.md

+# Ensure you have a CUDA-enabled device
+if not torch.cuda.is_available():
+    raise RuntimeError("This example requires a CUDA-enabled GPU")


Let me upload the activation kernel for ROCm as well. I think the example is stronger if we can show something that works with both CUDA and ROCm.

https://huggingface.co/kernels-community/activation/tree/main/build/torch26-cxx11-rocm62-x86_64-linux

Built, running validation tests now...

All tests pass.

wooo amazing, thank you!

danieldk · 2025-04-02T11:27:01Z

hello-hf-kernels.md

+if not torch.cuda.is_available():
+    raise RuntimeError("This example requires a CUDA-enabled GPU")


I think the Triton kernel should also work with ROCm? Worth trying.

awesome, thanks for building/testing! removed torch.cuda.. in the latest commit

danieldk · 2025-04-02T11:31:55Z

hello-hf-kernels.md

+layer_norm_kernel_module = get_kernel("kernels-community/triton-layer-norm")
+
+
+class KernelRMSNorm(layer_norm_kernel_module.layers.LlamaRMSNorm):
+    def __init__(self, hidden_size, variance_epsilon=1e-5):
+        super().__init__()
+        self.weight = nn.Parameter(torch.ones(hidden_size))
+        self.variance_epsilon = variance_epsilon


We want people to use @use_kernel_forward_from_hub to annotate the Torch class and then register LlamaRMSNorm using a mapping. See: https://github.com/huggingface/kernels/blob/main/docs/layers.md

Using @use_kernel_forward_from_hub enables people to make layers that are (dynamically) extensible with kernels, people can replace kernels, etc.

ah yea great point! I've updated the code to prefer adding @use_kernel_forward_from_hub("LlamaRMSNorm") to the RMSNorm defined in the reference example (and added some descriptive comments).

danieldk · 2025-04-02T11:33:35Z

hello-hf-kernels.md

+    ):
+        super().__init__()
+        self.linear1 = nn.Linear(input_size, hidden_size)
+        self.norm = KernelRMSNorm(hidden_size, variance_epsilon=eps)


With @use_kernel_forward_from_hub, you don't need this. The model doesn't need any change to use kernels, the model writer or the user can map kernels externally.

this has been updated in the latest commit along with the larger change to prefer using the use_kernel_forward_from_hub decorator in the example. thanks!

merveenoyan

I was so curious about kernels, great work 🤗

merveenoyan · 2025-05-02T09:33:40Z

hello-hf-kernels.md

+* **Simplify Deployment**: Reduce the complexity of your deployment environment by fetching kernels on demand.
+* **Develop and Share Your Own Kernels**: If you create optimized kernels, you can easily share them on the Hub for others to use. This encourages collaboration and knowledge sharing within the community.
+
+> As many machine learning developers know, managing dependencies and building low-level code from source can be a time-consuming and error-prone process. The Kernel Hub aims to simplify this by providing a centralized repository of optimized compute kernels that can be easily loaded and run.


this is a quote format btw!

ahh thanks, updated in latest changes

merveenoyan · 2025-05-02T09:37:01Z

hello-hf-kernels.md

+
+Using the Kernel Hub is designed to be straightforward. The `kernels` library provides the main interface. Here's a quick example that loads an optimized GELU activation function kernel. (Later on, we'll see another example about how to integrate a kernel in our model).
+
+File: `activation_validation_example.py`


I'd add a link to the file so it's easy for people to directly check

great point, ive made all of the files gists and added links to them! thanks

merveenoyan · 2025-05-02T09:38:26Z

hello-hf-kernels.md

+
+## 4. Review Performance Impact
+
+Does using the optimized Triton RMSNorm kernel provide a speedup compared to the basic PyTorch version? Let's benchmark the forward pass again.


Suggested change

Does using the optimized Triton RMSNorm kernel provide a speedup compared to the basic PyTorch version? Let's benchmark the forward pass again.

Does optimized Triton RMSNorm kernel speeds up compared to the kernel in basic PyTorch? Let's benchmark the forward pass again.

sentence felt a bit hard to read, rephrased (if you feel like it)

good catch thanks for the suggestion, I ended up rewriting that part to

4. Benchmarking the Performance Impact

How much faster is the optimized Triton RMSNorm kernel compared to the standard PyTorch version? Let’s benchmark the forward pass to find out.

File: rmsnorm_benchmark.py

...

hello-hf-kernels.md

Vaibhavs10

Nicely done! left some suggestions for readability and to hook the devs in a bit more overall!

Vaibhavs10 · 2025-05-11T12:44:04Z

hello-hf-kernels.md

+date: 2025-03-28 
+---
+
+# 🏎️ Learn the Hugging Face Kernel Hub in 5 Minutes


Smol suggestion

Suggested change

# 🏎️ Learn the Hugging Face Kernel Hub in 5 Minutes

# 🏎️ Boost your model performance with high performance via Hugging Face Kernels hub

Just a bit more descriptive title (for an average person not familiar with kernels as much, it might not be as descriptive) - basically repurposed the line below the title.

feel free to ignore the suggestion and make something better it's just for reference.

ah yea that a good call out, I've updated the header to "# 🏎️ Enhance Your Models in 5 Minutes with the Hugging Face Kernel Hub" which is a bit more descriptive.. but happy to change to anything better!

Vaibhavs10 · 2025-05-11T12:46:22Z

hello-hf-kernels.md

+**Boost your model performance with pre-optimized kernels, easily loaded from the Hub.**
+
+Today, we'll explore an exciting development from Hugging Face: the **Kernel Hub**! As ML practitioners, we know that maximizing performance often involves diving deep into optimized code, custom CUDA kernels, or complex build systems. The Kernel Hub simplifies this process dramatically!
+


it might be helpful here to put a small code snippet or a benchmark about a kernel from the hub. or some notion of how it'd look like in a dev PoV - usually acts as a good hook and helps the reader visualise what the rest of the blog would allude too.

This could also serve as a nice TL;DR to the blogpost as well.

great point, Ive added a very short snippet of a copy/paste example which should be a better hook for devs

Vaibhavs10 · 2025-05-11T12:49:50Z

hello-hf-kernels.md

+
+The [Kernel Hub](https://huggingface.co/kernels-community) (👈 Check it out!) allows Python libraries and applications to **load optimized compute kernels directly from the Hugging Face Hub**. Think of it like the Model Hub, but for low-level, high-performance code snippets (kernels) that accelerate specific operations, often on GPUs. 
+
+Examples include advanced attention mechanisms (like [FlashAttention](https://huggingface.co/kernels-community/flash-attn) for dramatic speedups and memory savings). Custom [quantization kernels](https://huggingface.co/kernels-community/quantization) (enabling efficient computation with lower-precision data types like INT8 or INT4). Specialized kernels required for complex architectures like [Mixture of Experts (MoE) layers](https://huggingface.co/kernels-community/moe), which involve intricate routing and computation patterns. As well as [activation functions](https://huggingface.co/kernels-community/activation), and [normalization layers (like LayerNorm or RMSNorm)](https://huggingface.co/kernels-community/triton-layer-norm).


Nice, not sure if this is intended some of the kernels have build error on their README for ex: https://huggingface.co/kernels-community/flash-attn

ah yea that should be resolved soon as the kernel-builder jobs integration is still experimental (to auto generate builds). It should be stable soon, however if this takes awhile I'll update the readme.

PS: the error is not related to the kernel itself and is related to an attempt to autobuild on a recent change. The available builds are manually built/uploaded so the kernels should run without any issues

Vaibhavs10 · 2025-05-11T12:51:28Z

hello-hf-kernels.md

+3.  **Adding a Kernel to a Simple Model** - A practical integration using RMSNorm.
+4.  **Reviewing Performance Impact** - Benchmarking the RMSNorm difference.
+
+We'll introduce these concepts quickly – the core idea can be grasped in about 5 minutes (though experimenting and benchmarking might take a bit longer!).


Somewhere around here it might be good to mention that we are actually using this in Transformers and TGI with maybe a code pointer.

This would convey the fact that this work is already used in production and integrated in downstream libraries. (or at least allude to it)

very good point! just added ("## 5. Real World Use Cases") and linked to where they are used and totally agree this strengthens the article 🙏 thank you!

Vaibhavs10 · 2025-05-11T12:52:56Z

hello-hf-kernels.md

+
+Examples include advanced attention mechanisms (like [FlashAttention](https://huggingface.co/kernels-community/flash-attn) for dramatic speedups and memory savings). Custom [quantization kernels](https://huggingface.co/kernels-community/quantization) (enabling efficient computation with lower-precision data types like INT8 or INT4). Specialized kernels required for complex architectures like [Mixture of Experts (MoE) layers](https://huggingface.co/kernels-community/moe), which involve intricate routing and computation patterns. As well as [activation functions](https://huggingface.co/kernels-community/activation), and [normalization layers (like LayerNorm or RMSNorm)](https://huggingface.co/kernels-community/triton-layer-norm).
+
+Instead of manually managing complex dependencies, wrestling with compilation flags, or building libraries like Triton or CUTLASS from source, you can use the `kernels` library to instantly fetch and run pre-compiled, optimized kernels.


I think you can also give an example here of FA2/ FA3 of building from source vs using pre-compiled in terms of time taken/ compute required etc

This would be make for a good cementing factor

agreed that is a strong argument, currently the flash_attn repo is actually building FA2 as it includes mha_varlen_fwd but not FA3 as the specialized H100 kernels are not built so I think I need to update the kernel repo to be more clear and support FA3.

In the article i've added a small section that clearly compares using the kernel hub to fetch flash attention in seconds rather than building from source (hours) and link to the FA requirements (96GB RAM, may cores, etc). Please let me know if I should change it in any way!

Vaibhavs10 · 2025-05-11T13:00:12Z

hello-hf-kernels.md

+
+4.  **Benchmark:** Measure the performance impact on your specific hardware and workload. Don't forget to check for numerical correctness (`torch.testing.assert_close`).
+
+5.  **(Advanced) Contribute:** If you develop optimized kernels, consider sharing them on the Hub!


Maybe here open an issue in the kernels repo/ kernels org on the hub so that people can request some kernels as well?

On second thought, issue/ discussion on kernels org on the hub would be even better.

agreed I think it may be best to have those conversations in discussions

I'm curious how they're being shared as well, might be nice to include here without letting people dig somewhere else

created a discussion here: https://huggingface.co/spaces/kernels-community/README/discussions/1

Let's link to https://huggingface.co/spaces/kernels-community/README/discussions/1 maybe

Co-authored-by: Merve Noyan <[email protected]>

…s/core contributors and syntax edits

drbh · 2025-06-04T16:44:42Z

Hey @Vaibhavs10 do you think this post may be ready to merge?

pcuenca · 2025-06-05T12:33:40Z

Hey @Vaibhavs10 do you think this post may be ready to merge?

Taking a quick look later today.

Vaibhavs10

Thanks for iterating @drbh - All good from my side, please let's wait for @pcuenca - to give a final LGTM before we merge!

Excited about this! ❤️

Vaibhavs10 · 2025-06-05T15:04:38Z

_blog.yml

+  title: "Enhance Your Models in 5 Minutes with the Hugging Face Kernel Hub"
+  author: drbh
+  thumbnail: /blog/assets/hello-hf-kernels/kernel-hub-five-mins-short-21.png
+  date: June 4, 2025


Suggested change

date: June 4, 2025

date: Jun 5, 2025

Vaibhavs10 · 2025-06-05T15:05:26Z

hello-hf-kernels.md

+Today, we'll explore an exciting development from Hugging Face: the **Kernel Hub**! As ML practitioners, we know that maximizing performance often involves diving deep into optimized code, custom CUDA kernels, or complex build systems. The Kernel Hub simplifies this process dramatically!
+
+Below is a short example of how to use a kernel in your code.
+```python


suggestion to make snippets complete w/ pip install instructions (you can also link a colab notebook here too)

Vaibhavs10 · 2025-06-05T15:06:18Z

hello-hf-kernels.md

+## 1. What is the Kernel Hub?
+
+
+The [Kernel Hub](https://huggingface.co/kernels-community) (👈 Check it out!) allows Python libraries and applications to **load optimized compute kernels directly from the Hugging Face Hub**. Think of it like the Model Hub, but for low-level, high-performance code snippets (kernels) that accelerate specific operations, often on GPUs. 


Suggested change

The [Kernel Hub](https://huggingface.co/kernels-community) (👈 Check it out!) allows Python libraries and applications to **load optimized compute kernels directly from the Hugging Face Hub**. Think of it like the Model Hub, but for low-level, high-performance code snippets (kernels) that accelerate specific operations, often on GPUs.

The [Kernel Hub](https://huggingface.co/kernels-community) (👈 Follow it to keep up with latest kernels!) allows Python libraries and applications to **load optimized compute kernels directly from the Hugging Face Hub**. Think of it like the Model Hub, but for low-level, high-performance code snippets (kernels) that accelerate specific operations, often on GPUs.

maybe

Vaibhavs10 · 2025-06-05T15:09:05Z

hello-hf-kernels.md

+
+4.  **Benchmark:** Measure the performance impact on your specific hardware and workload. Don't forget to check for numerical correctness (`torch.testing.assert_close`).
+
+5.  **(Advanced) Contribute:** If you develop optimized kernels, consider sharing them on the Hub!


created a discussion here: https://huggingface.co/spaces/kernels-community/README/discussions/1

pcuenca

🔥

pcuenca · 2025-06-05T14:53:06Z

assets/hello-hf-kernels/kernel-hub-five-mins-short.png

let's remove this one if not in use

pcuenca · 2025-06-05T14:54:20Z

hello-hf-kernels.md

@@ -0,0 +1,568 @@
+---
+title: "Learn the Hugging Face Kernel Hub in 5 Minutes"
+thumbnail: /blog/assets/hello-hf-kernels/kernel-hub-five-mins-short.png


are you using both thumbnails or does this need to be fixed?

pcuenca · 2025-06-05T14:56:05Z

hello-hf-kernels.md

+- user: pagezyhf
+- user: merve
+- user: reach-vb
+date: 2025-03-28 


Suggested change

date: 2025-03-28

pcuenca · 2025-06-05T14:56:36Z

hello-hf-kernels.md

+date: 2025-03-28 
+---
+
+# 🏎️ Enhance Your Models in 5 Minutes with the Hugging Face Kernel Hub


Suggested change

# 🏎️ Enhance Your Models in 5 Minutes with the Hugging Face Kernel Hub

# 🏎️ Learn the Hugging Face Kernel Hub in 5 Minutes

pcuenca · 2025-06-05T14:57:43Z

hello-hf-kernels.md

how about calling it hf-kernel-hub.md? I don't know if the url impacts SEO or not, but hello seems to provide little semantic info :)

pcuenca · 2025-06-05T16:10:14Z

hello-hf-kernels.md

+
+File: [`rmsnorm_benchmark.py`](https://gist.github.com/drbh/c754a4ba52bcc46190ae4a45516fb190)
+
+```python


This benchmark snippet is super cool but perhaps a bit too long. Perhaps we could consider to show it in a disclosure button.

pcuenca · 2025-06-05T16:11:10Z

hello-hf-kernels.md

+
+The `kernels` library is still growing but is already being used in various real work projects, including:
+- [Text Generation Inference](https://github.com/huggingface/text-generation-inference/blob/d658b5def3fe6c32b09b4ffe36f770ba2aa959b4/server/text_generation_server/layers/marlin/fp8.py#L15): The TGI project uses the `kernels` library to load optimized kernels for text generation tasks, improving performance and efficiency.
+- [Transformers](https://github.com/huggingface/transformers/blob/6f9da7649f2b23b22543424140ce2421fccff8af/src/transformers/integrations/hub_kernels.py#L32): The Transformers library has integrated the `kernels` library to use drop in optimized layers without requiring any changes to the model code. This allows users to easily switch between standard and optimized implementations.


Suggested change

- [Transformers](https://github.com/huggingface/transformers/blob/6f9da7649f2b23b22543424140ce2421fccff8af/src/transformers/integrations/hub_kernels.py#L32): The Transformers library has integrated the `kernels` library to use drop in optimized layers without requiring any changes to the model code. This allows users to easily switch between standard and optimized implementations.

- [Transformers](https://github.com/huggingface/transformers/blob/6f9da7649f2b23b22543424140ce2421fccff8af/src/transformers/integrations/hub_kernels.py#L32): The Transformers library has integrated the `kernels` library to use drop-in optimized layers without requiring any changes to the model code. This allows users to easily switch between standard and optimized implementations.

Maybe put transformers first?

pcuenca · 2025-06-05T16:14:03Z

hello-hf-kernels.md

+    ```
+    Ensure you have a compatible PyTorch version and gpu driver installed.
+
+2.  **Browse the Hub:** Explore available kernels on the Hugging Face Hub under the [`kernels` tag](https://huggingface.co/kernels) or within organizations like [`kernels-community`](https://huggingface.co/kernels-community). Look for kernels relevant to your operations (activations, attention, normalization like LayerNorm/RMSNorm, etc.).


Suggested change

2. **Browse the Hub:** Explore available kernels on the Hugging Face Hub under the [`kernels` tag](https://huggingface.co/kernels) or within organizations like [`kernels-community`](https://huggingface.co/kernels-community). Look for kernels relevant to your operations (activations, attention, normalization like LayerNorm/RMSNorm, etc.).

2. **Browse the Hub:** Explore available kernels on the Hugging Face Hub under the [`kernel` tag](https://huggingface.co/models?other=kernel) or within organizations like [`kernels-community`](https://huggingface.co/kernels-community). Look for kernels relevant to your operations (activations, attention, normalization like LayerNorm/RMSNorm, etc.).

I assume

pcuenca · 2025-06-05T16:14:58Z

hello-hf-kernels.md

+
+2.  **Browse the Hub:** Explore available kernels on the Hugging Face Hub under the [`kernels` tag](https://huggingface.co/kernels) or within organizations like [`kernels-community`](https://huggingface.co/kernels-community). Look for kernels relevant to your operations (activations, attention, normalization like LayerNorm/RMSNorm, etc.).
+
+3.  **Experiment:** Try replacing components in your own models. Use `get_kernel("user-or-org/kernel-name")`. **Crucially, inspect the loaded kernel object** (e.g., `print(dir(loaded_kernel))`) or check its Hub repository documentation to understand how to correctly call its functions/methods and what parameters (weights, biases, inputs, epsilon) it expects.


(Again, auto-discovery of symbols would be awesome down the line)

pcuenca · 2025-06-05T16:16:24Z

hello-hf-kernels.md

+
+4.  **Benchmark:** Measure the performance impact on your specific hardware and workload. Don't forget to check for numerical correctness (`torch.testing.assert_close`).
+
+5.  **(Advanced) Contribute:** If you develop optimized kernels, consider sharing them on the Hub!


Let's link to https://huggingface.co/spaces/kernels-community/README/discussions/1 maybe

Vaibhavs10

let's merge and release this today/ tomorrow @drbh - we can keep improving as we go forward ofc

drbh · 2025-06-12T14:53:22Z

@Vaibhavs10 this sounds good to me.

I intent on following up with article for developers building kernels and will also include edits to this article with those changes.

Vaibhavs10

LETS GOOOOOOOO!

pcuenca reviewed Mar 31, 2025

View reviewed changes

pagezyhf reviewed Apr 1, 2025

View reviewed changes

danieldk reviewed Apr 2, 2025

View reviewed changes

merveenoyan reviewed May 2, 2025

View reviewed changes

drbh force-pushed the kernel-hub-introduction branch from 292fba4 to 3d58ec1 Compare May 8, 2025 14:44

drbh marked this pull request as ready for review May 8, 2025 17:22

Vaibhavs10 reviewed May 11, 2025

View reviewed changes

drbh force-pushed the kernel-hub-introduction branch from 36a8638 to 3763908 Compare June 4, 2025 16:39

drbh and others added 6 commits June 4, 2025 12:41

feat: kernel hub introduction draft

88fc0c7

feat: address edit comments and improve examples

721872f

fix: adjust image size

95de62f

Update hello-hf-kernels.md

97a2b9d

Co-authored-by: Merve Noyan <[email protected]>

fix: improve phrasing, add gist links, update authors for all reviwer…

08815a3

…s/core contributors and syntax edits

fix: improve examples and add real world uses

f383936

drbh force-pushed the kernel-hub-introduction branch from 8f30236 to f383936 Compare June 4, 2025 16:42

fix: add post back to _blog

7a21a50

Vaibhavs10 approved these changes Jun 5, 2025

View reviewed changes

pcuenca reviewed Jun 5, 2025

View reviewed changes

Vaibhavs10 reviewed Jun 12, 2025

View reviewed changes

Merge branch 'main' into kernel-hub-introduction

157de85

Vaibhavs10 approved these changes Jun 12, 2025

View reviewed changes

drbh merged commit 70f29e6 into main Jun 12, 2025
1 check passed

drbh deleted the kernel-hub-introduction branch June 12, 2025 15:18

drbh mentioned this pull request Jul 17, 2025

feat: add first draft of kernel builder tutorial #2971

Merged

3 tasks


		# 🏎️ Learn the Hugging Face Kernel Hub in 5 Minutes

		Unlock performance boosts for your models with pre-optimized compute kernels, easily loaded from the Hub.

	Unlock performance boosts for your models with pre-optimized compute kernels, easily loaded from the Hub.
	Boost your model performance with pre-optimized kernels, easily loaded from the Hub.


		Unlock performance boosts for your models with pre-optimized compute kernels, easily loaded from the Hub.

		Today, we'll explore an exciting development from Hugging Face: the Kernel Hub! As ML practitioners, we know that maximizing performance often involves diving deep into optimized code, custom CUDA kernels, or complex build systems. The Kernel Hub aims to simplify this dramatically.


		## 2. How to Use the Kernel Hub (Basic Example)

		Using the Kernel Hub is designed to be straightforward. The `kernels` library provides the main interface. Here's a quick example loading an optimized GELU activation function kernel (we'll use a different kernel for the main example later).

		from snippet2 import BaselineModel
		from snippet3 import KernelModel


		### Benefits of the Kernel Hub:

		* Instant Access to Optimized Kernels: Load and run kernels optimized for various hardware (like NVIDIA GPUs) without local compilation hassles.


		## 1. What is the Kernel Hub?

		The [Kernel Hub](https://huggingface.co/kernels) (👈 Check it out!) allows Python libraries and applications to load optimized compute kernels directly from the Hugging Face Hub. Think of it like the Model Hub, but for low-level, high-performance code snippets (kernels) that accelerate specific operations, often on GPUs. Examples include optimized attention mechanisms (like FlashAttention), activation functions, and normalization layers (like LayerNorm or RMSNorm).

		if not torch.cuda.is_available():
		raise RuntimeError("This example requires a CUDA-enabled GPU")


		Using the Kernel Hub is designed to be straightforward. The `kernels` library provides the main interface. Here's a quick example that loads an optimized GELU activation function kernel. (Later on, we'll see another example about how to integrate a kernel in our model).

		File: `activation_validation_example.py`


		## 4. Review Performance Impact

		Does using the optimized Triton RMSNorm kernel provide a speedup compared to the basic PyTorch version? Let's benchmark the forward pass again.

	Does using the optimized Triton RMSNorm kernel provide a speedup compared to the basic PyTorch version? Let's benchmark the forward pass again.
	Does optimized Triton RMSNorm kernel speeds up compared to the kernel in basic PyTorch? Let's benchmark the forward pass again.

	# 🏎️ Learn the Hugging Face Kernel Hub in 5 Minutes
	# 🏎️ Boost your model performance with high performance via Hugging Face Kernels hub


		The [Kernel Hub](https://huggingface.co/kernels-community) (👈 Check it out!) allows Python libraries and applications to load optimized compute kernels directly from the Hugging Face Hub. Think of it like the Model Hub, but for low-level, high-performance code snippets (kernels) that accelerate specific operations, often on GPUs.

		Examples include advanced attention mechanisms (like [FlashAttention](https://huggingface.co/kernels-community/flash-attn) for dramatic speedups and memory savings). Custom [quantization kernels](https://huggingface.co/kernels-community/quantization) (enabling efficient computation with lower-precision data types like INT8 or INT4). Specialized kernels required for complex architectures like [Mixture of Experts (MoE) layers](https://huggingface.co/kernels-community/moe), which involve intricate routing and computation patterns. As well as [activation functions](https://huggingface.co/kernels-community/activation), and [normalization layers (like LayerNorm or RMSNorm)](https://huggingface.co/kernels-community/triton-layer-norm).


		Examples include advanced attention mechanisms (like [FlashAttention](https://huggingface.co/kernels-community/flash-attn) for dramatic speedups and memory savings). Custom [quantization kernels](https://huggingface.co/kernels-community/quantization) (enabling efficient computation with lower-precision data types like INT8 or INT4). Specialized kernels required for complex architectures like [Mixture of Experts (MoE) layers](https://huggingface.co/kernels-community/moe), which involve intricate routing and computation patterns. As well as [activation functions](https://huggingface.co/kernels-community/activation), and [normalization layers (like LayerNorm or RMSNorm)](https://huggingface.co/kernels-community/triton-layer-norm).

		Instead of manually managing complex dependencies, wrestling with compilation flags, or building libraries like Triton or CUTLASS from source, you can use the `kernels` library to instantly fetch and run pre-compiled, optimized kernels.


		4. Benchmark: Measure the performance impact on your specific hardware and workload. Don't forget to check for numerical correctness (`torch.testing.assert_close`).

		5. (Advanced) Contribute: If you develop optimized kernels, consider sharing them on the Hub!

	# 🏎️ Enhance Your Models in 5 Minutes with the Hugging Face Kernel Hub
	# 🏎️ Learn the Hugging Face Kernel Hub in 5 Minutes


		File: [`rmsnorm_benchmark.py`](https://gist.github.com/drbh/c754a4ba52bcc46190ae4a45516fb190)

		```python

	- [Transformers](https://github.com/huggingface/transformers/blob/6f9da7649f2b23b22543424140ce2421fccff8af/src/transformers/integrations/hub_kernels.py#L32): The Transformers library has integrated the `kernels` library to use drop in optimized layers without requiring any changes to the model code. This allows users to easily switch between standard and optimized implementations.
	- [Transformers](https://github.com/huggingface/transformers/blob/6f9da7649f2b23b22543424140ce2421fccff8af/src/transformers/integrations/hub_kernels.py#L32): The Transformers library has integrated the `kernels` library to use drop-in optimized layers without requiring any changes to the model code. This allows users to easily switch between standard and optimized implementations.

	2. Browse the Hub: Explore available kernels on the Hugging Face Hub under the [`kernels` tag](https://huggingface.co/kernels) or within organizations like [`kernels-community`](https://huggingface.co/kernels-community). Look for kernels relevant to your operations (activations, attention, normalization like LayerNorm/RMSNorm, etc.).
	2. Browse the Hub: Explore available kernels on the Hugging Face Hub under the [`kernel` tag](https://huggingface.co/models?other=kernel) or within organizations like [`kernels-community`](https://huggingface.co/kernels-community). Look for kernels relevant to your operations (activations, attention, normalization like LayerNorm/RMSNorm, etc.).


		2. Browse the Hub: Explore available kernels on the Hugging Face Hub under the [`kernels` tag](https://huggingface.co/kernels) or within organizations like [`kernels-community`](https://huggingface.co/kernels-community). Look for kernels relevant to your operations (activations, attention, normalization like LayerNorm/RMSNorm, etc.).

		3. Experiment: Try replacing components in your own models. Use `get_kernel("user-or-org/kernel-name")`. Crucially, inspect the loaded kernel object (e.g., `print(dir(loaded_kernel))`) or check its Hub repository documentation to understand how to correctly call its functions/methods and what parameters (weights, biases, inputs, epsilon) it expects.

feat: kernel hub introduction draft #2777

feat: kernel hub introduction draft #2777

Uh oh!

Conversation

drbh commented Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TODO

Uh oh!

pcuenca left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

TODO

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danieldk Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

drbh commented Mar 28, 2025 •

edited

Loading

danieldk Apr 2, 2025 •

edited

Loading