Developer companion repo for working with NVIDIA's Nemotron models: inference, fine-tuning, agents, visual reasoning, deployment.
nemotron/
│
├── usage-cookbook/ Usage cookbooks (how to deploy, and simple model usage guides)
│
│
└── use-case-examples/ Examples of leveraging Nemotron Models in Agentic Workflows and more
NVIDIA Nemotron™ is a family of open, high-efficiency models with fully transparent training data, weights, and recipes.
Nemotron models are designed for agentic AI workflows — they excel at coding, math, scientific reasoning, tool calling, instruction following, and visual reasoning (for the VL models).
They are optimized for deployment across a spectrum of compute tiers (edge, single GPU, data center) and support frameworks like NeMo and TensorRT-LLM, vLLM, and SGLang, with NIM microservice options for scalable serving.
- Usage Cookbook - Practical deployment and simple model usage guides for Nemotron models
- Use Case Examples - Practical use-case examples and apps (more coming soon)
Have an idea for improving Nemotron models? Visit the Nemotron Ideas Portal to:
- 🗳️ Vote on existing feature requests
- 💭 Submit your own ideas and suggestions
- 📊 See what the community is requesting
Your feedback helps shape the future of Nemotron models!
Full, reproducible training pipelines will be included in the nemotron package at src/nemotron/recipes/.
- 🗂️ Data Curation - Scripts to prepare training data using NVIDIA-NeMo/Curator
- 🔁 Training - Complete training loops with hyperparameters using:
- NVIDIA-NeMo/Megatron-Bridge for Megatron models
- NVIDIA-NeMo/Automodel for HuggingFace models
- NVIDIA-NeMo/NeMo-RL when RL is needed
- 📊 Evaluation - Benchmark evaluation on standard suites using NVIDIA-NeMo/Evaluator
- 📖 Documentation - Detailed explanations of each stage
Learn how to deploy and use the models through an API.
| Model | Best For | Key Features | Trade-offs | Resources |
|---|---|---|---|---|
| Llama-3.3-Nemotron-Super-49B-v1.5 | Production deployments needing strong reasoning with efficiency | • 128K context • Single H200 GPU • RAG & tool calling • Optimized via NAS |
Balances accuracy & throughput | 📁 Cookbooks |
| NVIDIA-Nemotron-Nano-9B-v2 | Resource-constrained environments needing flexible reasoning | • 9B params • Hybrid Mamba-2 architecture • Controllable reasoning traces • Unified reasoning/non-reasoning |
Smaller model with configurable reasoning | 📁 Cookbooks |
| NVIDIA-Nemotron-Nano-12B-v2-VL | Document intelligence and video understanding | • 12B VLM • Video & multi-image reasoning • Controllable reasoning (/think mode) • Efficient Video Sampling (EVS) |
Vision-language with configurable reasoning | 📁 Cookbooks |
| Llama-3.1-Nemotron-Safety-Guard-8B-v3 | Multilingual content moderation with cultural nuance | • 9 languages • 23 safety categories • Cultural sensitivity • NeMo Guardrails integration |
Focused on safety/moderation tasks | 📁 Cookbooks |
| Nemotron-Parse (link coming soon!) | Document parsing for RAG and AI agents | • VLM for document parsing • Table extraction (LaTeX) • Semantic segmentation • Spatial grounding (bbox) |
Specialized for document structure | 📁 Cookbooks |
Below is an outline of the end-to-end use case examples provided in the use-case-examples directory. These scenarios demonstrate practical applications that go beyond basic model inference.
-
Agentic Workflows
Orchestration of multi-step AI agents, integrating planning, context management, and external tools/APIs. -
Retrieval-Augmented Generation (RAG) Systems
Building pipelines that combine retrieval components (vector databases, search APIs) with Nemotron models for grounded, accurate outputs. -
Integration with External Tools & APIs
Examples of Nemotron models powering applications with structured tool calling, function execution, or data enrichment. -
Production-Ready Application Patterns
Architectures supporting scalability, monitoring, data pipelines, and real-world deployment considerations.
See the
use-case-examples/subfolders for in-depth, runnable examples illustrating these concepts.
We welcome contributions! Whether it's examples, recipes, or other tools you'd find useful.
Please read our Contributing Guidelines before submitting pull requests.
- Contributing Guidelines - How to contribute to this project
- Changelog - Version history and changes
Apache 2.0 License - see LICENSE file for details.
NVIDIA Nemotron - Open, transparent, and reproducible.