-
|
From the Mar 31 NemoClaw Livestream — Models, runtimes, and Nemotron‑specific behavior |
Beta Was this translation helpful? Give feedback.
Answered by
zNeill
Apr 2, 2026
Replies: 1 comment
-
|
Ollama is convenient and great for quick starts—simple install and a single command to pull and run models. For maximum performance and tighter control on Spark, vLLM or TRT‑LLM/NIM are the recommended paths; that’s where we’re focusing optimizations for Nemotron (NVFP4, multi‑token prediction, long context). Use Ollama to get going, then move to vLLM/TRT‑LLM when you care more about throughput and latency. |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
zNeill
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Ollama is convenient and great for quick starts—simple install and a single command to pull and run models. For maximum performance and tighter control on Spark, vLLM or TRT‑LLM/NIM are the recommended paths; that’s where we’re focusing optimizations for Nemotron (NVFP4, multi‑token prediction, long context). Use Ollama to get going, then move to vLLM/TRT‑LLM when you care more about throughput and latency.