Skip to content
Discussion options

You must be logged in to vote

Use the latest configs in the Nemotron on Spark repo, which set the model, KV‑cache, and runtime flags for long contexts. Ensure you allocate enough GPU memory to KV cache and be mindful that running multiple large agents on one box will limit per‑agent context. When in doubt, start with a single Nemotron 3 Super instance, verify context behavior, then scale out.

Replies: 1 comment

Comment options

zNeill
Apr 2, 2026
Collaborator Author

You must be logged in to vote
0 replies
Answer selected by zNeill
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
1 participant