-
Notifications
You must be signed in to change notification settings - Fork 13.8k
models : Added support for RND1 Diffusion Language Model #17433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Co-authored-by: Sigbjørn Skjæret <[email protected]>
CISC
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thank you!
|
Please update or close the issue with a comment #17291 |
I don't really want you to close the issue, however if it is a correctness bug as you mention there then the outputs for RND1 will also suffer from the same issue, I'm not sure why you are saying it is not to do with RND1's implementation as that is the same implementation as Qwen2. I would rather see that resolved before merging. |
RND1 works on llama.cpp, can run:
Model Card
https://huggingface.co/radicalnumerics/RND1-Base-0910
Instructions
Results
Works on GB200, non-causal, BF16, 512 context len, 256 diffusion steps:
Works on H100, non-causal, BF16, 512 context len, 256 diffusion steps:
Works on Mac Studio M3 Ultra, non-causal, BF16, 512 context len, 256 diffusion steps: