You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
MTP is now merged in upstream llama.cpp for Qwen models. How could MTP be used to replace and possibly improve speculative drafting performance than using a separate small draft model?
MTP is now merged in upstream llama.cpp for Qwen models. How could MTP be used to replace and possibly improve speculative drafting performance than using a separate small draft model?