-
Notifications
You must be signed in to change notification settings - Fork 663
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Serialize Executables crashing when compiling LLaMa on async-cpu #17244
Comments
It appears the issue is in |
The unrolling is needed because LLVM backend wants 1D vector. It could be the issue in tile size selection, and vector shape optimization potentially can help with it. |
Some additional guilty lines:
If we defer the unrolling of the |
Okay, so this is similar to what I'm seeing in #17226 (comment) IMO, we should not fuse these two generic ops. TileAndFuse is basically broken for the case. There are no dependency captured by operands. I'll talk to Mahesh to see if we can disable such fusion. |
@pashu123 please help take a look if there are other issues, apart from the fusion issue. |
Do we have a workaround for this or any patches we could try? I'm also seeing unusably slow behavior after running |
Perhaps you can try llvm/torch-mlir#3277 . It should fix the embedding lookup issue at torch level. |
That gets further, yeah :D. Might be enough to call this particular issue fixed? I do see another error with
pretty late in compilation: https://gist.github.com/ScottTodd/6fbe7edd118bbb53c0abc2582459158d |
There is an action item at LinAlg level: #17226 (comment)
@ScottTodd can you provide the mlir file? @pashu123 please help triage and provide possible solutions |
This is the input file I'm working with: https://sharkpublic.blob.core.windows.net/sharkpublic/scotttodd/issue_reports/open_llama_3b_v2_f16.mlir |
@ScottTodd I think you should add |
Verified adding this generates .vmfb. |
After thinking a while, I think we can close the issue. The action item I mentioned is tracking in the other issue, and we don't have action items for this issue now. |
This issue is blocking another model on the |
Yes, the model is RAFT_vaiq_int8 I added some information to the issue #17226 |
I think we only need to track it in one of the issues? So either we can close this or the other one. |
Yeah, that's why I opted to provide more information there. I can't close this issue because I don't have permissions. |
Wherever the issue is tracked, can we follow-up and get fixes or patches landed? I've needed to keep patching llvm/torch-mlir#3277 locally as a workaround for compilation crashes for several weeks now. |
Actually, I think we already land a more robust fix. 748db31 is landed. It should create a valid IR for codegen input. @ScottTodd could you to verify if the commit fixes the issue? |
Thanks! I haven't seen issues lately without the torch-mlir commit but I'm still juggling a few other flags (notably |
After circling back from #17341 (comment) I think we need that torch-mlir Patch. I will address the comments there. |
The following dispatches appear to cause a crash when compiling a llama model. Unrolling / vectorization makes 20K+ lines of generated code which likely causes final LLVM compilation to completely fail.
module_prefill_bs4$async_dispatch_1.zip
module_decode_bs4$async_dispatch_2.zip
The text was updated successfully, but these errors were encountered: