Need help related to T5EncoderModel #10735
-
So I was getting this warning during inference and figured this has to do with Clip. So I should be using T5 text encoder.
Even after using T5 still the same error, which means T5 is not used. Any guidance plz what is wrong in code.
If I set text_encoder=None in pipe, it gives another error NoneType and doesn't work at all. |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 5 replies
-
HI, that's not an error, it's a warning telling you that the prompt for the clip model will be truncated since the model only supports up to 77 tokens. This is only for the clip model and not the T5, the T5 will use the full prompt. You can search a workaround about it but it's a really common and known issue that we've discussed multiple times in different issues and discussions. Even though I answered this multiple times, the fast answer is that if you want to use more tokens you will need to use some kind of strategy to pass the model the tokens, I recommend a library called sd_embed for this. |
Beta Was this translation helpful? Give feedback.
-
My understanding is if I use T5 (text_encoder_2) this warning should not appear, and that's what I added in code. In this case T5 should be used. My understanding wrong? How to force T5 and ignore Clip. I read your response again, so this is known issue and need to be fixed in diffusers. text_encoder_2 = T5EncoderModel.from_pretrained( I am looking at https://github.com/xhinker/sd_embed, thanks for the repo link |
Beta Was this translation helpful? Give feedback.
-
Solution is already posted here But question remains, using T5 should solve the limit of 77 tokens, but it is not. Why? |
Beta Was this translation helpful? Give feedback.
see answer above.
Also, even though that's a solution, compel has very basic support for long prompts and weightings, it's really not easy to use the prompts shared by other users in other UIs (you need to convert them) and hasn't been updated in a long time. I don't even know if it supports Flux or the newer models that still use the clip models.
That's a really old solution and not the best one, maybe we should update it so people use sd_embed, what do you think @stevhliu