black box in mouth

<img width="999" height="995" alt="Image" src="https://github.com/user-attachments/assets/fbdd5ce5-f7e2-4165-a2ba-2de119f712d3" />


i tried chunking the audio and then processed iteratively with same input image. Shockingly for some chunks it processed successfully but for some chunks it gave same output is shared.

Maybe it detected mouth but not able to regenerate. 

Does any fix exist??