Is there an issue with the open-source code or the model? During I2V inference, the video often remains static or simply zooms in/out on the image.