-
Notifications
You must be signed in to change notification settings - Fork 225
Description
I’m using the ClipCap model and I’m interested in extracting the text embeddings generated from the captions. Specifically, I would like to know if there’s a built-in way to directly retrieve the text embeddings from the model after it generates a caption for a given image.
Here’s what I’ve tried so far:
I’ve used ClipCap to generate captions for images using the model’s generate_caption method.
After obtaining the generated text, I attempted to pass it through CLIP’s text encoder to retrieve the embeddings.
However, I’m curious if there’s a more direct or integrated way within ClipCap to obtain the text embeddings immediately after caption generation, rather than using an external model like CLIP for encoding.
Could you provide guidance on how to directly extract text embeddings from ClipCap, or is this something I would need to implement separately?
Thank you for your time and help!