Question Regarding Text Embedding Extraction from ClipCap

I’m using the ClipCap model and I’m interested in extracting the text embeddings generated from the captions. Specifically, I would like to know if there’s a built-in way to directly retrieve the text embeddings from the model after it generates a caption for a given image.

Here’s what I’ve tried so far:

I’ve used ClipCap to generate captions for images using the model’s generate_caption method.
After obtaining the generated text, I attempted to pass it through CLIP’s text encoder to retrieve the embeddings.
However, I’m curious if there’s a more direct or integrated way within ClipCap to obtain the text embeddings immediately after caption generation, rather than using an external model like CLIP for encoding.

Could you provide guidance on how to directly extract text embeddings from ClipCap, or is this something I would need to implement separately?

Thank you for your time and help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question Regarding Text Embedding Extraction from ClipCap #91

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Question Regarding Text Embedding Extraction from ClipCap #91

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions