Skip to content

Question Regarding Text Embedding Extraction from ClipCap #91

@Linn0910

Description

@Linn0910

I’m using the ClipCap model and I’m interested in extracting the text embeddings generated from the captions. Specifically, I would like to know if there’s a built-in way to directly retrieve the text embeddings from the model after it generates a caption for a given image.

Here’s what I’ve tried so far:

I’ve used ClipCap to generate captions for images using the model’s generate_caption method.
After obtaining the generated text, I attempted to pass it through CLIP’s text encoder to retrieve the embeddings.
However, I’m curious if there’s a more direct or integrated way within ClipCap to obtain the text embeddings immediately after caption generation, rather than using an external model like CLIP for encoding.

Could you provide guidance on how to directly extract text embeddings from ClipCap, or is this something I would need to implement separately?

Thank you for your time and help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions