Is RA-SAE able to process embedding features from feed-forward 3D vision backbones?

Hi, thanks on your great work on vision model interpretations! I also love your work and findings in the latest "Into the Rabbit Hull" paper!

Yet I'm wondring if it's do-able to process embeddings from feed-forward 3D backbones (e.g. mast3r family, vggt etc.) which are also attention-based models, and it's supre interesting to reveal what 3D concepts do they learned. Is there anything to pay attention to when swtich to 3D vision models?

Meanwhile, I'll trying to train a RA-SAE on VGGT's embeddings (the attention layer output) and I'm glad to share it if anyone find it interesting.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is RA-SAE able to process embedding features from feed-forward 3D vision backbones? #6

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Is RA-SAE able to process embedding features from feed-forward 3D vision backbones? #6

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions