-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding DPT to SMP #1073
Comments
Hey @vedantdalimkar sounds super cool 👍 I would appreciate the contribution! The most simple way to start is to clone library, copy-paste /decoders/unet as /decoders/dpt and start drafting your code in decoder.py file. See similar recent PR like #941 and #906 |
Hey @qubvel. DPT uses VIT-base, VIT-large models as encoders. However, those encoders are not supported by SMP as of now. Would adding them be outside the scope of the library? If not, let me know. I can help with extending the library so that VIT based encoders are supported. My 2 cents - adding VIT based encoders would be valuable for the community as it would allow the users to use SOTA pretraining method weights (like DINO and DINO v2) for segmentation |
Hey @vedantdalimkar, ViT encoders are supported via ViT encoders are not listed because they are not compatible with models such as Unet, but you can still instantiate them and use them for newer models. |
Hey @qubvel, I think ViT encoders can't be used with SMP as they don't follow the required downsampling pattern. I tried loading the following timm model - The error traceback -
So, if this is not an error from my side - How should I go about extending support for the ViT encoders? Should I make a new module |
Ahh, you are right, my bad! Let's have another encoder then (e.g. TimmViTLikeEncoder), I would be safer and will not break other models. |
Hey @qubvel, I have raised an initial PR for this issue. It would be great if you could please check it out and let me know if it requires any changes. |
Dense Prediction Transformer was introduced in the paper - Vision Transformers for Dense Prediction
It was also used in UniMatch V2: Pushing the Limit of Semi-Supervised Semantic Segmentation, which is the current SOTA in semi supervised semantic segmentation.
I think this would be a good addition to the library. I would like to contribute this architecture to the library, if its fine.
The text was updated successfully, but these errors were encountered: