Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What happen when I use vgg16 as backbone. #12

Open
okina13070358 opened this issue Nov 27, 2023 · 3 comments
Open

What happen when I use vgg16 as backbone. #12

okina13070358 opened this issue Nov 27, 2023 · 3 comments

Comments

@okina13070358
Copy link

@trzy
Thank you your good work.
I've affirmed your pytorch version codes. Then, I have quessions.

  1. What happen when I use vgg16 (not models/vgg16_torch.py) as backbone? At that case, how to load the initial weight? I did not define vgg16_caffe.pth, but I can train.
  2. If I want to make model which can use four channels, should I make new image classification programs to train backbone from scratch? (I will use vgg16, but I can follow your advise if I have to use others.)

I want to use your good programs for my task. So, Please let me know.
Regards.

@okina13070358
Copy link
Author

Addictional
Can I train backbone from scratch with https://github.com/trzy/VGG16.git ?

@trzy
Copy link
Owner

trzy commented Nov 27, 2023

1:

vgg16 vs. vgg16_torch: a bit confusing but the first one is my "hand-coded" version. I define the VGG-16 network from scratch using layers. The second one uses torchvision's built-in VGG-16 model, torchvision.models.vgg16, which is exactly the same except that the layers are named differently.

If you use vgg16_torch, the weights should be downloaded automatically because torchvision provides them. If you use my vgg16 implementation, you'll need vgg16_caffe.pth, which my download script will fetch for you.

torchvision already contains many common CNNs, including Faster R-CNN. Obviously, I wanted to implemented Faster R-CNN from scratch, so I did all the work myself, including the VGG-16 backbone initially, but then I included the option to use the torchvision version just to demonstrate how to do so. I then later went ahead and added torchvision's ResNet backbones.

To download the vgg16_caffe.pth weights file, just look at download_models.sh.

2:

I'm still not sure why you want to pass 4 channel images in. What is inside this 4th channel?

@okina13070358
Copy link
Author

No.1 I see.
For No.2
In deep learning research field, engineers take into RGB channels for building model. But there are several 2D data. For example, Normalize Differential Vegetation Index (NDVI; it indicate the degree of vegetation healthy), Digital Eevation Model (DEM; it shows the topography shape). These 2D data is able to get by artificial satellite. Previous studies shows that building model with 4th channel can increase more infomations. Therefore, 4th channel increase accuracy of object detection.
Related work is Mask R-CNN (https://github.com/orestis-z/mask-rcnn-rgbd). But I don't need Instance Segmentation. I want to Object Detection. So I want to Faster R-CNN which can input 4th channel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants