Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature map channel not same as what I defined #13402

Closed
1 task done
tobymuller233 opened this issue Nov 7, 2024 · 3 comments
Closed
1 task done

Feature map channel not same as what I defined #13402

tobymuller233 opened this issue Nov 7, 2024 · 3 comments
Labels
detect Object Detection issues, PR's question Further information is requested

Comments

@tobymuller233
Copy link

Search before asking

Question

Recently I'm working on some head detection works, and to deploy the model on devices with weak computation ability, I used a yoloface-500k model and tried to train it in yolov5 framework. The model yaml is defined as followed:

nc: 1 # number of classes
depth_multiple: 1.0 # model depth multiple
width_multiple: 1.0 # layer channel multiple
anchors:
  - [4, 6, 7, 10, 11, 15]
  - [16, 24, 33, 25, 26, 41]
  - [47, 60, 83, 97, 141, 149]

backbone:
  # [from, number, module, args]
  # args: out_channels, size, stride
  [
    [-1, 1, Conv, [8, 3, 2]],  # 0  [batch, 8, size/2, size/2]
    [-1, 1, DWConv, [8, 3, 1]], # 1 [320]
    [-1, 1, Conv, [4, 1, 1 ]], # 2  [320]
    [-1, 1, Conv, [24, 1, 1]], # 3 [-1, 1, DWConv, [24, 3, 2]] # 4
    [-1, 1, Conv, [6, 1, 1]], # 4
    [-1, 1, Bottleneck3, [6]], # 5

    [-1, 1, Conv, [36, 1, 1]], # 6  
    [-1, 1, DWConv, [36, 3, 2]], # 7  [160]
    [-1, 1, Conv, [8, 1, 1]], # 8
    [-1, 2, Bottleneck3, [8]], # 9
    
    [-1, 1, Conv, [48, 1, 1]], # 10 
    [-1, 1, DWConv, [48, 3, 2]], # 11 [80]
    [-1, 1, Conv, [16, 1, 1]], # 12
    [-1, 3, Bottleneck3, [16]], # 13

    [-1, 1, Conv, [96, 1, 1]], # 14
    [-1, 1, DWConv, [96, 3, 1]], # 15
    [-1, 1, Conv, [24, 1, 1]], # 16
    [-1, 2, Bottleneck3, [24]], # 17

    [-1, 1, Conv, [144, 1, 1]], # 18    [80]
    [-1, 1, DWConv, [144, 3, 2]], # 19  [80] -> [40]
    [-1, 1, Conv, [40, 1, 1]], # 20
    [-1, 2, Bottleneck3, [40]], # 21 [batch, 40, size/16, size/16]
  ]

head: [
    [-1, 1, Conv, [80, 1, 1]], # 22 [40]
    [[-1, -4], 1, Concat, [1]], # 23  [batch, 224, size/16, size/16]  [40]

    [-1, 1, Conv, [48, 1, 1]], # 24
    [-1, 1, DWConv, [48, 3, 1]], # 25
    [-1, 1, Conv, [36, 1, 1]], # 26
    [-1, 1, Conv, [18, 1, 1]], # 27   [batch, 18, size/8, size/8] -> [40]
    
    [-5, 1, nn.Upsample, [None, 2, "nearest"]],  # 28   [80]
    [[-1, 11], 1, Concat, [1]],  # 29   [80]  ch = 272
    [-1, 1, Conv, [24, 1, 1]], # 30
    [-1, 1, DWConv, [24, 3, 1]], # 31 
    [-1, 1, Conv, [24, 1, 1]], # 32   
    [-1, 1, Conv, [18, 1, 1]], # 33 [batch, 18, 160, 160] -> [80]

    [-5, 1, nn.Upsample, [None, 2, "nearest"]],  # 34 [1, 272, 320, 320] -> [160]
    [[-1, 7], 1, Concat, [1]],  # 35  
    [-1, 1, Conv, [18, 1, 1]], # 36   
    [-1, 1, DWConv, [18, 3, 1]], # 37 
    [-1, 1, Conv, [24, 1, 1]], # 38   
    [-1, 1, Conv, [18, 1, 1]], # 39   [batch, 18, 320, 320] -> [160]

    [[39, 33, 27], 1, Detect, [nc, anchors]], 


  ]

The arrows in the file just denote the change on size I have made in this layer from a previous version, which is not important in this issue.
My problem is, As I defined in layer 27, 33 and 39, these three layers should output a 18 channel feature map, respectively. However, in my experiment, where I run detect.py with the .pt weight file I get after training, it turns out that the output of these layers are all 24 channels:

Layer 0: torch.Size([1, 8, 320, 320])
Layer 1: torch.Size([1, 8, 320, 320])
Layer 2: torch.Size([1, 8, 320, 320])
Layer 3: torch.Size([1, 24, 320, 320])
Layer 4: torch.Size([1, 8, 320, 320])
Layer 5: torch.Size([1, 8, 320, 320])
Layer 6: torch.Size([1, 40, 320, 320])
Layer 7: torch.Size([1, 40, 160, 160])
Layer 8: torch.Size([1, 8, 160, 160])
Layer 9: torch.Size([1, 8, 160, 160])
Layer 10: torch.Size([1, 48, 160, 160])
Layer 11: torch.Size([1, 48, 80, 80])
Layer 12: torch.Size([1, 16, 80, 80])
Layer 13: torch.Size([1, 16, 80, 80])
Layer 14: torch.Size([1, 96, 80, 80])
Layer 15: torch.Size([1, 96, 80, 80])
Layer 16: torch.Size([1, 24, 80, 80])
Layer 17: torch.Size([1, 24, 80, 80])
Layer 18: torch.Size([1, 144, 80, 80])
Layer 19: torch.Size([1, 144, 40, 40])
Layer 20: torch.Size([1, 40, 40, 40])
Layer 21: torch.Size([1, 40, 40, 40])
Layer 22: torch.Size([1, 80, 40, 40])
Layer 23: torch.Size([1, 224, 40, 40])
Layer 24: torch.Size([1, 48, 40, 40])
Layer 25: torch.Size([1, 48, 40, 40])
Layer 26: torch.Size([1, 40, 40, 40])
Layer 27: torch.Size([1, 24, 40, 40])
Layer 28: torch.Size([1, 224, 80, 80])
Layer 29: torch.Size([1, 272, 80, 80])
Layer 30: torch.Size([1, 24, 80, 80])
Layer 31: torch.Size([1, 24, 80, 80])
Layer 32: torch.Size([1, 24, 80, 80])
Layer 33: torch.Size([1, 24, 80, 80])
Layer 34: torch.Size([1, 272, 160, 160])
Layer 35: torch.Size([1, 312, 160, 160])
Layer 36: torch.Size([1, 24, 160, 160])
Layer 37: torch.Size([1, 24, 160, 160])
Layer 38: torch.Size([1, 24, 160, 160])
Layer 39: torch.Size([1, 24, 160, 160])

What makes it more weird is that, in my previous version, just as I mentioned above,

nc: 1 # number of classes
depth_multiple: 1.0 # model depth multiple
width_multiple: 1.0 # layer channel multiple
anchors:
  - [4, 6, 7, 10, 11, 15]
  - [16, 24, 33, 25, 26, 41]
  - [47, 60, 83, 97, 141, 149]

backbone:
  # [from, number, module, args]
  # args: out_channels, size, stride
  [
    [-1, 1, Conv, [8, 3, 2]],  # 0  [batch, 8, size/2, size/2]
    [-1, 1, DWConv, [8, 3, 1]], # 1 [320]
    [-1, 1, Conv, [4, 1, 1 ]], # 2  [320]
    [-1, 1, Conv, [24, 1, 1]], # 3 [-1, 1, DWConv, [24, 3, 2]] # 4
    [-1, 1, Conv, [6, 1, 1]], # 4
    [-1, 1, Bottleneck3, [6]], # 5

    [-1, 1, Conv, [36, 1, 1]], # 6  
    [-1, 1, DWConv, [36, 3, 2]], # 7  [160]
    [-1, 1, Conv, [8, 1, 1]], # 8
    [-1, 2, Bottleneck3, [8]], # 9
    
    [-1, 1, Conv, [48, 1, 1]], # 10 
    [-1, 1, DWConv, [48, 3, 2]], # 11 [80]
    [-1, 1, Conv, [16, 1, 1]], # 12
    [-1, 3, Bottleneck3, [16]], # 13

    [-1, 1, Conv, [96, 1, 1]], # 14
    [-1, 1, DWConv, [96, 3, 1]], # 15
    [-1, 1, Conv, [24, 1, 1]], # 16
    [-1, 2, Bottleneck3, [24]], # 17

    [-1, 1, Conv, [144, 1, 1]], # 18    [80]
    [-1, 1, DWConv, [144, 3, 2]], # 19  [40]
    [-1, 1, Conv, [40, 1, 1]], # 20
    [-1, 2, Bottleneck3, [40]], # 21 [batch, 40, size/16, size/16]
  ]

head: [
    [-1, 1, Conv, [80, 1, 1]], # 22 
    [-1, 1, nn.Upsample, [None, 2, "nearest"]],  # 23 [1, 80, 80, 80]
    [[-1, -6], 1, Concat, [1]], # 24  [batch, 224, size/8, size/8]

    [-1, 1, Conv, [48, 1, 1]], # 25
    [-1, 1, DWConv, [48, 3, 1]], # 26
    [-1, 1, Conv, [36, 1, 1]], # 27
    [-1, 1, Conv, [18, 1, 1]], # 28   [batch, 18, size/8, size/8]
    
    [-5, 1, nn.Upsample, [None, 2, "nearest"]],  # 29 
    [[-1, 10], 1, Concat, [1]],  # 30 
    [-1, 1, Conv, [24, 1, 1]], # 31
    [-1, 1, DWConv, [24, 3, 1]], # 32 
    [-1, 1, Conv, [24, 1, 1]], # 33   
    [-1, 1, Conv, [18, 1, 1]], # 34 [batch, 18, 160, 160]

    [-5, 1, nn.Upsample, [None, 2, "nearest"]],  # 35 [1, 272, 320, 320]
    [[-1, 6], 1, Concat, [1]],  # 36  
    [-1, 1, Conv, [18, 1, 1]], # 37   
    [-1, 1, DWConv, [18, 3, 1]], # 38 
    [-1, 1, Conv, [24, 1, 1]], # 39   
    [-1, 1, Conv, [18, 1, 1]], # 40   [batch, 18, 320, 320]

    [[40, 34, 28], 1, Detect, [nc, anchors]], 


  ]

Which is a similar one except that the stride of some of the convolution layers are different with that in the latter version. By the way, I made these changes only to reduce the size of feature maps from 80, 160, 320 to 40, 80, 160 in order to have a better performance on the edge device. In this version, on the contrary, it outputs three feature maps with channel 18 each.

Layer 0: torch.Size([1, 8, 320, 320])
Layer 1: torch.Size([1, 8, 320, 320])
Layer 2: torch.Size([1, 8, 320, 320])
Layer 3: torch.Size([1, 24, 320, 320])
Layer 4: torch.Size([1, 8, 320, 320])
Layer 5: torch.Size([1, 8, 320, 320])
Layer 6: torch.Size([1, 40, 320, 320])
Layer 7: torch.Size([1, 40, 160, 160])
Layer 8: torch.Size([1, 8, 160, 160])
Layer 9: torch.Size([1, 8, 160, 160])
Layer 10: torch.Size([1, 48, 160, 160])
Layer 11: torch.Size([1, 48, 80, 80])
Layer 12: torch.Size([1, 16, 80, 80])
Layer 13: torch.Size([1, 16, 80, 80])
Layer 14: torch.Size([1, 96, 80, 80])
Layer 15: torch.Size([1, 96, 80, 80])
Layer 16: torch.Size([1, 24, 80, 80])
Layer 17: torch.Size([1, 24, 80, 80])
Layer 18: torch.Size([1, 144, 80, 80])
Layer 19: torch.Size([1, 144, 40, 40])
Layer 20: torch.Size([1, 40, 40, 40])
Layer 21: torch.Size([1, 40, 40, 40])
Layer 22: torch.Size([1, 80, 40, 40])
Layer 23: torch.Size([1, 80, 80, 80])
Layer 24: torch.Size([1, 224, 80, 80])
Layer 25: torch.Size([1, 48, 80, 80])
Layer 26: torch.Size([1, 48, 80, 80])
Layer 27: torch.Size([1, 40, 80, 80])
Layer 28: torch.Size([1, 18, 80, 80])
Layer 29: torch.Size([1, 224, 160, 160])
Layer 30: torch.Size([1, 272, 160, 160])
Layer 31: torch.Size([1, 24, 160, 160])
Layer 32: torch.Size([1, 24, 160, 160])
Layer 33: torch.Size([1, 24, 160, 160])
Layer 34: torch.Size([1, 18, 160, 160])
Layer 35: torch.Size([1, 272, 320, 320])
Layer 36: torch.Size([1, 312, 320, 320])
Layer 37: torch.Size([1, 18, 320, 320])
Layer 38: torch.Size([1, 18, 320, 320])
Layer 39: torch.Size([1, 24, 320, 320])
Layer 40: torch.Size([1, 18, 320, 320])

I wonder what makes the output channel different? It seems that the yolov3 framework has modified the last several layers of the new model automatically... Since the output channel of last four layers, in my design is 18, 18, 24, 24. However in the first txt file I showed above, it's 24, 24, 24, 24. Why does this change take place?

Additional

No response

@tobymuller233 tobymuller233 added the question Further information is requested label Nov 7, 2024
@UltralyticsAssistant UltralyticsAssistant added the detect Object Detection issues, PR's label Nov 7, 2024
@UltralyticsAssistant
Copy link
Member

👋 Hello @tobymuller233, thank you for reaching out to us with your concern about the feature map channels in YOLOv5! 🚀 This is an automated response to acknowledge your issue and provide some guidance, and an Ultralytics engineer will be with you shortly to assist further.

For your configuration and layer setup, it's crucial to ensure compatibility and consistency across layers. It seems there might be a discrepancy between the defined model and the output you're observing when running detect.py. To help us investigate further, could you please provide a minimum reproducible example (MRE)? This example should include any relevant scripts, configurations, and the complete command you are using.

In the meantime, you might find our Tutorials helpful, which cover various aspects of customizing and troubleshooting YOLOv5 models, including insights into Custom Data Training and similar issues.

Quick Check

Ensure you meet the following requirements and setup steps:

  1. Python>=3.8.0 with all requirements.txt installed, including PyTorch>=1.8.
  2. Clone the repository and install dependencies:
    git clone https://github.com/ultralytics/yolov5  # clone
    cd yolov5
    pip install -r requirements.txt  # install

Environment Options

Feel free to run your experiments in any of the verified environments provided, like:

  • Google Colab or Paperspace, which offer free GPU support.
  • Google Cloud, Amazon AWS, or via a Docker Image. Links for quickstart guides are available within the repository documentation.

We look forward to your detailed input to facilitate an effective resolution. Meanwhile, keep exploring and experimenting! 😊

@tobymuller233
Copy link
Author

I found that this is because I accidentally set anchors in hyp.yaml as 2.0, which should be 3.0 instead. 🤣

@pderrenger
Copy link
Member

@tobymuller233 thank you for identifying the issue with the anchor settings in your hyp.yaml. If you have any further questions or need assistance, feel free to ask!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
detect Object Detection issues, PR's question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants