Feature map channel not same as what I defined #13402

tobymuller233 · 2024-11-07T06:27:23Z

Search before asking

I have searched the YOLOv5 issues and discussions and found no similar questions.

Question

Recently I'm working on some head detection works, and to deploy the model on devices with weak computation ability, I used a yoloface-500k model and tried to train it in yolov5 framework. The model yaml is defined as followed:

nc: 1 # number of classes
depth_multiple: 1.0 # model depth multiple
width_multiple: 1.0 # layer channel multiple
anchors:
  - [4, 6, 7, 10, 11, 15]
  - [16, 24, 33, 25, 26, 41]
  - [47, 60, 83, 97, 141, 149]

backbone:
  # [from, number, module, args]
  # args: out_channels, size, stride
  [
    [-1, 1, Conv, [8, 3, 2]],  # 0  [batch, 8, size/2, size/2]
    [-1, 1, DWConv, [8, 3, 1]], # 1 [320]
    [-1, 1, Conv, [4, 1, 1 ]], # 2  [320]
    [-1, 1, Conv, [24, 1, 1]], # 3 [-1, 1, DWConv, [24, 3, 2]] # 4
    [-1, 1, Conv, [6, 1, 1]], # 4
    [-1, 1, Bottleneck3, [6]], # 5

    [-1, 1, Conv, [36, 1, 1]], # 6  
    [-1, 1, DWConv, [36, 3, 2]], # 7  [160]
    [-1, 1, Conv, [8, 1, 1]], # 8
    [-1, 2, Bottleneck3, [8]], # 9
    
    [-1, 1, Conv, [48, 1, 1]], # 10 
    [-1, 1, DWConv, [48, 3, 2]], # 11 [80]
    [-1, 1, Conv, [16, 1, 1]], # 12
    [-1, 3, Bottleneck3, [16]], # 13

    [-1, 1, Conv, [96, 1, 1]], # 14
    [-1, 1, DWConv, [96, 3, 1]], # 15
    [-1, 1, Conv, [24, 1, 1]], # 16
    [-1, 2, Bottleneck3, [24]], # 17

    [-1, 1, Conv, [144, 1, 1]], # 18    [80]
    [-1, 1, DWConv, [144, 3, 2]], # 19  [80] -> [40]
    [-1, 1, Conv, [40, 1, 1]], # 20
    [-1, 2, Bottleneck3, [40]], # 21 [batch, 40, size/16, size/16]
  ]

head: [
    [-1, 1, Conv, [80, 1, 1]], # 22 [40]
    [[-1, -4], 1, Concat, [1]], # 23  [batch, 224, size/16, size/16]  [40]

    [-1, 1, Conv, [48, 1, 1]], # 24
    [-1, 1, DWConv, [48, 3, 1]], # 25
    [-1, 1, Conv, [36, 1, 1]], # 26
    [-1, 1, Conv, [18, 1, 1]], # 27   [batch, 18, size/8, size/8] -> [40]
    
    [-5, 1, nn.Upsample, [None, 2, "nearest"]],  # 28   [80]
    [[-1, 11], 1, Concat, [1]],  # 29   [80]  ch = 272
    [-1, 1, Conv, [24, 1, 1]], # 30
    [-1, 1, DWConv, [24, 3, 1]], # 31 
    [-1, 1, Conv, [24, 1, 1]], # 32   
    [-1, 1, Conv, [18, 1, 1]], # 33 [batch, 18, 160, 160] -> [80]

    [-5, 1, nn.Upsample, [None, 2, "nearest"]],  # 34 [1, 272, 320, 320] -> [160]
    [[-1, 7], 1, Concat, [1]],  # 35  
    [-1, 1, Conv, [18, 1, 1]], # 36   
    [-1, 1, DWConv, [18, 3, 1]], # 37 
    [-1, 1, Conv, [24, 1, 1]], # 38   
    [-1, 1, Conv, [18, 1, 1]], # 39   [batch, 18, 320, 320] -> [160]

    [[39, 33, 27], 1, Detect, [nc, anchors]], 


  ]

The arrows in the file just denote the change on size I have made in this layer from a previous version, which is not important in this issue.
My problem is, As I defined in layer 27, 33 and 39, these three layers should output a 18 channel feature map, respectively. However, in my experiment, where I run detect.py with the .pt weight file I get after training, it turns out that the output of these layers are all 24 channels:

Layer 0: torch.Size([1, 8, 320, 320])
Layer 1: torch.Size([1, 8, 320, 320])
Layer 2: torch.Size([1, 8, 320, 320])
Layer 3: torch.Size([1, 24, 320, 320])
Layer 4: torch.Size([1, 8, 320, 320])
Layer 5: torch.Size([1, 8, 320, 320])
Layer 6: torch.Size([1, 40, 320, 320])
Layer 7: torch.Size([1, 40, 160, 160])
Layer 8: torch.Size([1, 8, 160, 160])
Layer 9: torch.Size([1, 8, 160, 160])
Layer 10: torch.Size([1, 48, 160, 160])
Layer 11: torch.Size([1, 48, 80, 80])
Layer 12: torch.Size([1, 16, 80, 80])
Layer 13: torch.Size([1, 16, 80, 80])
Layer 14: torch.Size([1, 96, 80, 80])
Layer 15: torch.Size([1, 96, 80, 80])
Layer 16: torch.Size([1, 24, 80, 80])
Layer 17: torch.Size([1, 24, 80, 80])
Layer 18: torch.Size([1, 144, 80, 80])
Layer 19: torch.Size([1, 144, 40, 40])
Layer 20: torch.Size([1, 40, 40, 40])
Layer 21: torch.Size([1, 40, 40, 40])
Layer 22: torch.Size([1, 80, 40, 40])
Layer 23: torch.Size([1, 224, 40, 40])
Layer 24: torch.Size([1, 48, 40, 40])
Layer 25: torch.Size([1, 48, 40, 40])
Layer 26: torch.Size([1, 40, 40, 40])
Layer 27: torch.Size([1, 24, 40, 40])
Layer 28: torch.Size([1, 224, 80, 80])
Layer 29: torch.Size([1, 272, 80, 80])
Layer 30: torch.Size([1, 24, 80, 80])
Layer 31: torch.Size([1, 24, 80, 80])
Layer 32: torch.Size([1, 24, 80, 80])
Layer 33: torch.Size([1, 24, 80, 80])
Layer 34: torch.Size([1, 272, 160, 160])
Layer 35: torch.Size([1, 312, 160, 160])
Layer 36: torch.Size([1, 24, 160, 160])
Layer 37: torch.Size([1, 24, 160, 160])
Layer 38: torch.Size([1, 24, 160, 160])
Layer 39: torch.Size([1, 24, 160, 160])

What makes it more weird is that, in my previous version, just as I mentioned above,

nc: 1 # number of classes
depth_multiple: 1.0 # model depth multiple
width_multiple: 1.0 # layer channel multiple
anchors:
  - [4, 6, 7, 10, 11, 15]
  - [16, 24, 33, 25, 26, 41]
  - [47, 60, 83, 97, 141, 149]

backbone:
  # [from, number, module, args]
  # args: out_channels, size, stride
  [
    [-1, 1, Conv, [8, 3, 2]],  # 0  [batch, 8, size/2, size/2]
    [-1, 1, DWConv, [8, 3, 1]], # 1 [320]
    [-1, 1, Conv, [4, 1, 1 ]], # 2  [320]
    [-1, 1, Conv, [24, 1, 1]], # 3 [-1, 1, DWConv, [24, 3, 2]] # 4
    [-1, 1, Conv, [6, 1, 1]], # 4
    [-1, 1, Bottleneck3, [6]], # 5

    [-1, 1, Conv, [36, 1, 1]], # 6  
    [-1, 1, DWConv, [36, 3, 2]], # 7  [160]
    [-1, 1, Conv, [8, 1, 1]], # 8
    [-1, 2, Bottleneck3, [8]], # 9
    
    [-1, 1, Conv, [48, 1, 1]], # 10 
    [-1, 1, DWConv, [48, 3, 2]], # 11 [80]
    [-1, 1, Conv, [16, 1, 1]], # 12
    [-1, 3, Bottleneck3, [16]], # 13

    [-1, 1, Conv, [96, 1, 1]], # 14
    [-1, 1, DWConv, [96, 3, 1]], # 15
    [-1, 1, Conv, [24, 1, 1]], # 16
    [-1, 2, Bottleneck3, [24]], # 17

    [-1, 1, Conv, [144, 1, 1]], # 18    [80]
    [-1, 1, DWConv, [144, 3, 2]], # 19  [40]
    [-1, 1, Conv, [40, 1, 1]], # 20
    [-1, 2, Bottleneck3, [40]], # 21 [batch, 40, size/16, size/16]
  ]

head: [
    [-1, 1, Conv, [80, 1, 1]], # 22 
    [-1, 1, nn.Upsample, [None, 2, "nearest"]],  # 23 [1, 80, 80, 80]
    [[-1, -6], 1, Concat, [1]], # 24  [batch, 224, size/8, size/8]

    [-1, 1, Conv, [48, 1, 1]], # 25
    [-1, 1, DWConv, [48, 3, 1]], # 26
    [-1, 1, Conv, [36, 1, 1]], # 27
    [-1, 1, Conv, [18, 1, 1]], # 28   [batch, 18, size/8, size/8]
    
    [-5, 1, nn.Upsample, [None, 2, "nearest"]],  # 29 
    [[-1, 10], 1, Concat, [1]],  # 30 
    [-1, 1, Conv, [24, 1, 1]], # 31
    [-1, 1, DWConv, [24, 3, 1]], # 32 
    [-1, 1, Conv, [24, 1, 1]], # 33   
    [-1, 1, Conv, [18, 1, 1]], # 34 [batch, 18, 160, 160]

    [-5, 1, nn.Upsample, [None, 2, "nearest"]],  # 35 [1, 272, 320, 320]
    [[-1, 6], 1, Concat, [1]],  # 36  
    [-1, 1, Conv, [18, 1, 1]], # 37   
    [-1, 1, DWConv, [18, 3, 1]], # 38 
    [-1, 1, Conv, [24, 1, 1]], # 39   
    [-1, 1, Conv, [18, 1, 1]], # 40   [batch, 18, 320, 320]

    [[40, 34, 28], 1, Detect, [nc, anchors]], 


  ]

Which is a similar one except that the stride of some of the convolution layers are different with that in the latter version. By the way, I made these changes only to reduce the size of feature maps from 80, 160, 320 to 40, 80, 160 in order to have a better performance on the edge device. In this version, on the contrary, it outputs three feature maps with channel 18 each.

Layer 0: torch.Size([1, 8, 320, 320])
Layer 1: torch.Size([1, 8, 320, 320])
Layer 2: torch.Size([1, 8, 320, 320])
Layer 3: torch.Size([1, 24, 320, 320])
Layer 4: torch.Size([1, 8, 320, 320])
Layer 5: torch.Size([1, 8, 320, 320])
Layer 6: torch.Size([1, 40, 320, 320])
Layer 7: torch.Size([1, 40, 160, 160])
Layer 8: torch.Size([1, 8, 160, 160])
Layer 9: torch.Size([1, 8, 160, 160])
Layer 10: torch.Size([1, 48, 160, 160])
Layer 11: torch.Size([1, 48, 80, 80])
Layer 12: torch.Size([1, 16, 80, 80])
Layer 13: torch.Size([1, 16, 80, 80])
Layer 14: torch.Size([1, 96, 80, 80])
Layer 15: torch.Size([1, 96, 80, 80])
Layer 16: torch.Size([1, 24, 80, 80])
Layer 17: torch.Size([1, 24, 80, 80])
Layer 18: torch.Size([1, 144, 80, 80])
Layer 19: torch.Size([1, 144, 40, 40])
Layer 20: torch.Size([1, 40, 40, 40])
Layer 21: torch.Size([1, 40, 40, 40])
Layer 22: torch.Size([1, 80, 40, 40])
Layer 23: torch.Size([1, 80, 80, 80])
Layer 24: torch.Size([1, 224, 80, 80])
Layer 25: torch.Size([1, 48, 80, 80])
Layer 26: torch.Size([1, 48, 80, 80])
Layer 27: torch.Size([1, 40, 80, 80])
Layer 28: torch.Size([1, 18, 80, 80])
Layer 29: torch.Size([1, 224, 160, 160])
Layer 30: torch.Size([1, 272, 160, 160])
Layer 31: torch.Size([1, 24, 160, 160])
Layer 32: torch.Size([1, 24, 160, 160])
Layer 33: torch.Size([1, 24, 160, 160])
Layer 34: torch.Size([1, 18, 160, 160])
Layer 35: torch.Size([1, 272, 320, 320])
Layer 36: torch.Size([1, 312, 320, 320])
Layer 37: torch.Size([1, 18, 320, 320])
Layer 38: torch.Size([1, 18, 320, 320])
Layer 39: torch.Size([1, 24, 320, 320])
Layer 40: torch.Size([1, 18, 320, 320])

I wonder what makes the output channel different? It seems that the yolov3 framework has modified the last several layers of the new model automatically... Since the output channel of last four layers, in my design is 18, 18, 24, 24. However in the first txt file I showed above, it's 24, 24, 24, 24. Why does this change take place?

Additional

No response

The text was updated successfully, but these errors were encountered:

UltralyticsAssistant · 2024-11-07T06:28:20Z

👋 Hello @tobymuller233, thank you for reaching out to us with your concern about the feature map channels in YOLOv5! 🚀 This is an automated response to acknowledge your issue and provide some guidance, and an Ultralytics engineer will be with you shortly to assist further.

For your configuration and layer setup, it's crucial to ensure compatibility and consistency across layers. It seems there might be a discrepancy between the defined model and the output you're observing when running detect.py. To help us investigate further, could you please provide a minimum reproducible example (MRE)? This example should include any relevant scripts, configurations, and the complete command you are using.

In the meantime, you might find our Tutorials helpful, which cover various aspects of customizing and troubleshooting YOLOv5 models, including insights into Custom Data Training and similar issues.

Quick Check

Ensure you meet the following requirements and setup steps:

Python>=3.8.0 with all requirements.txt installed, including PyTorch>=1.8.

Clone the repository and install dependencies:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environment Options

Feel free to run your experiments in any of the verified environments provided, like:

Google Colab or Paperspace, which offer free GPU support.
Google Cloud, Amazon AWS, or via a Docker Image. Links for quickstart guides are available within the repository documentation.

We look forward to your detailed input to facilitate an effective resolution. Meanwhile, keep exploring and experimenting! 😊

tobymuller233 · 2024-11-08T08:34:40Z

I found that this is because I accidentally set anchors in hyp.yaml as 2.0, which should be 3.0 instead. 🤣

pderrenger · 2024-11-08T20:26:32Z

@tobymuller233 thank you for identifying the issue with the anchor settings in your hyp.yaml. If you have any further questions or need assistance, feel free to ask!

tobymuller233 added the question Further information is requested label Nov 7, 2024

UltralyticsAssistant added the detect Object Detection issues, PR's label Nov 7, 2024

tobymuller233 closed this as completed Nov 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature map channel not same as what I defined #13402

Feature map channel not same as what I defined #13402

tobymuller233 commented Nov 7, 2024

UltralyticsAssistant commented Nov 7, 2024

tobymuller233 commented Nov 8, 2024

pderrenger commented Nov 8, 2024

Feature map channel not same as what I defined #13402

Feature map channel not same as what I defined #13402

Comments

tobymuller233 commented Nov 7, 2024

Search before asking

Question

Additional

UltralyticsAssistant commented Nov 7, 2024

Quick Check

Environment Options

tobymuller233 commented Nov 8, 2024

pderrenger commented Nov 8, 2024