I need a simple explanation of the models in SLEAP and the Max Stride parameter #2130

JULIANEMI · 2025-03-02T00:34:40Z

JULIANEMI
Mar 2, 2025

I have trained multiple models with mixed results—some good, some bad. However, I am wondering whether all models are pretrained. Based on the information I have gathered, it seems likely. Could you confirm if this is correct?

Models:

baseline_medium_rf.single
→ A pretrained medium-sized model that uses a moderate number of filters and layers.
→ Uses RefineNet (rf) to improve image accuracy.
→ Designed to track a single animal per frame.

baseline_large_rf.single
→ A larger model with more filters and layers than the medium version.
→ May be more accurate, but requires more memory and computation time.
→ Also uses RefineNet and detects a single animal per frame.

pretrained.single
→ A model already trained on another dataset.
→ Can be used without retraining from scratch, saving time.
→ Also detects a single animal per frame.

Max Stride parameter:

I understand that Max Stride controls how much the network downscales the image at each step:

A high value (e.g., 16) means the image is downsampled more, making the network faster but potentially losing details.
A low value (e.g., 4) means the image is kept larger, allowing for more details, but increasing memory and computation requirements.
However, in my case, a lower Max Stride consumes less memory. If I set it higher, I get an error because the GPU runs out of memory. Is this expected behavior in SLEAP?

talmo · 2025-03-13T23:38:54Z

talmo
Mar 13, 2025
Maintainer

Hi @JULIANEMI,

Apologies for the delay here! I told the team I'd take this question but then totally dropped the ball on it.

I have trained multiple models with mixed results—some good, some bad. However, I am wondering whether all models are pretrained. Based on the information I have gathered, it seems likely. Could you confirm if this is correct?

None of the default models in SLEAP are pretrained. There are the "pretrained encoder" backbone models that do include ImageNet-pretrained checkpoints for standard architectures, but these typically underperform relative to the UNet.

baseline_medium_rf.single
→ A pretrained medium-sized model that uses a moderate number of filters and layers.
→ Uses RefineNet (rf) to improve image accuracy.
→ Designed to track a single animal per frame.

baseline_large_rf.single
→ A larger model with more filters and layers than the medium version.
→ May be more accurate, but requires more memory and computation time.
→ Also uses RefineNet and detects a single animal per frame.

Not sure where this came from, but we don't use RefineNet in SLEAP at all -- the "RF" refers to the receptive field size of the architecture. As presented in Fig. 4 of the SLEAP paper, we found that varying the architecture parameters that control the receptive field size had a major impact on model performance, so we provide a few preset variants out of the box.

In your case, you probably want to use the large RF variants.

pretrained.single
→ A model already trained on another dataset.
→ Can be used without retraining from scratch, saving time.
→ Also detects a single animal per frame.

As mentioned above, the pretrained models have an ImageNet-pretrained encoder, but you still need to train it to predict the specific keypoints you're trying to annotate.

Max Stride parameter:

I understand that Max Stride controls how much the network downscales the image at each step:

A high value (e.g., 16) means the image is downsampled more, making the network faster but potentially losing details.
A low value (e.g., 4) means the image is kept larger, allowing for more details, but increasing memory and computation requirements.
However, in my case, a lower Max Stride consumes less memory. If I set it higher, I get an error because the GPU runs out of memory. Is this expected behavior in SLEAP?

Right so the max stride determines the number of downsampling blocks in the encoder portion of the model, which in turn determines the max receptive field size, which has a big impact on performance.

You do lose resolution as this increases, but this is recovered by the decoder (upsampling stack). This doesn't mean you lose resolution of the original image -- it just means that the spatial resolution of the image features gets coarser the deeper you go in the network. Technically, increasing the max stride adds more layers -- and therefore computation and memory requirements.

What you're thinking of is the input scale, which rescales your images down before any processing happens. Given the resolution of your images, I'd recommend setting this to 0.5 to mitigate GPU memory issues. You can also increase the output stride to 4 or 8, which makes the keypoint localization less precise but decreases memory usage.

I'd also recommend double checking your annotations. I see in your image that you have a lot of nodes marked as "not visible" (grey box, italicized text). These will be considered as not present in the image regardless of where they're placed. This will have a huge impact on the performance. You can toggle the visibility by right clicking on those nodes.

Let us know if you're confused about any of this, happy to clarify!

Cheers,

Talmo

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I need a simple explanation of the models in SLEAP and the Max Stride parameter #2130

{{title}}

Replies: 1 comment

{{title}}

Select a reply

I need a simple explanation of the models in SLEAP and the Max Stride parameter #2130

JULIANEMI Mar 2, 2025

Replies: 1 comment

talmo Mar 13, 2025 Maintainer

JULIANEMI
Mar 2, 2025

talmo
Mar 13, 2025
Maintainer