Skip to content

Visualizing gradients tutorial (issue #3186) #3389

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

j-silv
Copy link

@j-silv j-silv commented Jun 9, 2025

Fixes #3186

Description

Add draft for visualizing gradients tutorial. Link is here but the content is old and the files need to be re-built.

Checklist

  • The issue that is being fixed is referred in the description (see above "Fixes #ISSUE_NUMBER")
  • Only one issue is addressed in this pull request
  • Labels from the issue that this PR is fixing are added to this pull request
  • No unnecessary issues are included into this pull request.

Copy link

pytorch-bot bot commented Jun 9, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/3389

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 9887ab8 with merge base b5637fa (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@github-actions github-actions bot added advanced docathon-h1-2025 A label for the docathon in H1 2025 hard hard label for docathon tutorial-proposal labels Jun 9, 2025
@sekyondaMeta
Copy link
Contributor

sekyondaMeta commented Jun 10, 2025

Generally seems to be headed in the right direction in terms of tone and organization from my perspective.
Can you add perquisite knowledge for this.

@sekyondaMeta sekyondaMeta requested review from svekars and albanD June 10, 2025 14:04
@svekars svekars requested a review from soulitzer June 10, 2025 17:19
Copy link
Contributor

@soulitzer soulitzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the working on this tutorial. Overall I'd say though that this section (prior to the actual visualizing gradients part) can be much shorter.

By the end of this tutorial, you will be able to:

Differentiate between leaf and non-leaf tensors

have a diagram from https://github.com/szagoruyko/pytorchviz, point to the leafs

Know when to use\ retain_grad vs. ``require_grad`

"use requires_grad for leaf, use retain_grad for non-leaf"

@j-silv
Copy link
Author

j-silv commented Jun 13, 2025

Thank you for the comments, they were really helpful. Let me know if you think the first section is still too long.

Concerning the "visualizing gradients" section with an actual example, I'm not sure if I'm going about retaining the gradients for intermediate tensors correctly. My thought process was to use a forward hook, call retain_grad() on the output tensor of that module, and then store that output tensor in a list. Later, after calling loss.backward(), I could then pluck out the grad attribute of that tensor and plot it.

Initially I tried using a backward pass hook like register_full_backward_hook() but this didn't work because the ResNet model performs some inplace operations (i.e. ReLU and one += addition) and PyTorch complains about it:

RuntimeError: Output 0 of BackwardHookFunctionBackward is a view and is being modified inplace. This view was created inside a custom Function (or because an input was returned as-is) and the autograd logic to handle view+inplace would override the custom backward associated with the custom Function, leading to incorrect gradients. This behavior is forbidden. You can fix this by cloning the output of the custom Function.

I know that I can plot the gradients for the parameters by just looping through the named_parameters() but I would like to also plot the gradients for the intermediate tensors.

If anyone sees a problem with my method let me know. The current state of the code isn't doing what I expected so I still have to debug it.

EDIT 1: I stumbled upon this issue. Perhaps it's better to switch to using tensor hooks as suggested by alban, instead of storing the outputs through a forward pass and then later accessing their .grad

EDIT 2: I decided to not use ResNet but instead a simplified fully connected network as explained in the BatchNorm paper. It is purely for educative purposes, but it actually shows the results I was expecting. With the ResNet implementation, I believe that the residual connections and ReLU non-linearity are muddying the negative effect on the gradients if they don't have BatchNorm. I'll push an updated PR sometime today.

@j-silv j-silv force-pushed the 3186-gradient-tutorial branch from 0b9f56a to cc1aa32 Compare June 15, 2025 21:41
@j-silv j-silv changed the title Add work-in-progress for visualizing gradients tutorial (issue #3186) Visualizing gradients tutorial (issue #3186) Jun 15, 2025
@sekyondaMeta sekyondaMeta requested a review from soulitzer June 17, 2025 14:25
@sekyondaMeta
Copy link
Contributor

@soulitzer what are your thoughts on these new updates?

Copy link
Contributor

@soulitzer soulitzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates! Looking pretty good, but added some comments on wording, etc.

@albanD albanD removed their request for review June 20, 2025 21:40
@j-silv j-silv force-pushed the 3186-gradient-tutorial branch from f7f365b to 88e03c1 Compare June 23, 2025 17:39
@j-silv
Copy link
Author

j-silv commented Jun 23, 2025

@soulitzer here's my latest draft. Let me know if you would like anything changed!

@j-silv j-silv force-pushed the 3186-gradient-tutorial branch from 88e03c1 to 5d44d2f Compare June 25, 2025 18:19
@j-silv j-silv requested a review from soulitzer June 27, 2025 18:57
@j-silv
Copy link
Author

j-silv commented Jul 10, 2025

@soulitzer sorry to bug you as I know you are busy! When you get a chance can you review these changes? I'd love to get the tutorial merged :)

Copy link
Contributor

@soulitzer soulitzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, and thanks for your patience addressing the comments!
Overall, I think this is quite good!
My only comment would be that it should actually be two separate tutorials, since now the second part is kind of unrelated to the first now.
Would you mind splitting it? Sorry for the churn here.

@j-silv
Copy link
Author

j-silv commented Jul 10, 2025

Sorry for the delay, and thanks for your patience addressing the comments! Overall, I think this is quite good! My only comment would be that it should actually be two separate tutorials, since now the second part is kind of unrelated to the first now. Would you mind splitting it? Sorry for the churn here.

No worries I can do that!

What difficulty do you think there should be for each tutorial? Also should I open up a separate PR for one of the splits?

@soulitzer
Copy link
Contributor

What difficulty do you think there should be for each tutorial? Also should I open up a separate PR for one of the splits?

Maybe beginner for both? WDYT

Also should I open up a separate PR for one of the splits?

Yes, thanks!

@j-silv
Copy link
Author

j-silv commented Jul 11, 2025

Maybe beginner for both? WDYT

I took a look through most of the tutorials on the PyTorch docs and I think that beginner is reasonable. I could maybe see intermediate for the gradients one, but I'm on the fence. @sekyondaMeta do you think beginner for both?

@soulitzer
Copy link
Contributor

could maybe see intermediate for the gradients one, but I'm on the fence.

Either is fine to me!

@j-silv j-silv force-pushed the 3186-gradient-tutorial branch from 66cfc21 to b45b2ae Compare July 25, 2025 00:31
The original PRs had a combined visualizing gradients tutorial and a section
on understanding leaf vs non-leaf and requires_grad vs. retain_grad. I've broken
up these two and the latter is in a second PR.

I moved the visualizing gradients tutorial into the intermediate section.

Another change I made from the last PR is renaming the forward/backward hook
functions to be more clear.
@j-silv j-silv force-pushed the 3186-gradient-tutorial branch from b45b2ae to 9887ab8 Compare July 25, 2025 01:01
j-silv added a commit to j-silv/tutorials that referenced this pull request Jul 25, 2025
…orial

This was originally bundled with PR pytorch#3389, but now broken into two
separate tutorials after discussing with PyTorch team.
j-silv added a commit to j-silv/tutorials that referenced this pull request Jul 25, 2025
…orial

This was originally bundled with PR pytorch#3389, but now broken into two
separate tutorials after discussing with PyTorch team.
j-silv added a commit to j-silv/tutorials that referenced this pull request Jul 25, 2025
…orial

This was originally bundled with PR pytorch#3389, but now broken into two
separate tutorials after discussing with PyTorch team.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
advanced cla signed docathon-h1-2025 A label for the docathon in H1 2025 hard hard label for docathon tutorial-proposal
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Writing a gradient tutorial, focused on leaf vs non leaf tensors.
4 participants