Skip to content

feat(skills): add pytorch-patterns skill#550

Open
code-with-idrees wants to merge 1 commit intoaffaan-m:mainfrom
code-with-idrees:feat/pytorch-patterns
Open

feat(skills): add pytorch-patterns skill#550
code-with-idrees wants to merge 1 commit intoaffaan-m:mainfrom
code-with-idrees:feat/pytorch-patterns

Conversation

@code-with-idrees
Copy link

@code-with-idrees code-with-idrees commented Mar 17, 2026

Summary

Adds a comprehensive pytorch-patterns skill covering best practices for PyTorch development.
Includes device-agnostic code, reproducibility, model patterns, training loop setups (with AMP constraints), DataLoader best practices, caching gradients, and common anti-patterns.
This serves as the knowledge foundation complementary to the pytorch-build-resolver agent.

Type

  • Skill

Testing

Tested formatting and structure against existing skills (python-patterns, golang-patterns). Code examples verified for syntax and logic.

Checklist

  • Follows format guidelines
  • Tested with Claude Code
  • No sensitive info (API keys, paths)
  • Clear descriptions

Summary by cubic

Adds the pytorch-patterns skill with clear, practical PyTorch best practices for models, training, data, performance, and checkpointing. This complements the pytorch-build-resolver by providing a reference for idiomatic, reproducible code.

  • New Features
    • Device-agnostic patterns and full reproducibility setup.
    • Clean nn.Module structure and explicit weight init.
    • Training/validation loops with AMP, grad clipping, and zero_grad(set_to_none=True).
    • DataLoader best practices (workers, pinning, persistence) and custom collate for variable-length data.
    • Checkpointing that saves full training state and uses weights_only=True on load.
    • Performance tips: AMP via torch.amp, gradient checkpointing, and torch.compile.
    • Quick-reference idioms and common anti-patterns to avoid.

Written for commit 958c81e. Summary will update on new commits.

Summary by CodeRabbit

  • Documentation
    • Added comprehensive guide for PyTorch development patterns covering core design principles (device-agnostic code, reproducibility), model architecture patterns, training and evaluation loops, data pipeline configuration, and checkpointing strategies.
    • Includes performance optimization techniques, code examples illustrating best practices, anti-patterns to avoid, and guidance on profiling and resource monitoring.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 17, 2026

📝 Walkthrough

Walkthrough

A new documentation file detailing PyTorch development patterns has been added to the skills directory. It covers core design principles, model architecture, training loops, data pipelines, checkpointing, performance optimizations, and practical idioms with code examples and anti-patterns.

Changes

Cohort / File(s) Summary
PyTorch Patterns Documentation
skills/pytorch-patterns/SKILL.md
Added comprehensive guide covering PyTorch development patterns including device-agnostic code, nn.Module structure, training/evaluation loops, custom datasets, checkpointing strategies, mixed precision training, and common pitfalls with code examples.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

  • affaan-m

Poem

🐰 Hoppy patterns emerge from PyTorch's deep,
Devices dance with reproducible leaps,
Training loops bound with checkpoints so neat,
Performance optimizations—no retreat! 🚀
A skill-filled feast for learners to keep!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(skills): add pytorch-patterns skill' directly and clearly summarizes the main change: adding a new PyTorch patterns skill documentation.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

CodeRabbit can use oxc to improve the quality of JavaScript and TypeScript code reviews.

Add a configuration file to your project to customize how CodeRabbit runs oxc.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@skills/pytorch-patterns/SKILL.md`:
- Around line 11-17: Rename the "When to Activate" heading to "When to Use" and
add a new "How It Works" section that briefly describes the skill's approach to
common PyTorch patterns (e.g., model/module structuring, training loop flow,
optimizer/scheduler handling, device/GPU management, and reproducibility
mechanisms), then reorganize the existing examples under a clearly labeled
"Examples" section (group snippets into labeled sub-examples if needed). Locate
and update the headings in SKILL.md (the current "When to Activate" block),
insert the "How It Works" explanatory paragraph between "When to Use" and
"Examples", and move or relabel the code snippets so they appear under the
"Examples" header for clarity.
- Line 284: The use of weights_only=True in the torch.load call (checkpoint =
torch.load(path, map_location="cpu", weights_only=True")) requires PyTorch 2.0+;
update the documentation to explicitly state this version requirement (either
add a Prerequisites bullet or an inline comment near the torch.load line)
similar to the existing note for torch.compile so users running PyTorch < 2.0
won't hit a TypeError.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 27ba21bd-bdbc-49ae-92a9-7bc584788899

📥 Commits

Reviewing files that changed from the base of the PR and between 7cf07ca and 958c81e.

📒 Files selected for processing (1)
  • skills/pytorch-patterns/SKILL.md

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 17, 2026

Greptile Summary

This PR adds a new skills/pytorch-patterns/SKILL.md skill that provides comprehensive PyTorch best practices covering device-agnostic code, reproducibility, model architecture, training/validation loops with AMP, data loading, checkpointing, gradient checkpointing, torch.compile, and anti-patterns. The skill follows the established repo format (YAML frontmatter, ## When to Activate, Good/Bad code examples, a quick-reference table) and is well within the 500-line limit at 396 lines.

Three issues were found:

  • Hardcoded device string in AMP examples (P1): Both the train_one_epoch function (line 152) and the standalone AMP "Good" example (lines 294–296) hardcode "cuda" in torch.amp.autocast and torch.amp.GradScaler, even though the function already accepts a device parameter and the skill's first principle is device-agnostic code. On MPS (Apple Silicon) or other non-CUDA backends this will fail when AMP is enabled.
  • Inplace ReLU contradiction (P2): nn.ReLU(inplace=True) is used inside the "Good: Well-organized module" example (line 87) but is simultaneously flagged as a "Bad" anti-pattern in the Anti-Patterns section (line 358), with no explanation of the nuance. This will confuse readers.
  • Incomplete reproducibility setup (P2): The set_seed function is labelled "Full reproducibility setup" but omits torch.use_deterministic_algorithms(True), which is required in PyTorch ≥ 1.8 to prevent non-deterministic CUDA operations not covered by cudnn.deterministic.

Confidence Score: 3/5

  • Safe to merge after fixing the hardcoded device string in AMP examples and resolving the inplace ReLU contradiction, as these would mislead users following the skill's guidance.
  • The skill is well-structured and adds genuine value. The format is correct, the content breadth is good, and most advice is sound. The confidence deduction comes from a P1 correctness issue (hardcoded "cuda" contradicts the skill's own first principle and will cause runtime errors on non-CUDA devices) and two P2 consistency/accuracy issues (inplace ReLU contradiction and incomplete reproducibility setup).
  • skills/pytorch-patterns/SKILL.md — three issues in the code examples: AMP device string, inplace ReLU contradiction, and set_seed completeness.

Important Files Changed

Filename Overview
skills/pytorch-patterns/SKILL.md New skill covering PyTorch best practices. Well-structured and comprehensive, but has three issues: (1) hardcoded "cuda" device string in the training loop and AMP example directly contradicts the first principle of device-agnostic code; (2) nn.ReLU(inplace=True) is used in a "Good" example yet flagged as a "Bad" anti-pattern later, creating a confusing contradiction; (3) the set_seed function is labelled "Full reproducibility setup" but omits torch.use_deterministic_algorithms(True) required in modern PyTorch for true determinism.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Training Script] --> B{scaler is not None?}
    B -- Yes AMP enabled --> C["torch.amp.autocast(device.type)"]
    B -- No FP32 --> D[Standard forward pass]
    C --> E[model forward + loss]
    D --> E
    E --> F{AMP enabled?}
    F -- Yes --> G["scaler.scale(loss).backward()"]
    F -- No --> H["loss.backward()"]
    G --> I["scaler.unscale_(optimizer)"]
    I --> J["clip_grad_norm_()"]
    J --> K["scaler.step(optimizer)"]
    K --> L["scaler.update()"]
    H --> M["clip_grad_norm_()"]
    M --> N["optimizer.step()"]
    L --> O["optimizer.zero_grad(set_to_none=True)"]
    N --> O
    O --> P[Accumulate loss]
    P --> Q{More batches?}
    Q -- Yes --> E
    Q -- No --> R[Return avg loss]
Loading

Last reviewed commit: 958c81e

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 issues found across 1 file

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="skills/pytorch-patterns/SKILL.md">

<violation number="1" location="skills/pytorch-patterns/SKILL.md:41">
P1: The doc claims a “Full reproducibility setup” but omits multi-worker DataLoader seeding guidance (`worker_init_fn`/`generator`), making reproducibility incomplete for the recommended `num_workers=4` pattern.</violation>

<violation number="2" location="skills/pytorch-patterns/SKILL.md:48">
P2: If this helper is presented as a full reproducibility setup, also enable deterministic algorithms; CuDNN flags alone can still allow non-deterministic kernels and lead to run-to-run differences.</violation>

<violation number="3" location="skills/pytorch-patterns/SKILL.md:152">
P2: AMP examples hardcode CUDA (`"cuda"`) even though the skill presents the surrounding training patterns as device-agnostic, leading to inconsistent/non-portable guidance.</violation>

<violation number="4" location="skills/pytorch-patterns/SKILL.md:222">
P2: Dataset example advertises `torch.Tensor` output but can return a PIL image when `transform` is `None`, violating its own type contract.</violation>

<violation number="5" location="skills/pytorch-patterns/SKILL.md:264">
P2: Checkpoint example overclaims "all training state" while only persisting model/optimizer/epoch/loss, which can mislead users into incomplete resume behavior.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant