Skip to content

Bump 3rdparty/NeMo from b685967 to b5952a6#1

Closed
dependabot[bot] wants to merge 484 commits intomainfrom
dependabot/submodules/main/3rdparty/NeMo-b5952a6
Closed

Bump 3rdparty/NeMo from b685967 to b5952a6#1
dependabot[bot] wants to merge 484 commits intomainfrom
dependabot/submodules/main/3rdparty/NeMo-b5952a6

Conversation

@dependabot
Copy link
Copy Markdown

@dependabot dependabot bot commented on behalf of github Mar 31, 2026

Bumps 3rdparty/NeMo from b685967 to b5952a6.

Commits
  • b5952a6 ci: upgrade GitHub Actions for Node.js 24 compatibility (#15537)
  • 29f3884 Ignore PnC for WER calculation: streaming ASR inference (#15550)
  • b83d1db Rename index for attention prior weights (#15551)
  • db8e21d ci: Update docs build job to exclude cu12 extra (#15553)
  • 7a81456 docs: Fix docs build by setting uv conflicts for cu12 vs cu13 (#15548)
  • 801051a [TTS][MagpieTTS] Remove HF dependencies from CI Tests (#15544)
  • da57b01 Add support for partial transcription prefix in the prompt (#15449)
  • c12a31d Changed the documentation getting started structure (#15460)
  • ed49067 Fix IsADirectoryError when cleaning up unfinished distributed checkpoints (#1...
  • d9e94a1 Clean ASR older and non-maintained models and its respective modules (#15507)
  • Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

polinabinder1 and others added 30 commits November 1, 2024 11:57
I have added functionality for paginated loading of an anndata file.
This enables building up an in-memory sparse matrix structure of the
file when it is converted into the SingleCellMemMapDataset structure.
This used in the case where the anndata file will not fit into memory.

The user can specify a file size cutoff of paginated_load_cutoff to
specify the minimum anndata file size at which paginated loading will
occur. load_block_size refers to the number rows (cells) that will be
loaded into memory at a given time.

**A line by line analysis of memory usage with regular data loading vs.
the paginated data-loading with a block-size of 100_000 on an h5ad file
of 967 MB:**

[memory.txt](https://github.com/user-attachments/files/17514163/memory.txt)

 
Regular data-loading adds 4825.8 MB to memory. The majority of this
(3258.4 MB) is during the loading of the entire h5ad file into memory.
The paginated data-loading with a block size of 100_000 adds at most
1382 MB to memory. A block size of 10_000 adds at most 297.6 MB to
memory.

This is a line-by-line breakdown of regular loading vs. paginated
loading with a block size of 100_000.

---------

Signed-off-by: polinabinder1 <pbinder@nvidia.com>
Co-authored-by: Steven Kothen-Hill <148821680+skothenhill-nv@users.noreply.github.com>
Co-authored-by: Malcolm Greaves <malcolmgreaves@users.noreply.github.com>
ESM2 Inference Script
## Summary
Inference+fine-tuning notebook and associated scripts covering the 10M
and 106M geneformer models.

## Changes
* New notebook for geneformer inference+fine-tuning that runs/passes
tests in CI
* Move scripts for geneformer training and inference into the geneformer
sub-package, and install them in the CLI.

Depends on #384

---------

Signed-off-by: Farhad Ramezanghorbani <farhad.ghorbani@gmail.com>
Co-authored-by: Farhad Ramezanghorbani <farhad.ghorbani@gmail.com>
Co-authored-by: Peter St. John <pstjohn@nvidia.com>
Co-authored-by: Farhad Ramezanghorbani <farhadrgh@users.noreply.github.com>
Documentation for an example model along with python training scripts.

---------

Signed-off-by: polinabinder1 <pbinder@nvidia.com>
Co-authored-by: Peter St. John <pstjohn@nvidia.com>
Puts a slimmed-down copy of the internal `infra-bionemo` into the `internal/` directory. This contains the original `license_check.py` program, functionality for creating new bionemo `sub-packages`, and functionality to create new standalone Python projects. New Python projects can either be namespaced (PEP 420) or not (aka "simple"). Adds a new GitHub CI stage for running `pytest` & calculates code coverage in `infra-bionemo`.

New bionemo sub-package creation is exposed as the CLI program `create-bionemo-project`. New namespaced project creation is handled by the CLI tool `create-namespaced-project`. Simple projects are created with `create-py-project`.

Developers **MUST** install this new internal Python project by executing `pip install -e internal/infra-bionemo`.
The top-level bionemo meta package (`pyproject.toml`) has been updated to add `infra-bionemo` to the workspace.

Note that the license check script has moved to `infra-bionemo`. It is now accessible as a standalone CLI program: `license-check`.

---------

Signed-off-by: Malcolm Greaves <mgreaves@nvidia.com>
Signed-off-by: Dorota Toczydlowska <115542912+dorotat-nv@users.noreply.github.com>
This is to address QA Bug https://nvbugspro.nvidia.com/bug/4946953
By changing the output of predict method from a `list` to `dict` by
using `batch_collator`
## Summary
Introduces a pydantic based configuration and execution of bionemo2.

Pydantic based configuration introduces a few changes (and some
unresolved changes) to our execution and configuration workflow.

1) all model submodules should have their own entrypoints for generating
a json (alternatively, we can commit config templates).
2) all model submodules should also have their own `train` emptypoint.
If these are BioBertModels, they may use the `train` function defined in
`bionemo.llm`

## Core additions for this PR
- config_models.py (base, ESM2, and Geneformer). These define the
pydantic models that are ultimately parsed, including the use of
generics for swapping ModelConfig and DataConfigs. The structure of
these is important for identifying which things are coupled.
- main.py this is the argparse entrypoint, which does the underlying
parsing. There is one for each sub-package. There is some weirdness
around defaults that I'd appreciate feedback on.
- sub-packages/bionemo-llm/src/bionemo/llm/train.py - Unified training
entrypoint for both ESM2 and Geneformer. This consumes the pydantic
configs and kicks of a NeMo2 lightning training job. (**IMPORTANT** I
think we can add an inference hook similar to this).
- recipes.py this is a minor feature, used to populate json files that
can be run by users. there isnt much parametric code, but useful to
review for correctness.

## Usage
Substitute the data-dir argument with wherever you house your test data.

```bash
bionemo-geneformer-recipe --dest config.json --data-dir /workspaces/bionemo-fw-ea/data/cellxgene_2023-12-15_small/processed_data
bionemo-geneformer-train --config config.json --model-config-t bionemo.geneformer.run.config_models.ExposedGeneformerPretrainConfig --resume-if-exists=False
```
## Changes
- We now construct 'config' objects that define execution.
- Configs are currently generic over MasterConfig, meaning as long as
the associated configs are provided, it can execute any valid
permutation of model configs, data configs, and parallel configs using
`nemo.lightning.api.llm.train`
- Example of serializing and deserializing configs.
- Example using discriminated unions (sum types) for generalizing over
configs.
- Example of `model_validator` for validating parameters across configs.
- Example where we pair `field_validator` and `field_serializer` to
provide a serde interface for arbitrary types (in this case, activation
functions)
- Patterns for creating families of generic configs
(`ExposedModelConfig[ModelConfigT]` and `DataConfig[DataModuleT]`) by
combining generics and abc interfaces.
- direct test of using the CLI to create/test recipes
- added recipes for ESM2 8m, 650m, 3B
- added recipes for Geneformer 10m, 106m
- added hook for global validators in children of DataConfig and
ExposedModelConfig
- updated README to reflect these changes.

---------

Signed-off-by: Steven Kothen-Hill <148821680+skothenhill-nv@users.noreply.github.com>
Co-authored-by: Farhad Ramezanghorbani <farhadrgh@users.noreply.github.com>
## Summary
Port the geneformer loss eval script over from bionemo1, and update the
model card with new numbers and results.

---------

Signed-off-by: John St. John <jstjohn@users.noreply.github.com>
Co-authored-by: Peter St. John <pstjohn@nvidia.com>
I'm adding a few changes to the SCDL documentation to make it a bit more
clear. I am also adding explanation of the paginated loading.

---------

Signed-off-by: polinabinder1 <pbinder@nvidia.com>
`bionemo-testing` re-exports the same values defined in `__all__`: they
are imported from `bionemo-core` as the implementations have moved.
Note that the tests have also moved.

Now, all sub-packages can use `load` at runtime, not just during tests.

All previous imports of `bionemo.testing.load` have been changed to
`bionemo.core.load`.

Additionally moves over the YAML resource files from bionemo-testing
into bionemo-core & adjusted the `get_all_resources` function.

This PR also fixes an error in the naming convention for
`bionemo-core`'s tests.
Adding a needed underscore.
Corrects the config by setting optional fields to actually be optional.

---------

Signed-off-by: Steven Kothen-Hill <148821680+skothenhill-nv@users.noreply.github.com>
Co-authored-by: Farhad Ramezanghorbani <farhadrgh@users.noreply.github.com>
`bionemo-testing` re-exports the same values defined in `__all__`: they
are imported from `bionemo-core` as the implementations have moved. Note
that the tests have also moved.

Now, all sub-packages can use `load` at runtime, not just during tests.

All previous imports of `bionemo.testing.load` have been changed to
`bionemo.core.load`.

Additionally moves over the YAML resource files from bionemo-testing
into bionemo-core & adjusted the `get_all_resources` function.

This PR also fixes an error in the naming convention for
`bionemo-core`'s tests.
…… (#416)

This fixes a console issue that was showing up:


![image](https://github.com/user-attachments/assets/9f73e71a-95e6-4e62-b748-560acc71f89b)

The issue was because we were referencing a custom font (`NVIDIA Sans`)
via the `text: ` field in the mkdocs.yml. Custom fonts should be handled
in custom stylesheets (which they currently are) - referencing a font in
the string will try to grab it from Google Fonts, hence the weird issue.

I also formatted the text to use double-quotes.

Co-authored-by: Tyler Shimko <tshimko@nvidia.com>
Add Frequently Asked Questions section to docs.

---------

Signed-off-by: Tyler Shimko <tshimko@nvidia.com>
Small update to default settings for `mike` docs build tool.
Uses `nest_asyncio` to allow this to run in a jupyter notebook, and
calls `client.configure()` to ensure that the output type is set to
`json`.
See slack thread
https://nvidia.slack.com/archives/C074Z808N05/p1730301123648729

The main issue I was hitting was the setting `SBATCH --overcommit` in my
personal slurm script. As a side effect of this exploration I did a few
things though which this PR has:
* Identified that a bug in some combination of NeMo2 and Megatron-lm
results in some fused kernels getting regularly re-compiled
* Verified that even with this re-compilation bug, we still have the
best performance when using these kernels
* Added an option for geneformer to turn on the torch debugger that
errors out if a kernel is getting recompiled, this is how I found which
kernels were at fault and where/why the recompilations were happening.


## Performance summary of different settings
| Name | Replicate | Num GPUs | Time per 10 steps | Average Timing |

|-------------------------------|-----------|----------|-------------------|----------------|
| no_recompile | 0 | 1 | 2.487 | 2.50875 |
| no_recompile | 0 | 2 | 2.527 | |
| no_recompile | 1 | 1 | 2.503 | |
| no_recompile | 1 | 2 | 2.518 | |
| fused_bias_act | 0 | 1 | 2.489 | 2.496666667 |
| fused_bias_act | 0 | 2 | 2.514 | |
| fused_bias_act | 1 | 1 | 2.487 | |
| fused_bias_act | 1 | 2 | | |
| fused_bias_act_do | 0 | 1 | 2.459 | 2.47425 |
| fused_bias_act_do | 0 | 2 | 2.478 | |
| fused_bias_act_do | 1 | 1 | 2.471 | |
| fused_bias_act_do | 1 | 2 | 2.489 | |
| fused_loss | 0 | 1 | 2.312 | 2.326 |
| fused_loss | 0 | 2 | 2.335 | |
| fused_loss | 1 | 1 | 2.323 | |
| fused_loss | 1 | 2 | 2.334 | |
| fused_bias_do | 0 | 1 | 2.467 | 2.4845 |
| fused_bias_do | 0 | 2 | 2.499 | |
| fused_bias_do | 1 | 1 | 2.472 | |
| fused_bias_do | 1 | 2 | 2.5 | |
| fused_bias_loss | 0 | 1 | 2.282 | 2.28775 |
| fused_bias_loss | 0 | 2 | 2.297 | |
| fused_bias_loss | 1 | 1 | 2.277 | |
| fused_bias_loss | 1 | 2 | 2.295 | |
| fused_bias_loss_arange_expand | 0 | 1 | 2.277 | 2.29075 |
| fused_bias_loss_arange_expand | 0 | 2 | 2.298 | |
| fused_bias_loss_arange_expand | 1 | 1 | 2.285 | |
| fused_bias_loss_arange_expand | 1 | 2 | 2.303 | |
## Summary
Run `python -m bionemo.geneformer.data.singlecell.dataset`
### Baseline:
Processed 31208 rows in 47.10491418838501 seconds
Processed 31208 rows in 47.388004779815674 seconds
### After vectorization: 
Processed 31208 rows in 44.2215359210968 seconds
Processed 31208 rows in 44.54389500617981 seconds

---------

Signed-off-by: John St. John <jstjohn@users.noreply.github.com>
Co-authored-by: Malcolm Greaves <malcolmgreaves@users.noreply.github.com>
Update minimum driver version per @ohadmo's findings that CI stalls with
535, but not with 560.
Cherry picking changes from the release-v.20 branch back into main.

---------

Signed-off-by: Tyler Shimko <tshimko@nvidia.com>
The example notebook depends on pooch, which is not installed.
Additionally, the output to the scdl is written to a permanent directroy
that is not deleted.
Pooch automatically calculates a filename for downloads that is a
function of the hash and the URL, but we want to ensure that downloads
from NGC and PBSS have identical local filenames so that their cache is
shared, and the URLs for both resources don't typically match.

Here we set the fname parameter manually so that we don't download
duplicate objects from ngc and pbss
I realized that the same `SBATCH --overcommit` was getting in the way of
both old and new versions of geneformer slurm logs I was comparing. I
found a run that didn't have this overcommit issue and described
relative performance in the context of that run instead.
See wandb runs here:
https://wandb.ai/clara-discovery/geneformer_bionemo2_timing2

See the results below, we can precisely control whether or not there is
a grad norm instability by setting or unsetting the two NVTE env
variables. Adding the NVTE env variables to our container is a recent
change as well. Based on these results we are unsetting these variables
for now. There is not a significant hit to performance by making this
change.

## Old run where this was not an issue:
<img width="457" alt="Screenshot 2024-11-12 at 9 42 45 AM"
src="https://github.com/user-attachments/assets/7571ec4a-7bf1-4f86-901a-4dc983b53149">

## Representative new run where we see a spike in grad norm
<img width="730" alt="Screenshot 2024-11-12 at 9 43 25 AM"
src="https://github.com/user-attachments/assets/c9069d1d-3cc7-43e3-93d0-1a3ff07ecfe3">

## We can make this spike go away by unsetting `NVTE_FUSED_ATTN` and
`NVTE_FLASH_ATTN`
<img width="731" alt="Screenshot 2024-11-12 at 9 43 44 AM"
src="https://github.com/user-attachments/assets/3883383a-e943-4d26-a12a-956f7240bd45">

## We can introduce this spike on the old image that didn't have these
env variables by setting them
<img width="728" alt="Screenshot 2024-11-12 at 9 44 16 AM"
src="https://github.com/user-attachments/assets/d5daeb16-57be-4e8e-bde6-8b275bf53a46">

## Example longer/larger batch run that fails with these env variables
set
<img width="729" alt="Screenshot 2024-11-12 at 9 45 07 AM"
src="https://github.com/user-attachments/assets/00cdb307-1863-47e1-b93e-3227cbc7259b">

## We can stabilize this run by unsetting these env variables
<img width="729" alt="Screenshot 2024-11-12 at 9 45 30 AM"
src="https://github.com/user-attachments/assets/2cd370e3-5cdc-4385-9294-cdab068d6a8b">




It seems to be relatively recent so this PR is going to test some recent
changes to see if any of them is causing this.

- [x] Check if the arange change is causing this?
- [x] Check if the grad buffer change (should not be enabled) is causing
this
- [x] bias fusions
- [x] garbage collection callback

Find out when this worked:
- [x] PR 409 right before second perf change and dset change
- [x] PR 410 after first perf change, CLI refactor, and wandb fix
- [x] PR 404 right before new CLI
- [x] PR 362 (2 weeks ago) but restarting job before the gradients start
to increase
- [x] PR 362 (2 weeks ago)
- [x] **worked**
https://wandb.ai/clara-discovery/geneformer_bionemo2/runs/0sSIf3tl?nw=nwusernvjstjohn
**worked** uses
`bionemo2-pr312--136b1889fc390d9dad04f077b32b8fbecf50e25d`
- [x] bionemo2-pr312--136b1889fc390d9dad04f077b32b8fbecf50e25d but with
`NVTE_FUSED_ATTN=1` and `NVTE_FLASH_ATTN=0` set in my script **did not
work **
- [x] bionemo2-pr312--136b1889fc390d9dad04f077b32b8fbecf50e25d but with
`NVTE_FUSED_ATTN=1` and `NVTE_FLASH_ATTN=0` `unset` in my script
**WORKED!!**
- [x] bionemo2-pr419--f2599382e4afaf061c9948628f3f72bb8e233fd6 (most
recent PR merged) but manually unsetting `NVTE_FUSED_ATTN=1` and
`NVTE_FLASH_ATTN=0`


Notes on differences between TOT and
`pr312--136b1889fc390d9dad04f077b32b8fbecf50e25d`

- `env` doesn't have `NVTE_FUSED*` env settings. Unclear if slurm script
adds them properly or not.
- `NVTE_FUSED_ATTN` and `NVTE_FLASH_ATTN` are set in
`bionemo2-pr373--db2fe9cc240b12bfaf045654fc5350a7b985c9de` for example.
- in slurm `--export=ALL` is default and passes all env variables.
Perhaps this happens then, so the run where I have those env variables
added might fail if those are causing the issue.
- Successful run was bs=32 vs 64. I'm running a test now that has the
NVTE* settings in the docker script but not in the image.
- This was a closed branch, maybe some key changes didn't make it to
main.
- No `pip freeze` differences pop out that distinguish the branch that
passes from the set that fail.
- NOTE: See the experiments above around `NVTE_FUSED_ATTN=1` and
`NVTE_FLASH_ATTN=0` . I am pretty sure these settings are what cause the
training instability in geneformer. Unsetting them works in the old PR
and setting them causes that old PR to not work with this explosion of
gradients.
- Currently I'm rerunning tests on a TOT branch but calling `unset` in
my script on those variables so that they are removed from the container
env prior to executing the script. If this fixes the TOT training curve
I will feel very confident that this is what's going on, and we can
focus on purging references to these variables from our docs, other than
maybe highlighting how they result in training instability.
## Summary
Fixing all headers to be Apache license, ahead of open source release.

## Details
Should not affect any executed code.

Co-authored-by: John St. John <jstjohn@users.noreply.github.com>
Mounts the local netrc file so wandb credentials are available inside
the container.

Co-authored-by: John St. John <jstjohn@users.noreply.github.com>
Move and refactor ESM2 scripts into a package
dorotat-nv and others added 20 commits March 21, 2025 16:35
### Description
Parallelising testing stages in GitHub CI to run in the same time to
speedup pipeline
Added verify-tests-status as a final job that checks the tests status
and if success or skipped, then success

Checked that everything works
* when if-statement moved to the main body of the job, then there is no
image squashing if the label is not selected, simply
test status job collects test status so we will be able to add it as a
check for pipeline success
* Here is pipeline with skipped notebooks (Verify All Tests Status
passes)

https://github.com/NVIDIA/bionemo-framework/actions/runs/13976419866/job/39135551206
* Here is a pipeline with failed notebooks (Verify All Tests Status
fails since notebooks fail)

https://github.com/NVIDIA/bionemo-framework/actions/runs/13991973092/job/39177843004?pr=768

EDIT:
Removed flag `INCLUDE_NOTEBOOKS_TESTS` since some of the notebooks do
not pass CI pipeline, see
[failed job
logs](https://github.com/NVIDIA/bionemo-framework/actions/runs/13949683812/job/39045773477?pr=768).

Fix is being merged in
NVIDIA/bionemo-framework#743

Current workflow in CI

* run-test jobs run sequentially on one runner
* image squashing need to run before the first (unit tests) job only
** duration: 18 min, see
[log](https://github.com/NVIDIA/bionemo-framework/actions/runs/13971056783/job/39114676552?pr=768)
* the pipeline duration if not SKIP_CI
**minimum: image_squashing_time + duration_L0 ~  50mins,
**maximum: image_squashing_time + duration_L0, + duration_L1 +
duration_docs ~ 2h


![image](https://github.com/user-attachments/assets/af0180f5-ffca-4af6-80ba-457f6e17112c)


Proposed parallelization:


* run-test jobs run in parallel since we have access to multiple runners
* image squashing needs to run before each of the test job 
duration: 18 min, see
[log](https://github.com/NVIDIA/bionemo-framework/actions/runs/13971056783/job/39114676552?pr=768)
* the pipeline duration if not SKIP_CI
**minimum: image_squashing_time + duration_L0 ~  50mins,
**maximum: image_squashing_time + max(duration_L0, duration_L1,
duration_docs) ~1h10mins
* no duration change to the default PR pipeline, other pipelines have
smaller duration



![image](https://github.com/user-attachments/assets/7593a04d-ce1a-4cb5-9232-e825da628c9e)




### Type of changes
<!-- Mark the relevant option with an [x] -->

- [ ]  Bug fix (non-breaking change which fixes an issue)
- [x]  New feature (non-breaking change which adds functionality)
- [ ]  Refactor
- [ ]  Documentation update
- [ ]  Other (please describe):

### CI Pipeline Configuration
Configure CI behavior by applying the relevant labels:

-
[SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci)
- Skip all continuous integration tests
-
[INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests)
- Execute notebook validation tests in pytest
-
[INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests)
- Execute tests labelled as slow in pytest for extensive testing

> [!NOTE]
> By default, the notebooks validation tests are skipped unless
explicitly enabled.

#### Authorizing CI Runs

We use
[copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation)
to manage authorization of CI
runs on NVIDIA's compute resources.

* If a pull request is opened by a trusted user and contains only
trusted changes, the pull request's code will
automatically be copied to a pull-request/ prefixed branch in the source
repository (e.g. pull-request/123)
* If a pull request is opened by an untrusted user or contains untrusted
changes, an NVIDIA org member must leave an
`/ok to test` comment on the pull request to trigger CI. This will need
to be done for each new commit.

### Usage
<!--- How does a user interact with the changed code -->
```python
TODO: Add code snippet
```

### Pre-submit Checklist
<!--- Ensure all items are completed before submitting -->

 - [ ] I have tested these changes locally
 - [ ] I have updated the documentation accordingly
 - [ ] I have added/updated tests as needed
 - [x] All existing tests pass successfully

---------

Signed-off-by: dorotat <dorotat@nvidia.com>
### Description
Turning off TF32 so D3PM tests pass on blackwell

### Type of changes
<!-- Mark the relevant option with an [x] -->

- [ x]  Bug fix (non-breaking change which fixes an issue)

tests pass on computelab

---------

Signed-off-by: Danny <dreidenbach@nvidia.com>
### Description
<!-- Provide a detailed description of the changes in this PR -->

### Type of changes
<!-- Mark the relevant option with an [x] -->

- [ ]  Bug fix (non-breaking change which fixes an issue)
- [ ]  New feature (non-breaking change which adds functionality)
- [ ]  Refactor
- [ ]  Documentation update
- [ ]  Other (please describe):

### CI Pipeline Configuration
Configure CI behavior by applying the relevant labels:

-
[SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci)
- Skip all continuous integration tests
-
[INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests)
- Execute notebook validation tests in pytest
-
[INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests)
- Execute tests labelled as slow in pytest for extensive testing


> [!NOTE]
> By default, the notebooks validation tests are skipped unless
explicitly enabled.

### Usage
<!--- How does a user interact with the changed code -->
```python
TODO: Add code snippet
```

### Pre-submit Checklist
<!--- Ensure all items are completed before submitting -->

 - [ ] I have tested these changes locally
 - [ ] I have updated the documentation accordingly
 - [ ] I have added/updated tests as needed
 - [ ] All existing tests pass successfully

Signed-off-by: Jonathan Mitchell <jomitchell@nvidia.com>
…e_tutorial.ipynb (#779)

### Description
Skipping execution of this notebook in GitHub CI due execution error

`pytest -v --nbval-lax -p no:python
--ignore="docs/docs/user-guide/examples/bionemo-geneformer/geneformer_cellxgene_tutorial.ipynb"
docs/ sub-packages/`

Issue: NVIDIA/bionemo-framework#778

### Type of changes
<!-- Mark the relevant option with an [x] -->

- [x]  Bug fix (non-breaking change which fixes an issue)
- [ ]  New feature (non-breaking change which adds functionality)
- [ ]  Refactor
- [ ]  Documentation update
- [ ]  Other (please describe):

### CI Pipeline Configuration
Configure CI behavior by applying the relevant labels:

-
[SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci)
- Skip all continuous integration tests
-
[INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests)
- Execute notebook validation tests in pytest
-
[INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests)
- Execute tests labelled as slow in pytest for extensive testing

> [!NOTE]
> By default, the notebooks validation tests are skipped unless
explicitly enabled.

#### Authorizing CI Runs

We use
[copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation)
to manage authorization of CI
runs on NVIDIA's compute resources.

* If a pull request is opened by a trusted user and contains only
trusted changes, the pull request's code will
automatically be copied to a pull-request/ prefixed branch in the source
repository (e.g. pull-request/123)
* If a pull request is opened by an untrusted user or contains untrusted
changes, an NVIDIA org member must leave an
`/ok to test` comment on the pull request to trigger CI. This will need
to be done for each new commit.

### Usage
<!--- How does a user interact with the changed code -->
```python
TODO: Add code snippet
```

### Pre-submit Checklist
<!--- Ensure all items are completed before submitting -->

 - [ ] I have tested these changes locally
 - [ ] I have updated the documentation accordingly
 - [ ] I have added/updated tests as needed
 - [ ] All existing tests pass successfully
Removes the xformers install in the framework container, which is only
used for running the huggingface AMPLIFY baselines.

---------

Signed-off-by: Peter St. John <pstjohn@nvidia.com>
### Description
Delete outdated tutorial doc path. Docs for evo2 are here
https://github.com/NVIDIA/bionemo-framework/tree/main/sub-packages/bionemo-evo2/examples

Signed-off-by: John St John <jstjohn@nvidia.com>
### Description
Blackwell compability.

Requires manual patch to causal-conv1d to enable sm100 support.

### Type of changes
<!-- Mark the relevant option with an [x] -->

- [ ]  Bug fix (non-breaking change which fixes an issue)
- [x]  New feature (non-breaking change which adds functionality)
- [ ]  Refactor
- [ ]  Documentation update
- [ ]  Other (please describe):

### CI Pipeline Configuration
Configure CI behavior by applying the relevant labels:

-
[SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci)
- Skip all continuous integration tests
-
[INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests)
- Execute notebook validation tests in pytest
-
[INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests)
- Execute tests labelled as slow in pytest for extensive testing


> [!NOTE]
> By default, the notebooks validation tests are skipped unless
explicitly enabled.

### Usage
<!--- How does a user interact with the changed code -->
```python
TODO: Add code snippet
```

### Pre-submit Checklist
<!--- Ensure all items are completed before submitting -->

 - [ ] I have tested these changes locally
 - [ ] I have updated the documentation accordingly
 - [ ] I have added/updated tests as needed
 - [ ] All existing tests pass successfully

Signed-off-by: Timur Rvachov <trvachov@nvidia.com>
Adds a new mock datamodule class and a new infer entrypoint for AMPLIFY.
Also moves the `mock_xformers` logic from `test_convert.py` to
`convert.py`, so it can be used in the infer script to infer directly
from a huggingface tag.

---------

Signed-off-by: Peter St. John <pstjohn@nvidia.com>
Signed-off-by: Peter St. John <pstjohn@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: Peter St. John <pstjohn@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: Peter St. John <pstjohn@nvidia.com>
### Description
We've observed a divergence on ToT main for Evo2 1B 8K pretraining. We
suspect the cause might be either the change in the PyTorch container
(from 24.12 to 25.01) or the TransformerEngine version (from v1.13 to
v1.14). To investigate, we reran the same Evo2 training using Docker
images built with different combinations of TE and PyTorch versions. The
results are summarized below.

The above experiments result in no divergence for
* pytorch 25.01-py3 (default
[TE_TG=v1.14](https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-25-01.html)),
with hardcoded TE_TAG=v1.13 and
* pytorch 24.12-py3 (by default
[TE_TG=v1.13](https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-24-12.html))

but those experiments result with the divergence for
* (ToT main) pytorch 25.01-py3 (default
[TE_TG=v1.14](https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-25-01.html))
* pytorch 25.02-py3 (by default
[TE_TG=v2.0](https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-25-02.html))

#### Plots presenting Evo2 1b 8k training for ToT main and
configurations


![image](https://github.com/user-attachments/assets/727d0ac8-e57b-41fa-a033-976501529318)


![image](https://github.com/user-attachments/assets/309cdb6a-18c2-4ed1-bdb5-14ed07a61317)


### Type of changes
<!-- Mark the relevant option with an [x] -->

- [x]  Bug fix (non-breaking change which fixes an issue)
- [ ]  New feature (non-breaking change which adds functionality)
- [ ]  Refactor
- [ ]  Documentation update
- [ ]  Other (please describe):

### CI Pipeline Configuration
Configure CI behavior by applying the relevant labels:

-
[SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci)
- Skip all continuous integration tests
-
[INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests)
- Execute notebook validation tests in pytest
-
[INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests)
- Execute tests labelled as slow in pytest for extensive testing

> [!NOTE]
> By default, the notebooks validation tests are skipped unless
explicitly enabled.

#### Authorizing CI Runs

We use
[copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation)
to manage authorization of CI
runs on NVIDIA's compute resources.

* If a pull request is opened by a trusted user and contains only
trusted changes, the pull request's code will
automatically be copied to a pull-request/ prefixed branch in the source
repository (e.g. pull-request/123)
* If a pull request is opened by an untrusted user or contains untrusted
changes, an NVIDIA org member must leave an
`/ok to test` comment on the pull request to trigger CI. This will need
to be done for each new commit.

### Usage
<!--- How does a user interact with the changed code -->
```python
TODO: Add code snippet
```

### Pre-submit Checklist
<!--- Ensure all items are completed before submitting -->

 - [ ] I have tested these changes locally
 - [ ] I have updated the documentation accordingly
 - [ ] I have added/updated tests as needed
 - [ ] All existing tests pass successfully
### Description
Updates the TFLOPS per GPU chart for Geneformer. The source is
https://docs.google.com/spreadsheets/d/1OB28ArwR_-huNyfi4M2I_Q8jKEpvcINNqLhd-LGqKBY/edit?gid=0#gid=0

### Type of changes
<!-- Mark the relevant option with an [x] -->

- [ ]  Bug fix (non-breaking change which fixes an issue)
- [ ]  New feature (non-breaking change which adds functionality)
- [ ]  Refactor
- [X]  Documentation update
- [ ]  Other (please describe):

### CI Pipeline Configuration
Configure CI behavior by applying the relevant labels:

-
[SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci)
- Skip all continuous integration tests
-
[INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests)
- Execute notebook validation tests in pytest
-
[INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests)
- Execute tests labelled as slow in pytest for extensive testing

> [!NOTE]
> By default, the notebooks validation tests are skipped unless
explicitly enabled.

#### Authorizing CI Runs

We use
[copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation)
to manage authorization of CI
runs on NVIDIA's compute resources.

* If a pull request is opened by a trusted user and contains only
trusted changes, the pull request's code will
automatically be copied to a pull-request/ prefixed branch in the source
repository (e.g. pull-request/123)
* If a pull request is opened by an untrusted user or contains untrusted
changes, an NVIDIA org member must leave an
`/ok to test` comment on the pull request to trigger CI. This will need
to be done for each new commit.

### Usage
<!--- How does a user interact with the changed code -->
```python
TODO: Add code snippet
```

### Pre-submit Checklist
<!--- Ensure all items are completed before submitting -->

 - [ ] I have tested these changes locally
 - [ ] I have updated the documentation accordingly
 - [ ] I have added/updated tests as needed
 - [ ] All existing tests pass successfully

Signed-off-by: Jonathan Mitchell <jomitchell@nvidia.com>
### Description

We have the following pattern in the callback of train.py:
```python
filename="{epoch}-{val_loss:.2f}-{step}-{consumed_samples}"
```
The issue raises is due to {val_loss:.2f} - Different ranks have
slightly different validation loss values due to distributed training,
leading to different directories per rank. We can avoid this by
switching to the following. This aggregates the rank specific data into
the same dir which we use as the aggregated checkpoint.
```python
filename="checkpoint-{step}-{consumed_samples}"
```
This works because both `step` and `consumed_samples` are global
counters that stay synchronized across all ranks during distributed
training.

### Type of changes
<!-- Mark the relevant option with an [x] -->

- [x]  Bug fix (non-breaking change which fixes an issue)
- [ ]  New feature (non-breaking change which adds functionality)
- [ ]  Refactor
- [ ]  Documentation update
- [ ]  Other (please describe):

### CI Pipeline Configuration
Configure CI behavior by applying the relevant labels:

-
[SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci)
- Skip all continuous integration tests
-
[INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests)
- Execute notebook validation tests in pytest
-
[INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests)
- Execute tests labelled as slow in pytest for extensive testing

> [!NOTE]
> By default, the notebooks validation tests are skipped unless
explicitly enabled.

#### Authorizing CI Runs

We use
[copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation)
to manage authorization of CI
runs on NVIDIA's compute resources.

* If a pull request is opened by a trusted user and contains only
trusted changes, the pull request's code will
automatically be copied to a pull-request/ prefixed branch in the source
repository (e.g. pull-request/123)
* If a pull request is opened by an untrusted user or contains untrusted
changes, an NVIDIA org member must leave an
`/ok to test` comment on the pull request to trigger CI. This will need
to be done for each new commit.

### Usage
<!--- How does a user interact with the changed code -->
```python
TODO: Add code snippet
```

### Pre-submit Checklist
<!--- Ensure all items are completed before submitting -->

 - [ ] I have tested these changes locally
 - [ ] I have updated the documentation accordingly
 - [ ] I have added/updated tests as needed
 - [ ] All existing tests pass successfully

Signed-off-by: Farhad Ramezanghorbani <farhadr@nvidia.com>
### Description
Reverting
NVIDIA/bionemo-framework@67a869b

### Type of changes
<!-- Mark the relevant option with an [x] -->

- [ ]  Bug fix (non-breaking change which fixes an issue)
- [ ]  New feature (non-breaking change which adds functionality)
- [ ]  Refactor
- [ ]  Documentation update
- [ ]  Other (please describe):

### CI Pipeline Configuration
Configure CI behavior by applying the relevant labels:

-
[SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci)
- Skip all continuous integration tests
-
[INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests)
- Execute notebook validation tests in pytest
-
[INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests)
- Execute tests labelled as slow in pytest for extensive testing

> [!NOTE]
> By default, the notebooks validation tests are skipped unless
explicitly enabled.

#### Authorizing CI Runs

We use
[copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation)
to manage authorization of CI
runs on NVIDIA's compute resources.

* If a pull request is opened by a trusted user and contains only
trusted changes, the pull request's code will
automatically be copied to a pull-request/ prefixed branch in the source
repository (e.g. pull-request/123)
* If a pull request is opened by an untrusted user or contains untrusted
changes, an NVIDIA org member must leave an
`/ok to test` comment on the pull request to trigger CI. This will need
to be done for each new commit.

### Usage
<!--- How does a user interact with the changed code -->
```python
TODO: Add code snippet
```

### Pre-submit Checklist
<!--- Ensure all items are completed before submitting -->

 - [ ] I have tested these changes locally
 - [ ] I have updated the documentation accordingly
 - [ ] I have added/updated tests as needed
 - [ ] All existing tests pass successfully
### Description
<!-- Provide a detailed description of the changes in this PR -->

- Add GPU runners to BioNeMo Sub-Package CI workflow.
- https://jirasw.nvidia.com/browse/BIONEMO-1340

### Details

- GHA GPU Runner Documentation:
https://docs.gha-runners.nvidia.com/runners/#gpu-runners
- NVIDIA CUDA Container Images:
https://catalog.ngc.nvidia.com/orgs/nvidia/containers/cuda/tags
  - Installing CUDA is presumed out-of-scope of BioNeMo Framework.
- BioNeMo Sub-Package Updates - GPU Testing PASS Checklist
  - `bionemo-moco` ✅ @nvdreidenbach 
  - `bionemo-webdatamodule` ✅ @DejunL 
  - `bionemo-size-aware-batching` ✅ @DejunL 
  - `bionemo-core` ✅ 
  - `bionemo-testing` ✅ 
- TODO after TransformerEngine is installed, otherwise they are not
functional. (https://jirasw.nvidia.com/browse/BIONEMO-1341)
  - `bionemo-llm` 🚧 
    - `bionemo-amplify`
    - `bionemo-esm2`
    - `bionemo-evo2`
    - `bionemo-geneformer`

### Usage
<!--- How does a user interact with the changed code -->

- Refer to the BioNeMo Sub-Package CI documentation to run the GHA
workflow:
https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#contributing-python-sub-packages-to-bionemo-framework

### Testing

- Latest:
https://github.com/NVIDIA/bionemo-framework/actions/runs/14030539337

---------

Signed-off-by: Cory Ye <cye@nvidia.com>
Signed-off-by: cspades <cory0ye@gmail.com>
Updates network pull (AWS cli, NGC) deps.



### Description
This tries to keep both x86 and ARM build in a single Dockerfile, so the
diff is easy to understand.

Some dependencies (ngc/aws) had to be updates to get this to work on
ARM.

### Type of changes
<!-- Mark the relevant option with an [x] -->

- [ ]  Bug fix (non-breaking change which fixes an issue)
- [ ]  New feature (non-breaking change which adds functionality)
- [x]  Refactor
- [ ]  Documentation update
- [ ]  Other (please describe):

### CI Pipeline Configuration
Configure CI behavior by applying the relevant labels:

-
[SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci)
- Skip all continuous integration tests
-
[INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests)
- Execute notebook validation tests in pytest
-
[INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests)
- Execute tests labelled as slow in pytest for extensive testing

> [!NOTE]
> By default, the notebooks validation tests are skipped unless
explicitly enabled.

#### Authorizing CI Runs

We use
[copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation)
to manage authorization of CI
runs on NVIDIA's compute resources.

* If a pull request is opened by a trusted user and contains only
trusted changes, the pull request's code will
automatically be copied to a pull-request/ prefixed branch in the source
repository (e.g. pull-request/123)
* If a pull request is opened by an untrusted user or contains untrusted
changes, an NVIDIA org member must leave an
`/ok to test` comment on the pull request to trigger CI. This will need
to be done for each new commit.


### Pre-submit Checklist
<!--- Ensure all items are completed before submitting -->

 - [x] I have tested these changes locally
 - [x] I have updated the documentation accordingly
 - [x] I have added/updated tests as needed
 - [ ] All existing tests pass successfully

Signed-off-by: Timur Rvachov <trvachov@nvidia.com>
### Description
Removes llama-index from container to fix 2 HIGH and 1 CRIT CVE. We do
not use this package and it's only used in nemo RAG here:

```
./3rdparty/NeMo/examples/nlp/rag/rag_generating.py:15:from llama_index.core import Settings, StorageContext, load_index_from_storage
./3rdparty/NeMo/examples/nlp/rag/rag_indexing.py:15:from llama_index.core import Settings, SimpleDirectoryReader, VectorStoreIndex
./3rdparty/NeMo/examples/nlp/rag/rag_indexing.py:16:from llama_index.core.node_parser import SentenceSplitter
./3rdparty/NeMo/nemo/collections/nlp/models/rag/custom_bert_embedder.py:19:from llama_index.core.bridge.pydantic import PrivateAttr
./3rdparty/NeMo/nemo/collections/nlp/models/rag/custom_bert_embedder.py:20:from llama_index.core.embeddings import BaseEmbedding
./3rdparty/NeMo/nemo/collections/nlp/models/rag/custom_gpt_llm.py:18:from llama_index.core.bridge.pydantic import PrivateAttr
./3rdparty/NeMo/nemo/collections/nlp/models/rag/custom_gpt_llm.py:19:from llama_index.core.llms import CompletionResponse, CompletionResponseGen, CustomLLM, LLMMetadata
./3rdparty/NeMo/nemo/collections/nlp/models/rag/custom_gpt_llm.py:20:from llama_index.core.llms.callbacks import llm_completion_callback
```


### Type of changes
<!-- Mark the relevant option with an [x] -->

- [x]  Bug fix (non-breaking change which fixes an issue)
- [ ]  New feature (non-breaking change which adds functionality)
- [ ]  Refactor
- [ ]  Documentation update
- [ ]  Other (please describe):

### CI Pipeline Configuration
Configure CI behavior by applying the relevant labels:

-
[SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci)
- Skip all continuous integration tests
-
[INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests)
- Execute notebook validation tests in pytest
-
[INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests)
- Execute tests labelled as slow in pytest for extensive testing

> [!NOTE]
> By default, the notebooks validation tests are skipped unless
explicitly enabled.

#### Authorizing CI Runs

We use
[copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation)
to manage authorization of CI
runs on NVIDIA's compute resources.

* If a pull request is opened by a trusted user and contains only
trusted changes, the pull request's code will
automatically be copied to a pull-request/ prefixed branch in the source
repository (e.g. pull-request/123)
* If a pull request is opened by an untrusted user or contains untrusted
changes, an NVIDIA org member must leave an
`/ok to test` comment on the pull request to trigger CI. This will need
to be done for each new commit.

### Usage
<!--- How does a user interact with the changed code -->
```python
TODO: Add code snippet
```

### Pre-submit Checklist
<!--- Ensure all items are completed before submitting -->

 - [ ] I have tested these changes locally
 - [ ] I have updated the documentation accordingly
 - [ ] I have added/updated tests as needed
 - [ ] All existing tests pass successfully

Signed-off-by: Timur Rvachov <trvachov@nvidia.com>
Bumps [3rdparty/NeMo](https://github.com/NVIDIA/NeMo) from `cc8ff45` to
`384ff02`.
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/NVIDIA/NeMo/commit/384ff022f4b6740b732ecdf149fabc2d2c686992"><code>384ff02</code></a>
ci: Move scripts fully down to files (<a
href="https://redirect.github.com/NVIDIA/NeMo/issues/12802">#12802</a>)</li>
<li><a
href="https://github.com/NVIDIA/NeMo/commit/f1ddc97a9846bb291f63899ef32f5d36ed30be69"><code>f1ddc97</code></a>
ci: Remove <code>--branch</code> (<a
href="https://redirect.github.com/NVIDIA/NeMo/issues/12809">#12809</a>)</li>
<li><a
href="https://github.com/NVIDIA/NeMo/commit/a0b4590dcad2068e2ef3c203ca9e610273eabbee"><code>a0b4590</code></a>
Remove cuda graph code in TransformerBlock (<a
href="https://redirect.github.com/NVIDIA/NeMo/issues/12779">#12779</a>)</li>
<li><a
href="https://github.com/NVIDIA/NeMo/commit/dc8c441fda39206601006c0d9144c262ffe18067"><code>dc8c441</code></a>
Add BERT/Qwen2.5 Unit test and Refactor all GHA Conversion Tests (<a
href="https://redirect.github.com/NVIDIA/NeMo/issues/12785">#12785</a>)</li>
<li><a
href="https://github.com/NVIDIA/NeMo/commit/79c9e9ab9304a1e610cd2dd9f1545194a5b18944"><code>79c9e9a</code></a>
ci: Fix flaky LLM tests (<a
href="https://redirect.github.com/NVIDIA/NeMo/issues/12807">#12807</a>)</li>
<li><a
href="https://github.com/NVIDIA/NeMo/commit/f2a1746d4fe2262b095ebda67bbbb0b6fb9f752f"><code>f2a1746</code></a>
ci: Measure multiprocessing (<a
href="https://redirect.github.com/NVIDIA/NeMo/issues/12778">#12778</a>)</li>
<li><a
href="https://github.com/NVIDIA/NeMo/commit/57f336235f6ec152d89249c9c0f97c426de21315"><code>57f3362</code></a>
Fixes for audio doc warnings (<a
href="https://redirect.github.com/NVIDIA/NeMo/issues/12736">#12736</a>)</li>
<li><a
href="https://github.com/NVIDIA/NeMo/commit/fae8897686d7460381b016de005b44ff27bd1092"><code>fae8897</code></a>
build: Add trtllm (<a
href="https://redirect.github.com/NVIDIA/NeMo/issues/12672">#12672</a>)</li>
<li><a
href="https://github.com/NVIDIA/NeMo/commit/d126c5f94e129cdae8316136aef4ccebd40e2258"><code>d126c5f</code></a>
Use NeMo quick_gelu (<a
href="https://redirect.github.com/NVIDIA/NeMo/issues/12787">#12787</a>)</li>
<li><a
href="https://github.com/NVIDIA/NeMo/commit/262b5ad4858b871168e9011658e56c156c6d9864"><code>262b5ad</code></a>
Add pruning recipe (<a
href="https://redirect.github.com/NVIDIA/NeMo/issues/12602">#12602</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/NVIDIA/NeMo/compare/cc8ff45aaf678d2b0f7439863f5f17ae06303f06...384ff022f4b6740b732ecdf149fabc2d2c686992">compare
view</a></li>
</ul>
</details>
<br />


Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
### Description
Add script to clone bionemo-moco sub-package directly.

Also cleaned up dependencies in tests.

---------

Signed-off-by: Danny <dreidenbach@nvidia.com>
### Description
ARM builds are still broken -- this should fix it. Just various
dependency juggling.

### Type of changes
<!-- Mark the relevant option with an [x] -->

- [x]  Bug fix (non-breaking change which fixes an issue)
- [ ]  New feature (non-breaking change which adds functionality)
- [ ]  Refactor
- [ ]  Documentation update
- [ ]  Other (please describe):

### CI Pipeline Configuration
Configure CI behavior by applying the relevant labels:

-
[SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci)
- Skip all continuous integration tests
-
[INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests)
- Execute notebook validation tests in pytest
-
[INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests)
- Execute tests labelled as slow in pytest for extensive testing

> [!NOTE]
> By default, the notebooks validation tests are skipped unless
explicitly enabled.

#### Authorizing CI Runs

We use
[copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation)
to manage authorization of CI
runs on NVIDIA's compute resources.

* If a pull request is opened by a trusted user and contains only
trusted changes, the pull request's code will
automatically be copied to a pull-request/ prefixed branch in the source
repository (e.g. pull-request/123)
* If a pull request is opened by an untrusted user or contains untrusted
changes, an NVIDIA org member must leave an
`/ok to test` comment on the pull request to trigger CI. This will need
to be done for each new commit.

### Usage
<!--- How does a user interact with the changed code -->
```python
TODO: Add code snippet
```

### Pre-submit Checklist
<!--- Ensure all items are completed before submitting -->

 - [x] I have tested these changes locally
 - [x] I have updated the documentation accordingly
 - [x] I have added/updated tests as needed
 - [x] All existing tests pass successfully

---------

Signed-off-by: Timur Rvachov <trvachov@nvidia.com>
Bumps [3rdparty/NeMo](https://github.com/NVIDIA/NeMo) from `b685967` to `b5952a6`.
- [Release notes](https://github.com/NVIDIA/NeMo/releases)
- [Commits](NVIDIA-NeMo/NeMo@b685967...b5952a6)

---
updated-dependencies:
- dependency-name: 3rdparty/NeMo
  dependency-version: b5952a6eb4d59648da381bcde1f66f192d3c1073
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot bot added dependencies Pull requests that update a dependency file submodules Pull requests that update submodules code labels Mar 31, 2026
@dependabot dependabot bot requested a review from skothenhill-nv as a code owner March 31, 2026 04:50
@dependabot dependabot bot added dependencies Pull requests that update a dependency file submodules Pull requests that update submodules code labels Mar 31, 2026
@dependabot @github
Copy link
Copy Markdown
Author

dependabot bot commented on behalf of github Apr 2, 2026

OK, I won't notify you again about this release, but will get in touch when a new version is available.

If you change your mind, just re-open this PR and I'll resolve any conflicts on it.

@dependabot dependabot bot deleted the dependabot/submodules/main/3rdparty/NeMo-b5952a6 branch April 2, 2026 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file submodules Pull requests that update submodules code

Projects

None yet

Development

Successfully merging this pull request may close these issues.