Bump 3rdparty/NeMo from b685967 to b5952a6#1
Closed
dependabot[bot] wants to merge 484 commits intomainfrom
Closed
Conversation
I have added functionality for paginated loading of an anndata file. This enables building up an in-memory sparse matrix structure of the file when it is converted into the SingleCellMemMapDataset structure. This used in the case where the anndata file will not fit into memory. The user can specify a file size cutoff of paginated_load_cutoff to specify the minimum anndata file size at which paginated loading will occur. load_block_size refers to the number rows (cells) that will be loaded into memory at a given time. **A line by line analysis of memory usage with regular data loading vs. the paginated data-loading with a block-size of 100_000 on an h5ad file of 967 MB:** [memory.txt](https://github.com/user-attachments/files/17514163/memory.txt) Regular data-loading adds 4825.8 MB to memory. The majority of this (3258.4 MB) is during the loading of the entire h5ad file into memory. The paginated data-loading with a block size of 100_000 adds at most 1382 MB to memory. A block size of 10_000 adds at most 297.6 MB to memory. This is a line-by-line breakdown of regular loading vs. paginated loading with a block size of 100_000. --------- Signed-off-by: polinabinder1 <pbinder@nvidia.com> Co-authored-by: Steven Kothen-Hill <148821680+skothenhill-nv@users.noreply.github.com> Co-authored-by: Malcolm Greaves <malcolmgreaves@users.noreply.github.com>
ESM2 Inference Script
## Summary Inference+fine-tuning notebook and associated scripts covering the 10M and 106M geneformer models. ## Changes * New notebook for geneformer inference+fine-tuning that runs/passes tests in CI * Move scripts for geneformer training and inference into the geneformer sub-package, and install them in the CLI. Depends on #384 --------- Signed-off-by: Farhad Ramezanghorbani <farhad.ghorbani@gmail.com> Co-authored-by: Farhad Ramezanghorbani <farhad.ghorbani@gmail.com> Co-authored-by: Peter St. John <pstjohn@nvidia.com> Co-authored-by: Farhad Ramezanghorbani <farhadrgh@users.noreply.github.com>
Documentation for an example model along with python training scripts. --------- Signed-off-by: polinabinder1 <pbinder@nvidia.com> Co-authored-by: Peter St. John <pstjohn@nvidia.com>
Puts a slimmed-down copy of the internal `infra-bionemo` into the `internal/` directory. This contains the original `license_check.py` program, functionality for creating new bionemo `sub-packages`, and functionality to create new standalone Python projects. New Python projects can either be namespaced (PEP 420) or not (aka "simple"). Adds a new GitHub CI stage for running `pytest` & calculates code coverage in `infra-bionemo`. New bionemo sub-package creation is exposed as the CLI program `create-bionemo-project`. New namespaced project creation is handled by the CLI tool `create-namespaced-project`. Simple projects are created with `create-py-project`. Developers **MUST** install this new internal Python project by executing `pip install -e internal/infra-bionemo`. The top-level bionemo meta package (`pyproject.toml`) has been updated to add `infra-bionemo` to the workspace. Note that the license check script has moved to `infra-bionemo`. It is now accessible as a standalone CLI program: `license-check`. --------- Signed-off-by: Malcolm Greaves <mgreaves@nvidia.com>
Signed-off-by: Dorota Toczydlowska <115542912+dorotat-nv@users.noreply.github.com>
This is to address QA Bug https://nvbugspro.nvidia.com/bug/4946953 By changing the output of predict method from a `list` to `dict` by using `batch_collator`
## Summary Introduces a pydantic based configuration and execution of bionemo2. Pydantic based configuration introduces a few changes (and some unresolved changes) to our execution and configuration workflow. 1) all model submodules should have their own entrypoints for generating a json (alternatively, we can commit config templates). 2) all model submodules should also have their own `train` emptypoint. If these are BioBertModels, they may use the `train` function defined in `bionemo.llm` ## Core additions for this PR - config_models.py (base, ESM2, and Geneformer). These define the pydantic models that are ultimately parsed, including the use of generics for swapping ModelConfig and DataConfigs. The structure of these is important for identifying which things are coupled. - main.py this is the argparse entrypoint, which does the underlying parsing. There is one for each sub-package. There is some weirdness around defaults that I'd appreciate feedback on. - sub-packages/bionemo-llm/src/bionemo/llm/train.py - Unified training entrypoint for both ESM2 and Geneformer. This consumes the pydantic configs and kicks of a NeMo2 lightning training job. (**IMPORTANT** I think we can add an inference hook similar to this). - recipes.py this is a minor feature, used to populate json files that can be run by users. there isnt much parametric code, but useful to review for correctness. ## Usage Substitute the data-dir argument with wherever you house your test data. ```bash bionemo-geneformer-recipe --dest config.json --data-dir /workspaces/bionemo-fw-ea/data/cellxgene_2023-12-15_small/processed_data bionemo-geneformer-train --config config.json --model-config-t bionemo.geneformer.run.config_models.ExposedGeneformerPretrainConfig --resume-if-exists=False ``` ## Changes - We now construct 'config' objects that define execution. - Configs are currently generic over MasterConfig, meaning as long as the associated configs are provided, it can execute any valid permutation of model configs, data configs, and parallel configs using `nemo.lightning.api.llm.train` - Example of serializing and deserializing configs. - Example using discriminated unions (sum types) for generalizing over configs. - Example of `model_validator` for validating parameters across configs. - Example where we pair `field_validator` and `field_serializer` to provide a serde interface for arbitrary types (in this case, activation functions) - Patterns for creating families of generic configs (`ExposedModelConfig[ModelConfigT]` and `DataConfig[DataModuleT]`) by combining generics and abc interfaces. - direct test of using the CLI to create/test recipes - added recipes for ESM2 8m, 650m, 3B - added recipes for Geneformer 10m, 106m - added hook for global validators in children of DataConfig and ExposedModelConfig - updated README to reflect these changes. --------- Signed-off-by: Steven Kothen-Hill <148821680+skothenhill-nv@users.noreply.github.com> Co-authored-by: Farhad Ramezanghorbani <farhadrgh@users.noreply.github.com>
## Summary Port the geneformer loss eval script over from bionemo1, and update the model card with new numbers and results. --------- Signed-off-by: John St. John <jstjohn@users.noreply.github.com> Co-authored-by: Peter St. John <pstjohn@nvidia.com>
I'm adding a few changes to the SCDL documentation to make it a bit more clear. I am also adding explanation of the paginated loading. --------- Signed-off-by: polinabinder1 <pbinder@nvidia.com>
`bionemo-testing` re-exports the same values defined in `__all__`: they are imported from `bionemo-core` as the implementations have moved. Note that the tests have also moved. Now, all sub-packages can use `load` at runtime, not just during tests. All previous imports of `bionemo.testing.load` have been changed to `bionemo.core.load`. Additionally moves over the YAML resource files from bionemo-testing into bionemo-core & adjusted the `get_all_resources` function. This PR also fixes an error in the naming convention for `bionemo-core`'s tests.
Adding a needed underscore.
Corrects the config by setting optional fields to actually be optional. --------- Signed-off-by: Steven Kothen-Hill <148821680+skothenhill-nv@users.noreply.github.com> Co-authored-by: Farhad Ramezanghorbani <farhadrgh@users.noreply.github.com>
`bionemo-testing` re-exports the same values defined in `__all__`: they are imported from `bionemo-core` as the implementations have moved. Note that the tests have also moved. Now, all sub-packages can use `load` at runtime, not just during tests. All previous imports of `bionemo.testing.load` have been changed to `bionemo.core.load`. Additionally moves over the YAML resource files from bionemo-testing into bionemo-core & adjusted the `get_all_resources` function. This PR also fixes an error in the naming convention for `bionemo-core`'s tests.
…… (#416) This fixes a console issue that was showing up:  The issue was because we were referencing a custom font (`NVIDIA Sans`) via the `text: ` field in the mkdocs.yml. Custom fonts should be handled in custom stylesheets (which they currently are) - referencing a font in the string will try to grab it from Google Fonts, hence the weird issue. I also formatted the text to use double-quotes. Co-authored-by: Tyler Shimko <tshimko@nvidia.com>
Add Frequently Asked Questions section to docs. --------- Signed-off-by: Tyler Shimko <tshimko@nvidia.com>
Small update to default settings for `mike` docs build tool.
Uses `nest_asyncio` to allow this to run in a jupyter notebook, and calls `client.configure()` to ensure that the output type is set to `json`.
See slack thread https://nvidia.slack.com/archives/C074Z808N05/p1730301123648729 The main issue I was hitting was the setting `SBATCH --overcommit` in my personal slurm script. As a side effect of this exploration I did a few things though which this PR has: * Identified that a bug in some combination of NeMo2 and Megatron-lm results in some fused kernels getting regularly re-compiled * Verified that even with this re-compilation bug, we still have the best performance when using these kernels * Added an option for geneformer to turn on the torch debugger that errors out if a kernel is getting recompiled, this is how I found which kernels were at fault and where/why the recompilations were happening. ## Performance summary of different settings | Name | Replicate | Num GPUs | Time per 10 steps | Average Timing | |-------------------------------|-----------|----------|-------------------|----------------| | no_recompile | 0 | 1 | 2.487 | 2.50875 | | no_recompile | 0 | 2 | 2.527 | | | no_recompile | 1 | 1 | 2.503 | | | no_recompile | 1 | 2 | 2.518 | | | fused_bias_act | 0 | 1 | 2.489 | 2.496666667 | | fused_bias_act | 0 | 2 | 2.514 | | | fused_bias_act | 1 | 1 | 2.487 | | | fused_bias_act | 1 | 2 | | | | fused_bias_act_do | 0 | 1 | 2.459 | 2.47425 | | fused_bias_act_do | 0 | 2 | 2.478 | | | fused_bias_act_do | 1 | 1 | 2.471 | | | fused_bias_act_do | 1 | 2 | 2.489 | | | fused_loss | 0 | 1 | 2.312 | 2.326 | | fused_loss | 0 | 2 | 2.335 | | | fused_loss | 1 | 1 | 2.323 | | | fused_loss | 1 | 2 | 2.334 | | | fused_bias_do | 0 | 1 | 2.467 | 2.4845 | | fused_bias_do | 0 | 2 | 2.499 | | | fused_bias_do | 1 | 1 | 2.472 | | | fused_bias_do | 1 | 2 | 2.5 | | | fused_bias_loss | 0 | 1 | 2.282 | 2.28775 | | fused_bias_loss | 0 | 2 | 2.297 | | | fused_bias_loss | 1 | 1 | 2.277 | | | fused_bias_loss | 1 | 2 | 2.295 | | | fused_bias_loss_arange_expand | 0 | 1 | 2.277 | 2.29075 | | fused_bias_loss_arange_expand | 0 | 2 | 2.298 | | | fused_bias_loss_arange_expand | 1 | 1 | 2.285 | | | fused_bias_loss_arange_expand | 1 | 2 | 2.303 | |
## Summary Run `python -m bionemo.geneformer.data.singlecell.dataset` ### Baseline: Processed 31208 rows in 47.10491418838501 seconds Processed 31208 rows in 47.388004779815674 seconds ### After vectorization: Processed 31208 rows in 44.2215359210968 seconds Processed 31208 rows in 44.54389500617981 seconds --------- Signed-off-by: John St. John <jstjohn@users.noreply.github.com> Co-authored-by: Malcolm Greaves <malcolmgreaves@users.noreply.github.com>
Update minimum driver version per @ohadmo's findings that CI stalls with 535, but not with 560.
Cherry picking changes from the release-v.20 branch back into main. --------- Signed-off-by: Tyler Shimko <tshimko@nvidia.com>
The example notebook depends on pooch, which is not installed. Additionally, the output to the scdl is written to a permanent directroy that is not deleted.
Pooch automatically calculates a filename for downloads that is a function of the hash and the URL, but we want to ensure that downloads from NGC and PBSS have identical local filenames so that their cache is shared, and the URLs for both resources don't typically match. Here we set the fname parameter manually so that we don't download duplicate objects from ngc and pbss
I realized that the same `SBATCH --overcommit` was getting in the way of both old and new versions of geneformer slurm logs I was comparing. I found a run that didn't have this overcommit issue and described relative performance in the context of that run instead.
See wandb runs here: https://wandb.ai/clara-discovery/geneformer_bionemo2_timing2 See the results below, we can precisely control whether or not there is a grad norm instability by setting or unsetting the two NVTE env variables. Adding the NVTE env variables to our container is a recent change as well. Based on these results we are unsetting these variables for now. There is not a significant hit to performance by making this change. ## Old run where this was not an issue: <img width="457" alt="Screenshot 2024-11-12 at 9 42 45 AM" src="https://github.com/user-attachments/assets/7571ec4a-7bf1-4f86-901a-4dc983b53149"> ## Representative new run where we see a spike in grad norm <img width="730" alt="Screenshot 2024-11-12 at 9 43 25 AM" src="https://github.com/user-attachments/assets/c9069d1d-3cc7-43e3-93d0-1a3ff07ecfe3"> ## We can make this spike go away by unsetting `NVTE_FUSED_ATTN` and `NVTE_FLASH_ATTN` <img width="731" alt="Screenshot 2024-11-12 at 9 43 44 AM" src="https://github.com/user-attachments/assets/3883383a-e943-4d26-a12a-956f7240bd45"> ## We can introduce this spike on the old image that didn't have these env variables by setting them <img width="728" alt="Screenshot 2024-11-12 at 9 44 16 AM" src="https://github.com/user-attachments/assets/d5daeb16-57be-4e8e-bde6-8b275bf53a46"> ## Example longer/larger batch run that fails with these env variables set <img width="729" alt="Screenshot 2024-11-12 at 9 45 07 AM" src="https://github.com/user-attachments/assets/00cdb307-1863-47e1-b93e-3227cbc7259b"> ## We can stabilize this run by unsetting these env variables <img width="729" alt="Screenshot 2024-11-12 at 9 45 30 AM" src="https://github.com/user-attachments/assets/2cd370e3-5cdc-4385-9294-cdab068d6a8b"> It seems to be relatively recent so this PR is going to test some recent changes to see if any of them is causing this. - [x] Check if the arange change is causing this? - [x] Check if the grad buffer change (should not be enabled) is causing this - [x] bias fusions - [x] garbage collection callback Find out when this worked: - [x] PR 409 right before second perf change and dset change - [x] PR 410 after first perf change, CLI refactor, and wandb fix - [x] PR 404 right before new CLI - [x] PR 362 (2 weeks ago) but restarting job before the gradients start to increase - [x] PR 362 (2 weeks ago) - [x] **worked** https://wandb.ai/clara-discovery/geneformer_bionemo2/runs/0sSIf3tl?nw=nwusernvjstjohn **worked** uses `bionemo2-pr312--136b1889fc390d9dad04f077b32b8fbecf50e25d` - [x] bionemo2-pr312--136b1889fc390d9dad04f077b32b8fbecf50e25d but with `NVTE_FUSED_ATTN=1` and `NVTE_FLASH_ATTN=0` set in my script **did not work ** - [x] bionemo2-pr312--136b1889fc390d9dad04f077b32b8fbecf50e25d but with `NVTE_FUSED_ATTN=1` and `NVTE_FLASH_ATTN=0` `unset` in my script **WORKED!!** - [x] bionemo2-pr419--f2599382e4afaf061c9948628f3f72bb8e233fd6 (most recent PR merged) but manually unsetting `NVTE_FUSED_ATTN=1` and `NVTE_FLASH_ATTN=0` Notes on differences between TOT and `pr312--136b1889fc390d9dad04f077b32b8fbecf50e25d` - `env` doesn't have `NVTE_FUSED*` env settings. Unclear if slurm script adds them properly or not. - `NVTE_FUSED_ATTN` and `NVTE_FLASH_ATTN` are set in `bionemo2-pr373--db2fe9cc240b12bfaf045654fc5350a7b985c9de` for example. - in slurm `--export=ALL` is default and passes all env variables. Perhaps this happens then, so the run where I have those env variables added might fail if those are causing the issue. - Successful run was bs=32 vs 64. I'm running a test now that has the NVTE* settings in the docker script but not in the image. - This was a closed branch, maybe some key changes didn't make it to main. - No `pip freeze` differences pop out that distinguish the branch that passes from the set that fail. - NOTE: See the experiments above around `NVTE_FUSED_ATTN=1` and `NVTE_FLASH_ATTN=0` . I am pretty sure these settings are what cause the training instability in geneformer. Unsetting them works in the old PR and setting them causes that old PR to not work with this explosion of gradients. - Currently I'm rerunning tests on a TOT branch but calling `unset` in my script on those variables so that they are removed from the container env prior to executing the script. If this fixes the TOT training curve I will feel very confident that this is what's going on, and we can focus on purging references to these variables from our docs, other than maybe highlighting how they result in training instability.
## Summary Fixing all headers to be Apache license, ahead of open source release. ## Details Should not affect any executed code. Co-authored-by: John St. John <jstjohn@users.noreply.github.com>
Mounts the local netrc file so wandb credentials are available inside the container. Co-authored-by: John St. John <jstjohn@users.noreply.github.com>
Move and refactor ESM2 scripts into a package
### Description Parallelising testing stages in GitHub CI to run in the same time to speedup pipeline Added verify-tests-status as a final job that checks the tests status and if success or skipped, then success Checked that everything works * when if-statement moved to the main body of the job, then there is no image squashing if the label is not selected, simply test status job collects test status so we will be able to add it as a check for pipeline success * Here is pipeline with skipped notebooks (Verify All Tests Status passes) https://github.com/NVIDIA/bionemo-framework/actions/runs/13976419866/job/39135551206 * Here is a pipeline with failed notebooks (Verify All Tests Status fails since notebooks fail) https://github.com/NVIDIA/bionemo-framework/actions/runs/13991973092/job/39177843004?pr=768 EDIT: Removed flag `INCLUDE_NOTEBOOKS_TESTS` since some of the notebooks do not pass CI pipeline, see [failed job logs](https://github.com/NVIDIA/bionemo-framework/actions/runs/13949683812/job/39045773477?pr=768). Fix is being merged in NVIDIA/bionemo-framework#743 Current workflow in CI * run-test jobs run sequentially on one runner * image squashing need to run before the first (unit tests) job only ** duration: 18 min, see [log](https://github.com/NVIDIA/bionemo-framework/actions/runs/13971056783/job/39114676552?pr=768) * the pipeline duration if not SKIP_CI **minimum: image_squashing_time + duration_L0 ~ 50mins, **maximum: image_squashing_time + duration_L0, + duration_L1 + duration_docs ~ 2h  Proposed parallelization: * run-test jobs run in parallel since we have access to multiple runners * image squashing needs to run before each of the test job duration: 18 min, see [log](https://github.com/NVIDIA/bionemo-framework/actions/runs/13971056783/job/39114676552?pr=768) * the pipeline duration if not SKIP_CI **minimum: image_squashing_time + duration_L0 ~ 50mins, **maximum: image_squashing_time + max(duration_L0, duration_L1, duration_docs) ~1h10mins * no duration change to the default PR pipeline, other pipelines have smaller duration  ### Type of changes <!-- Mark the relevant option with an [x] --> - [ ] Bug fix (non-breaking change which fixes an issue) - [x] New feature (non-breaking change which adds functionality) - [ ] Refactor - [ ] Documentation update - [ ] Other (please describe): ### CI Pipeline Configuration Configure CI behavior by applying the relevant labels: - [SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci) - Skip all continuous integration tests - [INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests) - Execute notebook validation tests in pytest - [INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests) - Execute tests labelled as slow in pytest for extensive testing > [!NOTE] > By default, the notebooks validation tests are skipped unless explicitly enabled. #### Authorizing CI Runs We use [copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation) to manage authorization of CI runs on NVIDIA's compute resources. * If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123) * If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an `/ok to test` comment on the pull request to trigger CI. This will need to be done for each new commit. ### Usage <!--- How does a user interact with the changed code --> ```python TODO: Add code snippet ``` ### Pre-submit Checklist <!--- Ensure all items are completed before submitting --> - [ ] I have tested these changes locally - [ ] I have updated the documentation accordingly - [ ] I have added/updated tests as needed - [x] All existing tests pass successfully --------- Signed-off-by: dorotat <dorotat@nvidia.com>
### Description Turning off TF32 so D3PM tests pass on blackwell ### Type of changes <!-- Mark the relevant option with an [x] --> - [ x] Bug fix (non-breaking change which fixes an issue) tests pass on computelab --------- Signed-off-by: Danny <dreidenbach@nvidia.com>
### Description <!-- Provide a detailed description of the changes in this PR --> ### Type of changes <!-- Mark the relevant option with an [x] --> - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Refactor - [ ] Documentation update - [ ] Other (please describe): ### CI Pipeline Configuration Configure CI behavior by applying the relevant labels: - [SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci) - Skip all continuous integration tests - [INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests) - Execute notebook validation tests in pytest - [INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests) - Execute tests labelled as slow in pytest for extensive testing > [!NOTE] > By default, the notebooks validation tests are skipped unless explicitly enabled. ### Usage <!--- How does a user interact with the changed code --> ```python TODO: Add code snippet ``` ### Pre-submit Checklist <!--- Ensure all items are completed before submitting --> - [ ] I have tested these changes locally - [ ] I have updated the documentation accordingly - [ ] I have added/updated tests as needed - [ ] All existing tests pass successfully Signed-off-by: Jonathan Mitchell <jomitchell@nvidia.com>
…e_tutorial.ipynb (#779) ### Description Skipping execution of this notebook in GitHub CI due execution error `pytest -v --nbval-lax -p no:python --ignore="docs/docs/user-guide/examples/bionemo-geneformer/geneformer_cellxgene_tutorial.ipynb" docs/ sub-packages/` Issue: NVIDIA/bionemo-framework#778 ### Type of changes <!-- Mark the relevant option with an [x] --> - [x] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Refactor - [ ] Documentation update - [ ] Other (please describe): ### CI Pipeline Configuration Configure CI behavior by applying the relevant labels: - [SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci) - Skip all continuous integration tests - [INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests) - Execute notebook validation tests in pytest - [INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests) - Execute tests labelled as slow in pytest for extensive testing > [!NOTE] > By default, the notebooks validation tests are skipped unless explicitly enabled. #### Authorizing CI Runs We use [copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation) to manage authorization of CI runs on NVIDIA's compute resources. * If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123) * If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an `/ok to test` comment on the pull request to trigger CI. This will need to be done for each new commit. ### Usage <!--- How does a user interact with the changed code --> ```python TODO: Add code snippet ``` ### Pre-submit Checklist <!--- Ensure all items are completed before submitting --> - [ ] I have tested these changes locally - [ ] I have updated the documentation accordingly - [ ] I have added/updated tests as needed - [ ] All existing tests pass successfully
Removes the xformers install in the framework container, which is only used for running the huggingface AMPLIFY baselines. --------- Signed-off-by: Peter St. John <pstjohn@nvidia.com>
### Description Delete outdated tutorial doc path. Docs for evo2 are here https://github.com/NVIDIA/bionemo-framework/tree/main/sub-packages/bionemo-evo2/examples Signed-off-by: John St John <jstjohn@nvidia.com>
### Description Blackwell compability. Requires manual patch to causal-conv1d to enable sm100 support. ### Type of changes <!-- Mark the relevant option with an [x] --> - [ ] Bug fix (non-breaking change which fixes an issue) - [x] New feature (non-breaking change which adds functionality) - [ ] Refactor - [ ] Documentation update - [ ] Other (please describe): ### CI Pipeline Configuration Configure CI behavior by applying the relevant labels: - [SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci) - Skip all continuous integration tests - [INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests) - Execute notebook validation tests in pytest - [INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests) - Execute tests labelled as slow in pytest for extensive testing > [!NOTE] > By default, the notebooks validation tests are skipped unless explicitly enabled. ### Usage <!--- How does a user interact with the changed code --> ```python TODO: Add code snippet ``` ### Pre-submit Checklist <!--- Ensure all items are completed before submitting --> - [ ] I have tested these changes locally - [ ] I have updated the documentation accordingly - [ ] I have added/updated tests as needed - [ ] All existing tests pass successfully Signed-off-by: Timur Rvachov <trvachov@nvidia.com>
Adds a new mock datamodule class and a new infer entrypoint for AMPLIFY. Also moves the `mock_xformers` logic from `test_convert.py` to `convert.py`, so it can be used in the infer script to infer directly from a huggingface tag. --------- Signed-off-by: Peter St. John <pstjohn@nvidia.com> Signed-off-by: Peter St. John <pstjohn@login-eos01.eos.clusters.nvidia.com> Co-authored-by: Peter St. John <pstjohn@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: Peter St. John <pstjohn@nvidia.com>
### Description We've observed a divergence on ToT main for Evo2 1B 8K pretraining. We suspect the cause might be either the change in the PyTorch container (from 24.12 to 25.01) or the TransformerEngine version (from v1.13 to v1.14). To investigate, we reran the same Evo2 training using Docker images built with different combinations of TE and PyTorch versions. The results are summarized below. The above experiments result in no divergence for * pytorch 25.01-py3 (default [TE_TG=v1.14](https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-25-01.html)), with hardcoded TE_TAG=v1.13 and * pytorch 24.12-py3 (by default [TE_TG=v1.13](https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-24-12.html)) but those experiments result with the divergence for * (ToT main) pytorch 25.01-py3 (default [TE_TG=v1.14](https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-25-01.html)) * pytorch 25.02-py3 (by default [TE_TG=v2.0](https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-25-02.html)) #### Plots presenting Evo2 1b 8k training for ToT main and configurations   ### Type of changes <!-- Mark the relevant option with an [x] --> - [x] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Refactor - [ ] Documentation update - [ ] Other (please describe): ### CI Pipeline Configuration Configure CI behavior by applying the relevant labels: - [SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci) - Skip all continuous integration tests - [INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests) - Execute notebook validation tests in pytest - [INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests) - Execute tests labelled as slow in pytest for extensive testing > [!NOTE] > By default, the notebooks validation tests are skipped unless explicitly enabled. #### Authorizing CI Runs We use [copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation) to manage authorization of CI runs on NVIDIA's compute resources. * If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123) * If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an `/ok to test` comment on the pull request to trigger CI. This will need to be done for each new commit. ### Usage <!--- How does a user interact with the changed code --> ```python TODO: Add code snippet ``` ### Pre-submit Checklist <!--- Ensure all items are completed before submitting --> - [ ] I have tested these changes locally - [ ] I have updated the documentation accordingly - [ ] I have added/updated tests as needed - [ ] All existing tests pass successfully
### Description Updates the TFLOPS per GPU chart for Geneformer. The source is https://docs.google.com/spreadsheets/d/1OB28ArwR_-huNyfi4M2I_Q8jKEpvcINNqLhd-LGqKBY/edit?gid=0#gid=0 ### Type of changes <!-- Mark the relevant option with an [x] --> - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Refactor - [X] Documentation update - [ ] Other (please describe): ### CI Pipeline Configuration Configure CI behavior by applying the relevant labels: - [SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci) - Skip all continuous integration tests - [INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests) - Execute notebook validation tests in pytest - [INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests) - Execute tests labelled as slow in pytest for extensive testing > [!NOTE] > By default, the notebooks validation tests are skipped unless explicitly enabled. #### Authorizing CI Runs We use [copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation) to manage authorization of CI runs on NVIDIA's compute resources. * If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123) * If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an `/ok to test` comment on the pull request to trigger CI. This will need to be done for each new commit. ### Usage <!--- How does a user interact with the changed code --> ```python TODO: Add code snippet ``` ### Pre-submit Checklist <!--- Ensure all items are completed before submitting --> - [ ] I have tested these changes locally - [ ] I have updated the documentation accordingly - [ ] I have added/updated tests as needed - [ ] All existing tests pass successfully Signed-off-by: Jonathan Mitchell <jomitchell@nvidia.com>
### Description
We have the following pattern in the callback of train.py:
```python
filename="{epoch}-{val_loss:.2f}-{step}-{consumed_samples}"
```
The issue raises is due to {val_loss:.2f} - Different ranks have
slightly different validation loss values due to distributed training,
leading to different directories per rank. We can avoid this by
switching to the following. This aggregates the rank specific data into
the same dir which we use as the aggregated checkpoint.
```python
filename="checkpoint-{step}-{consumed_samples}"
```
This works because both `step` and `consumed_samples` are global
counters that stay synchronized across all ranks during distributed
training.
### Type of changes
<!-- Mark the relevant option with an [x] -->
- [x] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Refactor
- [ ] Documentation update
- [ ] Other (please describe):
### CI Pipeline Configuration
Configure CI behavior by applying the relevant labels:
-
[SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci)
- Skip all continuous integration tests
-
[INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests)
- Execute notebook validation tests in pytest
-
[INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests)
- Execute tests labelled as slow in pytest for extensive testing
> [!NOTE]
> By default, the notebooks validation tests are skipped unless
explicitly enabled.
#### Authorizing CI Runs
We use
[copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation)
to manage authorization of CI
runs on NVIDIA's compute resources.
* If a pull request is opened by a trusted user and contains only
trusted changes, the pull request's code will
automatically be copied to a pull-request/ prefixed branch in the source
repository (e.g. pull-request/123)
* If a pull request is opened by an untrusted user or contains untrusted
changes, an NVIDIA org member must leave an
`/ok to test` comment on the pull request to trigger CI. This will need
to be done for each new commit.
### Usage
<!--- How does a user interact with the changed code -->
```python
TODO: Add code snippet
```
### Pre-submit Checklist
<!--- Ensure all items are completed before submitting -->
- [ ] I have tested these changes locally
- [ ] I have updated the documentation accordingly
- [ ] I have added/updated tests as needed
- [ ] All existing tests pass successfully
Signed-off-by: Farhad Ramezanghorbani <farhadr@nvidia.com>
### Description Reverting NVIDIA/bionemo-framework@67a869b ### Type of changes <!-- Mark the relevant option with an [x] --> - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Refactor - [ ] Documentation update - [ ] Other (please describe): ### CI Pipeline Configuration Configure CI behavior by applying the relevant labels: - [SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci) - Skip all continuous integration tests - [INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests) - Execute notebook validation tests in pytest - [INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests) - Execute tests labelled as slow in pytest for extensive testing > [!NOTE] > By default, the notebooks validation tests are skipped unless explicitly enabled. #### Authorizing CI Runs We use [copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation) to manage authorization of CI runs on NVIDIA's compute resources. * If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123) * If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an `/ok to test` comment on the pull request to trigger CI. This will need to be done for each new commit. ### Usage <!--- How does a user interact with the changed code --> ```python TODO: Add code snippet ``` ### Pre-submit Checklist <!--- Ensure all items are completed before submitting --> - [ ] I have tested these changes locally - [ ] I have updated the documentation accordingly - [ ] I have added/updated tests as needed - [ ] All existing tests pass successfully
### Description <!-- Provide a detailed description of the changes in this PR --> - Add GPU runners to BioNeMo Sub-Package CI workflow. - https://jirasw.nvidia.com/browse/BIONEMO-1340 ### Details - GHA GPU Runner Documentation: https://docs.gha-runners.nvidia.com/runners/#gpu-runners - NVIDIA CUDA Container Images: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/cuda/tags - Installing CUDA is presumed out-of-scope of BioNeMo Framework. - BioNeMo Sub-Package Updates - GPU Testing PASS Checklist - `bionemo-moco` ✅ @nvdreidenbach - `bionemo-webdatamodule` ✅ @DejunL - `bionemo-size-aware-batching` ✅ @DejunL - `bionemo-core` ✅ - `bionemo-testing` ✅ - TODO after TransformerEngine is installed, otherwise they are not functional. (https://jirasw.nvidia.com/browse/BIONEMO-1341) - `bionemo-llm` 🚧 - `bionemo-amplify` - `bionemo-esm2` - `bionemo-evo2` - `bionemo-geneformer` ### Usage <!--- How does a user interact with the changed code --> - Refer to the BioNeMo Sub-Package CI documentation to run the GHA workflow: https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#contributing-python-sub-packages-to-bionemo-framework ### Testing - Latest: https://github.com/NVIDIA/bionemo-framework/actions/runs/14030539337 --------- Signed-off-by: Cory Ye <cye@nvidia.com> Signed-off-by: cspades <cory0ye@gmail.com>
Updates network pull (AWS cli, NGC) deps. ### Description This tries to keep both x86 and ARM build in a single Dockerfile, so the diff is easy to understand. Some dependencies (ngc/aws) had to be updates to get this to work on ARM. ### Type of changes <!-- Mark the relevant option with an [x] --> - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [x] Refactor - [ ] Documentation update - [ ] Other (please describe): ### CI Pipeline Configuration Configure CI behavior by applying the relevant labels: - [SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci) - Skip all continuous integration tests - [INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests) - Execute notebook validation tests in pytest - [INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests) - Execute tests labelled as slow in pytest for extensive testing > [!NOTE] > By default, the notebooks validation tests are skipped unless explicitly enabled. #### Authorizing CI Runs We use [copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation) to manage authorization of CI runs on NVIDIA's compute resources. * If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123) * If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an `/ok to test` comment on the pull request to trigger CI. This will need to be done for each new commit. ### Pre-submit Checklist <!--- Ensure all items are completed before submitting --> - [x] I have tested these changes locally - [x] I have updated the documentation accordingly - [x] I have added/updated tests as needed - [ ] All existing tests pass successfully Signed-off-by: Timur Rvachov <trvachov@nvidia.com>
### Description Removes llama-index from container to fix 2 HIGH and 1 CRIT CVE. We do not use this package and it's only used in nemo RAG here: ``` ./3rdparty/NeMo/examples/nlp/rag/rag_generating.py:15:from llama_index.core import Settings, StorageContext, load_index_from_storage ./3rdparty/NeMo/examples/nlp/rag/rag_indexing.py:15:from llama_index.core import Settings, SimpleDirectoryReader, VectorStoreIndex ./3rdparty/NeMo/examples/nlp/rag/rag_indexing.py:16:from llama_index.core.node_parser import SentenceSplitter ./3rdparty/NeMo/nemo/collections/nlp/models/rag/custom_bert_embedder.py:19:from llama_index.core.bridge.pydantic import PrivateAttr ./3rdparty/NeMo/nemo/collections/nlp/models/rag/custom_bert_embedder.py:20:from llama_index.core.embeddings import BaseEmbedding ./3rdparty/NeMo/nemo/collections/nlp/models/rag/custom_gpt_llm.py:18:from llama_index.core.bridge.pydantic import PrivateAttr ./3rdparty/NeMo/nemo/collections/nlp/models/rag/custom_gpt_llm.py:19:from llama_index.core.llms import CompletionResponse, CompletionResponseGen, CustomLLM, LLMMetadata ./3rdparty/NeMo/nemo/collections/nlp/models/rag/custom_gpt_llm.py:20:from llama_index.core.llms.callbacks import llm_completion_callback ``` ### Type of changes <!-- Mark the relevant option with an [x] --> - [x] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Refactor - [ ] Documentation update - [ ] Other (please describe): ### CI Pipeline Configuration Configure CI behavior by applying the relevant labels: - [SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci) - Skip all continuous integration tests - [INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests) - Execute notebook validation tests in pytest - [INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests) - Execute tests labelled as slow in pytest for extensive testing > [!NOTE] > By default, the notebooks validation tests are skipped unless explicitly enabled. #### Authorizing CI Runs We use [copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation) to manage authorization of CI runs on NVIDIA's compute resources. * If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123) * If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an `/ok to test` comment on the pull request to trigger CI. This will need to be done for each new commit. ### Usage <!--- How does a user interact with the changed code --> ```python TODO: Add code snippet ``` ### Pre-submit Checklist <!--- Ensure all items are completed before submitting --> - [ ] I have tested these changes locally - [ ] I have updated the documentation accordingly - [ ] I have added/updated tests as needed - [ ] All existing tests pass successfully Signed-off-by: Timur Rvachov <trvachov@nvidia.com>
Bumps [3rdparty/NeMo](https://github.com/NVIDIA/NeMo) from `cc8ff45` to `384ff02`. <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/NVIDIA/NeMo/commit/384ff022f4b6740b732ecdf149fabc2d2c686992"><code>384ff02</code></a> ci: Move scripts fully down to files (<a href="https://redirect.github.com/NVIDIA/NeMo/issues/12802">#12802</a>)</li> <li><a href="https://github.com/NVIDIA/NeMo/commit/f1ddc97a9846bb291f63899ef32f5d36ed30be69"><code>f1ddc97</code></a> ci: Remove <code>--branch</code> (<a href="https://redirect.github.com/NVIDIA/NeMo/issues/12809">#12809</a>)</li> <li><a href="https://github.com/NVIDIA/NeMo/commit/a0b4590dcad2068e2ef3c203ca9e610273eabbee"><code>a0b4590</code></a> Remove cuda graph code in TransformerBlock (<a href="https://redirect.github.com/NVIDIA/NeMo/issues/12779">#12779</a>)</li> <li><a href="https://github.com/NVIDIA/NeMo/commit/dc8c441fda39206601006c0d9144c262ffe18067"><code>dc8c441</code></a> Add BERT/Qwen2.5 Unit test and Refactor all GHA Conversion Tests (<a href="https://redirect.github.com/NVIDIA/NeMo/issues/12785">#12785</a>)</li> <li><a href="https://github.com/NVIDIA/NeMo/commit/79c9e9ab9304a1e610cd2dd9f1545194a5b18944"><code>79c9e9a</code></a> ci: Fix flaky LLM tests (<a href="https://redirect.github.com/NVIDIA/NeMo/issues/12807">#12807</a>)</li> <li><a href="https://github.com/NVIDIA/NeMo/commit/f2a1746d4fe2262b095ebda67bbbb0b6fb9f752f"><code>f2a1746</code></a> ci: Measure multiprocessing (<a href="https://redirect.github.com/NVIDIA/NeMo/issues/12778">#12778</a>)</li> <li><a href="https://github.com/NVIDIA/NeMo/commit/57f336235f6ec152d89249c9c0f97c426de21315"><code>57f3362</code></a> Fixes for audio doc warnings (<a href="https://redirect.github.com/NVIDIA/NeMo/issues/12736">#12736</a>)</li> <li><a href="https://github.com/NVIDIA/NeMo/commit/fae8897686d7460381b016de005b44ff27bd1092"><code>fae8897</code></a> build: Add trtllm (<a href="https://redirect.github.com/NVIDIA/NeMo/issues/12672">#12672</a>)</li> <li><a href="https://github.com/NVIDIA/NeMo/commit/d126c5f94e129cdae8316136aef4ccebd40e2258"><code>d126c5f</code></a> Use NeMo quick_gelu (<a href="https://redirect.github.com/NVIDIA/NeMo/issues/12787">#12787</a>)</li> <li><a href="https://github.com/NVIDIA/NeMo/commit/262b5ad4858b871168e9011658e56c156c6d9864"><code>262b5ad</code></a> Add pruning recipe (<a href="https://redirect.github.com/NVIDIA/NeMo/issues/12602">#12602</a>)</li> <li>Additional commits viewable in <a href="https://github.com/NVIDIA/NeMo/compare/cc8ff45aaf678d2b0f7439863f5f17ae06303f06...384ff022f4b6740b732ecdf149fabc2d2c686992">compare view</a></li> </ul> </details> <br /> Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
### Description Add script to clone bionemo-moco sub-package directly. Also cleaned up dependencies in tests. --------- Signed-off-by: Danny <dreidenbach@nvidia.com>
### Description ARM builds are still broken -- this should fix it. Just various dependency juggling. ### Type of changes <!-- Mark the relevant option with an [x] --> - [x] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Refactor - [ ] Documentation update - [ ] Other (please describe): ### CI Pipeline Configuration Configure CI behavior by applying the relevant labels: - [SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci) - Skip all continuous integration tests - [INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests) - Execute notebook validation tests in pytest - [INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests) - Execute tests labelled as slow in pytest for extensive testing > [!NOTE] > By default, the notebooks validation tests are skipped unless explicitly enabled. #### Authorizing CI Runs We use [copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation) to manage authorization of CI runs on NVIDIA's compute resources. * If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123) * If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an `/ok to test` comment on the pull request to trigger CI. This will need to be done for each new commit. ### Usage <!--- How does a user interact with the changed code --> ```python TODO: Add code snippet ``` ### Pre-submit Checklist <!--- Ensure all items are completed before submitting --> - [x] I have tested these changes locally - [x] I have updated the documentation accordingly - [x] I have added/updated tests as needed - [x] All existing tests pass successfully --------- Signed-off-by: Timur Rvachov <trvachov@nvidia.com>
Bumps [3rdparty/NeMo](https://github.com/NVIDIA/NeMo) from `b685967` to `b5952a6`. - [Release notes](https://github.com/NVIDIA/NeMo/releases) - [Commits](NVIDIA-NeMo/NeMo@b685967...b5952a6) --- updated-dependencies: - dependency-name: 3rdparty/NeMo dependency-version: b5952a6eb4d59648da381bcde1f66f192d3c1073 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>
Author
|
OK, I won't notify you again about this release, but will get in touch when a new version is available. If you change your mind, just re-open this PR and I'll resolve any conflicts on it. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bumps 3rdparty/NeMo from
b685967tob5952a6.Commits
b5952a6ci: upgrade GitHub Actions for Node.js 24 compatibility (#15537)29f3884Ignore PnC for WER calculation: streaming ASR inference (#15550)b83d1dbRename index for attention prior weights (#15551)db8e21dci: Update docs build job to exclude cu12 extra (#15553)7a81456docs: Fix docs build by setting uv conflicts for cu12 vs cu13 (#15548)801051a[TTS][MagpieTTS] Remove HF dependencies from CI Tests (#15544)da57b01Add support for partial transcription prefix in the prompt (#15449)c12a31dChanged the documentation getting started structure (#15460)ed49067Fix IsADirectoryError when cleaning up unfinished distributed checkpoints (#1...d9e94a1Clean ASR older and non-maintained models and its respective modules (#15507)Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting
@dependabot rebase.Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR:
@dependabot rebasewill rebase this PR@dependabot recreatewill recreate this PR, overwriting any edits that have been made to it@dependabot show <dependency name> ignore conditionswill show all of the ignore conditions of the specified dependency@dependabot ignore this major versionwill close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this minor versionwill close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this dependencywill close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)