-
Notifications
You must be signed in to change notification settings - Fork 319
feat(install): slim install for remote/NIM-only inference on Mac/Windows #1830
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 18 commits
Commits
Show all changes
35 commits
Select commit
Hold shift + click to select a range
0ef1eae
Add workflow_dispatch integration test for library mode on Windows an…
charlesbluca 6a21df3
Split torch cu130 deps into explicit group
charlesbluca bbe51d3
ci: increase RAY_raylet_start_wait_time_s for macOS integration tests
charlesbluca 027c154
Use inprocess run mode for now
charlesbluca 79e8f0b
Pass API key via --api-key flag using NGC_NV_DEVELOPER_NVCF secret
charlesbluca 1f31946
Initial plan & refactors for slimmer instal
charlesbluca 8c45923
Drop nv-ingest as dep
charlesbluca 531015e
Make heavy optional deps lazy for slim Intel Mac install
charlesbluca 6190f04
Fix Intel Mac slim-install blockers in PDF/image/embed pipeline
charlesbluca 264ab24
Merge remote-tracking branch 'upstream/main' into slim-install
charlesbluca fd53df6
Use CUDA torch index for Windows as well as Linux
charlesbluca 5bef6f7
Merge branch 'slim-install'
charlesbluca 26475c8
Add macOS x64 to workflow
charlesbluca 4b120f0
torch cuda index rename
charlesbluca 9f2035b
Merge branch 'slim-install'
charlesbluca 45f2732
Try switching to macos-26-intel
charlesbluca 9e01344
Modify unit test install
charlesbluca 326de9a
Linting
charlesbluca 0722e6f
Guard optional imports and restore graceful embedding failure handling
charlesbluca 4591f62
Fix test failures from lazy import change and network-dependent token…
charlesbluca 4273b5a
Fix misplaced docstrings and remove invalid uv conflicts block
charlesbluca 25622df
Simplify dependency groups; move remote and lancedb to core
charlesbluca 48a8953
Drop agent doc
charlesbluca 03dc39f
Fix README install instructions to reflect simplified dependency groups
charlesbluca ec99f30
ci: add nightly schedule trigger and fix secret name in library mode …
charlesbluca 92f777a
Compat code for ray[data] 2.49
charlesbluca eabdf74
Merge upstream/main into slim-install
charlesbluca 1085a2c
Merge remote-tracking branch 'upstream/main' into slim-install
charlesbluca 9e8dbeb
fix(embed): avoid doubling /embeddings on HTTP embedding URLs
charlesbluca dec707d
test(embed): align BatchEmbedCPUActor test with local HF default
charlesbluca fec7dd8
Merge branch 'main' into slim-install
charlesbluca 0988bd7
fix(embed): remote-only CPU embed actor; drop inprocess debug logs
charlesbluca e51cfd9
Merge branch 'main' into slim-install
charlesbluca e35a7de
Merge branch 'main' into slim-install
jdye64 cf71664
Merge branch 'main' into slim-install
charlesbluca File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,77 @@ | ||||||||||||||||||||||||||||||||||||
| # SPDX-FileCopyrightText: Copyright (c) 2024-25, NVIDIA CORPORATION & AFFILIATES. | ||||||||||||||||||||||||||||||||||||
| # All rights reserved. | ||||||||||||||||||||||||||||||||||||
| # SPDX-License-Identifier: Apache-2.0 | ||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||
| name: Library Mode Integration Tests (Windows & macOS) | ||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||
| on: | ||||||||||||||||||||||||||||||||||||
| workflow_dispatch: | ||||||||||||||||||||||||||||||||||||
| inputs: | ||||||||||||||||||||||||||||||||||||
| source-ref: | ||||||||||||||||||||||||||||||||||||
| description: 'Git ref to test (branch, tag, or SHA). Defaults to the dispatched branch.' | ||||||||||||||||||||||||||||||||||||
| required: false | ||||||||||||||||||||||||||||||||||||
| type: string | ||||||||||||||||||||||||||||||||||||
| default: '' | ||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||
| jobs: | ||||||||||||||||||||||||||||||||||||
| integration-test: | ||||||||||||||||||||||||||||||||||||
| name: Integration Tests (${{ matrix.os-label }}) | ||||||||||||||||||||||||||||||||||||
| runs-on: ${{ matrix.runner }} | ||||||||||||||||||||||||||||||||||||
| timeout-minutes: 90 | ||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||
| strategy: | ||||||||||||||||||||||||||||||||||||
| fail-fast: false | ||||||||||||||||||||||||||||||||||||
| matrix: | ||||||||||||||||||||||||||||||||||||
| include: | ||||||||||||||||||||||||||||||||||||
| - runner: windows-latest | ||||||||||||||||||||||||||||||||||||
| os-label: windows-x64 | ||||||||||||||||||||||||||||||||||||
| - runner: macos-26 | ||||||||||||||||||||||||||||||||||||
| os-label: macos-arm64 | ||||||||||||||||||||||||||||||||||||
| - runner: macos-26-intel | ||||||||||||||||||||||||||||||||||||
| os-label: macos-x64 | ||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||
| env: | ||||||||||||||||||||||||||||||||||||
| # NIM endpoint URLs — edit these directly to point at different deployments | ||||||||||||||||||||||||||||||||||||
| PAGE_ELEMENTS_INVOKE_URL: "https://ai.api.nvidia.com/v1/cv/nvidia/nemotron-page-elements-v3" | ||||||||||||||||||||||||||||||||||||
| OCR_INVOKE_URL: "https://ai.api.nvidia.com/v1/cv/nvidia/nemoretriever-ocr-v1" | ||||||||||||||||||||||||||||||||||||
| GRAPHIC_ELEMENTS_INVOKE_URL: "https://ai.api.nvidia.com/v1/cv/nvidia/nemotron-graphic-elements-v1" | ||||||||||||||||||||||||||||||||||||
| TABLE_STRUCTURE_INVOKE_URL: "https://ai.api.nvidia.com/v1/cv/nvidia/nemotron-table-structure-v1" | ||||||||||||||||||||||||||||||||||||
| EMBED_INVOKE_URL: "https://integrate.api.nvidia.com/v1" | ||||||||||||||||||||||||||||||||||||
| EMBED_MODEL_NAME: "nvidia/llama-nemotron-embed-1b-v2" | ||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||
| steps: | ||||||||||||||||||||||||||||||||||||
| - name: Check out repository code | ||||||||||||||||||||||||||||||||||||
| uses: actions/checkout@v4 | ||||||||||||||||||||||||||||||||||||
| with: | ||||||||||||||||||||||||||||||||||||
| ref: ${{ inputs.source-ref != '' && inputs.source-ref || github.ref }} | ||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||
| - name: Set up Python 3.12 | ||||||||||||||||||||||||||||||||||||
| uses: actions/setup-python@v5 | ||||||||||||||||||||||||||||||||||||
| with: | ||||||||||||||||||||||||||||||||||||
| python-version: '3.12' | ||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||
| - name: Install uv | ||||||||||||||||||||||||||||||||||||
| run: pip install uv | ||||||||||||||||||||||||||||||||||||
|
Comment on lines
+50
to
+57
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Both
Suggested change
Prompt To Fix With AIThis is a comment left during a code review.
Path: .github/workflows/integration-test-library-mode.yml
Line: 47-54
Comment:
**GitHub Actions not pinned to commit SHA**
Both `actions/checkout@v4` and `actions/setup-python@v5` use mutable version tags. Per the repository's `github-actions-security` rule, third-party actions must be pinned to a full commit SHA to prevent supply-chain attacks.
```suggestion
- name: Check out repository code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
ref: ${{ inputs.source-ref != '' && inputs.source-ref || github.ref }}
- name: Set up Python 3.12
uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5.6.0
with:
python-version: '3.12'
```
How can I resolve this? If you propose a fix, please make it concise. |
||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||
| - name: Install nemo-retriever and dependencies | ||||||||||||||||||||||||||||||||||||
|
charlesbluca marked this conversation as resolved.
|
||||||||||||||||||||||||||||||||||||
| shell: bash | ||||||||||||||||||||||||||||||||||||
| run: | | ||||||||||||||||||||||||||||||||||||
| uv pip install --system -e api/ -e client/ -e "nemo_retriever[remote]" | ||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||
| - name: Run graph pipeline on PDFs | ||||||||||||||||||||||||||||||||||||
| shell: bash | ||||||||||||||||||||||||||||||||||||
| env: | ||||||||||||||||||||||||||||||||||||
| PYTHONPATH: nemo_retriever/src | ||||||||||||||||||||||||||||||||||||
| run: | | ||||||||||||||||||||||||||||||||||||
| python -m nemo_retriever.examples.graph_pipeline ./data \ | ||||||||||||||||||||||||||||||||||||
| --run-mode inprocess \ | ||||||||||||||||||||||||||||||||||||
| --input-type pdf \ | ||||||||||||||||||||||||||||||||||||
| --api-key "${{ secrets.NGC_NV_DEVELOPER_NVCF }}" \ | ||||||||||||||||||||||||||||||||||||
| --page-elements-invoke-url "$PAGE_ELEMENTS_INVOKE_URL" \ | ||||||||||||||||||||||||||||||||||||
| --ocr-invoke-url "$OCR_INVOKE_URL" \ | ||||||||||||||||||||||||||||||||||||
| --use-graphic-elements \ | ||||||||||||||||||||||||||||||||||||
| --graphic-elements-invoke-url "$GRAPHIC_ELEMENTS_INVOKE_URL" \ | ||||||||||||||||||||||||||||||||||||
| --use-table-structure \ | ||||||||||||||||||||||||||||||||||||
| --table-structure-invoke-url "$TABLE_STRUCTURE_INVOKE_URL" \ | ||||||||||||||||||||||||||||||||||||
| --embed-invoke-url "$EMBED_INVOKE_URL" \ | ||||||||||||||||||||||||||||||||||||
| --embed-model-name "$EMBED_MODEL_NAME" | ||||||||||||||||||||||||||||||||||||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,121 @@ | ||
| # Dependency Layering Plan | ||
|
|
||
| This document describes the restructured optional-extras model for `nemo_retriever/pyproject.toml`. | ||
|
|
||
| ## Problem | ||
|
|
||
| The previous `pyproject.toml` listed ~50 packages as required dependencies, meaning every install | ||
| pulled in torch, vLLM, CUDA wheels, nemotron models, GPU monitoring tooling, etc. — regardless of | ||
| whether the user intended to run local models or simply call remote NIM endpoints. This made the | ||
| package impossible to install on Intel Macs and unnecessarily heavy everywhere. | ||
|
|
||
| ## Solution: Layered Optional Extras | ||
|
|
||
| Dependencies are now split into a slim base plus composable optional extras. Each tier builds on | ||
| the previous via self-referencing extras. | ||
|
|
||
| ### Tier hierarchy | ||
|
|
||
| ``` | ||
| nemo_retriever ← slim base: ray, fastapi, pydantic, HTTP clients, nv-ingest* | ||
| └── [remote] ← adds: pypdfium2, pillow, nltk, markitdown, langchain-nvidia-ai-endpoints | ||
| └── [local-cpu] ← adds: torch CPU, transformers, nemotron models (ARM Mac compatible) | ||
| └── [local-gpu] ← adds: nvidia-ml-py, vLLM (Linux/CUDA only) | ||
| └── [multimedia] ← adds: soundfile + scipy (ASR), cairosvg (SVG) | ||
| (can also be combined with any tier independently) | ||
|
|
||
| [stores] ← lancedb, duckdb, duckdb-engine, neo4j (independent, add to any tier) | ||
| [benchmarks] ← datasets, open-clip-torch (BEIR evaluation only) | ||
| [dev] ← build, pytest | ||
| [all] ← local-gpu + multimedia + stores + benchmarks | ||
| ``` | ||
|
|
||
| ### Install commands by use case | ||
|
|
||
| | Use case | Platform | Command | | ||
| |---|---|---| | ||
| | All remote (NIM) inference | Intel Mac, any | `uv pip install "nemo_retriever[remote,stores]"` | | ||
| | Local PDF ingestion, CPU | ARM Mac | `uv pip install "nemo_retriever[local-cpu,stores]"` | | ||
| | Local PDF ingestion, GPU | Linux + CUDA | `uv pip install "nemo_retriever[local-gpu,stores]"` | | ||
| | Full multimedia (GPU + audio + SVG) | Linux + CUDA | `uv pip install "nemo_retriever[local-gpu,multimedia,stores]"` | | ||
| | Everything | Linux + CUDA | `uv pip install "nemo_retriever[all]"` | | ||
|
|
||
| ## What Each Extra Contains | ||
|
|
||
| ### Base (always installed) | ||
| Pure framework infrastructure — no ML, no storage. | ||
|
|
||
| - `ray[data,serve]` — pipeline orchestration | ||
| - `pandas`, `numpy`, `tqdm` — data handling | ||
| - `fastapi`, `uvicorn`, `python-multipart` — service API | ||
| - `httpx`, `requests`, `urllib3` — HTTP clients | ||
| - `pydantic`, `typer`, `pyyaml`, `rich` — config, CLI, output | ||
| - `universal-pathlib`, `debugpy` — utilities | ||
| - `nv-ingest`, `nv-ingest-api`, `nv-ingest-client` — core ingest packages | ||
|
|
||
| ### `[remote]` | ||
| Everything needed to run the full pipeline via remote NIM endpoints. No GPU, no local models. | ||
| Installs cleanly on Intel Macs. | ||
|
|
||
| - `pypdfium2` — PDF page splitting and rendering | ||
| - `pillow` — image I/O | ||
| - `nltk` — text splitting utilities | ||
| - `markitdown` — HTML/document-to-markdown conversion | ||
| - `langchain-nvidia-ai-endpoints` — LLM/SQL via NVIDIA NIM | ||
|
|
||
| ### `[local-cpu]` | ||
| Adds local HuggingFace model inference. On Linux, torch resolves to a CUDA wheel from the | ||
| PyTorch index; on Mac it falls through to the PyPI CPU wheel. | ||
|
|
||
| - `transformers`, `tokenizers`, `accelerate==1.12.0` — HuggingFace model loading | ||
| - `torch~=2.9.1`, `torchvision` — PyTorch (CPU on Mac, CUDA on Linux) | ||
| - `einops`, `easydict`, `addict`, `timm`, `albumentations`, `scikit-learn` — model utilities | ||
| - `nemotron-page-elements-v3`, `nemotron-graphic-elements-v1`, `nemotron-table-structure-v1` — layout/table/chart detection | ||
| - `nemotron-ocr` — end-to-end OCR (Linux only) | ||
|
|
||
| ### `[local-gpu]` | ||
| Adds GPU monitoring and fast LLM inference on top of `[local-cpu]`. | ||
|
|
||
| - `nvidia-ml-py` — GPU memory and utilization monitoring | ||
| - `vllm==0.16.0` — fast GPU-accelerated LLM inference (Linux only) | ||
|
|
||
| ### `[multimedia]` | ||
| Specialized media format support. Can be combined with any inference tier. | ||
|
|
||
| - `soundfile`, `scipy` — audio file I/O and resampling for local Parakeet ASR | ||
| - `cairosvg` — SVG-to-image rendering (requires `libcairo` system library) | ||
|
|
||
| ### `[stores]` | ||
| Vector, SQL, and graph storage backends. Independent of inference tier. | ||
|
|
||
| - `lancedb` — vector database for embedding storage and hybrid search | ||
| - `duckdb`, `duckdb-engine` — SQL execution on structured/tabular data | ||
| - `neo4j` — graph database for knowledge graph ingestion | ||
|
|
||
| ### `[benchmarks]` | ||
| BEIR evaluation tools. Not needed for production use. | ||
|
|
||
| - `datasets` — HuggingFace datasets (used in `recall/beir.py`) | ||
| - `open-clip-torch` — OpenAI CLIP implementation | ||
|
|
||
| ## Torch Index Configuration | ||
|
|
||
| `[tool.uv.sources]` uses a platform marker so the right torch wheel is resolved automatically: | ||
|
|
||
| ```toml | ||
| torch = [ | ||
| { index = "pytorch-cu130", marker = "sys_platform == 'linux'" }, | ||
| # Mac: falls through to PyPI CPU wheel | ||
| ] | ||
| ``` | ||
|
|
||
| No manual intervention needed — `uv` picks the right wheel per platform. | ||
|
|
||
| ## Cleanups Applied | ||
|
|
||
| The following bugs in the original flat deps list were fixed: | ||
|
|
||
| - `accelerate` was listed twice (`>=1.1.0` and `==1.12.0`) — kept `==1.12.0` only | ||
| - `tqdm` was listed twice — deduplicated | ||
| - `typer` was listed twice — deduplicated | ||
| - `[svg]` extra merged into `[multimedia]` (cairosvg is a media format conversion tool) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
permissions:block on new workflowBoth
integration-test-library-mode.yml(new) andretriever-unit-tests.yml(modified) are missing an explicitpermissions:block. Without one, theGITHUB_TOKENinherits repository-default permissions, which can be write-scoped depending on org settings. Per thegithub-actions-securityrule, every workflow must declare least-privilege scope. These workflows only needcontents: read.Add at the workflow (or job) level:
The same fix applies to
retriever-unit-tests.yml.Rule Used: GitHub Actions workflows must: pin third-party act... (source)
Prompt To Fix With AI