feat(install): slim install for remote/NIM-only inference on Intel Mac by charlesbluca · Pull Request #1830 · NVIDIA/NeMo-Retriever

charlesbluca · 2026-04-09T15:15:12Z

Description

Simplifies the nemo_retriever install story by collapsing the dependency tier hierarchy and fixing several optional-import bugs surfaced during code review.

Dependency restructuring

[remote] extra removed — its packages (pypdfium2, pillow, nltk, markitdown, langchain-nvidia-ai-endpoints) are now in the base install. A bare pip install nemo_retriever is now the correct remote/NIM-inference install.
lancedb moved to core — it is the default VDB solution and is now always present; [stores] retains only the supplementary backends (duckdb, neo4j).
local-cpu + local-gpu merged into [local] — GPU is assumed for local model inference; the CPU-only tier is removed.
conflicts block removed from pyproject.toml — it referenced undefined cpu/cu130 extras; torch wheel selection is already handled by sys_platform markers.

Import guard cleanup

Removed try/except ImportError around import lancedb in lancedb_store.py — no longer needed since lancedb is a core dep.
Removed if lancedb is None guard in handle_lancedb and the redundant lazy local import in _write_rows_to_lancedb.
Restored handle_lancedb as a module-level import in graph_pipeline.py (was made lazy as a workaround; safe now that lancedb is always installed).

Other code review fixes

Added None sentinels for all optional pymilvus/minio/scipy names in milvus.py except ImportError block (previously only CONSISTENCY_BOUNDED was set).
Guarded cv2.INTER_LANCZOS4 / cv2.resize in _resize_image_opencv with a PIL fallback when cv2 is unavailable.
Restored graceful degradation in embed_text_main_text_embed: returns an error-payload DataFrame on failure instead of re-raising (restores original pipeline contract).
Fixed misplaced docstrings in txt_file_to_chunks_df and txt_bytes_to_chunks_df (appeared after variable assignments, not recognized by Python).
Added early return in txt_file_to_chunks_df for empty files before the tokenizer is loaded.

Test fixes

Mocked _get_tokenizer / _get_txt_tokenizer in txt and html unit tests to avoid HuggingFace network calls.
Skips audio benchmark test gracefully when Ray cannot write to the filesystem (EROFS) rather than failing.

Docs / CI

Updated DEPENDENCY_LAYERS.md to reflect new structure.
Updated integration-test-library-mode.yml to install bare nemo_retriever (no extra needed).

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.
If adjusting docker-compose.yaml environment variables have you ensured those are mimicked in the Helm values.yaml file.

…d macOS Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

macOS CI runners (Apple Silicon) can be slow to start the Ray raylet, causing the node to time out during startup. Setting the wait to 120s prevents premature failures on resource-constrained runners. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Guard cv2, sklearn, langdetect, minio, pymilvus, and unstructured_client behind try/except or deferred imports so nemo_retriever and nv_ingest_api can be imported without GPU/local-inference packages installed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- pdfium.py: numpy channel-swap fallback when cv2 is None - extract.py: PIL encode fallback for page image rendering without cv2 - transforms.py: PIL fallbacks for numpy_to_base64, base64_to_numpy - table_and_chart.py (api + nemo_retriever): guard bare sklearn DBSCAN imports with ImportError fallback using naive y-grid row grouping - ocr/shared.py: same guard for _blocks_to_pseudo_markdown DBSCAN - lancedb_store.py: skip index creation when no embedding rows written - text_embed/runtime.py: raise on embedding failure instead of silent no-op - graph/executor.py: per-stage debug logging for inprocess pipeline - text_embed/cpu_operator.py: debug logging for embed input/output stats Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Windows also supports CUDA torch wheels; extend the platform marker to sys_platform == 'linux' or sys_platform == 'win32' so Windows users get the GPU-enabled wheel from the PyTorch CUDA index rather than falling through to the CPU-only PyPI wheel. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

greptile-apps · 2026-04-09T15:20:13Z

Greptile Summary

This PR restructures the nemo_retriever package to support a slim remote/NIM-only install (bare pip install nemo_retriever) suitable for Intel Mac. Key changes include: moving lancedb and several previously-optional packages to core deps, merging local-cpu/local-gpu into a single [local] extra, removing the [remote] extra tier, fixing try/except ImportError guards for cv2, sklearn, langdetect, and pymilvus, and restoring graceful embedding failure handling (narrowed from BaseException to Exception with proper structured logging).

Confidence Score: 5/5

PR is safe to merge — all previously raised P0/P1 issues are addressed and remaining findings are non-blocking P2 style suggestions.

All critical issues from prior review threads (milvus None sentinels, cv2 attribute error, lancedb import, GitHub Actions SHA, BaseException swallowing) are resolved in this PR. The three remaining comments are P2: logger-inside-method style, a missing early-return for empty bytes, and a missing call-site guard in the legacy unstructured_io extractor. None of these block functionality on the target slim install path.

No files require special attention; the most impactful changes in pyproject.toml and lancedb_store.py look correct.

Vulnerabilities

No security concerns identified. No hardcoded credentials, secrets, or tokens were found. The new workflow file passes API keys via ${{ secrets.NGC_NV_DEVELOPER_NVCF }} correctly. No SQL injection vectors were introduced.

Important Files Changed

Filename	Overview
nemo_retriever/pyproject.toml	Dependency restructuring looks correct: lancedb promoted to core, [remote] collapsed into base, [local] merged, conflicts block removed; torch/torchvision platform markers properly applied.
nemo_retriever/src/nemo_retriever/text_embed/runtime.py	Error handling correctly narrowed from BaseException to Exception; print() replaced with structured logger.error(); error payload uses empty list embedding which is safely skipped by _build_lancedb_rows_from_df.
nemo_retriever/src/nemo_retriever/text_embed/cpu_operator.py	Added diagnostic debug logging for embedding operator; `import logging as _logging` placed inside process() method body rather than at module level, which is re-executed on every call.
nemo_retriever/src/nemo_retriever/graph/executor.py	Added per-stage debug logging to InprocessExecutor; logger imported and instantiated inside the run method body rather than at module level.
api/src/nv_ingest_api/util/image_processing/transforms.py	cv2 import now guarded with try/except; _resize_image_opencv correctly falls back to PIL when cv2 is None, using the hardcoded constant 4 (INTER_LANCZOS4) and Image.LANCZOS.
client/src/nv_ingest_client/util/vdb/milvus.py	All optional imports (Minio, MilvusClient, AnnSearchRequest, csr_array, etc.) now have None sentinels in the except ImportError block; previously only CONSISTENCY_BOUNDED was handled.
nemo_retriever/src/nemo_retriever/txt/split.py	Docstrings moved to correct position; early return added in txt_file_to_chunks_df for empty files; txt_bytes_to_chunks_df lacks the symmetric early return for empty byte content, causing unnecessary tokenizer load.
.github/workflows/integration-test-library-mode.yml	New workflow for cross-platform integration tests; bare nemo_retriever install (no extras) confirmed correct since lancedb is now a core dep; actions unpinned to mutable tags (pre-existing pattern).
api/src/nv_ingest_api/internal/extract/pdf/engines/unstructured_io.py	unstructured_client imports guarded with try/except; no guard at call site means TypeError instead of a clear ImportError when the package is absent and the extractor is invoked.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["pip install nemo_retriever\n(bare / slim install)"] --> B["Core deps always present:\nlancedb, pypdfium2, pillow,\nnltk, markitdown,\nlangchain-nvidia-ai-endpoints,\nnv-ingest-api, nv-ingest-client"]

    A --> C["[local] extra\ntorch, transformers,\nnemotron models, vllm"]
    A --> D["[multimedia] extra\nsoundfile, scipy, cairosvg"]
    A --> E["[stores] extra\nduckdb, neo4j"]
    A --> F["[benchmarks] extra\ndatasets, open-clip-torch"]
    C & D & E & F --> G["[all] convenience\nnemo_retriever[local,multimedia,stores,benchmarks]"]

    B --> H["Import guards\n(cv2=None, sklearn=None, langdetect=None)"]
    H --> I["PIL fallback in\n_resize_image_opencv\n& _render_page_to_base64"]
    H --> J["Grid fallback in\nDBSCAN clustering"]
    H --> K["UNKNOWN fallback in\nlanguage detection"]

Prompt To Fix All With AI

This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/graph/executor.py
Line: 132-134

Comment:
**Logger instantiated inside method body**

`import logging as _logging` and `_logging.getLogger(__name__)` are called on every invocation of `execute()`. While Python caches both the module and the logger, this is unconventional — the logger should be declared at module level (or class level) once. The same pattern appears inside `_BatchEmbedCPUActor.process()` in `cpu_operator.py`.

```suggestion
import logging

logger = logging.getLogger(__name__)
```

Then replace `_exec_log` references with `logger`.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/txt/split.py
Line: 269-272

Comment:
**`txt_bytes_to_chunks_df` missing early return for empty content**

`txt_file_to_chunks_df` gained an early return before loading the tokenizer when the file is empty. `txt_bytes_to_chunks_df` decodes the bytes but then immediately loads the tokenizer even for empty/whitespace-only content, causing an unnecessary HuggingFace model lookup. Adding the same guard keeps both functions consistent.

```suggestion
    path = str(Path(path).resolve())
    raw = content_bytes.decode(encoding, errors="replace")
    if not raw or not raw.strip():
        return pd.DataFrame(
            columns=["text", "path", "page_number", "metadata"],
        ).astype({"page_number": "int64"})
    model_id = tokenizer_model_id or DEFAULT_TOKENIZER_MODEL_ID
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: api/src/nv_ingest_api/internal/extract/pdf/engines/unstructured_io.py
Line: 191-195

Comment:
**Missing call-site guard for optional `UnstructuredClient`**

When `unstructured_client` is not installed, all five sentinels (`UnstructuredClient`, `operations`, `shared`, `BackoffStrategy`, `RetryConfig`) are `None`. Calling `UnstructuredClient(...)` directly will raise `TypeError: 'NoneType' object is not callable`, which is harder to diagnose than an explicit error. A short guard at the top of the function body would give the caller an actionable message:

```python
if UnstructuredClient is None:
    raise ImportError(
        "unstructured_client is required for the unstructured_io extractor. "
        "Install it with: pip install unstructured-client"
    )
```

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (4): Last reviewed commit: "Fix README install instructions to refle..." | Re-trigger Greptile}

client/src/nv_ingest_client/util/vdb/milvus.py

api/src/nv_ingest_api/util/image_processing/transforms.py

nemo_retriever/src/nemo_retriever/text_embed/runtime.py

greptile-apps · 2026-04-09T15:20:23Z

.github/workflows/integration-test-library-mode.yml

+
+      - name: Set up Python 3.12
+        uses: actions/setup-python@v5
+        with:
+          python-version: '3.12'
+
+      - name: Install uv
+        run: pip install uv


GitHub Actions not pinned to commit SHA

Both actions/checkout@v4 and actions/setup-python@v5 use mutable version tags. Per the repository's github-actions-security rule, third-party actions must be pinned to a full commit SHA to prevent supply-chain attacks.

Suggested change

- name: Set up Python 3.12

uses: actions/setup-python@v5

with:

python-version: '3.12'

- name: Install uv

run: pip install uv

- name: Check out repository code

uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

with:

ref: ${{ inputs.source-ref != '' && inputs.source-ref || github.ref }}

- name: Set up Python 3.12

uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5.6.0

with:

python-version: '3.12'

Prompt To Fix With AI

This is a comment left during a code review. Path: .github/workflows/integration-test-library-mode.yml Line: 47-54 Comment: **GitHub Actions not pinned to commit SHA** Both `actions/checkout@v4` and `actions/setup-python@v5` use mutable version tags. Per the repository's `github-actions-security` rule, third-party actions must be pinned to a full commit SHA to prevent supply-chain attacks. ```suggestion - name: Check out repository code uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2 with: ref: ${{ inputs.source-ref != '' && inputs.source-ref || github.ref }} - name: Set up Python 3.12 uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5.6.0 with: python-version: '3.12' ``` How can I resolve this? If you propose a fix, please make it concise.

.github/workflows/integration-test-library-mode.yml

- Wrap bare `import lancedb` in try/except in lancedb_store.py; add None guard in handle_lancedb for [remote]-only installs - Make graph_pipeline.py import of handle_lancedb lazy to avoid startup crash when lancedb is not installed - Add None sentinels for all optional pymilvus/minio/scipy imports in milvus.py (previously only CONSISTENCY_BOUNDED was set) - Guard cv2.INTER_LANCZOS4 and cv2.resize in _resize_image_opencv with PIL fallback when cv2 is unavailable - Restore graceful degradation in embed_text_main_text_embed: return error-payload DataFrame instead of re-raising on embedding failure Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…izer - Restore handle_lancedb as module-level import in graph_pipeline.py so monkeypatch.setattr works in batch_pipeline tests (safe now that lancedb_store.py guards its own import lancedb) - Early-return in txt_file_to_chunks_df before calling tokenizer when file content is empty, matching the pattern in html/convert.py - Mock _get_tokenizer/_get_txt_tokenizer in txt and html unit tests to avoid hitting HuggingFace (proxy-blocked in this environment) - Skip audio benchmark test gracefully when Ray cannot write to the filesystem (EROFS) rather than failing Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Move docstrings in txt_file_to_chunks_df and txt_bytes_to_chunks_df to immediately follow the def line so they are recognized as actual docstrings (previously appeared after variable assignments) - Remove conflicts block in pyproject.toml that referenced undefined cpu/cu130 extras; torch wheel selection is already handled by sys_platform markers in [tool.uv.sources] Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Merge remote extra into base dependencies (document parsing and NIM client libs are now core; no extra needed for remote-only installs) - Move lancedb into base dependencies (it is the default VDB solution) - Merge local-cpu and local-gpu into a single local extra (GPU assumed) - Drop lancedb from [stores]; update [all] to reference [local] - Remove now-unnecessary lancedb import guards in lancedb_store.py - Update integration-test workflow to install bare nemo_retriever - Update DEPENDENCY_LAYERS.md to reflect new structure Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Replace core-only install with [local] extra for local GPU path; add separate remote-only install command for NIM inference users - Reframe Step 2 torch override: clarifies it is a PyPI/CUDA workaround, not a completion of the GPU stack; mark it skip-able for remote users - Add note before pipeline examples that they require [local] + CUDA torch - Fix [svg] extra reference to [multimedia] Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…workflow Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

charlesbluca and others added 16 commits April 8, 2026 17:56

Add workflow_dispatch integration test for library mode on Windows an…

0ef1eae

…d macOS Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Split torch cu130 deps into explicit group

6a21df3

Use inprocess run mode for now

027c154

Pass API key via --api-key flag using NGC_NV_DEVELOPER_NVCF secret

79e8f0b

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Initial plan & refactors for slimmer instal

1f31946

Drop nv-ingest as dep

8c45923

Merge remote-tracking branch 'upstream/main' into slim-install

264ab24

Merge branch 'slim-install'

5bef6f7

Add macOS x64 to workflow

26475c8

torch cuda index rename

4b120f0

Merge branch 'slim-install'

9f2035b

Try switching to macos-26-intel

45f2732

charlesbluca requested review from a team as code owners April 9, 2026 15:15

charlesbluca requested a review from nkmcalli April 9, 2026 15:15

greptile-apps bot reviewed Apr 9, 2026

View reviewed changes

Modify unit test install

9e01344

charlesbluca marked this pull request as draft April 9, 2026 15:33

Linting

326de9a

greptile-apps bot reviewed Apr 9, 2026

View reviewed changes

.github/workflows/integration-test-library-mode.yml Show resolved Hide resolved

charlesbluca and others added 5 commits April 9, 2026 15:53

Drop agent doc

48a8953

charlesbluca and others added 3 commits April 9, 2026 16:52

ci: add nightly schedule trigger and fix secret name in library mode …

ec99f30

…workflow Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Compat code for ray[data] 2.49

92f777a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(install): slim install for remote/NIM-only inference on Intel Mac#1830

feat(install): slim install for remote/NIM-only inference on Intel Mac#1830
charlesbluca wants to merge 26 commits intoNVIDIA:mainfrom
charlesbluca:slim-install

charlesbluca commented Apr 9, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Apr 9, 2026 •

edited

Loading

Confidence Score: 5/5

Vulnerabilities

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot Apr 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

charlesbluca commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Dependency restructuring

Import guard cleanup

Other code review fixes

Test fixes

Docs / CI

Checklist

Uh oh!

greptile-apps bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Vulnerabilities

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

charlesbluca commented Apr 9, 2026 •

edited

Loading

greptile-apps bot commented Apr 9, 2026 •

edited

Loading