Skip to content

[Env] Reduce the size of default deps to speed up the installation.#959

Open
HYLcool wants to merge 14 commits intomainfrom
opt/default_deps
Open

[Env] Reduce the size of default deps to speed up the installation.#959
HYLcool wants to merge 14 commits intomainfrom
opt/default_deps

Conversation

@HYLcool
Copy link
Copy Markdown
Collaborator

@HYLcool HYLcool commented Apr 1, 2026

As the title says. Move 15 default deps to other optional sets and import them with LazyLoader.

Tested on mac based on the pip source from aliyun with python 3.12, uv, and --no-cache:

Installing time Num. of installed packages site-packages size
Before ~1min45s 155 ~950MB
After ~25s 88 ~350MB
  • regression test

@HYLcool HYLcool requested review from ShenQianli, cmgzn and yxdyc April 1, 2026 07:34
@HYLcool HYLcool self-assigned this Apr 1, 2026
@HYLcool HYLcool added dj:ci/cd issues/PRs about CI/CD of Data-Juicer dj:efficiency regarding to efficiency issues and enhancements environment related to third-party dependency, DJ-pypi, DJ-docker, etc. labels Apr 1, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements a comprehensive lazy loading strategy to optimize startup performance and manage optional dependencies. It introduces a post_import hook in the LazyLoader utility, moves heavy libraries like matplotlib, spacy, librosa, and av to optional dependency groups in pyproject.toml, and replaces direct imports with lazy alternatives across the codebase. Additionally, it refactors Ray-based deduplicators to delay actor registration, fixes regex string literals, and disables the resource monitor by default. Feedback recommends moving LazyLoader class imports to the module level for better PEP 8 compliance and consistency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dj:ci/cd issues/PRs about CI/CD of Data-Juicer dj:efficiency regarding to efficiency issues and enhancements environment related to third-party dependency, DJ-pypi, DJ-docker, etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant