Implement Lazy Loading of Submodules for faster import #3732

arjxn-py · 2024-01-16T10:26:19Z

Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

Fixes #3490

Type of change

Please add a line in the relevant section of CHANGELOG.md to document the change (include PR #) - note reverse order of PR #s. If necessary, also add to the list of breaking changes.

New feature (non-breaking change which adds functionality)
Optimization (back-end change that speeds up the code)
Bug fix (non-breaking change which fixes an issue)

Key checklist:

No style issues: $ pre-commit run (or $ nox -s pre-commit) (see CONTRIBUTING.md for how to set this up to run automatically when committing locally, in just two lines of code)
All tests pass: $ python run-tests.py --all (or $ nox -s tests)
The documentation builds: $ python run-tests.py --doctest (or $ nox -s doctests)

You can run integration tests, unit tests, and doctests together at once, using $ python run-tests.py --quick (or $ nox -s quick).

Further checks:

Code is commented, particularly in hard-to-understand areas
Tests added that prove fix is effective or that feature works

codecov · 2024-01-16T10:48:42Z

Codecov Report

Attention: 5 lines in your changes are missing coverage. Please review.

Comparison is base (4484514) 99.59% compared to head (2b979af) 99.56%.

❗ Current head 2b979af differs from pull request most recent head ec571ba. Consider uploading reports for the commit ec571ba to get more accurate results

Files	Patch %	Lines
pybamm/util.py	72.22%	5 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #3732      +/-   ##
===========================================
- Coverage    99.59%   99.56%   -0.03%     
===========================================
  Files          258      258              
  Lines        20823    20839      +16     
===========================================
+ Hits         20738    20749      +11     
- Misses          85       90       +5

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

arjxn-py · 2024-01-16T14:57:54Z

Tried handling wildcard (i.e. *) imports with this function but wasn't able to, have to explicitly define each attribute to be imported separately (for expression_tree mostly), Would be grateful to have some suggestions here.

agriyakhetarpal · 2024-01-16T15:49:35Z

I think the importlib.util or pkgutil modules from the Python standard library might have something that we could use. Ideally, we should define everything that should be available to the user namespace in a list named __all__ for every package and module, we haven't done that but this PR could be a good start for that, PyBaMM-wide.

This might or might not help to construct a function whose output can provide lazy_loader with importable paths for all submodules:

import pkgutil
import pybamm

print([i for i in pkgutil.walk_packages(pybamm.__path__)])

returns

paths to modules under pybamm/

[ModuleInfo(module_finder=FileFinder('/Users/agriyakhetarpal/Desktop/PyBaMM/pybamm'), name='batch_study', ispkg=False),
 ModuleInfo(module_finder=FileFinder('/Users/agriyakhetarpal/Desktop/PyBaMM/pybamm'), name='callbacks', ispkg=False),
 ModuleInfo(module_finder=FileFinder('/Users/agriyakhetarpal/Desktop/PyBaMM/pybamm'), name='citations', ispkg=False),
 ModuleInfo(module_finder=FileFinder('/Users/agriyakhetarpal/Desktop/PyBaMM/pybamm'), name='discretisations', ispkg=True),
 ModuleInfo(module_finder=FileFinder('/Users/agriyakhetarpal/Desktop/PyBaMM/pybamm'), name='doc_utils', ispkg=False),
 ModuleInfo(module_finder=FileFinder('/Users/agriyakhetarpal/Desktop/PyBaMM/pybamm'), name='experiment', ispkg=True),
 ModuleInfo(module_finder=FileFinder('/Users/agriyakhetarpal/Desktop/PyBaMM/pybamm'), name='expression_tree', ispkg=True),
 ModuleInfo(module_finder=FileFinder('/Users/agriyakhetarpal/Desktop/PyBaMM/pybamm'), name='geometry', ispkg=True),
 ModuleInfo(module_finder=FileFinder('/Users/agriyakhetarpal/Desktop/PyBaMM/pybamm'), name='input', ispkg=True),
 ModuleInfo(module_finder=FileFinder('/Users/agriyakhetarpal/Desktop/PyBaMM/pybamm'), name='install_odes', ispkg=False),
 ModuleInfo(module_finder=FileFinder('/Users/agriyakhetarpal/Desktop/PyBaMM/pybamm'), name='logger', ispkg=False),
 ModuleInfo(module_finder=FileFinder('/Users/agriyakhetarpal/Desktop/PyBaMM/pybamm'), name='meshes', ispkg=True),
 ModuleInfo(module_finder=FileFinder('/Users/agriyakhetarpal/Desktop/PyBaMM/pybamm'), name='models', ispkg=True),
 ModuleInfo(module_finder=FileFinder('/Users/agriyakhetarpal/Desktop/PyBaMM/pybamm'), name='parameters', ispkg=True),
 ModuleInfo(module_finder=FileFinder('/Users/agriyakhetarpal/Desktop/PyBaMM/pybamm'), name='parameters_cli', ispkg=False),
 ModuleInfo(module_finder=FileFinder('/Users/agriyakhetarpal/Desktop/PyBaMM/pybamm'), name='plotting', ispkg=True),
 ModuleInfo(module_finder=FileFinder('/Users/agriyakhetarpal/Desktop/PyBaMM/pybamm'), name='settings', ispkg=False),
 ModuleInfo(module_finder=FileFinder('/Users/agriyakhetarpal/Desktop/PyBaMM/pybamm'), name='simulation', ispkg=False),
 ModuleInfo(module_finder=FileFinder('/Users/agriyakhetarpal/Desktop/PyBaMM/pybamm'), name='solvers', ispkg=True),
 ModuleInfo(module_finder=FileFinder('/Users/agriyakhetarpal/Desktop/PyBaMM/pybamm'), name='spatial_methods', ispkg=True),
 ModuleInfo(module_finder=FileFinder('/Users/agriyakhetarpal/Desktop/PyBaMM/pybamm'), name='util', ispkg=False),
 ModuleInfo(module_finder=FileFinder('/Users/agriyakhetarpal/Desktop/PyBaMM/pybamm'), name='version', ispkg=False)]

You should be able to run this on packages under pybamm.* too. This is under pybamm.expression_tree as an example:

print([i for i in pkgutil.walk_packages(pybamm.expression_tree.__path__)])

returns

For the expression tree

ModuleInfo(module_finder=FileFinder('/Users/agriyakhetarpal/Desktop/PyBaMM/pybamm/expression_tree'), name='array', ispkg=False)
ModuleInfo(module_finder=FileFinder('/Users/agriyakhetarpal/Desktop/PyBaMM/pybamm/expression_tree'), name='averages', ispkg=False)
ModuleInfo(module_finder=FileFinder('/Users/agriyakhetarpal/Desktop/PyBaMM/pybamm/expression_tree'), name='binary_operators', ispkg=False)
ModuleInfo(module_finder=FileFinder('/Users/agriyakhetarpal/Desktop/PyBaMM/pybamm/expression_tree'), name='broadcasts', ispkg=False)
ModuleInfo(module_finder=FileFinder('/Users/agriyakhetarpal/Desktop/PyBaMM/pybamm/expression_tree'), name='concatenations', ispkg=False)
ModuleInfo(module_finder=FileFinder('/Users/agriyakhetarpal/Desktop/PyBaMM/pybamm/expression_tree'), name='exceptions', ispkg=False)
ModuleInfo(module_finder=FileFinder('/Users/agriyakhetarpal/Desktop/PyBaMM/pybamm/expression_tree'), name='functions', ispkg=False)
ModuleInfo(module_finder=FileFinder('/Users/agriyakhetarpal/Desktop/PyBaMM/pybamm/expression_tree'), name='independent_variable', ispkg=False)
ModuleInfo(module_finder=FileFinder('/Users/agriyakhetarpal/Desktop/PyBaMM/pybamm/expression_tree'), name='input_parameter', ispkg=False)
ModuleInfo(module_finder=FileFinder('/Users/agriyakhetarpal/Desktop/PyBaMM/pybamm/expression_tree'), name='interpolant', ispkg=False)
ModuleInfo(module_finder=FileFinder('/Users/agriyakhetarpal/Desktop/PyBaMM/pybamm/expression_tree'), name='matrix', ispkg=False)
ModuleInfo(module_finder=FileFinder('/Users/agriyakhetarpal/Desktop/PyBaMM/pybamm/expression_tree'), name='operations', ispkg=True)
ModuleInfo(module_finder=FileFinder('/Users/agriyakhetarpal/Desktop/PyBaMM/pybamm/expression_tree'), name='parameter', ispkg=False)
ModuleInfo(module_finder=FileFinder('/Users/agriyakhetarpal/Desktop/PyBaMM/pybamm/expression_tree'), name='printing', ispkg=True)
ModuleInfo(module_finder=FileFinder('/Users/agriyakhetarpal/Desktop/PyBaMM/pybamm/expression_tree'), name='scalar', ispkg=False)
ModuleInfo(module_finder=FileFinder('/Users/agriyakhetarpal/Desktop/PyBaMM/pybamm/expression_tree'), name='state_vector', ispkg=False)
ModuleInfo(module_finder=FileFinder('/Users/agriyakhetarpal/Desktop/PyBaMM/pybamm/expression_tree'), name='symbol', ispkg=False)
ModuleInfo(module_finder=FileFinder('/Users/agriyakhetarpal/Desktop/PyBaMM/pybamm/expression_tree'), name='unary_operators', ispkg=False)
ModuleInfo(module_finder=FileFinder('/Users/agriyakhetarpal/Desktop/PyBaMM/pybamm/expression_tree'), name='variable', ispkg=False)
ModuleInfo(module_finder=FileFinder('/Users/agriyakhetarpal/Desktop/PyBaMM/pybamm/expression_tree'), name='vector', ispkg=False)

and you can now manipulate this import machinery to describe your own module paths by joining the module_finder and name attributes for pybamm.util.lazy_loader! This might or might not work for our case (but at least TBH this is a great start for diving into all of the core details and the fundamentals of Pythonic imports, and I managed to learn something too!).

Note: (there might be a better solution of course, I was just reporting on what I have found by now)

agriyakhetarpal · 2024-01-18T15:39:39Z

I was reading up on making imports faster and I think we don't need tor essentially redefine the import pybamm mechanism by using lazy_loader everywhere, just making it faster would be great in itself. i.e., we can segregate what is causing a slower import and target that module in specific for a lazy-import, while keeping the other imports as is. For example, something like

from .expression_tree.operations.jacobian import Jacobian

or

from .util import (
    get_parameters_filepath,
    have_jax,
    install_jax,
    have_optional_dependency,
    is_jax_compatible,
    get_git_commit_info,
)

will never be slow, but something like this

from .models.submodels import (
    active_material,
    convection,
    current_collector,
    electrolyte_conductivity,
    electrolyte_diffusion,
    electrode,
    external_circuit,
    interface,
    oxygen_diffusion,
    particle,
    porosity,
    thermal,
    transport_efficiency,
    particle_mechanics,
    equivalent_circuit_elements,
)

has too many things going on in one statement. I figure that it will be just the *-imports and those for the battery models like the above one that are currently causing the bottleneck, so it would be great if we can do some profiling in this PR to gauge what parts are slow – we can explicitly focus on those and leave the rest (and leave out the parts that cause unit and integration tests to fail – just a case of tedious trial and error).

It might be worth scouting SciPy to see how they manage it (despite being a large monorepo, import scipy takes just about a second at most). In the footnotes for SPEC-0001, I found this project that can help automate a few things related to *-imports: https://github.com/Erotemic/mkinit

N.B. For profiling to be effective, we will have to delete the __pycache__ directories and their files (compiled bytecode) everywhere since Python imports get cached very efficiently.

arjxn-py · 2024-01-19T08:44:51Z

Thanks a lot @agriyakhetarpal, i'm looking accordingly to your suggestion into the segregation first and getting back in some time.

agriyakhetarpal · 2024-01-20T19:54:07Z

I might have solved it: for modules that further have sub-modules (pybamm.models.*), we might need to modify every __init__.py file, not just the base one. I tried this command in the root directory which set everything up for me automatically:

mkinit --lazy_loader --inplace --recursive pybamm  # also accepts --noall so as to not add __all__ in the files

and I presumably removed the import caches (all __pycache__ folders) recursively by doing

rm -rf $(find . -type d -name '__pycache__')

But as I understand, this still does not remove all caches – so %timeit importlib.import_module("pybamm") in a interactive REPL might be returning an incorrect calculation and could be unreliable. It does feel faster, though – but it could be a case of placebo.

Outputs from importing PyBaMM

Without lazy_loader

%timeit importlib.import_module("pybamm")

The slowest run took 13.77 times longer than the fastest. This could mean that an intermediate result is being cached.
2.33 µs ± 3.05 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

With lazy_loader

%timeit importlib.import_module("pybamm")

The slowest run took 11.64 times longer than the fastest. This could mean that an intermediate result is being cached.
2.29 µs ± 2.84 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

This isn't helpful TBH, so I started deleting the entries from the sys.modules dictionary:

import sys
keys_to_delete = [key for key in sys.modules if 'pybamm' in key]
for key in keys_to_delete:
    del sys.modules[key]

in addition to removing the __pycache__ directories, but I haven't found a substantial increase yet. Maybe this can help act as a precursor for your experiments as we go further. A reliable method, though a bit tedious, would be to re-clone the repository, re-install it from source, apply the lazy_loader changes, and then benchmark the import pybamm statement – maybe try that and see what you get?

P.S. You can consider putting out a testimonial (scientific-python/lazy-loader#50) after we manage to do this

…mkinit)

arjxn-py · 2024-01-21T05:39:19Z

This does the work, thanks for suggesting @agriyakhetarpal.
The import for me is superfast now as compared to before 🎉
I tested it however I could by iteratively re-cloning PyBaMM again and again and also in new environments including Python 3.9, 3.10, 3.11 & 3.12. The results below are from 3.12 -

Without lazy-import :

The slowest run took 7.07 times longer than the fastest. This could mean that an intermediate result is being cached.
58 µs ± 51.7 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

With lazy-import :

The slowest run took 14.75 times longer than the fastest. This could mean that an intermediate result is being cached.
2.74 µs ± 3.74 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

However, there are a number of CI failures raised due to big change which would be fixed iteratively.

arjxn-py · 2024-01-21T06:03:48Z

But as I understand, this still does not remove all caches – so %timeit importlib.import_module("pybamm") in a interactive REPL might be returning an incorrect calculation and could be unreliable. It does feel faster, though – but it could be a case of placebo.

I am also encountering an issue with %timeit importlib.import_module("pybamm") returning an incorrect calculation(might be due to cache), however I can easily recognize a clear difference with or without lazy-import while importing PyBaMM.

arjxn-py · 2024-01-21T08:28:22Z

Now as I'm trying to fix CI failures, I have first tried to resolve imports in the __init__.py files but I am realizing that it is leading to again defining imports like before recursively plus it is not ideal.

Other way around is to fix the imports in code i.e

pybamm.Symbol should be pybamm.expression_tree.Symbol
pybamm.Parameter should be pybamm.expression_tree.parameter.Parameter
pybamm.multiply should be pybamm.expression_tree.binary_operators.multiply
& so on

But this approach is leading to big API change. So I'd be more than happy to have suggestions here & also would like to know if i'm missing any page and can also try something else instead.

valentinsulzer · 2024-01-21T08:56:28Z

Definitely don't make that API change. Also having to keep the __init__.py files up to date like that will make development more challenging. What are the benefits of lazy loading that would justify this?

agriyakhetarpal · 2024-01-21T09:02:34Z

Usually import pybamm can take you 15 to 20 seconds in a Jupyter notebook (mostly the case for me) – lazy loading is supposed to just import PyBaMM and then import the modules under it at runtime dynamically when they are first used in a script.

I don't think we are using lazy_loader correctly, since it was defined for scenarios like this (to not bring a breaking change in terms of the public API). Can we skip lazy loading pybamm.Symbol and others (whatever ones break the tests)?

arjxn-py · 2024-01-21T09:05:59Z

Without lazy-import :

The slowest run took 7.07 times longer than the fastest. This could mean that an intermediate result is being cached.
58 µs ± 51.7 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

With lazy-import :

The slowest run took 14.75 times longer than the fastest. This could mean that an intermediate result is being cached.
2.74 µs ± 3.74 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

One known benefit of lazy-loading is reduced import time and it is also expected to improve performance as Many of the Calculated attributes or attributes that are loaded are using an expensive operation i.e. from .expression_tree.symbol import *

arjxn-py · 2024-01-21T09:09:46Z

Can we skip lazy loading pybamm.Symbol and others (whatever ones break the tests)?

I've tried doing that but it was leading more imports to skip lazy loading in a chain but yes we can try this to reach at point where there are no or minimum failures and least import time.

pybamm/models/submodels/convection/transverse/__init__.py

arjxn-py · 2024-02-20T08:59:27Z

Closing this as discussed in the last developer meeting.

arjxn-py added 3 commits January 15, 2024 16:25

define a lazy_import function

dfee63a

Modify lazy_import to have attribute

c9ae592

Implement lazy import to idaklu solver

40d1994

arjxn-py added the infrastructure Packaging, distribution, and releases label Jan 16, 2024

arjxn-py marked this pull request as draft January 16, 2024 10:27

Add handling for *

2e4bdad

arjxn-py added 10 commits January 17, 2024 20:59

Start reolving .util imports

c7108fc

Resolve .util imports lazily

26a78d5

Resolve .logger, .settings & .citations imports lazily

258bf3f

Resolve .expression_tree imports lazily

948381b

Resolve remaining operational .expression_tree imports lazily

05bf354

Resolve model imports lazily

13ac84c

Resolve submodels imports lazily

9b54d71

Remaining submodel interfaces

983e0f8

Resolve .geometry imports lazily

b48169c

Resolve .parameter imports lazily

bae2485

Revert breaking imports

fd9a4e0

arjxn-py force-pushed the lazy-imports branch from edab7af to fd9a4e0 Compare January 20, 2024 14:24

arjxn-py added 3 commits January 20, 2024 20:21

Try lazy_loader instead

3fe12fd

Revert breaking .expression_tree, .models & .submodels imports

fcdf41d

Resolve other utility imports

2b979af

arjxn-py added 3 commits January 21, 2024 09:36

Generate every __init__ using [mkinit](https://github.com/Erotemic/…

9c43ac6

…mkinit)

Remove existing imports

94a6a9e

Merge branch 'develop' into lazy-imports

5e033d3

Defined lazy_import function is now redundant

4d0d480

arjxn-py added 2 commits January 21, 2024 11:52

.logger, .settings, .citations causing CI issues

1c0d832

Resolve error causing imports

fb660ec

arjxn-py force-pushed the lazy-imports branch from d327af2 to fb660ec Compare January 21, 2024 09:52

arjxn-py added 2 commits January 21, 2024 15:26

Set EAGER_IMPORT blank for tests

c7e3da2

Manage to fix import atleeast

074616e

arjxn-py force-pushed the lazy-imports branch from d013f0a to 074616e Compare January 21, 2024 12:13

Resolve Min, Max & other imports causing failures

ec571ba

kratman reviewed Jan 22, 2024

View reviewed changes

pybamm/models/submodels/convection/transverse/__init__.py Show resolved Hide resolved

arjxn-py closed this Feb 20, 2024

agriyakhetarpal mentioned this pull request Mar 5, 2024

Make private functions and classes more explicit #2427

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Implement Lazy Loading of Submodules for faster import #3732

Implement Lazy Loading of Submodules for faster import #3732

Uh oh!

arjxn-py commented Jan 16, 2024 •

edited

Loading

Uh oh!

codecov bot commented Jan 16, 2024 •

edited

Loading

Uh oh!

arjxn-py commented Jan 16, 2024 •

edited

Loading

Uh oh!

agriyakhetarpal commented Jan 16, 2024 •

edited

Loading

Uh oh!

agriyakhetarpal commented Jan 18, 2024

Uh oh!

arjxn-py commented Jan 19, 2024

Uh oh!

agriyakhetarpal commented Jan 20, 2024

Uh oh!

arjxn-py commented Jan 21, 2024 •

edited

Loading

Uh oh!

arjxn-py commented Jan 21, 2024

Uh oh!

arjxn-py commented Jan 21, 2024 •

edited

Loading

Uh oh!

valentinsulzer commented Jan 21, 2024

Uh oh!

agriyakhetarpal commented Jan 21, 2024

Uh oh!

arjxn-py commented Jan 21, 2024 •

edited

Loading

Uh oh!

arjxn-py commented Jan 21, 2024

Uh oh!

Uh oh!

arjxn-py commented Feb 20, 2024

Uh oh!

Uh oh!

Uh oh!

Implement Lazy Loading of Submodules for faster import #3732

Implement Lazy Loading of Submodules for faster import #3732

Uh oh!

Conversation

arjxn-py commented Jan 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Key checklist:

Further checks:

Uh oh!

codecov bot commented Jan 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

arjxn-py commented Jan 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

agriyakhetarpal commented Jan 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

agriyakhetarpal commented Jan 18, 2024

Uh oh!

arjxn-py commented Jan 19, 2024

Uh oh!

agriyakhetarpal commented Jan 20, 2024

Uh oh!

arjxn-py commented Jan 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arjxn-py commented Jan 21, 2024

Uh oh!

arjxn-py commented Jan 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

valentinsulzer commented Jan 21, 2024

Uh oh!

agriyakhetarpal commented Jan 21, 2024

Uh oh!

arjxn-py commented Jan 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arjxn-py commented Jan 21, 2024

Uh oh!

Uh oh!

arjxn-py commented Feb 20, 2024

Uh oh!

Uh oh!

arjxn-py commented Jan 16, 2024 •

edited

Loading

codecov bot commented Jan 16, 2024 •

edited

Loading

arjxn-py commented Jan 16, 2024 •

edited

Loading

agriyakhetarpal commented Jan 16, 2024 •

edited

Loading

arjxn-py commented Jan 21, 2024 •

edited

Loading

arjxn-py commented Jan 21, 2024 •

edited

Loading

arjxn-py commented Jan 21, 2024 •

edited

Loading