Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docs] update Python-package installation guide #6767

Open
wants to merge 115 commits into
base: master
Choose a base branch
from

Conversation

StrikerRUS
Copy link
Collaborator

@StrikerRUS StrikerRUS commented Dec 26, 2024

Every installation method and some their combinations were tested with the help of our CI (Appveyor and GitHub Actions). Everything is working except --sanitizers flag: package installation is successful but it's import fails:

log:
Fatal Python error: Aborted

Current thread 0x00007ff856c629c0 (most recent call first):
  File "/Users/runner/miniforge/lib/python3.12/ctypes/__init__.py", line 379 in __init__
  File "/Users/runner/miniforge/lib/python3.12/ctypes/__init__.py", line 460 in LoadLibrary
  File "/Users/runner/miniforge/lib/python3.12/site-packages/lightgbm/libpath.py", line 49 in <module>
  File "<frozen importlib._bootstrap>", line 488 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 999 in exec_module
  File "<frozen importlib._bootstrap>", line 935 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1331 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1360 in _find_and_load
  File "/Users/runner/miniforge/lib/python3.12/site-packages/lightgbm/basic.py", line 9 in <module>
  File "<frozen importlib._bootstrap>", line 488 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 999 in exec_module
  File "<frozen importlib._bootstrap>", line 935 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1331 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1360 in _find_and_load
  File "/Users/runner/miniforge/lib/python3.12/site-packages/lightgbm/__init__.py", line 11 in <module>
  File "<frozen importlib._bootstrap>", line 488 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 999 in exec_module
  File "<frozen importlib._bootstrap>", line 935 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1331 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1360 in _find_and_load
  File "/Users/runner/work/LightGBM/LightGBM/tests/python_package_test/test_basic.py", line 15 in <module>
  File "/Users/runner/miniforge/lib/python3.12/site-packages/_pytest/assertion/rewrite.py", line 184 in exec_module
  File "<frozen importlib._bootstrap>", line 935 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1331 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1360 in _find_and_load
  File "<frozen importlib._bootstrap>", line 1387 in _gcd_import
  File "/Users/runner/miniforge/lib/python3.12/importlib/__init__.py", line 90 in import_module
  File "/Users/runner/miniforge/lib/python3.12/site-packages/_pytest/pathlib.py", line 587 in import_path
  File "/Users/runner/miniforge/lib/python3.12/site-packages/_pytest/python.py", line 493 in importtestmodule
  File "/Users/runner/miniforge/lib/python3.12/site-packages/_pytest/python.py", line 546 in _getobj
  File "/Users/runner/miniforge/lib/python3.12/site-packages/_pytest/python.py", line 284 in obj
  File "/Users/runner/miniforge/lib/python3.12/site-packages/_pytest/python.py", line 562 in _register_setup_module_fixture
  File "/Users/runner/miniforge/lib/python3.12/site-packages/_pytest/python.py", line 549 in collect
  File "/Users/runner/miniforge/lib/python3.12/site-packages/_pytest/runner.py", line 389 in collect
  File "/Users/runner/miniforge/lib/python3.12/site-packages/_pytest/runner.py", line 341 in from_call
  File "/Users/runner/miniforge/lib/python3.12/site-packages/_pytest/runner.py", line 391 in pytest_make_collect_report
  File "/Users/runner/miniforge/lib/python3.12/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/Users/runner/miniforge/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/Users/runner/miniforge/lib/python3.12/site-packages/pluggy/_hooks.py", line [513](https://github.com/microsoft/LightGBM/actions/runs/12499103876/job/34873729965#step:3:514) in __call__
  File "/Users/runner/miniforge/lib/python3.12/site-packages/_pytest/runner.py", line 567 in collect_one_node
  File "/Users/runner/miniforge/lib/python3.12/site-packages/_pytest/main.py", line 835 in _collect_one_node
  File "/Users/runner/miniforge/lib/python3.12/site-packages/_pytest/main.py", line 970 in genitems
  File "/Users/runner/miniforge/lib/python3.12/site-packages/_pytest/main.py", line 975 in genitems
  File "/Users/runner/miniforge/lib/python3.12/site-packages/_pytest/main.py", line 809 in perform_collect
  File "/Users/runner/miniforge/lib/python3.12/site-packages/_pytest/main.py", line 347 in pytest_collection
  File "/Users/runner/miniforge/lib/python3.12/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/Users/runner/miniforge/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/Users/runner/miniforge/lib/python3.12/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/Users/runner/miniforge/lib/python3.12/site-packages/_pytest/main.py", line 336 in _main
  File "/Users/runner/miniforge/lib/python3.12/site-packages/_pytest/main.py", line 283 in wrap_session
  File "/Users/runner/miniforge/lib/python3.12/site-packages/_pytest/main.py", line 330 in pytest_cmdline_main
  File "/Users/runner/miniforge/lib/python3.12/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/Users/runner/miniforge/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/Users/runner/miniforge/lib/python3.12/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/Users/runner/miniforge/lib/python3.12/site-packages/_pytest/config/__init__.py", line 175 in main
  File "/Users/runner/miniforge/lib/python3.12/site-packages/_pytest/config/__init__.py", line 201 in console_main
  File "/Users/runner/miniforge/bin/pytest", line 8 in <module>

Extension modules: numpy._core._multiarray_umath, numpy.linalg._umath_linalg, scipy._lib._ccallback_c, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_blas, scipy.linalg._matfuncs_expm, scipy.linalg._decomp_update, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.linalg._propack._spropack, scipy.sparse.linalg._propack._dpropack, scipy.sparse.linalg._propack._cpropack, scipy.sparse.linalg._propack._zpropack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, sklearn.__check_build._check_build, psutil._psutil_osx, psutil._psutil_posix, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, scipy.spatial._ckdtree, scipy._lib.messagestream, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.spatial.transform._rotation, scipy.optimize._group_columns, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize._highs.cython.src._highs_wrapper, scipy.optimize._highs._highs_wrapper, scipy.optimize._highs.cython.src._highs_constants, scipy.optimize._highs._highs_constants, scipy.linalg._interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.optimize._direct, scipy.integrate._odepack, scipy.integrate._quadpack, scipy.integrate._vode, scipy.integrate._dop, scipy.integrate._lsoda, scipy.interpolate._fitpack, scipy.interpolate._dfitpack, scipy.interpolate._bspl, scipy.interpolate._ppoly, scipy.interpolate.interpnd, scipy.interpolate._rbfinterp_pythran, scipy.interpolate._rgi_cython, scipy.special.cython_special, scipy.stats._stats, scipy.stats._biasedurn, scipy.stats._levy_stable.levyst, scipy.stats._stats_pythran, scipy._lib._uarray._uarray, scipy.stats._ansari_swilk_statistics, scipy.stats._sobol, scipy.stats._qmc_cy, scipy.stats._mvn, scipy.stats._rcont.rcont, scipy.stats._unuran.unuran_wrapper, scipy.ndimage._nd_image, _ni_label, scipy.ndimage._ni_label, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pandas._libs.ops, pandas._libs.hashing, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.internals, pandas._libs.indexing, pandas._libs.index, pandas._libs.writers, pandas._libs.join, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing, sklearn.utils._isfinite, sklearn.utils.sparsefuncs_fast, sklearn.utils.murmurhash, sklearn.utils._openmp_helpers, sklearn.preprocessing._csr_polynomial_expansion, sklearn.preprocessing._target_encoder_fast, scipy.io.matlab._mio_utils, scipy.io.matlab._streams, scipy.io.matlab._mio5_utils, sklearn.datasets._svmlight_format_fast, sklearn.utils._random, sklearn.utils._vector_sentinel, sklearn.feature_extraction._hashing_fast, sklearn.metrics.cluster._expected_mutual_info_fast, sklearn.metrics._dist_metrics, sklearn.metrics._pairwise_distances_reduction._datasets_pair, sklearn.utils._cython_blas, sklearn.metrics._pairwise_distances_reduction._base, sklearn.metrics._pairwise_distances_reduction._middle_term_computer, sklearn.utils._heap, sklearn.utils._sorting, sklearn.metrics._pairwise_distances_reduction._argkmin, sklearn.metrics._pairwise_distances_reduction._argkmin_classmode, sklearn.metrics._pairwise_distances_reduction._radius_neighbors, sklearn.metrics._pairwise_distances_reduction._radius_neighbors_classmode, sklearn.metrics._pairwise_fast (total: 164)
/Users/runner/work/LightGBM/LightGBM/.ci/setup.sh: line 41:  8194 Abort trap: 6           pytest $GITHUB_WORKSPACE/tests/python_package_test

I'm not sure we ever checked even building LightGBM (not testlightgbm) with sanitizers.

@StrikerRUS StrikerRUS added the doc label Dec 26, 2024
@@ -34,10 +34,14 @@
# OpenCL include directory.
# --opencl-library=FILEPATH
# Path to OpenCL library.
# --sanitizers=LIST_OF_SANITIZERS
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not excited about adding --sanitizers, --debug, and --nohomebrew to this script when no one has asked for them.

This build-python.sh script is already more complex than I'm comfortable with and is duplicating functionality that'd be better-handled by CMake.

Rather than trying to have a flag here for every option() in this project's CMake, I think we should just rely on documentation describing how to use --precompile to use a shared library built with whatever customizations you want.

I'd love to put our energy into instead working towards eliminating this script eventually with heavier use of CMake + all of the improvements that have gone into scikit-build-core over the last year (some of which I sort of describe in #6774).

What do you think?

# --user
# Install into user-specific instead of global site-packages directory.
# Only used with 'install' command.

set -e -u

echo "building lightgbm"
echo "[INFO] building lightgbm"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the consistency with

echo "[INFO] Attempting to build 32-bit version of LightGBM, which is only supported on Windows with generator '${CMAKE_GENERATOR}'."

and to better distinguish between our own logs and scikit-build-core ones.

#########
# flags #
#########
--bit32)
export CMAKE_GENERATOR="Visual Studio 17 2022"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allow to use any default Visual Studio version.

Comment on lines +343 to +345
elif test -f ../lib_lightgbm.dll; then
echo "[INFO] found pre-compiled lib_lightgbm.dll"
cp ../lib_lightgbm.dll ./lightgbm/lib/lib_lightgbm.dll
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +353 to +356
elif test -f ../windows/x64/Debug_DLL/lib_lightgbm.dll; then
echo "[INFO] found pre-compiled windows/x64/Debug_DLL/lib_lightgbm.dll"
cp ../windows/x64/Debug_DLL/lib_lightgbm.dll ./lightgbm/lib/lib_lightgbm.dll
cp ../windows/x64/Debug_DLL/lib_lightgbm.lib ./lightgbm/lib/lib_lightgbm.lib
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be located there after compilation with Debug_DLL config:

<ProjectConfiguration Include="Debug_DLL|x64">
<Configuration>Debug_DLL</Configuration>
<Platform>x64</Platform>

@@ -197,77 +271,76 @@ Install from `conda-forge channel <https://anaconda.org/conda-forge/lightgbm>`_

conda install -c conda-forge lightgbm

These are precompiled packages that are fast to install.
Use them instead of ``pip install`` if any of the following are true:
These packages support **CPU**, **GPU** and **CUDA** versions out of the box.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplify a little bit.

@@ -197,77 +271,76 @@ Install from `conda-forge channel <https://anaconda.org/conda-forge/lightgbm>`_

conda install -c conda-forge lightgbm

These are precompiled packages that are fast to install.
Use them instead of ``pip install`` if any of the following are true:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, what is the official recommended (preferred) method for installing LightGBM? pip or conda?
If conda, then we should change the order of them in the doc.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think that the conda-forge packages offer a lot of benefits. But it's not as clear as one approach being "recommended" and the other being "not recommended".

Situations where conda-forge's lightgbm should be preferred:

Situations where pip install lightgbm should be preferred

  • you want to customize the build (e.g. use a different Boost version, build a variant like the MPI package, compile with debug symbols, etc.)
  • you want to (or are required to!) use pip for non-LightGBM reasons

All that said... I'm indifferent about the order. If you want to put conda before pip, that's fine. But at this point, I don't think we should say that one is "preferred" or "recommended".


Run ``sh ./build-python.sh install --nomp`` to disable **OpenMP** support. All requirements from `Build Threadless Version section <#build-threadless-version>`__ apply for this installation option as well.

Run ``sh ./build-python.sh install --mpi`` to enable **MPI** support. All requirements from `Build MPI Version section <#build-mpi-version>`__ apply for this installation option as well.

Run ``sh ./build-python.sh install --mingw``, if you want to use **MinGW-w64** on **Windows** instead of **Visual Studio**. All requirements from `Build with MinGW-w64 on Windows section <#build-with-mingw-w64-on-windows>`__ apply for this installation option as well.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep the same order as in the Install from PyPI -> Build from Sources section.


Build With MSBuild
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Building with MSBuild is a part of "build dynamic library from sources by any method you prefer", so there is no need in a separate section for it.

Comment on lines -69 to -71
cmake.args = [
"-D__BUILD_FOR_PYTHON:BOOL=ON"
]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use define instead of args. This allows to use --config-setting=cmake.args=-G'MinGW Makefiles' from comand line because defines are appended and args are overwriten.

Warning
Setting defines through cmake.args in pyproject.toml is discouraged because this cannot be later altered via command line. Use cmake.define instead.
https://scikit-build-core.readthedocs.io/en/latest/configuration.html#configuring-cmake-arguments-and-defines

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh cool, did not know this! Really nice improvement, this is much better than using the CMAKE_GENERATOR environment variable, I think.

--config-settings=cmake.define.USE_GPU=ON \
--config-settings=cmake.define.OpenCL_INCLUDE_DIR="/usr/local/cuda/include/" \
--config-settings=cmake.define.OpenCL_LIBRARY="/usr/local/cuda/lib64/libOpenCL.so"
pip install lightgbm --no-binary lightgbm --config-settings=cmake.define.USE_GPU=ON --config-settings=cmake.define.OpenCL_INCLUDE_DIR="/usr/local/cuda/include/" --config-settings=cmake.define.OpenCL_LIBRARY="/usr/local/cuda/lib64/libOpenCL.so"
Copy link
Collaborator Author

@StrikerRUS StrikerRUS Jan 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot predict which shell user is using. For example, to split long line in PowerShell, ` char should be used. So for better UX I think it better to provide universal copy-pastable commands.

@StrikerRUS StrikerRUS marked this pull request as ready for review January 19, 2025 19:04
@StrikerRUS StrikerRUS changed the title [WIP][docs] update Python-package installation guide [docs] update Python-package installation guide Jan 19, 2025
Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! I left some suggestions for your consideration.

@@ -54,77 +61,98 @@ To install all dependencies needed to use ``pandas`` in LightGBM, append ``[pand

pip install 'lightgbm[pandas]'

|

Use LightGBM with Matplotlib and Graphviz
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Use LightGBM with Matplotlib and Graphviz
Use LightGBM Plotting Capabilities

For the other blocks, "Use LightGBM with {library}" is a helpful title, because that's how users are likely to be thinking about it. e.g. "I have a Dask cluster, how do I use LightGBM with Dask".

For plotting, I don't think that's true. I suspect that for most users, they are not thinking "how do I use LightGBM with graphviz?" and instead are just interested in "how do I visualize the structure of my LightGBM models?".

In other words... the fact that these functions happen to use matplotlib and graphviz isn't central to their value.

Would you consider renaming this?


Build with MinGW-w64 on Windows
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code:: sh

# in sh.exe, git bash, or other Unix-like shell
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh nice!

@@ -197,77 +271,76 @@ Install from `conda-forge channel <https://anaconda.org/conda-forge/lightgbm>`_

conda install -c conda-forge lightgbm

These are precompiled packages that are fast to install.
Use them instead of ``pip install`` if any of the following are true:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think that the conda-forge packages offer a lot of benefits. But it's not as clear as one approach being "recommended" and the other being "not recommended".

Situations where conda-forge's lightgbm should be preferred:

Situations where pip install lightgbm should be preferred

  • you want to customize the build (e.g. use a different Boost version, build a variant like the MPI package, compile with debug symbols, etc.)
  • you want to (or are required to!) use pip for non-LightGBM reasons

All that said... I'm indifferent about the order. If you want to put conda before pip, that's fine. But at this point, I don't think we should say that one is "preferred" or "recommended".


That script requires some dependencies like ``build``, ``scikit-build-core``, and ``wheel``.
In environments with restricted or no internet access, install those tools and then pass ``--no-isolation``.
If you get any errors during installation or due to any other reasons, you may want to build dynamic library from sources by any method you prefer (see `Installation Guide <https://github.com/microsoft/LightGBM/blob/master/docs/Installation-Guide.rst>`__). For example, you can use ``MSBuild`` tool and `solution file <https://github.com/microsoft/LightGBM/blob/master/windows/LightGBM.sln>`__ from the repo.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If you get any errors during installation or due to any other reasons, you may want to build dynamic library from sources by any method you prefer (see `Installation Guide <https://github.com/microsoft/LightGBM/blob/master/docs/Installation-Guide.rst>`__). For example, you can use ``MSBuild`` tool and `solution file <https://github.com/microsoft/LightGBM/blob/master/windows/LightGBM.sln>`__ from the repo.
If you get any errors during installation or due to any other reasons, you may want to build dynamic library from sources by any method you prefer (see `Installation Guide <https://github.com/microsoft/LightGBM/blob/master/docs/Installation-Guide.rst>`__).
For example, you can use ``MSBuild`` tool and `solution file <https://github.com/microsoft/LightGBM/blob/master/windows/LightGBM.sln>`__ from the repo.

(general comment for this whole PR)

Can we please split separate sentences onto their own lines? Consecutive lines should render exactly the same in on GitHub and readthedocs... they should be consolidated into one paragraph with wrapping.

But having them on separate lines makes git diffs easier to review, the files easier to edit in a text editor, and the git diff more informative as a history of changes to lines.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing! Do you want me to split lines in the whole file or only in the touched paragraphs?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with it being every line in the file, I don't mind reviewing that.

But also fine if you want to do that in a later PR (or want me to do it, since it's my annoying suggestion 😅 )

Comment on lines -69 to -71
cmake.args = [
"-D__BUILD_FOR_PYTHON:BOOL=ON"
]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh cool, did not know this! Really nice improvement, this is much better than using the CMAKE_GENERATOR environment variable, I think.

@@ -34,10 +34,14 @@
# OpenCL include directory.
# --opencl-library=FILEPATH
# Path to OpenCL library.
# --sanitizers=LIST_OF_SANITIZERS
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not excited about adding --sanitizers, --debug, and --nohomebrew to this script when no one has asked for them.

This build-python.sh script is already more complex than I'm comfortable with and is duplicating functionality that'd be better-handled by CMake.

Rather than trying to have a flag here for every option() in this project's CMake, I think we should just rely on documentation describing how to use --precompile to use a shared library built with whatever customizations you want.

I'd love to put our energy into instead working towards eliminating this script eventually with heavier use of CMake + all of the improvements that have gone into scikit-build-core over the last year (some of which I sort of describe in #6774).

What do you think?

Comment on lines +216 to +226
Build without Searching in Homebrew Folders for Dependencies on macOS
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code:: sh

pip install lightgbm --no-binary lightgbm --config-settings=cmake.define.USE_HOMEBREW_FALLBACK=OFF

Use this option to stop looking into Homebrew standard folders for finding dependencies (e.g. OpenMP) during the build on macOS.

|

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Build without Searching in Homebrew Folders for Dependencies on macOS
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code:: sh
pip install lightgbm --no-binary lightgbm --config-settings=cmake.define.USE_HOMEBREW_FALLBACK=OFF
Use this option to stop looking into Homebrew standard folders for finding dependencies (e.g. OpenMP) during the build on macOS.
|

I think we should omit this section. If you're using pip install (not pip wheel or python -m build) like this, then you're not building a package for redistribution... you're just installing one for runtime use on the same system.

The OpenMP found at build time will become the first RPATH entry in lib_lightgbm.dylib, and so it'll be found when you run import lightgbm.

As long as that library is still there later, it doesn't matter whether you later brew install libomp on the same system.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just do not think we need 1 paragraph-size section like this matching every option() in CMakeLists.txt. It creates duplication across the project and and adds to the size of the doc, which I think is intimidating for some users.

I think it is fine to only have specific sections for:

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel the same way about other new sections below... do not think we need to have a separate section in this doc describing how to build the Python package with sanitizers or with USE_DEBUG enabled.

plotting = [
"graphviz",
"matplotlib"
]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree! Every optional dependency should have an extra. Totally support this.

Comment on lines +83 to +89
[tool.scikit-build.cmake]
version = "CMakeLists.txt"
build-type = "Release"

[tool.scikit-build.cmake.define]
__BUILD_FOR_PYTHON = "ON"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the benefit of moving these things down here? Are the equivalent fields in [tool.scikit-build] deprecated?

Copy link
Collaborator Author

@StrikerRUS StrikerRUS Jan 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, these nested fields are identical, I simply don't know how to specify nested mapping tool.scikit-build.cmake.define in existing setup 😬

And I think that

[tool.scikit-build.cmake.define]
__BUILD_FOR_PYTHON = "ON"
some_new_option = "value"

is more readable than

[tool.scikit-build]
cmake.define.__BUILD_FOR_PYTHON = "ON"
cmake.define.some_new_option = "value"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you about that, I'd expected that this form would work:

[tool.scikit-build]
cmake.define = [
  "__BUILD_FOR_PYTHON=ON",
  "some_new_option=value"
]

But when I search around on GitHub, I don't see any examples like that: https://github.com/search?q=%22cmake.define%22+language%3ATOML&type=code&p=1

Let's leave this as you have it in the PR, I like it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants