-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[docs] update Python-package installation guide #6767
base: master
Are you sure you want to change the base?
Conversation
@@ -34,10 +34,14 @@ | |||
# OpenCL include directory. | |||
# --opencl-library=FILEPATH | |||
# Path to OpenCL library. | |||
# --sanitizers=LIST_OF_SANITIZERS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add missed options from https://github.com/microsoft/LightGBM/blob/master/docs/Installation-Guide.rst.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not excited about adding --sanitizers
, --debug
, and --nohomebrew
to this script when no one has asked for them.
This build-python.sh
script is already more complex than I'm comfortable with and is duplicating functionality that'd be better-handled by CMake.
Rather than trying to have a flag here for every option()
in this project's CMake, I think we should just rely on documentation describing how to use --precompile
to use a shared library built with whatever customizations you want.
I'd love to put our energy into instead working towards eliminating this script eventually with heavier use of CMake + all of the improvements that have gone into scikit-build-core
over the last year (some of which I sort of describe in #6774).
What do you think?
# --user | ||
# Install into user-specific instead of global site-packages directory. | ||
# Only used with 'install' command. | ||
|
||
set -e -u | ||
|
||
echo "building lightgbm" | ||
echo "[INFO] building lightgbm" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the consistency with
Line 141 in 60b0155
echo "[INFO] Attempting to build 32-bit version of LightGBM, which is only supported on Windows with generator '${CMAKE_GENERATOR}'." |
and to better distinguish between our own logs and scikit-build-core ones.
######### | ||
# flags # | ||
######### | ||
--bit32) | ||
export CMAKE_GENERATOR="Visual Studio 17 2022" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allow to use any default Visual Studio version.
elif test -f ../lib_lightgbm.dll; then | ||
echo "[INFO] found pre-compiled lib_lightgbm.dll" | ||
cp ../lib_lightgbm.dll ./lightgbm/lib/lib_lightgbm.dll |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be located there after compilation with MinGW: https://github.com/microsoft/LightGBM/blob/master/docs/Installation-Guide.rst#mingw-w64
elif test -f ../windows/x64/Debug_DLL/lib_lightgbm.dll; then | ||
echo "[INFO] found pre-compiled windows/x64/Debug_DLL/lib_lightgbm.dll" | ||
cp ../windows/x64/Debug_DLL/lib_lightgbm.dll ./lightgbm/lib/lib_lightgbm.dll | ||
cp ../windows/x64/Debug_DLL/lib_lightgbm.lib ./lightgbm/lib/lib_lightgbm.lib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be located there after compilation with Debug_DLL
config:
LightGBM/windows/LightGBM.vcxproj
Lines 4 to 6 in 60b0155
<ProjectConfiguration Include="Debug_DLL|x64"> | |
<Configuration>Debug_DLL</Configuration> | |
<Platform>x64</Platform> |
@@ -197,77 +271,76 @@ Install from `conda-forge channel <https://anaconda.org/conda-forge/lightgbm>`_ | |||
|
|||
conda install -c conda-forge lightgbm | |||
|
|||
These are precompiled packages that are fast to install. | |||
Use them instead of ``pip install`` if any of the following are true: | |||
These packages support **CPU**, **GPU** and **CUDA** versions out of the box. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplify a little bit.
@@ -197,77 +271,76 @@ Install from `conda-forge channel <https://anaconda.org/conda-forge/lightgbm>`_ | |||
|
|||
conda install -c conda-forge lightgbm | |||
|
|||
These are precompiled packages that are fast to install. | |||
Use them instead of ``pip install`` if any of the following are true: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, what is the official recommended (preferred) method for installing LightGBM? pip or conda?
If conda, then we should change the order of them in the doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think that the conda-forge
packages offer a lot of benefits. But it's not as clear as one approach being "recommended" and the other being "not recommended".
Situations where conda-forge's lightgbm
should be preferred:
- you want to use the CUDA version (ref: CUDA builds should depend on __cuda conda-forge/lightgbm-feedstock#58)
- you want to run on the
ppc4le
architecture - you want to use many libraries that all rely on OpenMP at the same time (ref: [python-package] SegFault on MacOS when pytorch is installed #6595)
- you want to (or are required to!) use conda for non-LightGBM reasons
Situations where pip install lightgbm
should be preferred
- you want to customize the build (e.g. use a different Boost version, build a variant like the MPI package, compile with debug symbols, etc.)
- you want to (or are required to!) use
pip
for non-LightGBM reasons
All that said... I'm indifferent about the order. If you want to put conda before pip
, that's fine. But at this point, I don't think we should say that one is "preferred" or "recommended".
|
||
Run ``sh ./build-python.sh install --nomp`` to disable **OpenMP** support. All requirements from `Build Threadless Version section <#build-threadless-version>`__ apply for this installation option as well. | ||
|
||
Run ``sh ./build-python.sh install --mpi`` to enable **MPI** support. All requirements from `Build MPI Version section <#build-mpi-version>`__ apply for this installation option as well. | ||
|
||
Run ``sh ./build-python.sh install --mingw``, if you want to use **MinGW-w64** on **Windows** instead of **Visual Studio**. All requirements from `Build with MinGW-w64 on Windows section <#build-with-mingw-w64-on-windows>`__ apply for this installation option as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep the same order as in the Install from PyPI -> Build from Sources
section.
|
||
Build With MSBuild |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Building with MSBuild is a part of "build dynamic library from sources by any method you prefer", so there is no need in a separate section for it.
cmake.args = [ | ||
"-D__BUILD_FOR_PYTHON:BOOL=ON" | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use define
instead of args
. This allows to use --config-setting=cmake.args=-G'MinGW Makefiles'
from comand line because defines are appended and args are overwriten.
Warning
Setting defines throughcmake.args
inpyproject.toml
is discouraged because this cannot be later altered via command line. Usecmake.define
instead.
https://scikit-build-core.readthedocs.io/en/latest/configuration.html#configuring-cmake-arguments-and-defines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh cool, did not know this! Really nice improvement, this is much better than using the CMAKE_GENERATOR
environment variable, I think.
--config-settings=cmake.define.USE_GPU=ON \ | ||
--config-settings=cmake.define.OpenCL_INCLUDE_DIR="/usr/local/cuda/include/" \ | ||
--config-settings=cmake.define.OpenCL_LIBRARY="/usr/local/cuda/lib64/libOpenCL.so" | ||
pip install lightgbm --no-binary lightgbm --config-settings=cmake.define.USE_GPU=ON --config-settings=cmake.define.OpenCL_INCLUDE_DIR="/usr/local/cuda/include/" --config-settings=cmake.define.OpenCL_LIBRARY="/usr/local/cuda/lib64/libOpenCL.so" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot predict which shell user is using. For example, to split long line in PowerShell, ` char should be used. So for better UX I think it better to provide universal copy-pastable commands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this! I left some suggestions for your consideration.
@@ -54,77 +61,98 @@ To install all dependencies needed to use ``pandas`` in LightGBM, append ``[pand | |||
|
|||
pip install 'lightgbm[pandas]' | |||
|
|||
| | |||
|
|||
Use LightGBM with Matplotlib and Graphviz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use LightGBM with Matplotlib and Graphviz | |
Use LightGBM Plotting Capabilities |
For the other blocks, "Use LightGBM with {library}
" is a helpful title, because that's how users are likely to be thinking about it. e.g. "I have a Dask cluster, how do I use LightGBM with Dask".
For plotting, I don't think that's true. I suspect that for most users, they are not thinking "how do I use LightGBM with graphviz?" and instead are just interested in "how do I visualize the structure of my LightGBM models?".
In other words... the fact that these functions happen to use matplotlib
and graphviz
isn't central to their value.
Would you consider renaming this?
|
||
Build with MinGW-w64 on Windows | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
.. code:: sh | ||
|
||
# in sh.exe, git bash, or other Unix-like shell |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh nice!
@@ -197,77 +271,76 @@ Install from `conda-forge channel <https://anaconda.org/conda-forge/lightgbm>`_ | |||
|
|||
conda install -c conda-forge lightgbm | |||
|
|||
These are precompiled packages that are fast to install. | |||
Use them instead of ``pip install`` if any of the following are true: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think that the conda-forge
packages offer a lot of benefits. But it's not as clear as one approach being "recommended" and the other being "not recommended".
Situations where conda-forge's lightgbm
should be preferred:
- you want to use the CUDA version (ref: CUDA builds should depend on __cuda conda-forge/lightgbm-feedstock#58)
- you want to run on the
ppc4le
architecture - you want to use many libraries that all rely on OpenMP at the same time (ref: [python-package] SegFault on MacOS when pytorch is installed #6595)
- you want to (or are required to!) use conda for non-LightGBM reasons
Situations where pip install lightgbm
should be preferred
- you want to customize the build (e.g. use a different Boost version, build a variant like the MPI package, compile with debug symbols, etc.)
- you want to (or are required to!) use
pip
for non-LightGBM reasons
All that said... I'm indifferent about the order. If you want to put conda before pip
, that's fine. But at this point, I don't think we should say that one is "preferred" or "recommended".
|
||
That script requires some dependencies like ``build``, ``scikit-build-core``, and ``wheel``. | ||
In environments with restricted or no internet access, install those tools and then pass ``--no-isolation``. | ||
If you get any errors during installation or due to any other reasons, you may want to build dynamic library from sources by any method you prefer (see `Installation Guide <https://github.com/microsoft/LightGBM/blob/master/docs/Installation-Guide.rst>`__). For example, you can use ``MSBuild`` tool and `solution file <https://github.com/microsoft/LightGBM/blob/master/windows/LightGBM.sln>`__ from the repo. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you get any errors during installation or due to any other reasons, you may want to build dynamic library from sources by any method you prefer (see `Installation Guide <https://github.com/microsoft/LightGBM/blob/master/docs/Installation-Guide.rst>`__). For example, you can use ``MSBuild`` tool and `solution file <https://github.com/microsoft/LightGBM/blob/master/windows/LightGBM.sln>`__ from the repo. | |
If you get any errors during installation or due to any other reasons, you may want to build dynamic library from sources by any method you prefer (see `Installation Guide <https://github.com/microsoft/LightGBM/blob/master/docs/Installation-Guide.rst>`__). | |
For example, you can use ``MSBuild`` tool and `solution file <https://github.com/microsoft/LightGBM/blob/master/windows/LightGBM.sln>`__ from the repo. |
(general comment for this whole PR)
Can we please split separate sentences onto their own lines? Consecutive lines should render exactly the same in on GitHub and readthedocs... they should be consolidated into one paragraph with wrapping.
But having them on separate lines makes git diffs easier to review, the files easier to edit in a text editor, and the git diff more informative as a history of changes to lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing! Do you want me to split lines in the whole file or only in the touched paragraphs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with it being every line in the file, I don't mind reviewing that.
But also fine if you want to do that in a later PR (or want me to do it, since it's my annoying suggestion 😅 )
cmake.args = [ | ||
"-D__BUILD_FOR_PYTHON:BOOL=ON" | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh cool, did not know this! Really nice improvement, this is much better than using the CMAKE_GENERATOR
environment variable, I think.
@@ -34,10 +34,14 @@ | |||
# OpenCL include directory. | |||
# --opencl-library=FILEPATH | |||
# Path to OpenCL library. | |||
# --sanitizers=LIST_OF_SANITIZERS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not excited about adding --sanitizers
, --debug
, and --nohomebrew
to this script when no one has asked for them.
This build-python.sh
script is already more complex than I'm comfortable with and is duplicating functionality that'd be better-handled by CMake.
Rather than trying to have a flag here for every option()
in this project's CMake, I think we should just rely on documentation describing how to use --precompile
to use a shared library built with whatever customizations you want.
I'd love to put our energy into instead working towards eliminating this script eventually with heavier use of CMake + all of the improvements that have gone into scikit-build-core
over the last year (some of which I sort of describe in #6774).
What do you think?
Build without Searching in Homebrew Folders for Dependencies on macOS | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
.. code:: sh | ||
|
||
pip install lightgbm --no-binary lightgbm --config-settings=cmake.define.USE_HOMEBREW_FALLBACK=OFF | ||
|
||
Use this option to stop looking into Homebrew standard folders for finding dependencies (e.g. OpenMP) during the build on macOS. | ||
|
||
| | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Build without Searching in Homebrew Folders for Dependencies on macOS | |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
.. code:: sh | |
pip install lightgbm --no-binary lightgbm --config-settings=cmake.define.USE_HOMEBREW_FALLBACK=OFF | |
Use this option to stop looking into Homebrew standard folders for finding dependencies (e.g. OpenMP) during the build on macOS. | |
| |
I think we should omit this section. If you're using pip install
(not pip wheel
or python -m build
) like this, then you're not building a package for redistribution... you're just installing one for runtime use on the same system.
The OpenMP found at build time will become the first RPATH entry in lib_lightgbm.dylib
, and so it'll be found when you run import lightgbm
.
As long as that library is still there later, it doesn't matter whether you later brew install libomp
on the same system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just do not think we need 1 paragraph-size section like this matching every option()
in CMakeLists.txt
. It creates duplication across the project and and adds to the size of the doc, which I think is intimidating for some users.
I think it is fine to only have specific sections for:
- build variations that we expect to be popular
- build variations that require significant explanation (e.g., all the customizations options for
USE_GPU=ON
))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel the same way about other new sections below... do not think we need to have a separate section in this doc describing how to build the Python package with sanitizers or with USE_DEBUG
enabled.
plotting = [ | ||
"graphviz", | ||
"matplotlib" | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree! Every optional dependency should have an extra. Totally support this.
[tool.scikit-build.cmake] | ||
version = "CMakeLists.txt" | ||
build-type = "Release" | ||
|
||
[tool.scikit-build.cmake.define] | ||
__BUILD_FOR_PYTHON = "ON" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the benefit of moving these things down here? Are the equivalent fields in [tool.scikit-build]
deprecated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, these nested fields are identical, I simply don't know how to specify nested mapping tool.scikit-build.cmake.define
in existing setup 😬
And I think that
[tool.scikit-build.cmake.define]
__BUILD_FOR_PYTHON = "ON"
some_new_option = "value"
is more readable than
[tool.scikit-build]
cmake.define.__BUILD_FOR_PYTHON = "ON"
cmake.define.some_new_option = "value"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you about that, I'd expected that this form would work:
[tool.scikit-build]
cmake.define = [
"__BUILD_FOR_PYTHON=ON",
"some_new_option=value"
]
But when I search around on GitHub, I don't see any examples like that: https://github.com/search?q=%22cmake.define%22+language%3ATOML&type=code&p=1
Let's leave this as you have it in the PR, I like it.
Every installation method and some their combinations were tested with the help of our CI (Appveyor and GitHub Actions). Everything is working except
--sanitizers
flag: package installation is successful but it's import fails:log:
I'm not sure we ever checked even building LightGBM (not
testlightgbm
) with sanitizers.