Releases: uxlfoundation/oneDAL
Releases · uxlfoundation/oneDAL
Intel® oneAPI Data Analytics Library 2021.4
The release introduces the following changes:
📚 Support Materials
The following additional materials were created:
-
Medium blogs:
-
Anaconda blogs:
-
Oracle blogs:
-
Kaggle kernels:
- [Tabular Playground Series - Jul 2021] Fast RandomForest with sklearnex
- [Tabular Playground Series - Jul 2021] RF with Intel Extension for Scikit-learn
- [Tabular Playground Series - Jul 2021] Stacking with scikit-learn-intelex
- [Tabular Playground Series - Aug 2021] NuSVR with Intel Extension for Sklearn
- [Predict Future Sales] Stacking with scikit-learn-intelex
- [House Prices - Advanced Regression Techniques] NuSVR sklearn-intelex 4x speedup
-
Added demo samples comparing the usage of Intel® Extension for Scikit-learn and the original Scikit-learn for KNN, Logistic Regression, SVM and Random Forest algorithms
🛠️ Library Engineering
- Introduced new functionality for Intel® Extension for Scikit-learn*:
- Enabled patching for all Scikit-learn applications at once:
- You can enable global patching via command line:
python -m sklearnex.glob patch_sklearn
- Or via code:
from sklearnex import patch_sklearn
patch_sklearn(global_patch=True)
- Read more in Intel® Extension for Scikit-learn documentation.
- You can enable global patching via command line:
- Added the support of Python 3.9 for both Intel® Extension for Scikit-learn and daal4py. The packages are available from PyPI and the Intel Channel on Anaconda Cloud.
- Enabled patching for all Scikit-learn applications at once:
- Introduced new oneDAL functionality:
- Added pkg-config support for Linux, macOS, Windows and for static/dynamic, thread/sequential configurations of oneDAL applications.
- Reduced the size of oneDAL library by approximately ~30%.
🚨 What's New
Introduced new oneDAL functionality:
- General:
- Basic statistics (Low order moments) algorithm in oneDAL interfaces
- Result options for kNN Brute-force in oneDAL interfaces: using a single function call to return any combination of responses, indices, and distances
- CPU:
- Sigmoid kernel of SVM algorithm
- Model converter from CatBoost to oneDAL representation
- Louvain Community Detection algorithm technical preview
- Connected Components algorithm technical preview
- Search task and cosine distance for kNN Brute-force
- GPU:
- The full range support of Minkowski distances in kNN Brute-force
Improved oneDAL performance for the following algorithms:
- CPU:
- Decision Forest training and prediction
- Brute-force kNN
- KMeans
- NuSVMs and SVR training
Introduced new functionality in Intel® Extension for Scikit-learn:
- General:
- Enabled the global patching of all Scikit-learn applications
- Provided an integration with dpctl for heterogeneous computing (the support of
dpctl.tensor.usm_ndarray
for input and output) - Extended API with
set_config
andget_config
methods. Added the support oftarget_offload
andallow_fallback_to_host
options for device offloading scenarios - Added the support of
predict_proba
in RandomForestClassifier estimator
- CPU:
- Added the support of Sigmoid kernel in SVM algorithms
- GPU:
- Added binary SVC support with Linear and RBF kernels
Improved the performance of the following scikit-learn estimators via scikit-learn patching:
SVR
algorithm trainingNuSVC
andNuSVR
algorithms trainingRandomForestRegression
andRandomForestClassifier
algorithms training and predictionKMeans
🐛 Bug Fixes
- General:
- Fixed an incorrectly raised exception during the patching of Random Forest algorithm when the number of trees was more than 7000.
- CPU:
- Fixed an accuracy issue in
Random Forest
algorithm caused by the exclusion of constant features. - Fixed an issue in
NuSVC
Multiclass. - Fixed an issue with
KMeans
convergence inconsistency. - Fixed incorrect work of
train_test_split
with specific subset sizes.
- Fixed an accuracy issue in
- GPU:
- Fixed incorrect bias calculation in
SVM
.
- Fixed incorrect bias calculation in
❗ Known Issues
- GPU:
- For most algorithms, performance degradations were observed when the 2021.4 version of Intel® oneAPI DPC++ Compiler was used.
- Examples are failing when run with Visual Studio Solutions on hardware that does not support double precision floating-point operations.
Intel® oneAPI Data Analytics Library 2021.3
The release introduces the following changes:
📚 Support Materials
The following additional materials were created:
-
Medium blogs:
- Superior Machine Learning Performance on the Latest Intel Xeon Scalable Processors
- Leverage Intel Optimizations in Scikit-Learn (SVM Performance Training and Inference)
- Optimizing CatBoost Performance
- Performance Optimizations for End-to-End AI Pipelines
- Optimizing the End-to-End Training Pipeline on Apache Spark Clusters
-
Kaggle kernels:
- [Tabular Playground Series - Apr 2021] RF with Intel Extension for Scikit-learn
- [Tabular Playground Series - Apr 2021] SVM with Intel Extension for Scikit-learn
- [Tabular Playground Series - Apr 2021] SVM with scikit-learn-intelex
-
Samples that illustrate the usage of Intel Extension for Scikit-learn
🛠️ Library Engineering
- Introduced a new Python package, Intel® Extension for Scikit-learn*. The scikit-learn-intelex package contains scikit-learn patching functionality that was originally available in daal4py package. All future updates for the patches will be available only in Intel® Extension for Scikit-learn. We recommend using scikit-learn-intelex package instead of daal4py.
- Download the extension using one of the following commands:
pip install scikit-learn-intelex
conda install scikit-learn-intelex -c conda-forge
- Enable Scikit-learn patching:
from sklearnex import patch_sklearn
patch_sklearn()
- Download the extension using one of the following commands:
- Introduced optional dependencies on DPC++ runtime to daal4py. To enable DPC++ backend, install dpcpp_cpp_rt package. It reduces the default package size with all dependencies from 1.2GB to 400 MB.
- Added the support of building oneDAL-based applications with /MD and /MDd options on Windows. The -d suffix is used in the names of oneDAL libraries that are built with debug run-time (/MDd).
🚨 What's New
Introduced new oneDAL and daal4py functionality:
- CPU:
- SVM Regression algorithm
- NuSVM algorithm for both Classification and Regression tasks
- Polynomial kernel support for all SVM algorithms (SVC, SVR, NuSVC, NuSVR)
- Minkowski and Chebyshev distances for kNN Brute-force
- The brute-force method and the voting mode support for kNN algorithm in oneDAL interfaces
- Multiclass support for SVM algorithms in oneDAL interfaces
- CSR-matrix support for SVM algorithms in oneDAL interfaces
- Subgraph Isomorphism algorithm technical preview
- Single Source Shortest Path (SSSP) algorithm technical preview
Improved oneDAL and daal4py performance for the following algorithms:
- CPU:
- Support Vector Machines training and prediction
- Linear, Ridge, ElasticNet, and LASSO regressions prediction
- GPU:
- Decision Forest training and prediction
- Principal Components Analysis training
Introduced the support of scikit-learn 1.0 version in Intel Extension for Scikit-learn.
- The 2021.3 release of Intel Extension for Scikit-learn supports the latest scikit-learn releases: 0.22.X, 0.23.X, 0.24.X and 1.0.X.
Introduced new functionality for Intel Extension for Scikit-learn:
- General:
- The support of
patch_sklearn
for all algorithms
- The support of
- CPU:
- Acceleration of SVR estimator
- Acceleration of NuSVC and NuSVR estimators
- Polynomial kernel support in SVM algorithms
Improved the performance of the following scikit-learn estimators via scikit-learn patching:
- SVM algorithms training and prediction
- Linear, Ridge, ElasticNet, and Lasso regressions prediction
Fixed the following issues:
- General:
- Fixed binary incompatibility for the versions of numpy earlier than 1.19.4
- Fixed an issue with a very large number of trees (> 7000) for Random Forest algorithm.
- Fixed
patch_sklearn
to patch both fit and predict methods of Logistic Regression when the algorithm is given as a single parameter topatch_sklearn
- CPU:
- Improved numerical stability of training for Alternating Least Squares (ALS) and Linear and Ridge regressions with Normal Equations method
- Reduced the memory consumption of SVM prediction
- GPU:
- Fixed an issue with kernel compilation on the platforms without hardware FP64 support
❗ Known Issues
- Intel® Extension for Scikit-learn and daal4py packages installed from PyPI repository can’t be found on Debian systems (including Google Collab). Mitigation: add “site-packages” folder into Python packages searching before importing the packages:
import sys
import os
import site
sys.path.append(os.path.join(os.path.dirname(site.getsitepackages()[0]), "site-packages"))
Intel® oneAPI Data Analytics Library 2021.2
The release introduces the following changes:
Library Engineering:
- Enabled new PyPI distribution channel for daal4py:
- Four latest Python versions (3.6, 3.7, 3.8, 3.9) are supported on Linux, Windows and MacOS.
- Support of both CPU and GPU is included in the package.
- You can download daal4py using the following command:
pip install daal4py
- Introduced CMake support for oneDAL examples
Support Materials
The following additional materials were created:
- Medium blogs:
- Kaggle kernels:
What's New
Introduced new oneDAL and daal4py functionality:
- CPU:
- Hist method for Decision Forest Classification and Regression, which outperforms the existing exact method
- Bit-to-bit results reproducibility for: Linear and Ridge regressions, LASSO and ElasticNet, KMeans training and initialization, PCA, SVM, kNN Brute Force method, Decision Forest Classification and Regression
- GPU:
- Multi-node multi-GPU algorithms: KMeans (batch), Covariance (batch and online), Low order moments (batch and online) and PCA
- Sparsity support for SVM algorithm
Improved oneDAL and daal4py performance for the following algorithms:
- CPU:
- Decision Forest training Classification and Regression
- Support Vector Machines training and prediction
- Logistic Regression, Logistic Loss and Cross Entropy for non-homogeneous input types
- GPU:
- Decision Forest training Classification and Regression
- All algorithms with GPU kernels (as a result of migration to Unified Shared Memory data management)
- Reduced performance overhead for oneAPI C++ interfaces on CPU and oneAPI DPC++ interfaces on GPU
Added technical preview features in Graph Analytics:
- CPU:
- Local and Global Triangle Counting
Introduced new functionality for scikit-learn patching through daal4py:
- CPU:
- Patches for four latest scikit-learn releases: 0.21.X, 0.22.X, 0.23.X and 0.24.X
- Acceleration of
roc_auc_score
function - Bit-to-bit results reproducibility for:
LinearRegression
,Ridge
,SVC
,KMeans
,PCA
,Lasso
,ElasticNet
,tSNE
,KNeighborsClassifier
,KNeighborsRegressor
,NearestNeighbors
,RandomForestClassifier
,RandomForestRegressor
Improved performance of the following scikit-learn estimators via scikit-learn patching:
- CPU
RandomForestClassifier
andRandomForestRegressor
scikit-learn estimators: training and prediction- Principal Component Analysis (PCA) scikit-learn estimator: training
- Support Vector Classification (SVC) scikit-learn estimators: training and prediction
- Support Vector Classification (SVC) scikit-learn estimator with the
probability==True
parameter: training and prediction
Fixed the following issues:
-
Scikit-learn patching:
- Improved accuracy of
RandomForestClassifier
andRandomForestRegressor
scikit-learn estimators - Fixed patching issues with
pairwise_distances
- Fixed the behavior of the
patch_sklearn
andunpatch_sklearn
functions - Fixed unexpected behavior that made accelerated functionality unavailable through scikit-learn patching if the unput was not of
float32
orfloat64
data types. Scikit-learn patching now works with all numpy data types. - Fixed a memory leak that appeared when
DataFrame
from pandas was used as an input type - Fixed performance issue for interoperability with Modin
- Improved accuracy of
-
daal4py:
- Fixed the crash of SVM and kNN algorithms on Windows on GPU
-
oneDAL:
- Improved accuracy of Decision Forest Classification and Regression on CPU
- Improved accuracy of KMeans algorithm on GPU
- Improved stability of Linear Regression and Logistic Regression algorithms on GPU
Known Issues
- oneDAL
vars.sh
script does not support kornShell
Intel® oneAPI Data Analytics Library 2021.1
The release contains all functionality of Intel® DAAL. See Intel® DAAL release notes for more details.
What's New
Library Engineering:
- Renamed the library from
Intel® Data Analytics Acceleration Library
toIntel® oneAPI Data Analytics Library
and changed the package names to reflect this. - Deprecated 32-bit version of the library.
- Introduced Intel GPU support for both
OpenCL
andLevel Zero
backends. - Introduced
Unified Shared Memory
(USM
) support
Introduced new Intel® oneDAL and daal4py functionality:
- GPU:
- Batch algorithms:
K-means
,Covariance, PCA
,Logistic Regression
,Linear Regression
,Random Forest Classification
andRegression
,Gradient Boosting Classification
andRegression
,kNN
,SVM
,DBSCAN
andLow-order moments
- Online algorithms:
Covariance
,PCA
,Linear Regression
andLow-order moments
- Added
Data Management
functionality to supportDPC++ APIs
: a new table type for representation ofSYCL-based
numeric tables (SyclNumericTable
) and an optimizedCSV data source
- Batch algorithms:
Improved Intel® oneDAL and daal4py performance for the following algorithms:
- CPU:
Logistic Regression
training and predictionk-Nearest Neighbors
prediction withBrute Force
methodLogistic Loss
andCross Entropy objective functions
Added Technical Preview Features in Graph Analytics:
- CPU:
- Undirected graph without edge and vertex weights (
undirected_adjacency_array_graph
), where vertex indices can only be of type int32 Jaccard Similarity Coefficients
for all pairs of vertices, a batch algorithm that processes the graph by blocks
- Undirected graph without edge and vertex weights (
Aligned the library with Intel® oneDAL Specification 1.0 for the following algorithms:
- CPU/GPU:
K-means
,PCA
,kNN
Introduced new functionality for scikit-learn patching through daal4py:
- CPU:
- Acceleration of
NearestNeighbors
andKNeighborsRegressor
scikit-learn estimators withBrute Force
andK-D tree
methods - Acceleration of
TSNE
scikit-learn estimator
- Acceleration of
- GPU:
- Intel GPU support in scikit-learn for
DBSCAN
,K-means
,Linear
andLogistic Regression
- Intel GPU support in scikit-learn for
Improved performance of the following scikit-learn estimators via scikit-learn patching:
- CPU:
LogisticRegression
fit, predict and predict_proba methodsKNeighborsClassifier
predict, predict_proba and kneighbors methods with“brute”
method
Known Issues
Intel® oneDAL DPC++ APIs
does not work onGEN12
graphics withOpenCL
backend. UseLevel Zero
backend for such cases.train_test_split
indaal4py
patches forScikit-learn
can produce incorrect shuffling on Windows*
Intel® DAAL 2020 Update 3
What's New in Intel® DAAL 2020 Update 3:
Introduced new Intel® DAAL and daal4py functionality:
- Brute Force method for
k-Nearest Neighbors
classification algorithm, which for datasets with more than 13 features demonstrates a better performance than the existing K-D tree method k-Nearest Neighbors
search for K-D tree and Brute Force methods with computation of distances to nearest neighbors and their indices
Extended existing Intel® DAAL and daal4py functionality:
- Voting methods for prediction in
k-Nearest Neighbors
classification and search: based on inverse-distance and uniform weighting - New parameters in
Decision Forest
classification and regression: minObservationsInSplitNode, minWeightFractionInLeafNode, minImpurityDecreaseInSplitNode, maxLeafNodes with best-first strategy and sample weights - Support of Support Vector Machine (
SVM
) decision function for Multi-class Classifier
Improved Intel® DAAL and daal4py performance for the following algorithms:
SVM
training and predictionDecision Forest
classification trainingRBF
andLinear
kernel functions
Introduced new daal4py functionality:
- Conversion of trained
XGBoost
* andLightGBM
* models into a daal4py Gradient Boosted Trees model for fast prediction - Support of
Modin
* DataFrame as an input
Introduced new functionality for scikit-learn patching through daal4py:
- Acceleration of
KNeighborsClassifier
scikit-learn estimator with Brute Force and K-D tree methods - Acceleration of
RandomForestClassifier
andRandomForestRegressor
scikit-learn estimators - Sparse input support for
KMeans
and Support Vector Classification (SVC
) scikit-learn estimators - Prediction of probabilities for
SVC
scikit-learn estimator - Support of ‘normalize’ parameter for
Lasso
andElasticNet
scikit-learn estimators
Improved performance of the following functionality for scikit-learn patching through daal4py:
train_test_split()
- Support Vector Classification (
SVC
) fit and prediction
Dependencies
fix one-algorithm build and spicific prediction case after probabilit…