Skip to content

Commit 4885d32

Browse files
Merge pull request #79 from HongW2019/doc-1.1.1-r
[ML-71]Backport master docs to branch-1.1-spark-3.1.1
2 parents 5f4a75e + fc243b9 commit 4885d32

12 files changed

+13808
-986
lines changed

CHANGELOG.md

+422-1
Large diffs are not rendered by default.

LICENSE

+1,957
Large diffs are not rendered by default.

LICENSE.txt

-201
This file was deleted.

README.md

+17-8
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
1+
##### \* LEGAL NOTICE: Your use of this software and any required dependent software (the "Software Package") is subject to the terms and conditions of the software license agreements for the Software Package, which may also include notices, disclaimers, or license terms for third party or open source software included in or with the Software Package, and your use indicates your acceptance of all such terms. Please refer to the "TPP.txt" or other similarly-named text file included with the Software Package for additional details.
2+
3+
##### \* Optimized Analytics Package for Spark* Platform is under Apache 2.0 (https://www.apache.org/licenses/LICENSE-2.0).
4+
15
# OAP MLlib
26

37
## Overview
@@ -17,13 +21,13 @@ You can find the all the OAP MLlib documents on the [project web page](https://o
1721

1822
### Java/Scala Users Preferred
1923

20-
Use a pre-built OAP MLlib JAR to get started. You can firstly download OAP package from [OAP-JARs-Tarball](https://github.com/Intel-bigdata/OAP/releases/download/v1.1.0-spark-3.0.0/oap-1.1.0-bin-spark-3.0.0.tar.gz) and extract this Tarball to get `oap-mllib-x.x.x-with-spark-x.x.x.jar` under `oap-1.1.0-bin-spark-3.0.0/jars`.
24+
Use a pre-built OAP MLlib JAR to get started. You can firstly download OAP package from [OAP-JARs-Tarball](https://github.com/oap-project/oap-tools/releases/download/v1.1.1-spark-3.1.1/oap-1.1.1-bin-spark-3.1.1.tar.gz) and extract this Tarball to get `oap-mllib-x.x.x.jar` under `oap-1.1.1-bin-spark-3.1.1/jars`.
2125

2226
Then you can refer to the following [Running](#running) section to try out.
2327

2428
### Python/PySpark Users Preferred
2529

26-
Use a pre-built JAR to get started. If you have finished [OAP-Installation-Guide](./docs/OAP-Installation-Guide.md), you can find compiled OAP MLlib JAR `oap-mllib-x.x.x-with-spark-x.x.x.jar` in `$HOME/miniconda2/envs/oapenv/oap_jars/`.
30+
Use a pre-built JAR to get started. If you have finished [OAP-Installation-Guide](./docs/OAP-Installation-Guide.md), you can find compiled OAP MLlib JAR `oap-mllib-x.x.x.jar` in `$HOME/miniconda2/envs/oapenv/oap_jars/`.
2731

2832
Then you can refer to the following [Running](#running) section to try out.
2933

@@ -49,13 +53,17 @@ Users usually run Spark application on __YARN__ with __client__ mode. In that ca
4953

5054
```
5155
# absolute path of the jar for uploading
52-
spark.files /path/to/oap-mllib-x.x.x-with-spark-x.x.x.jar
56+
spark.files /path/to/oap-mllib-x.x.x.jar
5357
# absolute path of the jar for driver class path
54-
spark.driver.extraClassPath /path/to/oap-mllib-x.x.x-with-spark-x.x.x.jar
58+
spark.driver.extraClassPath /path/to/oap-mllib-x.x.x.jar
5559
# relative path to spark.files, just specify jar name in current dir
56-
spark.executor.extraClassPath ./oap-mllib-x.x.x-with-spark-x.x.x.jar
60+
spark.executor.extraClassPath ./oap-mllib-x.x.x.jar
5761
```
5862

63+
#### OAP MLlib Specific Configuration
64+
65+
OAP MLlib adopted oneDAL as implementation backend. oneDAL requires enough native memory allocated for each executor. For large dataset, depending on algorithms, you may need to tune `spark.executor.memoryOverhead` to allocate enough native memory. Setting this value to larger than __dataset size / executor number__ is a good starting point.
66+
5967
### Sanity Check
6068

6169
#### Setup `env.sh`
@@ -103,10 +111,10 @@ Intel® oneAPI Toolkits and its components can be downloaded and install from [h
103111

104112
More details about oneAPI can be found [here](https://software.intel.com/content/www/us/en/develop/tools/oneapi.html).
105113

106-
You can refer to [this script](dev/install-build-deps-centos.sh) to install correct dependencies.
107-
108114
Scala and Java dependency descriptions are already included in Maven POM file.
109115

116+
***Note:*** You can refer to [this script](dev/install-build-deps-centos.sh) to install correct dependencies: DPC++/C++, oneDAL, oneTBB, oneCCL.
117+
110118
### Build
111119

112120
#### Building oneCCL
@@ -161,12 +169,13 @@ To build, run the following commands:
161169
$ cd mllib-dal
162170
$ ./build.sh
163171
```
172+
164173
The target can be built against different Spark versions by specifying profile with <spark-x.x.x>. E.g.
165174
```
166175
$ ./build.sh spark-3.1.1
167176
```
168177
If no profile parameter is given, the Spark version 3.0.0 will be activated by default.
169-
The built JAR package will be placed in `target` directory with the name `oap-mllib-x.x.x-with-spark-x.x.x.jar`.
178+
The built JAR package will be placed in `target` directory with the name `oap-mllib-x.x.x.jar`.
170179

171180
## Examples
172181

0 commit comments

Comments
 (0)