Skip to content

Commit 85ca879

Browse files
author
Hong
authored
[ML-172] Update documentation for OAP 1.4.0 (#220)
* [ML-172] Update documentation for OAP 1.4.0 (#218) * [ML-172] Update documentation for OAP 1.4.0 * Update Changelog * Update release version
1 parent e282ee1 commit 85ca879

8 files changed

+996
-17
lines changed

CHANGELOG.md

+126-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,130 @@
11
# Change log
2-
Generated on 2022-04-10
2+
Generated on 2022-07-05
3+
4+
## Release 1.4.0
5+
6+
### Gazelle Plugin
7+
8+
#### Features
9+
|||
10+
|:---|:---|
11+
|[#781](https://github.com/oap-project/gazelle_plugin/issues/781)|Add spark eventlog analyzer for advanced analyzing|
12+
|[#927](https://github.com/oap-project/gazelle_plugin/issues/927)|Column2Row further enhancement|
13+
|[#913](https://github.com/oap-project/gazelle_plugin/issues/913)|Add Hadoop 3.3 profile to pom.xml|
14+
|[#869](https://github.com/oap-project/gazelle_plugin/issues/869)|implement first agg function|
15+
|[#926](https://github.com/oap-project/gazelle_plugin/issues/926)|Support UDF URLDecoder|
16+
|[#856](https://github.com/oap-project/gazelle_plugin/issues/856)|[SHUFFLE] manually split of Variable length buffer (String likely)|
17+
|[#886](https://github.com/oap-project/gazelle_plugin/issues/886)|Add pmod function support|
18+
|[#855](https://github.com/oap-project/gazelle_plugin/issues/855)|[SHUFFLE] HugePage support in shuffle|
19+
|[#872](https://github.com/oap-project/gazelle_plugin/issues/872)|implement replace function|
20+
|[#867](https://github.com/oap-project/gazelle_plugin/issues/867)|Add substring_index function support|
21+
|[#818](https://github.com/oap-project/gazelle_plugin/issues/818)|Support length, char_length, locate, regexp_extract|
22+
|[#864](https://github.com/oap-project/gazelle_plugin/issues/864)|Enable native parquet write by default|
23+
|[#828](https://github.com/oap-project/gazelle_plugin/issues/828)|CoalesceBatches native implementation|
24+
|[#800](https://github.com/oap-project/gazelle_plugin/issues/800)|Combine datasource and columnar core jar|
25+
26+
#### Performance
27+
|||
28+
|:---|:---|
29+
|[#848](https://github.com/oap-project/gazelle_plugin/issues/848)|Optimize Columnar2Row performance|
30+
|[#943](https://github.com/oap-project/gazelle_plugin/issues/943)|Optimize Row2Columnar performance|
31+
|[#854](https://github.com/oap-project/gazelle_plugin/issues/854)|Enable skipping columnarWSCG for queries with small shuffle size|
32+
|[#857](https://github.com/oap-project/gazelle_plugin/issues/857)|[SHUFFLE] split by reducer by column|
33+
34+
#### Bugs Fixed
35+
|||
36+
|:---|:---|
37+
|[#827](https://github.com/oap-project/gazelle_plugin/issues/827)|Github action is broken|
38+
|[#987](https://github.com/oap-project/gazelle_plugin/issues/987)|TPC-H q7, q8, q9 run failed when using String for Date|
39+
|[#892](https://github.com/oap-project/gazelle_plugin/issues/892)|Q47 and q57 failed on ubuntu 20.04 OS without open-jdk.|
40+
|[#784](https://github.com/oap-project/gazelle_plugin/issues/784)|Improve Sort Spill|
41+
|[#788](https://github.com/oap-project/gazelle_plugin/issues/788)|Spark UT of "randomSplit on reordered partitions" encountered "Invalid: Map array child array should have no nulls" issue|
42+
|[#821](https://github.com/oap-project/gazelle_plugin/issues/821)|Improve Wholestage Codegen check|
43+
|[#831](https://github.com/oap-project/gazelle_plugin/issues/831)|Support more expression types in getting attribute|
44+
|[#876](https://github.com/oap-project/gazelle_plugin/issues/876)|Write arrow hang with OutputWriter.path|
45+
|[#891](https://github.com/oap-project/gazelle_plugin/issues/891)|Spark executor lost while DatasetFileWriter failed with speculation|
46+
|[#909](https://github.com/oap-project/gazelle_plugin/issues/909)|"INSERT OVERWRITE x SELECT /*+ REPARTITION(2) */ * FROM y LIMIT 2" drains 4 rows into table x using Arrow write extension|
47+
|[#889](https://github.com/oap-project/gazelle_plugin/issues/889)|Failed to write with ParquetFileFormat while using ArrowWriteExtension|
48+
|[#910](https://github.com/oap-project/gazelle_plugin/issues/910)|TPCDS failed, segfault caused by PR903|
49+
|[#852](https://github.com/oap-project/gazelle_plugin/issues/852)|Unit test fix for NSE-843|
50+
|[#843](https://github.com/oap-project/gazelle_plugin/issues/843)|ArrowDataSouce: Arrow dataset inspect() is called every time a file is read|
51+
52+
#### PRs
53+
|||
54+
|:---|:---|
55+
|[#1005](https://github.com/oap-project/gazelle_plugin/pull/1005)|[NSE-800] Fix an assembly warning|
56+
|[#1002](https://github.com/oap-project/gazelle_plugin/pull/1002)|[NSE-800] Pack the classes into one single jar|
57+
|[#988](https://github.com/oap-project/gazelle_plugin/pull/988)|[NSE-987] fix string date|
58+
|[#977](https://github.com/oap-project/gazelle_plugin/pull/977)|[NSE-126] set default codegen opt to O1|
59+
|[#975](https://github.com/oap-project/gazelle_plugin/pull/975)|[NSE-927] Add macro AVX512BW check for different CPU architecture|
60+
|[#962](https://github.com/oap-project/gazelle_plugin/pull/962)|[NSE-359] disable unit tests on spark32 package|
61+
|[#966](https://github.com/oap-project/gazelle_plugin/pull/966)|[NSE-913] Add support for Hadoop 3.3.1 when packaging|
62+
|[#936](https://github.com/oap-project/gazelle_plugin/pull/936)|[NSE-943] Optimize IsNULL() function for Row2Columnar|
63+
|[#937](https://github.com/oap-project/gazelle_plugin/pull/937)|[NSE-927] Implement AVX512 optimization selection in Runtime and merge two C2R code files into one.|
64+
|[#951](https://github.com/oap-project/gazelle_plugin/pull/951)|[DNM] update sparklog|
65+
|[#938](https://github.com/oap-project/gazelle_plugin/pull/938)|[NSE-581] implement rlike/regexp_like|
66+
|[#946](https://github.com/oap-project/gazelle_plugin/pull/946)|[DNM] update on sparklog script|
67+
|[#939](https://github.com/oap-project/gazelle_plugin/pull/939)|[NSE-581] adding ShortType/FloatType in ColumnarLiteral|
68+
|[#934](https://github.com/oap-project/gazelle_plugin/pull/934)|[NSE-927] Extract and inline functions for native ColumnartoRow|
69+
|[#933](https://github.com/oap-project/gazelle_plugin/pull/933)|[NSE-581] Improve GetArrayItem(Split()) performance|
70+
|[#922](https://github.com/oap-project/gazelle_plugin/pull/922)|[NSE-912] Remove extra handleSafe costs|
71+
|[#925](https://github.com/oap-project/gazelle_plugin/pull/925)|[NSE-926] Support a UDF: URLDecoder|
72+
|[#924](https://github.com/oap-project/gazelle_plugin/pull/924)|[NSE-927] Enable AVX512 in Binary length calculation for native ColumnartoRow|
73+
|[#918](https://github.com/oap-project/gazelle_plugin/pull/918)|[NSE-856] Optimize of string/binary split|
74+
|[#908](https://github.com/oap-project/gazelle_plugin/pull/908)| [NSE-848] Optimize performance for Column2Row|
75+
|[#900](https://github.com/oap-project/gazelle_plugin/pull/900)|[NSE-869] Add 'first' agg function support|
76+
|[#917](https://github.com/oap-project/gazelle_plugin/pull/917)|[NSE-886] Add pmod expression support|
77+
|[#916](https://github.com/oap-project/gazelle_plugin/pull/916)|[NSE-909] fix slow test|
78+
|[#915](https://github.com/oap-project/gazelle_plugin/pull/915)|[NSE-857] Further optimizations of validity buffer split|
79+
|[#912](https://github.com/oap-project/gazelle_plugin/pull/912)|[NSE-909] "INSERT OVERWRITE x SELECT /*+ REPARTITION(2) */ * FROM y L…|
80+
|[#896](https://github.com/oap-project/gazelle_plugin/pull/896)|[NSE-889] Failed to write with ParquetFileFormat while using ArrowWriteExtension|
81+
|[#911](https://github.com/oap-project/gazelle_plugin/pull/911)|[NSE-910] fix bug of PR903|
82+
|[#901](https://github.com/oap-project/gazelle_plugin/pull/901)|[NSE-891] Spark executor lost while DatasetFileWriter failed with speculation|
83+
|[#907](https://github.com/oap-project/gazelle_plugin/pull/907)|[NSE-857] split validity buffer by reducer|
84+
|[#902](https://github.com/oap-project/gazelle_plugin/pull/902)|[NSE-892] Allow to use jar cmd not in PATH|
85+
|[#898](https://github.com/oap-project/gazelle_plugin/pull/898)|[NSE-867][FOLLOWUP] Add substring_index function support|
86+
|[#894](https://github.com/oap-project/gazelle_plugin/pull/894)|[NSE-855] allocate large block of memory for all reducer #881|
87+
|[#880](https://github.com/oap-project/gazelle_plugin/pull/880)|[NSE-857] Fill destination buffer by reducer|
88+
|[#839](https://github.com/oap-project/gazelle_plugin/pull/839)|[DNM] some optimizations to shuffle's split function|
89+
|[#879](https://github.com/oap-project/gazelle_plugin/pull/879)|[NSE-878]Wip get phyplan bugfix|
90+
|[#877](https://github.com/oap-project/gazelle_plugin/pull/877)|[NSE-876] Fix writing arrow hang with OutputWriter.path|
91+
|[#873](https://github.com/oap-project/gazelle_plugin/pull/873)|[NSE-872] implement replace function|
92+
|[#850](https://github.com/oap-project/gazelle_plugin/pull/850)|[NSE-854] Small Shuffle Size disable wholestagecodegen|
93+
|[#868](https://github.com/oap-project/gazelle_plugin/pull/868)|[NSE-867] Add substring_index function support|
94+
|[#847](https://github.com/oap-project/gazelle_plugin/pull/847)|[NSE-818] Support length, char_length, locate & regexp_extract|
95+
|[#865](https://github.com/oap-project/gazelle_plugin/pull/865)|[NSE-864] Enable native parquet write by default|
96+
|[#811](https://github.com/oap-project/gazelle_plugin/pull/811)|[NSE-810] disable codegen for SMJ with local limit|
97+
|[#860](https://github.com/oap-project/gazelle_plugin/pull/860)|remove sensitive info from physical plan|
98+
|[#853](https://github.com/oap-project/gazelle_plugin/pull/853)|[NSE-852] Unit test fix for NSE-843|
99+
|[#844](https://github.com/oap-project/gazelle_plugin/pull/844)|[NSE-843] ArrowDataSouce: Arrow dataset inspect() is called every tim…|
100+
|[#842](https://github.com/oap-project/gazelle_plugin/pull/842)|fix in eventlog script|
101+
|[#841](https://github.com/oap-project/gazelle_plugin/pull/841)|fix bug of script|
102+
|[#829](https://github.com/oap-project/gazelle_plugin/pull/829)|[NSE-828] Add native CoalesceBatches implementation|
103+
|[#830](https://github.com/oap-project/gazelle_plugin/pull/830)|[NSE-831] Support more expression types in getting attribute|
104+
|[#815](https://github.com/oap-project/gazelle_plugin/pull/815)|[NSE-610] Shrink hashmap to use less memory|
105+
|[#822](https://github.com/oap-project/gazelle_plugin/pull/822)|[NSE-821] Fix Wholestage Codegen on unsupported pattern|
106+
|[#824](https://github.com/oap-project/gazelle_plugin/pull/824)|[NSE-823] Use `SPARK_VERSION_SHORT` instead of `SPARK_VERSION` to find SparkShims|
107+
|[#826](https://github.com/oap-project/gazelle_plugin/pull/826)|[NSE-827] fix GHA|
108+
|[#819](https://github.com/oap-project/gazelle_plugin/pull/819)|[DNM] complete sparklog script|
109+
|[#802](https://github.com/oap-project/gazelle_plugin/pull/802)|[NSE-794] Fix count() with decimal value|
110+
|[#801](https://github.com/oap-project/gazelle_plugin/pull/801)|[NSE-786] Adding docs for shim layers|
111+
|[#790](https://github.com/oap-project/gazelle_plugin/pull/790)|[NSE-781]Add eventlog analyzer tool|
112+
|[#789](https://github.com/oap-project/gazelle_plugin/pull/789)|[NSE-788] Quick fix for randomSplit on reordered partitions|
113+
|[#780](https://github.com/oap-project/gazelle_plugin/pull/780)|[NSE-784] fallback Sort after SortHashAgg|
114+
115+
116+
### OAP MLlib
117+
118+
#### Performance
119+
|||
120+
|:---|:---|
121+
|[#204](https://github.com/oap-project/oap-mllib/issues/204)|Intel-MLlib require more memory to run Bayes algorithm.|
122+
123+
#### PRs
124+
|||
125+
|:---|:---|
126+
|[#208](https://github.com/oap-project/oap-mllib/pull/208)|[ML-204][NaiveBayes] Remove cache from NaiveBayes|
127+
3128

4129
## Release 1.3.1
5130

README.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ You can find the all the OAP MLlib documents on the [project web page](https://o
2424

2525
### Java/Scala Users Preferred
2626

27-
Use a pre-built OAP MLlib JAR to get started. You can firstly download OAP package from [OAP-JARs-Tarball](https://github.com/oap-project/oap-tools/releases/download/v1.3.1/oap-1.3.1-bin.tar.gz) and extract this Tarball to get `oap-mllib-x.x.x.jar` under `oap-x.x.x-bin-spark-x.x.x/jars`.
27+
Use a pre-built OAP MLlib JAR to get started, you can download OAP MLlib JAR from [Release Page](https://github.com/oap-project/oap-mllib/releases/download/v1.4.0/oap-mllib-1.4.0.jar).
2828

2929
Then you can refer to the following [Running](#running) section to try out.
3030

@@ -56,7 +56,7 @@ OAP MLlib's latest version supports multiple Spark versions as below.
5656
* Java JRE 8.0+ Runtime
5757
* Apache Spark 3.1.1, 3.1.2, 3.1.3, 3.2.0 or 3.2.1
5858

59-
Generally, our common system requirements are the same with Intel® oneAPI Toolkit, please refer to [here](https://software.intel.com/content/www/us/en/develop/articles/intel-oneapi-base-toolkit-system-requirements.html) for details.
59+
Generally, our common system requirements are the same with Intel® oneAPI Toolkit, please refer to [Intel® oneAPI Base Toolkit System Requirements](https://software.intel.com/content/www/us/en/develop/articles/intel-oneapi-base-toolkit-system-requirements.html) for details.
6060

6161
Intel® oneAPI Toolkits components used by the project are already included into JAR package mentioned above. There are no extra installations for cluster nodes.
6262

@@ -117,7 +117,7 @@ Edit related variables in "`Minimun Settings`" of `env.sh`
117117

118118
### PySpark Support
119119

120-
As PySpark-based applications call their Scala couterparts, they shall be supported out-of-box. Examples can be found in the [Examples](#examples) section.
120+
As PySpark-based applications call their Scala counterparts, they shall be supported out-of-box. Examples can be found in the [Examples](#examples) section.
121121

122122
## Building
123123

RELEASE

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
1-
OAP_MLLIB_VERSION=1.3.1
1+
OAP_MLLIB_VERSION=1.4.0
22
SPARK_VERSION=3.2.0
33
PLATFORM_PROFILE=CPU_ONLY_PROFILE

0 commit comments

Comments
 (0)