Skip to content

Commit 85c8af6

Browse files
author
Hong
authored
[ML-172] Update documents for OAP 1.5.0 (#241)
1 parent 82235e0 commit 85c8af6

6 files changed

+177
-25
lines changed

CHANGELOG.md

+143-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,147 @@
11
# Change log
2-
Generated on 2022-07-05
2+
Generated on 2022-12-14
3+
4+
## Release 1.5.0
5+
6+
### Gazelle Plugin
7+
8+
#### Features
9+
|||
10+
|:---|:---|
11+
|[#931](https://github.com/oap-project/gazelle_plugin/issues/931)|Reuse partition vectors for arrow scan|
12+
|[#955](https://github.com/oap-project/gazelle_plugin/issues/955)|implement missing expressions|
13+
|[#1120](https://github.com/oap-project/gazelle_plugin/issues/1120)|Support aggregation window functions with order by|
14+
|[#1135](https://github.com/oap-project/gazelle_plugin/issues/1135)|Supports Spark 3.2.2 shims|
15+
|[#1114](https://github.com/oap-project/gazelle_plugin/issues/1114)|Remove tmp directory after application exits|
16+
|[#862](https://github.com/oap-project/gazelle_plugin/issues/862)|implement row_number window function|
17+
|[#1007](https://github.com/oap-project/gazelle_plugin/issues/1007)|Document how to test columnar UDF|
18+
|[#942](https://github.com/oap-project/gazelle_plugin/issues/942)|Use hash aggregate for string type input|
19+
20+
#### Performance
21+
|||
22+
|:---|:---|
23+
|[#1144](https://github.com/oap-project/gazelle_plugin/issues/1144)|Optimize cast WSCG performance|
24+
25+
#### Bugs Fixed
26+
|||
27+
|:---|:---|
28+
|[#1170](https://github.com/oap-project/gazelle_plugin/issues/1170)|Segfault on data source v2|
29+
|[#1164](https://github.com/oap-project/gazelle_plugin/issues/1164)|Limit the column num in WSCG|
30+
|[#1166](https://github.com/oap-project/gazelle_plugin/issues/1166)|Peers' values should be considered in window function for CURRENT ROW in range mode|
31+
|[#1149](https://github.com/oap-project/gazelle_plugin/issues/1149)|Vulnerability issues|
32+
|[#1112](https://github.com/oap-project/gazelle_plugin/issues/1112)|Validate Error: “Invalid: Length spanned by binary offsets (21) larger than values array (size 20)”|
33+
|[#1103](https://github.com/oap-project/gazelle_plugin/issues/1103)|wrong hashagg results|
34+
|[#929](https://github.com/oap-project/gazelle_plugin/issues/929)|Failed to add user extension while using gazelle|
35+
|[#1100](https://github.com/oap-project/gazelle_plugin/issues/1100)|Wildcard in json path is not supported|
36+
|[#1079](https://github.com/oap-project/gazelle_plugin/issues/1079)|Like function gets wrong result when default escape char is contained|
37+
|[#1046](https://github.com/oap-project/gazelle_plugin/issues/1046)|Fall back to use row-based operators, error is makeStructField is unable to parse from conv|
38+
|[#1053](https://github.com/oap-project/gazelle_plugin/issues/1053)|Exception when there is function expression in pos or len of substring|
39+
|[#1024](https://github.com/oap-project/gazelle_plugin/issues/1024)|ShortType is not supported in ColumnarLiteral|
40+
|[#1034](https://github.com/oap-project/gazelle_plugin/issues/1034)|Exception when there is unix_timestamp in CaseWhen |
41+
|[#1032](https://github.com/oap-project/gazelle_plugin/issues/1032)|Missing WSCG check for ExistenceJoin|
42+
|[#1027](https://github.com/oap-project/gazelle_plugin/issues/1027)|partition by literal in window function|
43+
|[#1019](https://github.com/oap-project/gazelle_plugin/issues/1019)|Support more date formats for from_unixtime & unix_timestamp|
44+
|[#999](https://github.com/oap-project/gazelle_plugin/issues/999)|The performance of using ColumnarSort operator to sort string type is significantly lower than that of native spark Sortexec|
45+
|[#984](https://github.com/oap-project/gazelle_plugin/issues/984)|concat_ws|
46+
|[#958](https://github.com/oap-project/gazelle_plugin/issues/958)|JVM/Native R2C and CoalesceBatcth process time inaccuracy|
47+
|[#979](https://github.com/oap-project/gazelle_plugin/issues/979)|Failed to find column while reading parquet with case insensitive|
48+
49+
#### PRs
50+
|||
51+
|:---|:---|
52+
|[#1175](https://github.com/oap-project/gazelle_plugin/pull/1175)|[NSE-1171] Support merge parquet schema and read missing schema|
53+
|[#1178](https://github.com/oap-project/gazelle_plugin/pull/1178)|[NSE-1161][FOLLOWUP] Remove extra compression type check|
54+
|[#1162](https://github.com/oap-project/gazelle_plugin/pull/1162)|[NSE-1161] Support read-write parquet conversion to read-write arrow|
55+
|[#1014](https://github.com/oap-project/gazelle_plugin/pull/1014)|[NSE-956] allow to write parquet with compression|
56+
|[#1176](https://github.com/oap-project/gazelle_plugin/pull/1176)|bump h2/pgsql version|
57+
|[#1173](https://github.com/oap-project/gazelle_plugin/pull/1173)|[NSE-1171] Throw RuntimeException when reading duplicate fields in case-insensitive mode|
58+
|[#1172](https://github.com/oap-project/gazelle_plugin/pull/1172)|[NSE-1170] Setting correct row number in batch scan w/ partition columns|
59+
|[#1169](https://github.com/oap-project/gazelle_plugin/pull/1169)|[NSE-1161] Format sql config string key|
60+
|[#1167](https://github.com/oap-project/gazelle_plugin/pull/1167)|[NSE-1166] Cover peers' values in sum window function in range mode|
61+
|[#1165](https://github.com/oap-project/gazelle_plugin/pull/1165)|[NSE-1164] Limit the max column num in WSCG|
62+
|[#1160](https://github.com/oap-project/gazelle_plugin/pull/1160)|[NSE-1149] upgrade guava to 30.1.1|
63+
|[#1158](https://github.com/oap-project/gazelle_plugin/pull/1158)|[NSE-1149] upgrade guava to 30.1.1|
64+
|[#1152](https://github.com/oap-project/gazelle_plugin/pull/1152)|[NSE-1149] upgrade guava to 24.1.1|
65+
|[#1153](https://github.com/oap-project/gazelle_plugin/pull/1153)|[NSE-1149] upgrade pgsql to 42.3.3|
66+
|[#1150](https://github.com/oap-project/gazelle_plugin/pull/1150)|[NSE-1149] Remove log4j in shims module|
67+
|[#1146](https://github.com/oap-project/gazelle_plugin/pull/1146)|[NSE-1135] Introduce shim layer for supporting spark 3.2.2|
68+
|[#1145](https://github.com/oap-project/gazelle_plugin/pull/1145)|[NSE-1144] Optimize cast wscg performance|
69+
|[#1136](https://github.com/oap-project/gazelle_plugin/pull/1136)|Remove project from wscg when it's the child of window|
70+
|[#1122](https://github.com/oap-project/gazelle_plugin/pull/1122)|[NSE-1120] Support sum window function with order by statement|
71+
|[#1131](https://github.com/oap-project/gazelle_plugin/pull/1131)|[NSE-1114] Remove temp directory without FileUtils.forceDeleteOnExit|
72+
|[#1129](https://github.com/oap-project/gazelle_plugin/pull/1129)|[NSE-1127] Use larger buffer for hash agg|
73+
|[#1130](https://github.com/oap-project/gazelle_plugin/pull/1130)|[NSE-610] fix hashjoin build time metric|
74+
|[#1126](https://github.com/oap-project/gazelle_plugin/pull/1126)|[NSE-1125] Add status check for hashing GetOrInsert|
75+
|[#1056](https://github.com/oap-project/gazelle_plugin/pull/1056)|[NSE-955] Support window function lag|
76+
|[#1123](https://github.com/oap-project/gazelle_plugin/pull/1123)|[NSE-1118] fix codegen on TPCDS Q88|
77+
|[#1119](https://github.com/oap-project/gazelle_plugin/pull/1119)|[NSE-1118] adding more checks for SMJ codegen|
78+
|[#1058](https://github.com/oap-project/gazelle_plugin/pull/1058)|[NSE-981] Add a test suite for projection codegen|
79+
|[#1117](https://github.com/oap-project/gazelle_plugin/pull/1117)|[NSE-1116] Disable columnar url_decoder|
80+
|[#1113](https://github.com/oap-project/gazelle_plugin/pull/1113)|[NSE-1112] Fix Arrow array meta data validating issue when writing parquet files|
81+
|[#1039](https://github.com/oap-project/gazelle_plugin/pull/1039)|[NSE-1019] fix codegen for all expressions|
82+
|[#1115](https://github.com/oap-project/gazelle_plugin/pull/1115)|[NSE-1114] Remove tmp directory after application exits|
83+
|[#1111](https://github.com/oap-project/gazelle_plugin/pull/1111)|remove debug log|
84+
|[#1098](https://github.com/oap-project/gazelle_plugin/pull/1098)|[NSE-1108] allow to use different cases in column names|
85+
|[#1082](https://github.com/oap-project/gazelle_plugin/pull/1082)|[NSE-1071] Refactor vector resizing in hash aggregate|
86+
|[#1036](https://github.com/oap-project/gazelle_plugin/pull/1036)|[NSE-987] fix string date|
87+
|[#948](https://github.com/oap-project/gazelle_plugin/pull/948)|[NSE-947] Add a whole stage fallback strategy|
88+
|[#1099](https://github.com/oap-project/gazelle_plugin/pull/1099)|[NSE-1104] fix hashagg w/ empty string|
89+
|[#1102](https://github.com/oap-project/gazelle_plugin/pull/1102)|[NSE-400] Fix memory leak for native C2R and R2C.|
90+
|[#1101](https://github.com/oap-project/gazelle_plugin/pull/1101)|[NSE-1100] Fall back get_json_object when wildcard is contained in json path|
91+
|[#1090](https://github.com/oap-project/gazelle_plugin/pull/1090)|[NSE-1065] fix on count distinct w/ keys|
92+
|[#1097](https://github.com/oap-project/gazelle_plugin/pull/1097)|Ignore two unit tests|
93+
|[#1081](https://github.com/oap-project/gazelle_plugin/pull/1081)|[NSE-1075] Support dynamic merge file partition|
94+
|[#1080](https://github.com/oap-project/gazelle_plugin/pull/1080)|[NSE-1079] Set the default escape char for like function|
95+
|[#1078](https://github.com/oap-project/gazelle_plugin/pull/1078)|[NSE-610] support big keys in hashagg|
96+
|[#1072](https://github.com/oap-project/gazelle_plugin/pull/1072)|[NSE-1071] Add tiny optimizations for hash aggregation functions|
97+
|[#1069](https://github.com/oap-project/gazelle_plugin/pull/1069)|[NSE-800] Remove spark-arrow-datasource-parquet in assembly|
98+
|[#1066](https://github.com/oap-project/gazelle_plugin/pull/1066)|[NSE-1065] Adding hashagg w/ filter support|
99+
|[#1067](https://github.com/oap-project/gazelle_plugin/pull/1067)|[NSE-958] Fix JVM R2C operator metrics|
100+
|[#935](https://github.com/oap-project/gazelle_plugin/pull/935)|[NSE-931] Reuse partition vectors for arrow scan|
101+
|[#1064](https://github.com/oap-project/gazelle_plugin/pull/1064)|[NSE-955] Implement parse_url|
102+
|[#1063](https://github.com/oap-project/gazelle_plugin/pull/1063)|[NSE-955] Support more date format in unix timestamp|
103+
|[#930](https://github.com/oap-project/gazelle_plugin/pull/930)|[NSE-929] Support user defined spark extensions|
104+
|[#1038](https://github.com/oap-project/gazelle_plugin/pull/1038)|[NSE-928] allow to sort with big partitions |
105+
|[#1057](https://github.com/oap-project/gazelle_plugin/pull/1057)|[NSE-1019] fix codegen for unixtimestamp|
106+
|[#1055](https://github.com/oap-project/gazelle_plugin/pull/1055)|[NSE-955] Support md5/sha1/sha2 functions|
107+
|[#903](https://github.com/oap-project/gazelle_plugin/pull/903)|[NSE-610] hashagg opt#3|
108+
|[#1044](https://github.com/oap-project/gazelle_plugin/pull/1044)|[NE-400] fix memory leakage in native columnartorow|
109+
|[#1041](https://github.com/oap-project/gazelle_plugin/pull/1041)|[NSE-1023] [NSE-1046] Cover more supported expressions in getting AttributeReference|
110+
|[#1054](https://github.com/oap-project/gazelle_plugin/pull/1054)|[NSE-1053] Support function in substring's pos and len|
111+
|[#1049](https://github.com/oap-project/gazelle_plugin/pull/1049)|[NSE-955] Support bin function|
112+
|[#1048](https://github.com/oap-project/gazelle_plugin/pull/1048)|[NSE-955] Support power function|
113+
|[#1042](https://github.com/oap-project/gazelle_plugin/pull/1042)|[NSE-955] Support find_in_set function|
114+
|[#1025](https://github.com/oap-project/gazelle_plugin/pull/1025)|[NSE-1024] Support ShortType in ColumnarLiteral|
115+
|[#1037](https://github.com/oap-project/gazelle_plugin/pull/1037)|[NSE-955] Turn on the support for get_json_object|
116+
|[#1033](https://github.com/oap-project/gazelle_plugin/pull/1033)|[NSE-1032] Adding WSCG check for keys in Join|
117+
|[#1035](https://github.com/oap-project/gazelle_plugin/pull/1035)|[NSE-1034] Add timeZoneId in ColumnarUnixTimestamp|
118+
|[#1028](https://github.com/oap-project/gazelle_plugin/pull/1028)|[NSE-1027] Problem with Literal in window function|
119+
|[#1017](https://github.com/oap-project/gazelle_plugin/pull/1017)|[NSE-999] use TimSort for STRING/DECIMAL onekey based sorting|
120+
|[#1022](https://github.com/oap-project/gazelle_plugin/pull/1022)|[NSE-955] Support remainder function|
121+
|[#1021](https://github.com/oap-project/gazelle_plugin/pull/1021)|[NSE-1019] [NSE-1020] Support more date formats and be aware of local time zone in handling unix timestamp|
122+
|[#1009](https://github.com/oap-project/gazelle_plugin/pull/1009)|[NSE-999] s/string/string_view in sort|
123+
|[#990](https://github.com/oap-project/gazelle_plugin/pull/990)|[NSE-943] Improve rowtocolumn operator|
124+
|[#1000](https://github.com/oap-project/gazelle_plugin/pull/1000)|[NSE-862] improve row_number()|
125+
|[#1013](https://github.com/oap-project/gazelle_plugin/pull/1013)|[NSE-955] Add Murmur3Hash expression support|
126+
|[#995](https://github.com/oap-project/gazelle_plugin/pull/995)|[NSE-981] Add more codegen checking in BHJ & SHJ|
127+
|[#1006](https://github.com/oap-project/gazelle_plugin/pull/1006)|[NSE-1007] Add a test guide for columnar UDF|
128+
|[#969](https://github.com/oap-project/gazelle_plugin/pull/969)|[NSE-943] Optimize data conversion for String/Binary type in Row2Columnar|
129+
|[#973](https://github.com/oap-project/gazelle_plugin/pull/973)|[NSE-928] Add ARROW_CHECK for batch_size check|
130+
|[#992](https://github.com/oap-project/gazelle_plugin/pull/992)|[NSE-984] fix concat_ws|
131+
|[#991](https://github.com/oap-project/gazelle_plugin/pull/991)|[NSE-981] check all expressions in HashAgg|
132+
|[#993](https://github.com/oap-project/gazelle_plugin/pull/993)|[NSE-979] fix data source|
133+
|[#980](https://github.com/oap-project/gazelle_plugin/pull/980)|[NSE-979] Support reading parquet with case sensitive|
134+
|[#985](https://github.com/oap-project/gazelle_plugin/pull/985)|[NSE-981] Implement supportColumnarCodegen to reflect the actual support state|
135+
|[#964](https://github.com/oap-project/gazelle_plugin/pull/964)|[NSE-955] implement lpad/rpad|
136+
|[#963](https://github.com/oap-project/gazelle_plugin/pull/963)|[NSE-955] implement concat_ws|
137+
|[#971](https://github.com/oap-project/gazelle_plugin/pull/971)|[NSE-955] Support hex expression|
138+
|[#968](https://github.com/oap-project/gazelle_plugin/pull/968)|[NSE-955] implement lower function |
139+
|[#965](https://github.com/oap-project/gazelle_plugin/pull/965)|[NSE-955] Support expression conv|
140+
|[#949](https://github.com/oap-project/gazelle_plugin/pull/949)|[NSE-862] implement row_number function|
141+
|[#960](https://github.com/oap-project/gazelle_plugin/pull/960)|[NSE-955] doc: Add columnar expression development guide|
142+
|[#941](https://github.com/oap-project/gazelle_plugin/pull/941)|[NSE-942] Force to use hash aggregate for string type input|
143+
|[#959](https://github.com/oap-project/gazelle_plugin/pull/959)|[NSE-958] Fix SQLMetrics inaccuracy in JVM/Native R2C and CoalesceBatcth|
144+
3145

4146
## Release 1.4.0
5147

README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ You can find the all the OAP MLlib documents on the [project web page](https://o
3838

3939
## Java/Scala Users Preferred
4040

41-
Use a pre-built OAP MLlib JAR to get started, you can download OAP MLlib JAR from [Release Page](https://github.com/oap-project/oap-mllib/releases/download/v1.4.0/oap-mllib-1.4.0.jar).
41+
Use a pre-built OAP MLlib JAR to get started, you can download OAP MLlib JAR from [Release Page](https://github.com/oap-project/oap-mllib/releases/download/v1.5.0/oap-mllib-1.5.0.jar).
4242

4343
Then you can refer to the following [Running](#running) section to try out.
4444

docs/OAP-Developer-Guide.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,8 @@ This document contains the instructions & scripts on installing necessary depend
44
You can get more detailed information from OAP each module below.
55

66

7-
* [OAP MLlib](https://github.com/oap-project/oap-mllib/tree/v1.4.0)
8-
* [Gazelle Plugin](https://github.com/oap-project/gazelle_plugin/tree/v1.4.0)
7+
* [OAP MLlib](https://github.com/oap-project/oap-mllib/tree/v1.5.0)
8+
* [Gazelle Plugin](https://github.com/oap-project/gazelle_plugin/tree/v1.5.0)
99

1010
## Building OAP
1111

@@ -18,14 +18,14 @@ We provide scripts to help automatically install dependencies required, please c
1818
# cd oap-tools
1919
# sh dev/install-compile-time-dependencies.sh
2020
```
21-
*Note*: oap-tools tag version `v1.4.0` corresponds to all OAP modules' tag version `v1.4.0`.
21+
*Note*: oap-tools tag version `v1.5.0` corresponds to all OAP modules' tag version `v1.5.0`.
2222

2323
Then the dependencies below will be installed:
2424

2525
* [Cmake](https://cmake.org/install/)
2626
* [GCC > 9](https://gcc.gnu.org/wiki/InstallingGCC)
2727
* [OneAPI](https://software.intel.com/content/www/us/en/develop/tools/oneapi.html)
28-
* [Arrow](https://github.com/oap-project/arrow/tree/v4.0.0-oap-1.4.0)
28+
* [Arrow](https://github.com/oap-project/arrow/tree/v4.0.0-oap-1.5.0)
2929
* [LLVM](https://llvm.org/)
3030

3131

docs/OAP-Installation-Guide.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ To test your installation, run the command `conda list` in your terminal window
2828
Create a Conda environment and install OAP Conda package.
2929

3030
```bash
31-
$ conda create -n oapenv -c conda-forge -c intel -y oap=1.4.0.spark32
31+
$ conda create -n oapenv -c conda-forge -c intel -y oap=1.5.0.spark32
3232
```
3333

3434
Once finished steps above, you have completed OAP dependencies installation and OAP building, and will find built OAP jars under `$HOME/miniconda2/envs/oapenv/oap_jars`

0 commit comments

Comments
 (0)