Generated on 2021-08-12
#1584 | [FEA] Support rank as window function |
#1859 | [FEA] Optimize row_number/rank for memory usage |
#2976 | [FEA] support for arrays in BroadcastNestedLoopJoinExec and CartesianProductExec |
#2398 | [FEA] GpuIf and GpuCoalesce supports ArrayType |
#2445 | [FEA] Support literal arrays in case/when statements |
#2757 | [FEA] Profiling tool display input data types |
#2860 | [FEA] Minimal support for LEGACY timeParserPolicy |
#2693 | [FEA] Profiling Tool: Print GDS + UCX related parameters |
#2334 | [FEA] Record GPU time and Fetch time separately, instead of recording Total Time |
#2685 | [FEA] Profiling compare mode for table SQL Duration and Executor CPU Time Percent |
#2742 | [FEA] include App Name from profiling tool output |
#2712 | [FEA] Display job and stage info in the dot graph for profiling tool |
#2562 | [FEA] Implement KnownNotNull on the GPU |
#2557 | [FEA] support sort_array on GPU |
#2307 | [FEA] Enable Parquet writing for arrays |
#1856 | [FEA] Create a batch chunking iterator and integrate it with GpuWindowExec |
#866 | [FEA] combine window operations into single call |
#2800 | [FEA] Support ORC small files coalescing reading |
#737 | [FEA] handle peer timeouts in shuffle |
#1590 | Rapids Shuffle - UcpListener |
#2275 | [FEA] UCP error callback deal with cleanup |
#2799 | [FEA] Support ORC multi-file cloud reading |
#3135 | [BUG] Regression seen in concatenate in NDS with RAPIDS Shuffle Manager enabled |
#3017 | [BUG] orc_write_test failed in databricks runtime |
#3060 | [BUG] ORC read can corrupt data when specified schema does not match file schema ordering |
#3065 | [BUG] window exec tries to do too much on the GPU |
#3066 | [BUG] Profiling tool generate dot file fails to convert |
#3038 | [BUG] leak in getDeviceMemoryBuffer for the unspill case |
#3007 | [BUG] data mess up reading from ORC |
#3029 | [BUG] udf_test failed in ucx standalone env |
#2723 | [BUG] test failures in CI build (observed in UCX job) after starting to use 21.08 |
#3016 | [BUG] databricks script failed to return correct exit code |
#3002 | [BUG] writing parquet with partitionBy() loses sort order |
#2959 | [BUG] Resolve common code source incompatibility with supported Spark versions |
#2589 | [BUG] RapidsShuffleHeartbeatManager needs to remove executors that are stale |
#2964 | [BUG] IGNORE ORDER, WITH DECIMALS: [Window] [MIXED WINDOW SPECS] FAILED in spark 3.0.3+ |
#2942 | [BUG] Cache of Array using ParquetCachedBatchSerializer failed with "DATA ACCESS MUST BE ON A HOST VECTOR" |
#2965 | [BUG] test_round_robin_sort_fallback failed with ValueError: 'a_1' is not in list |
#2891 | [BUG] Discrepancy in getting count before and after caching |
#2972 | [BUG] When using timeout option(-t) of qualification tool, it does not print anything in output after timeout. |
#2958 | [BUG] When AQE=on, SMJ with a Map in SELECTed list fails with "key not found: numPartitions" |
#2929 | [BUG] No validation of format strings when formatting dates in legacy timeParserPolicy mode |
#2900 | [BUG] CAST string to float/double produces incorrect results in some cases |
#2957 | [BUG] Builds failing due to breaking changes in SPARK-36034 |
#2901 | [BUG] GpuCompressedColumnVector cannot be cast to GpuColumnVector with AQE |
#2899 | [BUG] CAST string to integer produces incorrect results in some cases |
#2937 | [BUG] Fix more edge cases when parsing dates in legacy timeParserPolicy |
#2939 | [BUG] Window integration tests failing with Lead expected at least 3 but found 0 |
#2912 | [BUG] Profiling compare mode fails when comparing spark 2 eventlog to spark 3 event log |
#2892 | [BUG] UCX error Message truncated observed with UCX 1.11 RC in Q77 NDS |
#2807 | [BUG] Use UCP_AM_FLAG_WHOLE_MSG and UCP_AM_FLAG_PERSISTENT_DATA for receive handlers |
#2930 | [BUG] Profiling tool does not show "Potential Problems" for dataset API in section "SQL Duration and Executor CPU Time Percent" |
#2902 | [BUG] CAST string to bool produces incorrect results in some cases |
#2850 | [BUG] "java.io.InterruptedIOException: getFileStatus on s3a://xxx" for ORC reading in Databricks 8.2 runtime |
#2856 | [BUG] cache of struct does not work on databricks 8.2ML |
#2790 | [BUG] In Comparison mode health check does not show the application id |
#2713 | [BUG] profiling tool does not error or warn if incompatible options are given |
#2477 | [BUG] test_single_sort_in_part is failed in nightly UCX and AQE (no UCX) integration |
#2868 | [BUG] to_date produces wrong value on GPU for some corner cases |
#2907 | [BUG] incorrect expression to detect previously set --master |
#2893 | [BUG] TransferRequest request transactions are getting leaked |
#120 | [BUG] GPU InitCap supports too much white space. |
#2786 | [BUG][initCap function]There is an issue converting the uppercase character to lowercase on GPU. |
#2754 | [BUG] cudf_udf tests failed w/ 21.08 |
#2820 | [BUG] Metrics are inconsistent for GpuRowToColumnarToExec |
#2710 | [BUG] dot file generation can go over the limits of dot |
#2772 | [BUG] new integration test failures w/ maxFailures=1 |
#2739 | [BUG] CBO causes less efficient plan for NDS q84 |
#2717 | [BUG] CBO forces joins back onto CPU in some cases |
#2718 | [BUG] CBO falls back to CPU to write to Parquet in some cases |
#2692 | [BUG] Profiling tool: Add error handling for comparison functions |
#2711 | [BUG] reused stages should not appear multiple times in dot |
#2746 | [BUG] test_single_nested_sort_in_part integration test failure 21.08 |
#2690 | [BUG] Profiling tool doesn't properly read rolled log files |
#2546 | [BUG] Build Failure when building from source |
#2750 | [BUG] nightly test failed with lists: testStringReplaceWithBackrefs |
#2644 | [BUG] test event logs should be compressed |
#2725 | [BUG] Heartbeat from unknown executor when running with UCX shuffle in local mode |
#2715 | [BUG] Part of the plan is not columnar class com.databricks.sql.execution.window.RunningWindowFunc |
#2521 | [BUG] cudf_udf failed in all spark release intermittently |
#1712 | [BUG] Scala UDF compiler can decompile UDFs with RAPIDS implementation |
#3214 | Update download and databricks doc for 21.06.2 [skip ci] |
#3210 | Update 21.08.0 changelog to latest [skip ci] |
#3197 | Databricks parquetFilters api change in db 8.2 runtime |
#3168 | Update 21.08 changelog to latest [skip ci] |
#3146 | update cudf Java binding version to 21.08.2 |
#3080 | Update docs for 21.08 release |
#3136 | Update tool docs to explain default filesystem [skip ci] |
#3128 | Fix merge conflict 3126 from branch-21.06 [skip ci] |
#3124 | Fix merge conflict 3122 from branch-21.06 [skip ci] |
#3100 | Update databricks 3.0.1 shim to new ParquetFilter api |
#3083 | Initial CHANGELOG.md update for 21.08 |
#3079 | Remove the struct support in ORC reader |
#3062 | Fix ORC read corruption when specified schema does not match file order |
#3064 | Tweak scaladoc to callout the GDS+unspill case in copyBuffer |
#3049 | Handle mmap exception more gracefully in RapidsShuffleServer |
#3067 | Update to UCX 1.11.0 |
#3024 | Check validity of any() or all() results that could be null |
#3069 | Fall back to the CPU on window partition by struct or array |
#3068 | Profiling tool generate dot file fails on unescaped html characters |
#3048 | Apply unique committer job ID fix from SPARK-33230 |
#3050 | Updates for google analytics [skip ci] |
#3015 | Fix ORC read error when read schema reorders file schema columns |
#3053 | cherry-pick #3028 [skip ci] |
#2887 | ORC reader supports struct |
#3032 | Add disorder read schema test case for Parquet |
#3022 | Add in docs to describe window performance |
#3018 | [BUG] fix db script hides error issue |
#2953 | Add in support for rank and dense_rank |
#3009 | Propagate child output ordering in GpuCoalesceBatches |
#2989 | Re-enable Array support in Cartesian Joins, Broadcast Nested Loop Joins |
#2999 | Remove unused configuration setting spark.rapids.sql.castStringToInteger.enabled |
#2967 | Resolve hidden source incompatibility between Spark30x and Spark31x Shims |
#2982 | Add FAQ entry for timezone error |
#2839 | GpuIf and GpuCoalesce support array and struct types |
#2987 | Update documentation for unsupported edge cases when casting from string to timestamp |
#2977 | Expire executors from the RAPIDS shuffle heartbeat manager on timeout |
#2985 | Move tools README to docs/additional-functionality/qualification-profiling-tools.md with some modification |
#2992 | Remove commented/redundant window-function tests. |
#2994 | Tweak RAPIDS Shuffle Manager configs for 21.08 |
#2984 | Avoid comparing window range canonicalized plans on Spark 3.0.x |
#2970 | Put the GPU data back on host before processing cache on CPU |
#2986 | Avoid struct aliasing in test_round_robin_sort_fallback |
#2935 | Read the complete batch before returning when selectedAttributes is empty |
#2826 | CaseWhen supports scalar of list and struct |
#2978 | enable auto-merge from branch 21.08 to 21.10 [skip ci] |
#2946 | ORC reader supports list |
#2947 | Qualification tool: Filter based on timestamp in event logs |
#2973 | Assert that CPU and GPU row fields match when present |
#2974 | Qualification tool: fix performance regression |
#2948 | Remove unnecessary copies of ParquetCachedBatchSerializer |
#2968 | Fix AQE CustomShuffleReaderExec not seeing ShuffleQueryStageExec |
#2969 | Make the dir for spark301 shuffle shim match package name |
#2933 | Improve CAST string to float implementation to handle more edge cases |
#2963 | Add override getParquetFilters for shim 304 |
#2956 | Profile Tool: make order consistent between runs |
#2924 | Fix bug when collecting directly from a GPU shuffle query stage with AQE on |
#2950 | Fix shutdown bugs in the RAPIDS Shuffle Manager |
#2922 | Improve UCX assertion to show the failed assertion |
#2961 | Fix ParquetFilters issue |
#2951 | Qualification tool: Allow app start and app name filtering and test with filesystem filters |
#2941 | Make test event log compression codec configurable |
#2919 | Fix bugs in CAST string to integer |
#2944 | Fix childExprs list for GpuWindowExpression, for Spark 3.1.x. |
#2917 | Refine GpuHashAggregateExec.setupReference |
#2909 | Support orc coalescing reading |
#2938 | Qualification tool: Add negation filter |
#2940 | qualification tool: add filtering by app start time |
#2928 | Qualification tool support recognizing decimal operations |
#2934 | Qualification tool: Add filter based on appName |
#2904 | Qualification and Profiling tool handle Read formats and datatypes |
#2927 | Restore aggregation sorted data hint |
#2932 | Profiling tool: Fix comparing spark2 and spark3 event logs |
#2926 | GPU Active Messages for all buffer types |
#2888 | Type check with the information from RapidsMeta |
#2903 | Fix cast string to bool |
#2895 | Add in running window optimization using scan |
#2859 | Add spillable batch caching and sort fallback to hash aggregation |
#2898 | Add fuzz tests for cast from string to other types |
#2881 | fix orc readers leak issue for ORC PERFILE type |
#2842 | Support STRUCT/STRING for LEAD()/LAG() |
#2880 | Added ParquetCachedBatchSerializer support for Databricks |
#2911 | Add in ID as sort for Job + Stage level aggregated task metrics |
#2914 | Profiling tool: add app index to tables that don't have it |
#2906 | Fix compiler warning |
#2890 | Fix cast to date bug |
#2908 | Fixes bad string contains in run_pyspark_from_build |
#2886 | Use UCP Listener for UCX connections and enable peer error handling |
#2875 | Add support for timeParserPolicy=LEGACY |
#2894 | Fixes a JVM leak for UCX TransactionRequests |
#2854 | Qualification Tool to output only the 'k' highest-ranked or 'k' lowest-ranked applications |
#2873 | Fix infinite loop in MultiFileCloudPartitionReaderBase |
#2838 | Replace toTitle with capitalize for GpuInitCap |
#2870 | Avoid readers acquiring GPU on next batch query if not first batch |
#2882 | Refactor window operations to do them in the exec |
#2874 | Update audit script to clone branch-3.2 instead of master |
#2843 | Qualification/Profiling tool add tests for Spark2 event logs |
#2828 | add cloud reading for orc |
#2721 | Check-list for corner cases in testing. |
#2675 | Support for Decimals with negative scale for Parquet Cached Batch Serializer |
#2849 | Update release notes to include qualification and profiling tool |
#2852 | Fix hash aggregate tests leaking configs into other tests |
#2845 | Split window exec into multiple stages if needed |
#2853 | Tag last batch when coalescing |
#2851 | Fix build failure - update ucx profiling test to fix parameter type to getEventLogInfo |
#2785 | Profiling tool: Print UCX and GDS parameters |
#2840 | Fix Gpu -> GPU |
#2844 | Document Qualification tool Spark requirements |
#2787 | Add metrics definition link to tool README.md[skip ci] |
#2841 | Add a threadpool to Qualification tool to process logs in parallel |
#2833 | Stop running so many versions of Spark unit tests for premerge |
#2837 | Append new authorized user to blossom-ci whitelist [skip ci] |
#2822 | Rewrite Qualification tool for better performance |
#2823 | Add semaphoreWaitTime and gpuOpTime for GpuRowToColumnarExec |
#2829 | Fix filtering directories on compression extension match |
#2720 | Add metrics documentation to the tuning guide |
#2816 | Improve some existing collectTime handling |
#2821 | Truncate long plan labels and refer to "print-plans" |
#2827 | Update cmake to build udf native [skip ci] |
#2793 | Report equivilant stages/sql ids as a part of compare |
#2810 | Use SecureRandom for UCPListener TCP port choice |
#2798 | Mirror apache repos to urm |
#2788 | Update the type signatures for some expressions |
#2792 | Automatically set spark.task.maxFailures and local[*, maxFailures] |
#2805 | Revert "Use UCX Active Messages for all shuffle transfers (#2735)" |
#2796 | show disk bytes spilled when GDS spill is enabled |
#2801 | Update pre-merge to use reserved_pool [skip ci] |
#2795 | Improve CBO debug logging |
#2794 | Prevent integer overflow when estimating data sizes in cost-based optimizer |
#2784 | Make spark303 shim version w/o snapshot and add shim layer for spark304 |
#2744 | Cost-based optimizer: Implement simple cost model that demonstrates benefits with NDS queries |
#2762 | Profiling tool: Update comparison mode output format and add error handling |
#2761 | Update dot graph to include stages and remove some duplication |
#2760 | Add in application timeline to profiling tool |
#2735 | Use UCX Active Messages for all shuffle transfers |
#2732 | qualification and profiling tool support rolled and compressed event logs for CSPs and Apache Spark |
#2768 | Make window function test results deterministic. |
#2769 | Add developer documentation for Adaptive Query Execution |
#2532 | date_format should not suggest enabling incompatibleDateFormats for formats we cannot support |
#2743 | Disable dynamicAllocation and set maxFailures to 1 in integration tests |
#2749 | Revert "Add in support for lists in some joins (#2702)" |
#2181 | abstract the parquet coalescing reading |
#2753 | Merge branch-21.06 to branch-21.08 [skip ci] |
#2751 | remove invalid blossom-ci users [skip ci] |
#2707 | Support KnownNotNull running on GPU |
#2747 | Fix num_slices for test_single_nested_sort_in_part |
#2729 | fix 301db-shim typecheck typo |
#2726 | Fix local mode starting RAPIDS shuffle heartbeats |
#2722 | Support aggregation on NullType in RunningWindowExec |
#2719 | Avoid executing child plan twice in CoalesceExec |
#2586 | Update metrics use in GpuUnionExec and GpuCoalesceExec |
#2716 | Add file size check to pre-merge CI |
#2554 | Upload build failure log to Github for external contributors access |
#2596 | Initial running window memory optimization |
#2702 | Add in support for arrays in BroadcastNestedLoopJoinExec and CartesianProductExec |
#2699 | Add a pre-commit hook to reject large files |
#2700 | Set numSlices and use parallelize to build dataframe for partition-se… |
#2548 | support collect_set in rolling window |
#2661 | Make tools inherit common dependency versions from parent pom |
#2668 | Remove CUDA 10.x from getting started guide [skip ci] |
#2676 | Profiling tool: Print Job Information in compare mode |
#2679 | Merge branch-21.06 to branch-21.08 [skip ci] |
#2677 | Add pre-merge independent stage timeout [skip ci] |
#2616 | support GpuSortArray |
#2582 | support parquet write arrays |
#2609 | Fix automerge failure from branch-21.06 to branch-21.08 |
#2570 | Added nested structs to UnionExec |
#2581 | Fix merge conflict 2580 [skip ci] |
#2458 | Split batch by key for window operations |
#2565 | Merge branch-21.06 into branch-21.08 |
#2563 | Document: git commit twice when copyright year updated by hook |
#2561 | Fixing the merge of 21.06 to 21.08 for comment changes in Profiling tool |
#2558 | Fix cdh shim version in 21.08 [skip ci] |
#2543 | Init branch-21.08 |
#3191 | [BUG] Databricks parquetFilters build failure in db 8.2 runtime |
#3209 | Update 21.06.2 changelog [skip ci] |
#3208 | Update rapids plugin version to 21.06.2 [skip ci] |
#3207 | Disable auto-merge from 21.06 to 21.08 [skip ci] |
#3205 | Branch 21.06 databricks update [skip ci] |
#3198 | Databricks parquetFilters api change in db 8.2 runtime |
#3098 | [BUG] Databricks parquetFilters build failure |
#3127 | Update CHANGELOG for the release v21.06.1 [skip ci] |
#3123 | Update rapids plugin version to 21.06.1 [skip ci] |
#3118 | Fix databricks 3.0.1 for ParquetFilters api change |
#3119 | Branch 21.06 databricks update [skip ci] |
#2483 | [FEA] Profiling and qualification tool |
#951 | [FEA] Create Cloudera shim layer |
#2481 | [FEA] Support Spark 3.1.2 |
#2530 | [FEA] Add support for Struct columns in CoalesceExec |
#2512 | [FEA] Report gpuOpTime not totalTime for expand, generate, and range execs |
#63 | [FEA] support ConcatWs sql function |
#2501 | [FEA] Add support for scalar structs to named_struct |
#2286 | [FEA] update UCX documentation for branch 21.06 |
#2436 | [FEA] Support nested types in CreateNamedStruct |
#2461 | [FEA] Report gpuOpTime instead of totalTime for project, filter, window, limit |
#2465 | [FEA] GpuFilterExec should report gpuOpTime not totalTime |
#2013 | [FEA] Support concatenating ArrayType columns |
#2425 | [FEA] Support for casting array of floats to array of doubles |
#2012 | [FEA] Support Window functions(lead & lag) for ArrayType |
#2011 | [FEA] Support creation of 2D array type |
#1582 | [FEA] Allow StructType as input and output type to InMemoryTableScan and InMemoryRelation |
#216 | [FEA] Range window-functions must support non-timestamp order-by expressions |
#2390 | [FEA] CI/CD for databricks 8.2 runtime |
#2273 | [FEA] Enable struct type columns for GpuHashAggregateExec |
#20 | [FEA] Support out of core joins |
#2160 | [FEA] Support Databricks 8.2 ML Runtime |
#2330 | [FEA] Enable hash partitioning with arrays |
#1103 | [FEA] Support date_format on GPU |
#1125 | [FEA] explode() can take expressions that generate arrays |
#1605 | [FEA] Support sorting on struct type keys |
#1445 | [FEA] GDS Integration |
#1588 | Rapids shuffle - UCX active messages |
#2367 | [FEA] CBO: Implement costs for memory access and launching kernels |
#2431 | [FEA] CBO should show benefits with q24b with decimals enabled |
#2652 | [BUG] No Job Found. Exiting. |
#2659 | [FEA] Group profiling tool "Potential Problems" |
#2680 | [BUG] cast can throw NPE |
#2628 | [BUG] failed to build plugin in databricks runtime 8.2 |
#2605 | [BUG] test_pandas_map_udf_nested_type failed in Yarn integration |
#2622 | [BUG] compressed event logs are not processed |
#2478 | [BUG] When tasks complete, cancel pending UCX requests |
#1953 | [BUG] Could not allocate native memory when running DLRM ETL with --output_ordering input on A100 |
#2495 | [BUG] scaladoc warning GpuParquetScan.scala:727 "discarding unmoored doc comment" |
#2368 | [BUG] Mismatched number of columns while performing GpuSort |
#2407 | [BUG] test_round_robin_sort_fallback failed |
#2497 | [BUG] GpuExec failed to find metric totalTime in databricks env |
#2473 | [BUG] enable test_window_aggs_for_rows_lead_lag_on_arrays and make the order unambiguous |
#2489 | [BUG] Queries with window expressions fail when cost-based optimizer is enabled |
#2457 | [BUG] test_window_aggs_for_rows_lead_lag_on_arrays failed |
#2371 | [BUG] Performance regression for crossjoin on 0.6 comparing to 0.5 |
#2372 | [BUG] FAILED ../../src/main/python/udf_cudf_test.py::test_window |
#2404 | [BUG] test_hash_pivot_groupby_nan_fallback failed on Dataproc |
#2474 | [BUG] when ucp listener enabled we bind 16 times always |
#2427 | [BUG] test_union_struct_missing_children[(Struct(not_null) failed in databricks310 and spark 311 |
#2455 | [BUG] CaseWhen crashes on literal arrays |
#2421 | [BUG] NPE when running mapInPandas Pandas UDF in 0.5GA |
#2428 | [BUG] Intermittent ValueError in test_struct_groupby_count |
#1628 | [BUG] TPC-DS-like query 24a and 24b at scale=3TB fails with OOM |
#2276 | [BUG] SPARK-33386 - ansi-mode changed ElementAt/Elt/GetArray behavior in Spark 3.1.1 - fallback to cpu |
#2309 | [BUG] legacy cast of a struct column to string with a single nested null column yields null instead of '[]' |
#2315 | [BUG] legacy struct cast to string crashes on a two field struct |
#2406 | [BUG] test_struct_groupby_count failed |
#2378 | [BUG] java.lang.ClassCastException: GpuCompressedColumnVector cannot be cast to GpuColumnVector |
#2355 | [BUG] convertDecimal64ToDecimal32Wrapper leaks ColumnView instances |
#2346 | [BUG] segfault when using UcpListener in TCP-only setup |
#2364 | [BUG] qa_nightly_select_test.py::test_select integration test fails |
#2302 | [BUG] Int96 are not being written as expected |
#2359 | [BUG] Alias is different in spark 3.1.0 but our canonicalization code doesn't handle |
#2277 | [BUG] spark.sql.legacy.parquet.datetimeRebaseModeInRead=CORRECTED or LEGACY still fails to read LEGACY date from parquet |
#2320 | [BUG] TypeChecks diagnostics outputs column ids instead of unsupported types |
#2238 | [BUG] Unnecessary to cache the batches that will be sent to Python in FlatMapGroupInPandas . |
#1811 | [BUG] window_function_test.py::test_multi_types_window_aggs_for_rows_lead_lag[partBy failed |
#2817 | Update changelog for v21.06.0 release [skip ci] |
#2806 | Noted testing for A10, noted that min driver ver is HW specific |
#2797 | Update documentation for InitCap incompatibility |
#2774 | Update changelog for 21.06 release [skip ci] |
#2770 | [Doc] add more for Alluxio page [skip ci] |
#2745 | Add link to Mellanox RoCE documentation and mention --without-ucx installation option |
#2740 | Update cudf Java bindings to 21.06.1 |
#2664 | Update changelog for 21.06 release [skip ci] |
#2697 | fix GDS spill bug when copying from the batch write buffer |
#2691 | Update properties to check if table there |
#2687 | Remove CUDA 10.x from getting started guide (#2668) |
#2686 | Profiling tool: Print Job Information in compare mode |
#2657 | Print CPU and GPU output when _assert_equal fails to help debug given… |
#2681 | Avoid NPE when casting empty strings to ints |
#2669 | Fix multiple problems reported and improve error handling |
#2666 | [DOC]Update custom image guide in GCP dataproc to reduce cluster startup time |
#2665 | Update docs to move RAPIDS Shuffle out of beta [skip ci] |
#2671 | Clean profiling&qualification tool README |
#2673 | Profiling tool: Enable tests and update compressed event log |
#2672 | Update cudfjni dependency version to 21.06.0 |
#2663 | Qualification tool - add in estimating the App end time when the event log missing application end event |
#2600 | Accelerate RunningWindow queries on GPU |
#2651 | Profiling tool - fix reporting contains dataset when sql time 0 |
#2623 | Fixed minor mistakes in documentation |
#2631 | Update docs for Databricks 8.2 ML |
#2638 | Add an init script for databricks 7.3ML with CUDA11.0 installed |
#2643 | Profiling tool: Health check follow on |
#2640 | Add physical plan to the dot file as the graph label |
#2637 | Fix databricks for 3.1.1 |
#2577 | Update download.md and FAQ.md for 21.06.0 |
#2636 | Profiling tool - Fix file writer for generating dot graphs, supporting writing sql plans to a file, change output to subdirectory |
#2625 | Exclude failed jobs/queries from Qualification tool output |
#2626 | Enable processing of compressed Spark event logs |
#2632 | Profiling tool: Add support for health check. |
#2627 | Ignore order for map udf test |
#2620 | Change aggregation of executor CPU and run time for Qualification tool to speed up query |
#2618 | Correct an issue for README for tools and also correct s3 solution in Args.scala |
#2612 | Profiling tool, add in job to stage, duration, executor cpu time, fix writing to HDFS |
#2614 | change rapids-4-spark-tools directory to tools in deploy script [skip ci] |
#2611 | Revert "disable cudf_udf tests for #2521" |
#2604 | Profile/qualification tool error handling improvements and support spark < 3.1.1 |
#2598 | Rename rapids-4-spark-tools directory to tools |
#2576 | Add filter support for qualification and profiling tool. |
#2603 | Add the doc for -g option of the profiling tool. |
#2594 | Change the README of the qualification and profiling tool to match the current version. |
#2591 | Implement test for qualification tool sql metric aggregates |
#2590 | Profiling tool support for collection and analysis |
#2587 | Handle UCX connection timeouts from heartbeats more gracefully |
#2588 | Fix package name |
#2574 | Add Qualification tool support |
#2571 | Change test_single_sort_in_part to print source data frame on failure |
#2569 | Remove -SNAPSHOT in documentation in preparation for release |
#2429 | Change RMM_ALLOC_FRACTION to represent percentage of available memory, rather than total memory, for initial allocation |
#2553 | Cancel requests that are queued for a client/handler on error |
#2566 | expose unspill config option |
#2460 | align GDS reads/writes to 4 KiB |
#2515 | Remove fetchTime and standardize on collectTime |
#2523 | Not compile RapidsUDF when udf compiler is enabled |
#2538 | Fixed code indentation in ParquetCachedBatchSerializer |
#2559 | Release profiling tool jar to maven central |
#2423 | Add cloudera shim layer |
#2520 | Add event logs for integration tests |
#2525 | support interval.microseconds for range window TimeStampType |
#2536 | Don't do an extra shuffle in some TopN cases |
#2508 | Refactor the code for conditional expressions |
#2542 | enable auto-merge from 21.06 to 21.08 [skip ci] |
#2540 | Update spark 312 shim, and Add spark 313-SNAPSHOT shim |
#2539 | disable cudf_udf tests for #2521 |
#2514 | Add Struct support for ParquetWriter |
#2534 | Remove scaladoc on an internal method to avoid warning during build |
#2537 | Add CentOS documentation and improve dockerfiles for UCX |
#2531 | Add nested types and decimals to CoalesceExec |
#2513 | Report opTime not totalTime for expand, range, and generate execs |
#2533 | Fix concat_ws test specifying only a separator for databricks |
#2528 | Make GenerateDot test more robust |
#2529 | Change Databricks 310 shim to be 311 to match reported spark.version |
#2479 | Support concat with separator on GPU |
#2507 | Improve test coverage for sorting structs |
#2526 | Improve debug print to include addresses and null counts |
#2463 | Add EMR 6.3 documentation |
#2516 | Avoid listener race collecting wrong plan in assert_gpu_fallback_collect |
#2505 | Qualification tool updates for datasets, udf, and misc fixes |
#2509 | Added in basic support for scalar structs to named_struct |
#2449 | Add code for generating dot file visualizations |
#2475 | Update shuffle documentation for branch-21.06 and UCX 1.10.1 |
#2500 | Update Dockerfile for native UDF |
#2506 | Support creating Scalars/ColumnVectors from utf8 strings directly. |
#2502 | Remove work around for nulls in semi-anti joins |
#2503 | Remove temporary logging and adjust test column names |
#2499 | Fix regression in TOTAL_TIME metrics for Databricks |
#2498 | Add in basic support for scalar maps and allow nesting in named_struct |
#2496 | Add comments for lazy binding in WindowInPandas |
#2493 | improve window agg test for range numeric types |
#2491 | Fix regression in cost-based optimizer when calculating cost for Window operations |
#2482 | Window tests with smaller batches |
#2490 | Add temporary logging for Dataproc round robin fallback issue |
#2486 | Remove the null replacement in computePredicate |
#2469 | Adding additional functionalities to profiling tool |
#2462 | Report gpuOpTime instead of totalTime for project, filter, limit, and window |
#2484 | Fix the failing test test_window on Databricks |
#2472 | Fix hash_aggregate_test |
#2476 | Fix for UCP Listener created spark.port.maxRetries times |
#2471 | skip test_window_aggs_for_rows_lead_lag_on_arrays |
#2446 | Update plugin version to 21.06.0 |
#2409 | Change shuffle metadata messages to use UCX Active Messages |
#2397 | Include memory access costs in cost models (cost-based optimizer) |
#2442 | fix GpuCreateNamedStruct not serializable issue |
#2379 | support GpuConcat on ArrayType |
#2456 | Fall back to the CPU for literal array values on case/when |
#2447 | Filter out the nulls after slicing the batches. |
#2426 | Implement cast of nested arrays |
#2299 | support creating array of array |
#2451 | Update tuning docs to add batch size recommendations. |
#2435 | support lead/lag on arrays |
#2448 | support creating list ColumnVector for Literal(ArrayType(NullType)) |
#2402 | Add profiling tool |
#2313 | Supports GpuLiteral of array type |
#938 | [FEA] Have hashed shuffle match spark |
#1604 | [FEA] Support casting structs to strings |
#1920 | [FEA] Support murmur3 hashing of structs |
#2018 | [FEA] A way for user to find out the plugin version and cudf version in REPL |
#77 | [FEA] Support ArrayContains |
#1721 | [FEA] build cudf jars with NVTX enabled |
#1782 | [FEA] Shim layers to support spark versions |
#1625 | [FEA] Support Decimal Casts to String and String to Decimal |
#166 | [FEA] Support get_json_object |
#1698 | [FEA] Support casting structs to string |
#1912 | [FEA] Let Scalar Pandas UDF support array of struct type. |
#1136 | [FEA] Audit: Script to list commits between different Spark versions/tags |
#1921 | [FEA] cudf version check should be lenient on later patch version |
#19 | [FEA] Out of core sorts |
#2090 | [FEA] Make row count estimates available to the cost-based optimizer |
#1341 | Optimize unnecessary columnar->row->columnar transitions with AQE |
#1558 | [FEA] Initialize UCX early |
#1633 | [FEA] Implement a cost-based optimizer |
#1727 | [FEA] Put RangePartitioner data path on the GPU |
#2279 | [BUG] Hash Partitioning can fail for very small batches |
#2314 | [BUG] v0.5.0 pre-release pytests join_test.py::test_hash_join_array FAILED on SPARK-EGX Yarn Cluster |
#2317 | [BUG] GpuColumnarToRowIterator can stop after receiving an empty batch |
#2244 | [BUG] Executors hanging when running NDS benchmarks |
#2278 | [BUG] FullOuter join can produce too many results |
#2220 | [BUG] csv_test.py::test_csv_fallback FAILED on the EMR Cluster |
#2225 | [BUG] GpuSort fails on tables containing arrays. |
#2232 | [BUG] hash_aggregate_test.py::test_hash_grpby_pivot FAILED on the Databricks Cluster |
#2231 | [BUG]string_test.py::test_re_replace FAILED on the Dataproc Cluster |
#2042 | [BUG] NDS q14a fails with "GpuColumnarToRow does not implement doExecuteBroadcast" |
#2203 | [BUG] Spark nightly cache tests fail with -- master flag |
#2230 | [BUG] qa_nightly_select_test.py::test_select FAILED on the Dataproc Cluster |
#1711 | [BUG] find a way to stop allocating from RMM on the shuffle-client thread |
#2109 | [BUG] Fix high priority violations detected by code analysis tools |
#2217 | [BUG] qa_nightly_select_test failure in test_select |
#2127 | [BUG] Parsing with two-digit year should fall back to CPU |
#2078 | [BUG] java.lang.ArithmeticException: divide by zero when spark.sql.ansi.enabled=true |
#2048 | [BUG] split function+ repartition result in "ai.rapids.cudf.CudaException: device-side assert triggered" |
#2036 | [BUG] Stackoverflow when writing wide parquet files. |
#1973 | [BUG] generate_expr_test FAILED on Dataproc Cluster |
#2079 | [BUG] koalas.sql fails with java.lang.ArrayIndexOutOfBoundsException |
#217 | [BUG] CudaUtil should be removed |
#1550 | [BUG] The ORC output data of a query is not readable |
#2074 | [BUG] Intermittent NPE in RapidsBufferCatalog when running test suite |
#2027 | [BUG] udf_cudf_test.py integration tests fail |
#1899 | [BUG] Some queries fail when cost-based optimizations are enabled |
#1914 | [BUG] Add in float, double, timestamp, and date support to murmur3 |
#2014 | [BUG] earlyStart option added in 0.5 can cause errors when starting UCX |
#1984 | [BUG] NDS q58 Decimal scale (59) cannot be greater than precision (38). |
#2001 | [BUG] RapidsShuffleManager didn't pass dirs to getBlockData from a wrapped ShuffleBlockResolver |
#1797 | [BUG] occasional crashes in CI |
#1861 | Encountered column data outside the range of input buffer |
#1905 | [BUG] Large concat task time in GpuShuffleCoalesce with pinned memory pool |
#1638 | [BUG] Tests test_window_aggs_for_rows_collect_list fails when there are null values in columns. |
#1864 | [BUG]HostColumnarToGPU inefficient when only doing count() |
#1862 | [BUG] spark 3.2.0-snapshot integration test failed due to conf change |
#1844 | [BUG] branch-0.5 nightly IT FAILED on the The mortgage ETL test "Could not read footer for file: file:/xxx/xxx.snappy.parquet" |
#1627 | [BUG] GDS exception when restoring spilled buffer |
#1802 | [BUG] Many decimal integration test failures for 0.5 |
#2326 | Update changelog for 0.5.0 release |
#2316 | Update doc to note that single quoted json strings are not ok |
#2319 | Disable hash partitioning on arrays |
#2318 | Fix ColumnarToRowIterator handling of empty batches |
#2304 | Update CHANGELOG.md |
#2301 | Update doc to reflect nanosleep problem with 460.32.03 |
#2298 | Update changelog for v0.5.0 release [skip ci] |
#2293 | update cudf version to 0.19.2 |
#2289 | Update docs to warn against 450.80.02 driver with 10.x toolkit |
#2285 | Require single batch for full outer join streaming |
#2281 | Remove download section for unreleased 0.4.2 |
#2264 | Add spark312 and spark320 versions of cache serializer |
#2254 | updated gcp docs with custom dataproc image instructions |
#2247 | Allow specifying a superclass for non-GPU execs |
#2235 | Fix distributed cache to read requested schema |
#2261 | Make CBO row count test more robust |
#2237 | update cudf version to 0.19.1 |
#2240 | Get the correct 'PIPESTATUS' in bash [skip ci] |
#2242 | Add shuffle doc section on the periodicGC configuration |
#2251 | Fix issue when out of core sorting nested data types |
#2204 | Run nightly tests for ParquetCachedBatchSerializer |
#2245 | Fix pivot bug for decimalType |
#2093 | Initial implementation of row count estimates in cost-based optimizer |
#2188 | Support GPU broadcast exchange reuse to feed CPU BHJ when AQE is enabled |
#2227 | ParquetCachedBatchSerializer broadcast AllConfs instead of SQLConf to fix distributed mode |
#2223 | Adds subquery aggregate tests from SPARK-31620 |
#2222 | Remove groupId already specified in parent pom |
#2209 | Fixed a few issues with out of core sort |
#2218 | Fix incorrect RegExpReplace children handling on Spark 3.1+ |
#2207 | fix batch size default values in the tuning guide |
#2208 | Revert "add nightly cache tests (#2083)" |
#2206 | Fix shim301db build |
#2192 | Fix index-based access to the head elements |
#2210 | Avoid redundant collection conversions |
#2190 | JNI fixes for StringWordCount native UDF example |
#2086 | Updating documentation for data format support |
#2172 | Remove easy unused symbols |
#2089 | Update PandasUDF doc |
#2195 | fix cudf 0.19.0 download link [skip ci] |
#2175 | Branch 0.5 doc update |
#2168 | Simplify GpuExpressions w/ withResourceIfAllowed |
#2055 | Support PivotFirst |
#2183 | GpuParquetScan#readBufferToTable remove dead code |
#2129 | Fall back to CPU when parsing two-digit years |
#2083 | add nightly cache tests |
#2151 | add corresponding close call for HostMemoryOutputStream |
#2169 | Work around bug in Spark for integration test |
#2130 | Fix divide-by-zero in GpuAverage with ansi mode |
#2149 | Auto generate the supported types for the file formats |
#2072 | Disable CSV parsing by default and update tests to better show what is left |
#2157 | fix merge conflict for 0.4.2 [skip ci] |
#2144 | Allow array and struct types to pass thru when doing join |
#2145 | Avoid GPU shuffle for round-robin of unsortable types |
#2021 | Add in support for murmur3 hashing of structs |
#2128 | Add in Partition type check support |
#2116 | Add dynamic Spark configuration for Databricks |
#2132 | Log plugin and cudf versions on startup |
#2135 | Disable Spark 3.2 shim by default |
#2125 | enable auto-merge from 0.5 to 0.6 [skip ci] |
#2120 | Materialize Stream before serialization |
#2119 | Add more comprehensive documentation on supported date formats |
#1717 | Decimal32 support |
#2114 | Modified the Download page for 0.4.1 and updated doc to point to K8s guide |
#2106 | Fix some buffer leaks |
#2097 | fix the bound row project empty issue in row frame |
#2099 | Remove verbose log prints to make the build/test log clean |
#2105 | Cleanup prior Spark sessions in tests consistently |
#2104 | Clone apache spark source code to parse the git commit IDs |
#2095 | fix refcount when materializing device buffer from GDS |
#2100 | [BUG] add wget for fetching conda [skip ci] |
#2096 | Adjust images for integration tests |
#2094 | Changed name of parquet files for Mortgage ETL Integration test |
#2035 | Accelerate data transfer for map Pandas UDF plan |
#2050 | stream shuffle buffers from GDS to UCX |
#2084 | Enable ORC write by default |
#2088 | Upgrade ScalaTest plugin to respect JAVA_HOME |
#1932 | Create a getting started on K8s page |
#2080 | Improve error message after failed RMM shutdown |
#2064 | Optimize unnecessary columnar->row->columnar transitions with AQE |
#2025 | Update the doc for pandas udf on databricks |
#2059 | Add the flag 'TEST_TYPE' to avoid integration tests silently skipping some test cases |
#2075 | Remove debug println from CBO test |
#2046 | support casting Decimal to String |
#1812 | allow spilled buffers to be unspilled |
#2061 | Run the pandas udf using cudf on Databricks |
#1893 | Plug-in support for get_json_object |
#2044 | Use partition for GPU hash partitioning |
#1954 | Fix CBO bug where incompatible plans were produced with AQE on |
#2049 | Remove incompatable int overflow checking |
#2056 | Remove Spark 3.2 from premerge and nightly CI run |
#1814 | Struct to string casting functionality |
#2037 | Fix warnings from use of deprecated cudf methods |
#2033 | Bump up pre-merge OS from ubuntu 16 to ubuntu 18 [skip ci] |
#1883 | Enable sort for single-level nesting struct columns on GPU |
#2016 | Refactor logic for parallel testing |
#2022 | Update order by to not load native libraries when sorting |
#2017 | Add in murmur3 support for float, double, date and timestamp |
#1981 | Fix GpuSize |
#1999 | support casting string to decimal |
#2006 | Enable windowed collect_list by default |
#2000 | Use Spark's HybridRowQueue to avoid MemoryConsumer API shim |
#2015 | Fix bug where rkey buffer is getting advanced after the first handshake |
#2007 | Fix unknown column name error when filtering ORC file with no names |
#2005 | Update to new is_before_spark_311 function name |
#1944 | Support running scalar pandas UDF with array type. |
#1991 | Fixes creation of invalid DecimalType in GpuDivide.tagExprForGpu |
#1958 | Support legacy behavior of parameterless count |
#1919 | Add support for Structs for UnionExec |
#2002 | Pass dirs to getBlockData for a wrapped shuffle resolver |
#1983 | document building against different CUDA Toolkit versions |
#1994 | Merge 0.4 to 0.5 [skip ci] |
#1982 | Update ORC pushdown filter building to latest Spark logic |
#1978 | Add audit script to list commits from Spark |
#1976 | Temp fix for parquet write changes |
#1970 | add maven profiles for supported CUDA versions |
#1951 | Branch 0.5 doc remove numpartitions |
#1967 | Update FAQ for Dataset API and format supported versions |
#1972 | support GpuSize |
#1966 | add xml report for codecov |
#1955 | Fix typo in Arrow optimization config |
#1956 | Fix NPE in plugin shutdown |
#1930 | Relax cudf version check for patch-level versions |
#1787 | support distributed file path in cloud environment |
#1961 | change premege GPU_TYPE from secret to global env [skip ci] |
#1957 | Update Spark 3.1.2 shim for float upcast behavior |
#1889 | Decimal DIV changes |
#1947 | Move doc of Pandas UDF to additional-functionality |
#1938 | Add spark.executor.resource.gpu.amount=1 to YARN and K8s docs |
#1937 | Fix merge conflict with branch-0.4 |
#1878 | spillable cache for GpuCartesianRDD |
#1843 | Refactor GpuGenerateExec and Explode |
#1933 | Split DB scripts to make them common for the build and IT pipeline |
#1935 | Update Alias SQL quoting and float-to-timestamp casting to match Spark 3.2 |
#1926 | Consolidate RAT settings in parent pom |
#1918 | Minor code cleanup in dateTImeExpressions |
#1906 | Remove get call on timeZoneId |
#1908 | Remove the Scala version of Mortgage ETL tests from nightly test |
#1894 | Modified Download Page to re-order the items and change the format of download links |
#1909 | Avoid pinned memory for shuffle host buffers |
#1891 | Connect UCX endpoints early during app startup |
#1877 | remove docker build in pre-merge [skip ci] |
#1830 | Enable the tests for collect over window. |
#1882 | GpuArrowColumnarBatchBuilder retains the references of ArrowBuf until HostToGpuCoalesceIterator put them into device |
#1868 | Increase row limit when doing count() for HostColumnarToGpu |
#1855 | Expose row count statistics in GpuShuffleExchangeExec |
#1875 | Fix merge conflict with branch-0.4 |
#1841 | Add in support for DateAddInterval |
#1869 | Fix tests for Spark 3.2.0 shim |
#1858 | fix shuffle manager doc on ucx library path |
#1836 | Add shim for Spark 3.1.2 |
#1852 | Fix Part Suite Tests |
#1616 | Cost-based optimizer |
#1834 | Add shim for Spark 3.0.3 |
#1839 | Refactor join code to reduce duplicated code |
#1848 | Fix merge conflict with branch-0.4 |
#1796 | Have most of range partitioning run on the GPU |
#1845 | Fix fails on the mortgage ETL test |
#1829 | Cleanup unused Jenkins files and scripts |
#1704 | Create a shim for Spark 3.2.0 development |
#1838 | Make databricks build.sh more convenient for dev |
#1835 | Fix merge conflict with branch-0.4 |
#1808 | Update mortgage tests to support reading multiple dataset formats |
#1822 | Fix conflict 0.4 to 0.5 |
#1807 | Fix merge conflict between branch-0.4 and branch-0.5 |
#1788 | Spill metrics everywhere |
#1719 | Add in out of core sort |
#1728 | Skip RAPIDS accelerated Java UDF tests if UDF fails to load |
#1689 | Update docs for plugin 0.5.0-SNAPSHOT and cudf 0.19-SNAPSHOT |
#1682 | init CI/CD dependencies branch-0.5 |
#1985 | [BUG] broadcast exchange can fail on 0.4 |
#1995 | update changelog 0.4.1 [skip ci] |
#1990 | Prepare for v0.4.1 release |
#1988 | broadcast exchange can fail when job group set |
#1773 | [FEA] Spark 3.0.2 release support |
#80 | [FEA] Support the struct SQL function |
#76 | [FEA] Support CreateArray |
#1635 | [FEA] RAPIDS accelerated Java UDF |
#1333 | [FEA] Support window operations on Decimal |
#1419 | [FEA] Support GPU accelerated UDF alternative for higher order function "aggregate" over window |
#1580 | [FEA] Support Decimal for ParquetCachedBatchSerializer |
#1600 | [FEA] Support ScalarSubquery |
#1072 | [FEA] Support for a custom DataSource V2 which supplies Arrow data |
#906 | [FEA] Clarify query explanation to directly state what will run on GPU |
#1335 | [FEA] Support CollectLimitExec for decimal |
#1485 | [FEA] Decimal Support for Parquet Write |
#1329 | [FEA] Decimal support for multiply int div, add, subtract and null safe equals |
#1351 | [FEA] Execute UDFs that provide a RAPIDS execution path |
#1330 | [FEA] Support Decimal Casts |
#1353 | [FEA] Example of RAPIDS UDF using custom GPU code |
#1487 | [FEA] Change spark 3.1.0 to 3.1.1 |
#1334 | [FEA] Add support for count aggregate on decimal |
#1325 | [FEA] Add in join support for decimal |
#1326 | [FEA] Add in Broadcast support for decimal values |
#37 | [FEA] round and bround SQL functions |
#78 | [FEA] Support CreateNamedStruct function |
#1331 | [FEA] UnionExec and ExpandExec support for decimal |
#1332 | [FEA] Support CaseWhen, Coalesce and IfElse for decimal |
#937 | [FEA] have murmur3 hash function that matches exactly with spark |
#1324 | [FEA] Support Parquet Read of Decimal FIXED_LENGTH_BYTE_ARRAY |
#1428 | [FEA] Add support for unary decimal operations abs, floor, ceil, unary - and unary + |
#1375 | [FEA] Add log statement for what the concurrentGpuTasks tasks is set to on executor startup |
#1352 | [FEA] Example of RAPIDS UDF using cudf Java APIs |
#1328 | [FEA] Support sorting and shuffle of decimal |
#1316 | [FEA] Support simple DECIMAL aggregates |
#1435 | [FEA]Improve the file reading by using local file caching |
#1738 | [FEA] Reduce regex usage in CAST string to date/timestamp |
#987 | [FEA] Optimize CAST from string to temporal types by using cuDF is_timestamp function |
#1594 | [FEA] RAPIDS accelerated ScalaUDF |
#103 | [FEA] GPU version of TakeOrderedAndProject |
#1024 | Cleanup RAPIDS transport calls to receive |
#1366 | Seeing performance differences of multi-threaded/coalesce/perfile Parquet reader type for a single file |
#1200 | [FEA] Accelerate the scan speed for coalescing parquet reader when reading files from multiple partitioned folders |
#1885 | [BUG] natural join on string key results in a data frame with spurious NULLs |
#1785 | [BUG] Rapids pytest integration tests FAILED on Yarn cluster with unrecognized arguments: --std_input_path=src/test/resources/ |
#999 | [BUG] test_multi_types_window_aggs_for_rows_lead_lag fails against Spark 3.1.0 |
#1818 | [BUG] unmoored doc comment warnings in GpuCast |
#1817 | [BUG] Developer build with local modifications fails during verify phase |
#1644 | [BUG] test_window_aggregate_udf_array_from_python fails on databricks |
#1771 | [BUG] Databricks AWS CI/CD failing to create cluster |
#1157 | [BUG] Fix regression supporting to_date on GPU with Spark 3.1.0 |
#716 | [BUG] Cast String to TimeStamp issues |
#1117 | [BUG] CAST string to date returns wrong values for dates with out-of-range values |
#1670 | [BUG] Some TPC-DS queries fail with AQE when decimal types enabled |
#1730 | [BUG] Range Partitioning can crash when processing is in the order-by |
#1726 | [BUG] java url decode test failing on databricks, emr, and dataproc |
#1651 | [BUG] GDS exception when writing shuffle file |
#1702 | [BUG] check all tests marked xfail for Spark 3.1.1 |
#575 | [BUG] Spark 3.1 FAILED join_test.py::test_broadcast_join_mixed[FullOuter][IGNORE_ORDER] failed |
#577 | [BUG] Spark 3.1 log arithmetic functions fail |
#1541 | [BUG] Tests fail in integration in distributed mode after allowing nested types through in sort and shuffle |
#1626 | [BUG] TPC-DS-like query 77 at scale=3TB fails with maxResultSize exceeded error |
#1576 | [BUG] loading SPARK-32639 example parquet file triggers a JVM crash |
#1643 | [BUG] TPC-DS-Like q10, q35, and q69 - slow or hanging at leftSemiJoin |
#1650 | [BUG] BenchmarkRunner does not include query name in JSON summary filename when running multiple queries |
#1654 | [BUG] TPC-DS-like query 59 at scale=3TB with AQE fails with join mismatch |
#1274 | [BUG] OutOfMemoryError - Maximum pool size exceeded while running 24 day criteo ETL Transform stage |
#1497 | [BUG] Spark-rapids v0.3.0 pytest integration tests with UCX on FAILED on Yarn cluster |
#1534 | [BUG] Spark 3.1.1 test failure in writing due to removal of InMemoryFileIndex.shouldFilterOut |
#1155 | [BUG] on shutdown don't print Socket closed exception when shutting down UCX.scala |
#1510 | [BUG] IllegalArgumentException during shuffle |
#1513 | [BUG] executor not fully initialized may get calls from Spark, in the process setting the catalog incorrectly |
#1466 | [BUG] Databricks build must run before the rapids nightly |
#1456 | [BUG] Databricks 0.4 parquet integration tests fail |
#1400 | [BUG] Regressions in spark-shell usage of benchmark utilities |
#1119 | [BUG] inner join fails with Column size cannot be negative |
#1079 | [BUG]The Scala UDF function cannot invoke the UDF compiler when it's passed to "explode" |
#1298 | TPCxBB query16 failed at UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionary |
#1271 | [BUG] CastOpSuite and AnsiCastOpSuite failing with ArithmeticException on Spark 3.1 |
#84 | [BUG] sort does not match spark for -0.0 and 0.0 |
#578 | [BUG] Spark 3.1 qa_nightly_select_test.py Full join test failures |
#586 | [BUG] Spark3.1 tpch failures |
#837 | [BUG] Distinct count of floating point values differs with regular spark |
#953 | [BUG] 3.1.0 pos_explode tests are failing |
#127 | [BUG] String CSV parsing does not respect nullValues |
#1203 | [BUG] tpcds query 51 fails with join error on Spark 3.1.0 |
#750 | [BUG] udf_cudf_test::test_with_column fails with IPC error |
#1348 | [BUG] Host columnar decimal conversions are failing |
#1270 | [BUG] Benchmark runner fails to produce report if benchmark fails due to an invalid query plan |
#1179 | [BUG] SerializeConcatHostBuffersDeserializeBatch may have thread issues |
#1115 | [BUG] Unchecked type warning in SparkQueryCompareTestSuite |
#1963 | Update changelog 0.4 [skip ci] |
#1960 | Replace sonatype staging link with maven central link |
#1945 | Update changelog 0.4 [skip ci] |
#1910 | Make hash partitioning match CPU |
#1927 | Change cuDF dependency to 0.18.1 |
#1934 | Update documentation to use cudf version 0.18.1 |
#1871 | Disable coalesce batch spilling to avoid cudf contiguous_split bug |
#1849 | Update changelog for 0.4 |
#1744 | Fix NullPointerException on null partition insert |
#1842 | Update to note support for 3.0.2 |
#1832 | Spark 3.1.1 shim no longer a snapshot shim |
#1831 | Spark 3.0.2 shim no longer a snapshot shim |
#1826 | Remove benchmarks |
#1828 | Update cudf dependency to 0.18 |
#1813 | Fix LEAD/LAG failures in Spark 3.1.1 |
#1819 | Fix scaladoc warning in GpuCast |
#1820 | [BUG] make modified check pre-merge only |
#1780 | Remove SNAPSHOT from test and integration_test READMEs |
#1809 | check if modified files after update_config/supported |
#1804 | Update UCX documentation for RX_QUEUE_LEN and Docker |
#1810 | Pandas UDF: Sort the data before computing the sum. |
#1751 | Exclude foldable expressions from GPU if constant folding is disabled |
#1798 | Add documentation about explain not on GPU when AQE is on |
#1766 | Branch 0.4 release docs |
#1794 | Build python output schema from udf expressions |
#1783 | Fix the collect_list over window tests failures on db |
#1781 | Better float/double cases for casting tests |
#1790 | Record row counts in benchmark runs that call collect |
#1779 | Add support of DateType and TimestampType for GetTimestamp expression |
#1768 | Updating getting started Databricks docs |
#1742 | Fix regression supporting to_date with Spark-3.1 |
#1775 | Fix ambiguous ordering for some tests |
#1760 | Update GpuDataSourceScanExec and GpuBroadcastExchangeExec to fix audit issues |
#1750 | Detect task failures in benchmarks |
#1767 | Consistent Spark version for test and production |
#1741 | Reduce regex use in CAST |
#1756 | Skip RAPIDS accelerated Java UDF tests if UDF fails to load |
#1716 | Update RapidsShuffleManager documentation for branch 0.4 |
#1740 | Disable ORC writes until bug can be fixed |
#1747 | Fix resource leaks in unit tests |
#1725 | Branch 0.4 FAQ reorg |
#1718 | CAST string to temporal type now calls isTimestamp |
#1734 | Disable range partitioning if computation is needed |
#1723 | Removed StructTypes support for ParquetCachedBatchSerializer as cudf doesn't support it yet |
#1714 | Add support for RAPIDS accelerated Java UDFs |
#1713 | Call GpuDeviceManager.shutdown when the executor plugin is shutting down |
#1596 | Added in Decimal support to ParquetCachedBatchSerializer |
#1706 | cleanup unused is_before_spark_310 |
#1685 | Fix CustomShuffleReader replacement when decimal types enabled |
#1699 | Add docs about Spark 3.1 in standalone modes not needing extra class path |
#1701 | remove xfail for orc test_input_meta for spark 3.1.0 |
#1703 | Remove xfail for spark 3.1.0 test_broadcast_join_mixed FullOuter |
#1676 | BenchmarkRunner option to generate query plan diagrams in DOT format |
#1695 | support alternate jar paths |
#1694 | increase mem and limit parallelism for pre-merge |
#1691 | add validate_execs_in_gpu_plan to pytest.ini |
#1692 | Add the integration test resources to the test tarball |
#1677 | When PTDS is enabled, print warning if the allocator is not ARENA |
#1683 | update changelog to verify autotmerge 0.5 setup [skip ci] |
#1673 | support auto-merge for branch 0.5 [skip ci] |
#1681 | Xfail the collect_list tests for databricks |
#1678 | Fix array/struct checks in Sort and HashAggregate and sorting tests in distributed mode |
#1671 | Allow metrics to be configurable by level |
#1675 | add run_pyspark_from_build.sh to the pytest distribution tarball |
#1548 | Support executing collect_list on GPU with windowing. |
#1593 | Avoid unnecessary Table instances after contiguous split |
#1592 | Add in support for Decimal divide |
#1668 | Implement way for python integration tests to validate Exec is in GPU plan |
#1669 | Add FAQ entries for executor-per-GPU questions |
#1661 | Enable Parquet test for file containing map struct key |
#1664 | Filter nulls for left semi and left anti join to work around cudf |
#1665 | Add better automated tests for Arrow columnar copy in HostColumnarToGpu |
#1614 | add alluxio getting start document |
#1639 | support GpuScalarSubquery |
#1656 | Move UDF to Catalyst Expressions to its own document |
#1663 | BenchmarkRunner - Include query name in JSON summary filename |
#1655 | Fix extraneous shuffles added by AQE |
#1652 | Fix typo in arrow optimized config name - spark.rapids.arrowCopyOptimizationEnabled |
#1645 | Run Databricks IT with python-xdist parallel, includes test fixes and xfail |
#1649 | Move building from source docs to contributing guide |
#1637 | Fail DivModLike on zero divisor in ANSI mode |
#1646 | Update links in rapids-udfs.md after moving to subfolder |
#1641 | Xfail struct and array order by tests on Dataproc |
#1565 | Add GPU accelerated array_contains operator |
#1617 | Enable nightly test checks for Apache Spark |
#1636 | RAPIDS accelerated Spark Scala UDF support |
#1634 | Fix databricks build since Arrow code added |
#1599 | Add division by zero tests for Spark 3.1 behavior |
#1619 | Update GpuFileSourceScanExec to be in sync with DataSourceScanExec |
#1631 | Explicitly add maven-jar-plugin version to improve incremental build time. |
#1624 | Update explain format to show what will and will not run on the GPU |
#1622 | Support faster copy for a custom DataSource V2 which supplies Arrow data |
#1621 | Additional functionality docs |
#1618 | update blossom-ci for security updates [skip ci] |
#1562 | add alluxio support |
#1597 | Documentation for Parquet serializer |
#1611 | Add in flag for integration tests to not skip required tests |
#1609 | Disable float round/bround by default |
#1615 | Add in window support for average |
#1610 | Limit length of spark app name in BenchmarkRunner |
#1579 | Support TakeOrderedAndProject |
#1581 | Support Decimal type for CollectLimitExec |
#1591 | Add support for running multiple queries in BenchmarkRunner |
#1595 | Fix Github documentation issue template |
#1577 | rename directory from spark310 to spark311 |
#1578 | Test to track RAPIDS-side issues re SPARK-32639 |
#1583 | fix request-action issue [skip ci] |
#1555 | Enable ANSI mode for CAST string to timestamp |
#1531 | Decimal Support for writing Parquet |
#1545 | Support comparing ORC data |
#1570 | Branch 0.4 doc cleanup |
#1569 | Add shim method shouldIgnorePath |
#1564 | Add in support for Decimal Multiply and DIV |
#1561 | Decimal support for add and subtract |
#1560 | support sum in window aggregation for decimal |
#1546 | Cleanup shutdown logging for UCX shuffle |
#1551 | RAPIDS-accelerated Hive UDFs support all types |
#1543 | Shuffle/transport enabled by default |
#1552 | Disable blackduck signature check |
#1540 | Handle ShuffleManager api calls when plugin is not fully initialized |
#1547 | Cleanup shuffle transport receive calls |
#1512 | Support window operations on Decimal |
#1532 | Support casting from decimal to decimal |
#1542 | Change the number of partitions to zero when a range is empty |
#1506 | Add --use-decimals flag to TPC-DS ConvertFiles |
#1511 | Remove unused Jenkinsfiles [skip ci] |
#1505 | Add least, greatest and eqNullSafe support for DecimalType |
#1484 | add doc for nsight systems bundled with cuda toolkit |
#1478 | Documentation for RAPIDS-accelerated Hive UDFs |
#1477 | Allow structs and arrays to pass through for Shuffle and Sort |
#1489 | Adds in some support for the array sql function |
#1438 | Cast from numeric types to decimal type |
#1493 | Moved ParquetRecordMaterializer to the shim package to follow convention |
#1495 | Fix merge conflict, merge branch 0.3 to branch 0.4 [skip ci] |
#1472 | Add an example RAPIDS-accelerated Hive UDF using native code |
#1488 | Rename Spark 3.1.0 shim to Spark 3.1.1 to match community |
#1474 | Fix link |
#1476 | DecimalType support for Aggregate Count |
#1475 | Join support for DecimalType |
#1244 | Support round and bround SQL functions |
#1458 | Add in support for struct and named_struct |
#1465 | DecimalType support for UnionExec and ExpandExec |
#1450 | Add dynamic configs for the spark-rapids IT pipelines |
#1207 | Spark SQL hash function using murmur3 |
#1457 | Support reading decimal columns from parquet files on Databricks |
#1455 | Upgrade Scala Maven Plugin to 4.3.0 |
#1453 | DecimalType support for IfElse and Coalesce |
#1452 | Support DecimalType for CaseWhen |
#1444 | Improve UX when running benchmarks from Spark shell |
#1294 | Support reading decimal columns from parquet files |
#1153 | Scala UDF will compile children expressions in Project |
#1416 | Optimize mvn dependency download scripts |
#1430 | Add project for testing code that requires Spark 3.1.0 or later |
#1425 | Add in Decimal support for abs, floor, ceil, unary - and unary + |
#1427 | Revert "Make the multi-threaded parquet reader the default" |
#1420 | Add udf jar to nightly integration tests |
#1422 | Log the number of concurrent gpu tasks allowed on Executor startup |
#1401 | Accelerate the coalescing parquet reader when reading files from multiple partitioned folders |
#1413 | Add config for cast float to integral types |
#1313 | Support spilling to disk directly via cuFile/GDS |
#1411 | Add udf-examples jar to databricks build |
#1412 | Fix a lot of tests marked with xfail for Spark 3.1.0 that no longer fail |
#1414 | Build merged code of HEAD and BASE branch for pre-merge [skip ci] |
#1409 | Add option to use decimals in tpc-ds csv to parquet conversion |
#1410 | Add Decimal support for In, InSet, AtLeastNNonNulls, GetArrayItem, GetStructField, and GenerateExec |
#1408 | Support RAPIDS-accelerated HiveGenericUDF |
#1407 | Update docs and tests for null CSV support |
#1393 | Support RAPIDS-accelerated HiveSimpleUDF |
#1392 | Turn on hash partitioning for decimal support |
#1402 | Better GPU Cast type checks |
#1404 | Fix branch 0.4 merge conflict |
#1323 | More advanced type checking and documentation |
#1391 | Remove extra null join filtering because cudf is fast for this now. |
#1395 | Fix branch-0.3 -> branch-0.4 automerge |
#1382 | Handle "MM[/-]dd" and "dd[/-]MM" datetime formats in UnixTimeExprMeta |
#1390 | Accelerated columnar to row/row to columnar for decimal |
#1380 | Adds in basic support for decimal sort, sum, and some shuffle |
#1367 | Reuse gpu expression conversion rules when checking sort order |
#1349 | Add canonicalization tests |
#1368 | Move to cudf 0.18-SNAPSHOT |
#1361 | Use the correct precision when reading spark columnar data. |
#1273 | Update docs and scripts to 0.4.0-SNAPSHOT |
#1321 | Refactor to stop inheriting from HashJoin |
#1311 | ParquetCachedBatchSerializer code cleanup |
#1303 | Add explicit outputOrdering for BHJ and SHJ in spark310 shim |
#1299 | Benchmark runner improved error handling |
#1002 | [FEA] RapidsHostColumnVectorCore should verify cudf data with respect to the expected spark type |
#444 | [FEA] Plugable Cache |
#1158 | [FEA] Better documentation on type support |
#57 | [FEA] Support INT96 for parquet reads and writes |
#1003 | [FEA] Reduce overlap between RapidsHostColumnVector and RapidsHostColumnVectorCore |
#913 | [FEA] In Pluggable Cache Support CalendarInterval while creating CachedBatches |
#1092 | [FEA] In Pluggable Cache handle nested types having CalendarIntervalType and NullType |
#670 | [FEA] Support NullType |
#50 | [FEA] support spark.sql.legacy.timeParserPolicy |
#1144 | [FEA] Remove Databricks 3.0.0 shim layer |
#1096 | [FEA] Implement parquet CreateDataSourceTableAsSelectCommand |
#688 | [FEA] udf compiler should be auto-appended to spark.sql.extensions |
#502 | [FEA] Support Databricks 7.3 LTS Runtime |
#764 | [FEA] Sanity checks for cudf jar mismatch |
#1018 | [FEA] Log details related to GPU memory fragmentation on GPU OOM |
#619 | [FEA] log whether libcudf and libcudfjni were built for PTDS |
#905 | [FEA] create AWS EMR 3.0.1 shim |
#838 | [FEA] Support window count for a column |
#864 | [FEA] config option to enable RMM arena memory resource |
#430 | [FEA] Audit: Parquet Writer support for TIMESTAMP_MILLIS |
#818 | [FEA] Create shim layer for AWS EMR |
#608 | [FEA] Parquet small file optimization improve handle merge schema |
#446 | [FEA] Test jucx in 1.9.x branch |
#1038 | [FEA] Accelerate the data transfer for plan WindowInPandasExec |
#533 | [FEA] Improve PTDS performance |
#849 | [FEA] Have GpuColumnarBatchSerializer return GpuColumnVectorFromBuffer instances |
#784 | [FEA] Allow Host Spilling to be more dynamic |
#627 | [FEA] Further parquet reading small file improvements |
#5 | [FEA] Support Adaptive Execution |
#1423 | [BUG] Mortgage ETL sample failed with spark.sql.adaptive enabled on AWS EMR 6.2 |
#1369 | [BUG] TPC-DS Query Failing on EMR 6.2 with AQE |
#1344 | [BUG] Spark-rapids Pytests failed on On Databricks cluster spark standalone mode |
#1279 | [BUG] TPC-DS query 2 failing with NPE |
#1280 | [BUG] TPC-DS query 93 failing with UnsupportedOperationException |
#1308 | [BUG] TPC-DS query 14a runs much slower on 0.3 |
#1284 | [BUG] TPC-DS query 77 at scale=1TB fails with maxResultSize exceeded error |
#1061 | [BUG] orc_test.py is failing |
#1197 | [BUG] java.lang.NullPointerException when exporting delta table |
#685 | [BUG] In ParqueCachedBatchSerializer, serializing parquet buffers might blow up in certain cases |
#1269 | [BUG] GpuSubstring is not expected to be a part of a SortOrder |
#1246 | [BUG] Many TPC-DS benchmarks fail when writing to Parquet |
#961 | [BUG] ORC predicate pushdown should work with case-insensitive analysis |
#962 | [BUG] Loading columns from an ORC file without column names returns no data |
#1245 | [BUG] Code adding buffers to the spillable store should synchronize |
#570 | [BUG] Continue debugging OOM after ensuring device store is empty |
#972 | [BUG] total time metric is redundant with scan time |
#1039 | [BUG] UNBOUNDED window ranges on null timestamp columns produces incorrect results. |
#1195 | [BUG] AcceleratedColumnarToRowIterator queue empty |
#1177 | [BUG] leaks possible in the rapids shuffle if batches are received after the task completes |
#1216 | [BUG] Failure to recognize ORC file format when loaded via Hive |
#898 | [BUG] count reductions are failing on databricks because lack for Complete support |
#1184 | [BUG] test_window_aggregate_udf_array_from_python fails on databricks 3.0.1 |
#1151 | [BUG]Add databricks 3.0.1 shim layer for GpuWindowInPandasExec. |
#1199 | [BUG] No data size in Input column in Stages page from Spark UI when using Parquet as file source |
#1031 | [BUG] dependency info properties file contains error messages |
#1149 | [BUG] Scaladoc warnings in GpuDataSource |
#1185 | [BUG] test_hash_multiple_mode_query failing |
#724 | [BUG] PySpark test_broadcast_nested_loop_join_special_case intermittent failure |
#1164 | [BUG] ansi_cast tests are failing in 3.1.0 |
#1110 | [BUG] Special date "now" has wrong value on GPU |
#1139 | [BUG] Host columnar to GPU can be very slow |
#1094 | [BUG] unix_timestamp on GPU returns invalid data for special dates |
#1098 | [BUG] unix_timestamp on GPU returns invalid data for bad input |
#1082 | [BUG] string to timestamp conversion fails with split |
#1140 | [BUG] ConcurrentModificationException error after scala test suite completes |
#1073 | [BUG] java.lang.RuntimeException: BinaryExpressions must override either eval or nullSafeEval |
#975 | [BUG] BroadcastExchangeExec fails to fall back to CPU on driver node on GCP Dataproc |
#773 | [BUG] Investigate high task deserialization |
#1035 | [BUG] TPC-DS query 90 with AQE enabled fails with doExecuteBroadcast exception |
#825 | [BUG] test_window_aggs_for_ranges intermittently fails |
#1008 | [BUG] limit function is producing inconsistent result when type is Byte, Long, Boolean and Timestamp |
#996 | [BUG] TPC-DS benchmark via spark-submit does not provide option to disable appending .dat to path |
#1006 | [BUG] Spark3.1.0 changed BasicWriteTaskStats breaks BasicColumnarWriteTaskStatsTracker |
#985 | [BUG] missing metric dataSize |
#881 | [BUG] cannot disable Sort by itself |
#812 | [BUG] Test failures for 0.2 when run with multiple executors |
#925 | [BUG]Range window-functions with non-timestamp order-by expressions not falling back to CPU |
#852 | [BUG] BenchUtils.compareResults cannot compare partitioned files when ignoreOrdering=false |
#868 | [BUG] Rounding error when casting timestamp to string for timestamps before 1970 |
#880 | [BUG] doing a window operation with an orderby for a single constant crashes |
#776 | [BUG] Integration test fails on spark 3.1.0-SNAPSHOT |
#874 | [BUG] RapidsConf.scala has some un-consistency for spark.rapids.sql.format.parquet.multiThreadedRead |
#860 | [BUG] we need to mark columns from received shuffle buffers as GpuColumnVectorFromBuffer |
#122 | [BUG] CSV Timestamp parseing is broken for TS < 1902 and TS > 2038 |
#810 | [BUG] UDF Integration tests fail if pandas is not installed |
#746 | [BUG] cudf_udf_test.py is flakey |
#811 | [BUG] 0.3 nightly is timing out |
#574 | [BUG] Fix GpuTimeSub for Spark 3.1.0 |
#1496 | Update changelog for v0.3.0 release [skip ci] |
#1473 | Update documentation for 0.3 release |
#1371 | Start Guide for RAPIDS on AWS EMR 6.2 |
#1446 | Update changelog for 0.3.0 release [skip ci] |
#1439 | when AQE enabled we fail to fix up exchanges properly and EMR |
#1433 | fix pandas 1.2 compatible issue |
#1424 | Make the multi-threaded parquet reader the default since coalescing doesn't handle partitioned files well |
#1389 | Update project version to 0.3.0 |
#1387 | Update cudf version to 0.17 |
#1370 | [REVIEW] init changelog 0.3 [skip ci] |
#1376 | MetaUtils.getBatchFromMeta should return batches with GpuColumnVectorFromBuffer |
#1358 | auto-merge: instant merge after creation [skip ci] |
#1359 | Use SortOrder from shims. |
#1343 | Do not run UDFs when the partition is empty. |
#1342 | Fix and edit docs for standalone mode |
#1350 | fix GpuRangePartitioning canonicalization |
#1281 | Documentation added for testing |
#1336 | Fix missing post-shuffle coalesce with AQE |
#1318 | Fix copying GpuFileSourceScanExec node |
#1337 | Use UTC instead of GMT |
#1307 | Fallback to cpu when reading Delta log files for stats |
#1310 | Fix canonicalization of GpuFileSourceScanExec, GpuShuffleCoalesceExec |
#1302 | Add GpuSubstring handling to SortOrder canonicalization |
#1265 | Chunking input before writing a ParquetCachedBatch |
#1278 | Add a config to disable decimal types by default |
#1272 | Add Alias to shims |
#1268 | Adds in support docs for 0.3 release |
#1235 | Trigger reading and handling control data. |
#1266 | Updating Databricks getting started for 0.3 release |
#1291 | Increase pre-merge resource requests [skip ci] |
#1275 | Temporarily disable more CAST tests for Spark 3.1.0 |
#1264 | Fix race condition in batch creation |
#1260 | Update UCX license info in NOTIFY-binary for 1.9 and RAPIDS plugin copyright dates |
#1247 | Ensure column names are valid when writing benchmark query results to file |
#1240 | Fix loading from ORC file with no column names |
#1242 | Remove compatibility documentation about unsupported INT96 |
#1192 | [REVIEW] Support GpuFilter and GpuCoalesceBatches for decimal data |
#1170 | Add nested type support to MetaUtils |
#1194 | Drop redundant total time metric from scan |
#1248 | At BatchedTableCompressor.finish synchronize to allow for "right-size… |
#1169 | Use CUDF's "UNBOUNDED" window boundaries for time-range queries. |
#1204 | Avoid empty batches on columnar to row conversion |
#1133 | Refactor batch coalesce to be based solely on batch data size |
#1237 | In transport, limit pending transfer requests to fit within a bounce |
#1232 | Move SortOrder creation to shims |
#1068 | Write int96 to parquet |
#1193 | Verify shuffle of decimal columns |
#1180 | Remove batches if they are received after the iterator detects that t… |
#1173 | Support relational operators for decimal type |
#1220 | Support replacing ORC format when Hive is configured |
#1219 | Upgrade to jucx 1.9.0 |
#1081 | Add option to upload benchmark summary JSON file |
#1217 | Aggregate reductions in Complete mode should use updateExpressions |
#1218 | Remove obsolete HiveStringType usage |
#1214 | changelog update 2020-11-30. Trigger automerge check [skip ci] |
#1210 | Support auto-merge for branch-0.4 [skip ci] |
#1202 | Fix a bug with the support for java.lang.StringBuilder.append. |
#1213 | Skip casting StringType to TimestampType for Spark 310 |
#1201 | Replace only window expressions on databricks. |
#1208 | [BUG] Fix GHSL2020-239 [skip ci] |
#1205 | Fix missing input bytes read metric for Parquet |
#1206 | Update Spark 3.1 shim for ShuffleOrigin shuffle parameter |
#1196 | Rename ShuffleCoalesceExec to GpuShuffleCoalesceExec |
#1191 | Skip window array tests for databricks. |
#1183 | Support for CalendarIntervalType and NullType |
#1150 | udf spec |
#1188 | Add in tests for parquet nested pruning support |
#1189 | Enable NullType for First and Last in 3.0.1+ |
#1181 | Fix resource leaks in unit tests |
#1186 | Fix compilation and scaladoc warnings |
#1187 | Updated documentation for distinct count compatibility |
#1182 | Close buffer catalog on device manager shutdown |
#1137 | Let GpuWindowInPandas declare ArrayType supported. |
#1176 | Add in support for null type |
#1174 | Fix race condition in SerializeConcatHostBuffersDeserializeBatch |
#1175 | Fix leaks seen in shuffle tests |
#1138 | [REVIEW] Support decimal type for GpuProjectExec |
#1162 | Set job descriptions in benchmark runner |
#1172 | Revert "Fix race condition (#1165)" |
#1060 | Show partition metrics for custom shuffler reader |
#1152 | Add spark301db shim layer for WindowInPandas. |
#1167 | Nulls out the dataframe if --gc-between-runs is set |
#1165 | Fix race condition in SerializeConcatHostBuffersDeserializeBatch |
#1163 | Add in support for GetStructField |
#1166 | Fix the cast tests for 3.1.0+ |
#1159 | fix bug where 'now' had same value as 'today' for timestamps |
#1161 | Fix nightly build pipeline failure. |
#1160 | Fix some performance problems with columnar to columnar conversion |
#1105 | [REVIEW] Change ColumnViewAccess usage to work with ColumnView |
#1148 | Add in tests for Maps and extend map support where possible |
#1154 | Mark test as xfail until we can get a fix in |
#1113 | Support unix_timestamp on GPU for subset of formats |
#1156 | Fix warning introduced in iterator suite |
#1095 | Dependency info |
#1145 | Remove support for databricks 7.0 runtime - shim spark300db |
#1147 | Change the assert to require for handling TIMESTAMP_MILLIS in isDateTimeRebaseNeeded |
#1132 | Add in basic support to read structs from parquet |
#1121 | Shuffle/better error handling |
#1134 | Support saveAsTable for writing orc and parquet |
#1124 | Add shim layers for GpuWindowInPandasExec. |
#1131 | Add in some basic support for Structs |
#1127 | Add in basic support for reading lists from parquet |
#1129 | Fix resource leaks with new shuffle optimization |
#1116 | Optimize normal shuffle by coalescing smaller batches on host |
#1102 | Auto-register UDF extention when main plugin is set |
#1108 | Remove integration test pipelines on NGCC |
#1123 | Mark Pandas udf over window tests as xfail on databricks until they can be fixed |
#1120 | Add in support for filtering ArrayType |
#1080 | Support for CalendarIntervalType and NullType for ParquetCachedSerializer |
#994 | Packs bounce buffers for highly partitioned shuffles |
#1112 | Remove bad config from pytest setup |
#1107 | closeOnExcept -> withResources in MetaUtils |
#1104 | Support lists to/from the GPU |
#1106 | Improve mechanism for expected exceptions in tests |
#1069 | Accelerate the data transfer between JVM and Python for the plan 'GpuWindowInPandasExec' |
#1099 | Update how we deal with type checking |
#1077 | Improve AQE transitions for shuffle and coalesce batches |
#1097 | Cleanup some instances of excess closure serialization |
#1090 | Fix the integration build |
#1086 | Speed up test performance using pytest-xdist |
#1084 | Avoid issues where more scalars that expected show up in an expression |
#1076 | [FEA] Support Databricks 7.3 LTS Runtime |
#1083 | Revert "Get cudf/spark dependency from the correct .m2 dir" |
#1062 | Get cudf/spark dependency from the correct .m2 dir |
#1078 | Another round of fixes for mapping of DataType to DType |
#1066 | More fixes for conversion to ColumnarBatch |
#1029 | BenchmarkRunner should produce JSON summary file even when queries fail |
#1055 | Fix build warnings |
#1064 | Use array instead of List for from(Table, DataType) |
#1057 | Fix empty table broadcast requiring a GPU on driver node |
#1047 | Sanity checks for cudf jar mismatch |
#1044 | Accelerated row to columnar and columnar to row transitions |
#1056 | Add query number to Spark app name when running benchmarks |
#1054 | Log total RMM allocated on GPU OOM |
#1053 | Remove isGpuBroadcastNestedLoopJoin from shims |
#1052 | Allow for GPUCoalesceBatch to deal with Map |
#1051 | Add simple retry for URM dependencies [skip ci] |
#1046 | Fix broken links |
#1017 | Log whether PTDS is enabled |
#1040 | Update to cudf 0.17-SNAPSHOT and fix tests |
#1042 | Fix inconsistencies in AQE support for broadcast joins |
#1037 | Add in support for the SQL functions Least and Greatest |
#1036 | Increase number of retries when waiting for databricks cluster |
#1034 | [BUG] To honor spark.rapids.memory.gpu.pool=NONE |
#854 | Arbitrary function call in UDF |
#1028 | Update to cudf-0.16 |
#1023 | Add --gc-between-run flag for TPC* benchmarks. |
#1001 | ColumnarBatch to CachedBatch and back |
#990 | Parquet coalesce file reader for local filesystems |
#1014 | Add --append-dat flag for TPC-DS benchmark |
#991 | Updated GCP Dataproc Mortgage-ETL-GPU.ipynb |
#886 | Spark BinaryType and cast to BinaryType |
#1016 | Change Hash Aggregate to allow pass-through on MapType |
#984 | Add support for MapType in selected operators |
#1012 | Update for new position parameter in Spark 3.1.0 RegExpReplace |
#995 | Add shim for EMR 3.0.1 and EMR 3.0.1-SNAPSHOT |
#998 | Update benchmark automation script |
#1000 | Always use RAPIDS shuffle when running TPCH and Mortgage tests |
#981 | Change databricks build to dynamically create a cluster |
#986 | Fix missing dataSize metric when using RAPIDS shuffle |
#914 | Write InternalRow to CachedBatch |
#934 | Iterator to make it easier to work with a window of blocks in the RAPIDS shuffle |
#992 | Skip post-clean if aborted before the image build stage in pre-merge [skip ci] |
#988 | Change in Spark caused the 3.1.0 CI to fail |
#983 | clean jenkins file for premerge on NGCC |
#964 | Refactor TPC benchmarks to reduce duplicate code |
#978 | Enable scalastyle checks for udf-compiler module |
#949 | Fix GpuWindowExec to work with a CPU SortExec |
#973 | Stop reporting totalTime metric for GpuShuffleExchangeExec |
#968 | XFail pos_explode tests until final fix can be put in |
#970 | Add legacy config to clear active Spark 3.1.0 session in tests |
#918 | Benchmark runner script |
#915 | Add option to control number of partitions when converting from CSV to Parquet |
#944 | Fix some issues with non-determinism |
#935 | Add in support/tests for a window count on a column |
#940 | Fix closeOnExcept suppressed exception handling |
#942 | fix github action env setup [skip ci] |
#933 | Update first/last tests to avoid non-determinisim and ordering differences |
#931 | Fix checking for nullable columns in window range query |
#924 | Benchmark guide update for command-line interface / spark-submit |
#926 | Move pandas_udf functions into the tests functions |
#929 | Pick a default tableId to use that is non 0 so that flatbuffers allow… |
#928 | Fix RapidsBufferStore NPE when no spillable buffers are available |
#820 | Benchmarking guide |
#859 | Compare partitioned files in order |
#916 | create new sparkContext explicitly in CPU notebook |
#917 | create new SparkContext in GPU notebook explicitly. |
#919 | Add label benchmark to performance subsection in changelog |
#850 | Add in basic support for lead/lag |
#843 | [REVIEW] Cache plugin to handle reading CachedBatch to an InternalRow |
#904 | Add command-line argument for benchmark result filename |
#909 | GCP preview version image name update |
#903 | update getting-started-gcp.md with new component list |
#900 | Turn off CollectLimitExec replacement by default |
#907 | remove configs from databricks that shouldn't be used by default |
#893 | Fix rounding error when casting timestamp to string for timestamps before 1970 |
#899 | Mark reduction corner case tests as xfail on databricks until they can be fixed |
#894 | Replace whole-buffer slicing with direct refcounting |
#891 | Add config to dump heap on GPU OOM |
#890 | Clean up CoalesceBatch to use withResource |
#892 | Only manifest the current batch in cached block shuffle read iterator |
#871 | Add support for using the arena allocator |
#889 | Fix crash on scalar only orderby |
#879 | Update SpillableColumnarBatch to remove buffer from catalog on close |
#888 | Shrink detect scope to compile only [skip ci] |
#885 | [BUG] fix IT dockerfile arguments [skip ci] |
#883 | [BUG] fix IT dockerfile args ordering [skip ci] |
#875 | fix the non-consistency for spark.rapids.sql.format.parquet.multiThreadedRead in RapidsConf.scala |
#862 | Migrate nightly&integration pipelines to blossom [skip ci] |
#872 | Ensure that receive-side batches use GpuColumnVectorFromBuffer to avoid |
#833 | Add nvcomp LZ4 codec support |
#870 | Cleaned up tests and documentation for csv timestamp parsing |
#823 | Add command-line interface for TPC-* for use with spark-submit |
#856 | Move GpuWindowInPandasExec in shims layers |
#756 | Add stream-time metric |
#832 | Skip pandas tests if pandas cannot be found |
#841 | Fix a hanging issue when processing empty data. |
#840 | [REVIEW] Fixed failing cache tests |
#848 | Update task memory and disk spill metrics when buffer store spills |
#851 | Use contiguous table when deserializing columnar batch |
#857 | fix pvc scheduling issue |
#853 | Remove nodeAffinity from premerge pipeline |
#796 | Record spark plan SQL metrics to JSON when running benchmarks |
#781 | Add AQE unit tests |
#824 | Skip cudf_udf test by default |
#839 | First/Last reduction and cleanup of agg APIs |
#827 | Add Spark 3.0 EMR Shim layer |
#816 | [BUG] fix nightly is timing out |
#782 | Benchmark utility to perform diff of output from benchmark runs, allowing for precision differences |
#813 | Revert "Enable tests in udf_cudf_test.py" |
#788 | [FEA] Persist workspace data on PVC for premerge |
#805 | [FEA] nightly build trigger both IT on spark 300 and 301 |
#797 | Allow host spill store to fit a buffer larger than configured max size |
#807 | Deploy integration-tests javadoc and sources |
#777 | Enable tests in udf_cudf_test.py |
#790 | CI: Update cudf python to 0.16 nightly |
#772 | Add support for empty array construction. |
#783 | Improved GpuArrowEvalPythonExec |
#771 | Various improvements to benchmarks |
#763 | [REVIEW] Allow CoalesceBatch to spill data that is not in active use |
#727 | Update cudf dependency to 0.16-SNAPSHOT |
#726 | parquet writer support for TIMESTAMP_MILLIS |
#674 | Unit test for GPU exchange re-use with AQE |
#723 | Update code coverage to find source files in new places |
#766 | Update the integration Dockerfile to reduce the image size |
#762 | Fixing conflicts in branch-0.3 |
#738 | [auto-merge] branch-0.2 to branch-0.3 - resolve conflict |
#722 | Initial code changes to support spilling outside of shuffle |
#693 | Update jenkins files for 0.3 |
#692 | Merge shims dependency to spark-3.0.1 into branch-0.3 |
#690 | Update the version to 0.3.0-SNAPSHOT |
#696 | [FEA] run integration tests against SPARK-3.0.1 |
#455 | [FEA] Support UCX shuffle with optimized AQE |
#510 | [FEA] Investigate libcudf features needed to support struct schema pruning during loads |
#541 | [FEA] Scala UDF:Support for null Value operands |
#542 | [FEA] Scala UDF: Support for Date and Time |
#499 | [FEA] disable any kind of warnings about ExecutedCommandExec not being on the GPU |
#540 | [FEA] Scala UDF: Support for String replaceFirst() |
#340 | [FEA] widen the rendered Jekyll pages |
#602 | [FEA] don't release with any -SNAPSHOT dependencies |
#579 | [FEA] Auto-merge between branches |
#515 | [FEA] Write tests for AQE skewed join optimization |
#452 | [FEA] Update HashSortOptimizerSuite to work with AQE |
#454 | [FEA] Update GpuCoalesceBatchesSuite to work with AQE enabled |
#354 | [FEA]Spark 3.1 FileSourceScanExec adds parameter optionalNumCoalescedBuckets |
#566 | [FEA] Add support for StringSplit with an array index. |
#524 | [FEA] Add GPU specific metrics to GpuFileSourceScanExec |
#494 | [FEA] Add some AQE-specific tests to the PySpark test suite |
#146 | [FEA] Python tests should support running with Adaptive Query Execution enabled |
#465 | [FEA] Audit: Update script to audit multiple versions of Spark |
#488 | [FEA] Ability to limit total GPU memory used |
#70 | [FEA] Support StringSplit |
#403 | [FEA] Add in support for GetArrayItem |
#493 | [FEA] Implement shuffle optimization when AQE is enabled |
#500 | [FEA] Add maven profiles for testing with AQE on or off |
#471 | [FEA] create a formal process for updating the github-pages branch |
#233 | [FEA] Audit DataWritingCommandExec |
#240 | [FEA] Audit Api validation script follow on - Optimize StringToTypeTag |
#388 | [FEA] Audit WindowExec |
#425 | [FEA] Add tests for configs in BatchScan Readers |
#453 | [FEA] Update HashAggregatesSuite to work with AQE |
#184 | [FEA] Enable NoScalaDoc scalastyle rule |
#438 | [FEA] Enable StringLPad |
#232 | [FEA] Audit SortExec |
#236 | [FEA] Audit ShuffleExchangeExec |
#355 | [FEA] Support Multiple Spark versions in the same jar |
#385 | [FEA] Support RangeExec on the GPU |
#317 | [FEA] Write test wrapper to run SQL queries via pyspark |
#235 | [FEA] Audit BroadcastExchangeExec |
#234 | [FEA] Audit BatchScanExec |
#238 | [FEA] Audit ShuffledHashJoinExec |
#237 | [FEA] Audit BroadcastHashJoinExec |
#316 | [FEA] Add some basic Dataframe tests for CoalesceExec |
#145 | [FEA] Scala tests should support running with Adaptive Query Execution enabled |
#231 | [FEA] Audit ProjectExec |
#229 | [FEA] Audit FileSourceScanExec |
#326 | [DISCUSS] Shuffle read-side error handling |
#601 | [FEA] Optimize unnecessary sorts when replacing SortAggregate |
#333 | [FEA] Better handling of reading lots of small Parquet files |
#511 | [FEA] Connect shuffle table compression to shuffle exec metrics |
#15 | [FEA] Multiple threads sharing the same GPU |
#272 | [DOC] Getting started guide for UCX shuffle |
#780 | [BUG] Inner Join dropping data with bucketed Table input |
#569 | [BUG] left_semi_join operation is abnormal and serious time-consuming |
#744 | [BUG] TPC-DS query 6 now produces incorrect results. |
#718 | [BUG] GpuBroadcastHashJoinExec ArrayIndexOutOfBoundsException |
#698 | [BUG] batch coalesce can fail to appear between columnar shuffle and subsequent columnar operation |
#658 | [BUG] GpuCoalesceBatches collectTime metric can be underreported |
#59 | [BUG] enable tests for string literals in a select |
#486 | [BUG] GpuWindowExec does not implement requiredChildOrdering |
#631 | [BUG] Rows are dropped when AQE is enabled in some cases |
#671 | [BUG] Databricks hash_aggregate_test fails trying to canonicalize a WrappedAggFunction |
#218 | [BUG] Window function COUNT(x) includes null-values, when it shouldn't |
#153 | [BUG] Incorrect output from partial-only hash aggregates with multiple distincts and non-distinct functions |
#656 | [BUG] integration tests produce hive metadata files |
#607 | [BUG] Fix misleading "cannot run on GPU" warnings when AQE is enabled |
#630 | [BUG] GpuCustomShuffleReader metrics always show zero rows/batches output |
#643 | [BUG] race condition while registering a buffer and spilling at the same time |
#606 | [BUG] Multiple scans for same data source with TPC-DS query59 with delta format |
#626 | [BUG] parquet_test showing leaked memory buffer |
#155 | [BUG] Incorrect output from averages with filters in partial only mode |
#277 | [BUG] HashAggregateSuite failure when AQE is enabled |
#276 | [BUG] GpuCoalesceBatchSuite failure when AQE is enabled |
#598 | [BUG] Non-deterministic output from MapOutputTracker.getStatistics() with AQE on GPU |
#192 | [BUG] test_read_merge_schema fails on Databricks |
#341 | [BUG] Document compression formats for readers/writers |
#587 | [BUG] Spark3.1 changed FileScan which means or GpuScans need to be added to shim layer |
#362 | [BUG] Implement getReaderForRange in the RapidsShuffleManager |
#528 | [BUG] HashAggregateSuite "Avg Distinct with filter" no longer valid when testing against Spark 3.1.0 |
#416 | [BUG] Fix Spark 3.1.0 integration tests |
#556 | [BUG] NPE when removing shuffle |
#553 | [BUG] GpuColumnVector build warnings from raw type access |
#492 | [BUG] Re-enable AQE integration tests |
#275 | [BUG] TpchLike query 2 fails when AQE is enabled |
#508 | [BUG] GpuUnion publishes metrics on the UI that are all 0 |
#269 | Needed to add --conf spark.driver.extraClassPath= |
#473 | [BUG] PartMerge:countDistinct:sum fails sporadically |
#531 | [BUG] Temporary RMM workaround needs to be removed |
#532 | [BUG] NPE when enabling shuffle manager |
#525 | [BUG] GpuFilterExec reports incorrect nullability of output in some cases |
#483 | [BUG] Multiple scans for the same parquet data source |
#382 | [BUG] Spark3.1 StringFallbackSuite regexp_replace null cpu fall back test fails. |
#489 | [FEA] Fix Spark 3.1 GpuHashJoin since it now requires CodegenSupport |
#441 | [BUG] test_broadcast_nested_loop_join_special_case fails on databricks |
#347 | [BUG] Failed to read Parquet file generated by GPU-enabled Spark. |
#433 | InSet operator produces an error for Strings |
#144 | [BUG] spark.sql.legacy.parquet.datetimeRebaseModeInWrite is ignored |
#323 | [BUG] GpuBroadcastNestedLoopJoinExec can fail if there are no columns |
#356 | [BUG] Integration cache test for BroadcastNestedLoopJoin failure |
#280 | [BUG] Full Outer Join does not work on nullable keys |
#149 | [BUG] Spark driver fails to load native libs when running on node without CUDA |
#826 | Fix link to cudf-0.15-cuda11.jar |
#815 | Update documentation for Scala UDFs in 0.2 since you need two things |
#802 | Update 0.2 CHANGELOG |
#793 | Update Jenkins scripts for release |
#798 | Fix shims provider override config not being seen by executors |
#785 | Make shuffle run on CPU if we do a join where we read from bucketed table |
#765 | Add config to override shims provider class |
#759 | Add CHANGELOG for release 0.2 |
#758 | Skip the udf test fails periodically. |
#752 | Fix snapshot plugin jar version in docs |
#751 | Correct the channel for cudf installation |
#754 | Filter nulls from joins where possible to improve performance |
#732 | Add a timeout for RapidsShuffleIterator to prevent jobs to hang infin… |
#637 | Documentation changes for 0.2 release |
#747 | Disable udf tests that fail periodically |
#745 | Revert Null Join Filter |
#741 | Fix issue with parquet partitioned reads |
#733 | Remove GPU Types from github |
#720 | Stop removing GpuCoalesceBatches from non-AQE queries when AQE is enabled |
#729 | Fix collect time metric in CoalesceBatches |
#640 | Support running Pandas UDFs on GPUs in Python processes. |
#721 | Add some more checks to databricks build scripts |
#714 | Move spark 3.0.1-shims out of snapshot-shims |
#711 | fix blossom checkout repo |
#709 | [BUG] fix unexpected indentation issue in blossom yml |
#642 | Init workflow for blossom-ci |
#705 | Enable configuration check for cast string to timestamp |
#702 | Update slack channel for Jenkins builds |
#701 | fix checkout-ref for automerge |
#695 | Fix spark-3.0.1 shim to be released |
#668 | refactor automerge to support merge for protected branch |
#687 | Include the UDF compiler in the dist jar |
#689 | Change shims dependency to spark-3.0.1 |
#677 | Use multi-threaded parquet read with small files |
#638 | Add Parquet-based cache serializer |
#613 | Enable UCX + AQE |
#684 | Enable test for literal string values in a select |
#686 | Remove sorts when replacing sort aggregate if possible |
#675 | Added TimeAdd |
#645 | [window] Add GpuWindowExec requiredChildOrdering |
#676 | fixUpJoinConsistency rule now works when AQE is enabled |
#683 | Fix issues with cannonicalization of WrappedAggFunction |
#682 | Fix path to start-slave.sh script in docs |
#673 | Increase build timeouts on nightly and premerge builds |
#648 | add signoff-check use github actions |
#593 | Add support for isNaN and datetime related instructions in UDF compiler |
#666 | [window] Disable GPU for COUNT(exp) queries |
#655 | Implement AQE unit test for InsertAdaptiveSparkPlan |
#614 | Fix for aggregation with multiple distinct and non distinct functions |
#657 | Fix verify build after integration tests are run |
#660 | Add in neverReplaceExec and several rules for it |
#639 | BooleanType test shouldn't xfail |
#652 | Mark UVM config as internal until supported |
#653 | Move to the cudf-0.15 release |
#647 | Improve warnings about AQE nodes not supported on GPU |
#646 | Stop reporting zero metrics for GpuCustomShuffleReader |
#644 | Small fix for race in catalog where a buffer could get spilled while … |
#623 | Fix issues with canonicalization |
#599 | [FEA] changelog generator |
#563 | cudf and spark version info in artifacts |
#633 | Fix leak if RebaseHelper throws during Parquet read |
#632 | Copy function isSearchableType from Spark because signature changed in 3.0.1 |
#583 | Add udf compiler unit tests |
#617 | Documentation updates for branch 0.2 |
#616 | Add config to reserve GPU memory |
#612 | [REVIEW] Fix incorrect output from averages with filters in partial only mode |
#609 | fix minor issues with instructions for building ucx |
#611 | Added in profile to enable shims for SNAPSHOT releases |
#595 | Parquet small file reading optimization |
#582 | fix #579 Auto-merge between branches |
#536 | Add test for skewed join optimization when AQE is enabled |
#603 | Fix data size metric always 0 when using RAPIDS shuffle |
#600 | Fix calculation of string data for compressed batches |
#597 | Remove the xfail for parquet test_read_merge_schema on Databricks |
#591 | Add ucx license in NOTICE-binary |
#596 | Add Spark 3.0.2 to Shim layer |
#594 | Filter nulls from joins where possible to improve performance. |
#590 | Move GpuParquetScan/GpuOrcScan into Shim |
#588 | xfail the tpch spark 3.1.0 tests that fail |
#572 | Update buffer store to return compressed batches directly, add compression NVTX ranges |
#558 | Fix unit tests when AQE is enabled |
#580 | xfail the Spark 3.1.0 integration tests that fail |
#565 | Minor improvements to TPC-DS benchmarking code |
#567 | Explicitly disable AQE in one test |
#571 | Fix Databricks shim layer for GpuFileSourceScanExec and GpuBroadcastExchangeExec |
#564 | Add GPU decode time metric to scans |
#562 | getCatalog can be called from the driver, and can return null |
#555 | Fix build warnings for ColumnViewAccess |
#560 | Fix databricks build for AQE support |
#557 | Fix tests failing on Spark 3.1 |
#547 | Add GPU metrics to GpuFileSourceScanExec |
#462 | Implement optimized AQE support so that exchanges run on GPU where possible |
#550 | Document Parquet and ORC compression support |
#539 | Update script to audit multiple Spark versions |
#543 | Add metrics to GpuUnion operator |
#549 | Move spark shim properties to top level pom |
#497 | Add UDF compiler implementations |
#487 | Add framework for batch compression of shuffle partitions |
#544 | Add in driverExtraClassPath for standalone mode docs |
#546 | Fix Spark 3.1.0 shim build error in GpuHashJoin |
#537 | Use fresh SparkSession when capturing to avoid late capture of previous query |
#538 | Revert "Temporary workaround for RMM initial pool size bug (#530)" |
#517 | Add config to limit maximum RMM pool size |
#527 | Add support for split and getArrayIndex |
#534 | Fixes bugs around GpuShuffleEnv initialization |
#529 | [BUG] Degenerate table metas were not getting copied to the heap |
#530 | Temporary workaround for RMM initial pool size bug |
#526 | Fix bug with nullability reporting in GpuFilterExec |
#521 | Fix typo with databricks shim classname SparkShimServiceProvider |
#522 | Use SQLConf instead of SparkConf when looking up SQL configs |
#518 | Fix init order issue in GpuShuffleEnv when RAPIDS shuffle configured |
#514 | Added clarification of RegExpReplace, DateDiff, made descriptive text consistent |
#506 | Add in basic support for running tpcds like queries |
#504 | Add ability to ignore tests depending on spark shim version |
#503 | Remove unused async buffer spill support |
#501 | disable codegen in 3.1 shim for hash join |
#466 | Optimize and fix Api validation script |
#481 | Codeowners |
#439 | Check a PR has been committed using git signoff |
#319 | Update partitioning logic in ShuffledBatchRDD |
#491 | Temporarily ignore AQE integration tests |
#490 | Fix Spark 3.1.0 build for HashJoin changes |
#482 | Prevent bad practice in python tests |
#485 | Show plan in assertion message if test fails |
#480 | Fix link from README to getting-started.md |
#448 | Preliminary support for keeping broadcast exchanges on GPU when AQE is enabled |
#478 | Fall back to CPU for binary as string in parquet |
#477 | Fix special case joins in broadcast nested loop join |
#469 | Update HashAggregateSuite to work with AQE |
#475 | Udf compiler pom followup |
#434 | Add UDF compiler skeleton |
#474 | Re-enable noscaladoc check |
#461 | Fix comments style to pass scala style check |
#468 | fix broken link |
#456 | Add closeOnExcept to clean up code that closes resources only on exceptions |
#464 | Turn off noscaladoc rule until codebase is fixed |
#449 | Enforce NoScalaDoc rule in scalastyle checks |
#450 | Enable scalastyle for shuffle plugin |
#451 | Databricks remove unneeded files and fix build to not fail on rm when file missing |
#442 | Shim layer support for Spark 3.0.0 Databricks |
#447 | Add scalastyle plugin to shim module |
#426 | Update BufferMeta to support multiple codec buffers per table |
#440 | Run mortgage test both with AQE on and off |
#445 | Added in StringRPad and StringLPad |
#422 | Documentation updates |
#437 | Fix bug with InSet and Strings |
#435 | Add in checks for Parquet LEGACY date/time rebase |
#432 | Fix batch use-after-close in partitioning, shuffle env init |
#423 | Fix duplicates includes in assembly jar |
#418 | CI Add unit tests running for Spark 3.0.1 |
#421 | Make it easier to run TPCxBB benchmarks from spark shell |
#413 | Fix download link |
#414 | Shim Layer to support multiple Spark versions |
#406 | Update cast handling to deal with new libcudf casting limitations |
#405 | Change slave->worker |
#395 | Databricks doc updates |
#401 | Extended the FAQ |
#398 | Add tests for GpuPartition |
#352 | Change spark tgz package name |
#397 | Fix small bug in ShuffleBufferCatalog.hasActiveShuffle |
#286 | [REVIEW] Updated join tests for cache |
#393 | Contributor license agreement |
#389 | Added in support for RangeExec |
#390 | Ucx getting started |
#391 | Hide slack channel in Jenkins scripts |
#387 | Remove the term whitelist |
#365 | [REVIEW] Timesub tests |
#383 | Test utility to compare SQL query results between CPU and GPU |
#380 | Fix databricks notebook link |
#378 | Added in FAQ and fixed spelling |
#377 | Update heading in configs.md |
#373 | Modifying branch name to conform with rapidsai branch name change |
#376 | Add our session extension correctly if there are other extensions configured |
#374 | Fix rat issue for notebooks |
#364 | Update Databricks patch for changes to GpuSortMergeJoin |
#371 | fix typo and use regional bucket per GCP's update |
#359 | Karthik changes |
#353 | Fix broadcast nested loop join for the no column case |
#313 | Additional tests for broadcast hash join |
#342 | Implement build-side rules for shuffle hash join |
#349 | Updated join code to treat null equality properly |
#335 | Integration tests on spark 3.0.1-SNAPSHOT & 3.1.0-SNAPSHOT |
#346 | Update the Title Header for Fine Tuning |
#344 | Fix small typo in readme |
#331 | Adds iterator and client unit tests, and prepares for more fetch failure handling |
#337 | Fix Scala compile phase to allow Java classes referencing Scala classes |
#332 | Match GPU overwritten functions with SQL functions from FunctionRegistry |
#339 | Fix databricks build |
#338 | Move GpuPartitioning to a separate file |
#310 | Update release Jenkinsfile for Databricks |
#330 | Hide private info in Jenkins scripts |
#324 | Add in basic support for GpuCartesianProductExec |
#328 | Enable slack notification for Databricks build |
#321 | update databricks patch for GpuBroadcastNestedLoopJoinExec |
#322 | Add oss.sonatype.org to download the cudf jar |
#320 | Don't mount passwd/group to the container |
#258 | Enable running TPCH tests with AQE enabled |
#318 | Build docker image with Dockerfile |
#309 | Update databricks patch to latest changes |
#312 | Trigger branch-0.2 integration test |
#307 | [Jenkins] Update the release script and Jenkinsfile |
#304 | [DOC][Minor] Fix typo in spark config name. |
#303 | Update compatibility doc for -0.0 issues |
#301 | Add info about branches in README.md |
#296 | Added in basic support for broadcast nested loop join |
#297 | Databricks CI improvements and support runtime env parameter to xfail certain tests |
#292 | Move artifacts version in version-def.sh |
#254 | Cleanup QA tests |
#289 | Clean up GpuCollectLimitMeta and add in metrics |
#287 | Add in support for right join and fix issues build right |
#273 | Added releases to the README.md |
#285 | modify run_pyspark_from_build.sh to be bash 3 friendly |
#281 | Add in support for Full Outer Join on non-null keys |
#274 | Add RapidsDiskStore tests |
#259 | Add RapidsHostMemoryStore tests |
#282 | Update Databricks patch for 0.2 branch |
#261 | Add conditional xfail test for DISTINCT aggregates with NaN |
#263 | More time ops |
#256 | Remove special cases for contains, startsWith, and endWith |
#253 | Remove GpuAttributeReference and GpuSortOrder |
#271 | Update the versions for 0.2.0 properly for the databricks build |
#162 | Integration tests for corner cases in window functions. |
#264 | Add a local mvn repo for nightly pipeline |
#262 | Refer to branch-0.2 |
#255 | Revert change to make dependencies of shaded jar optional |
#257 | Fix link to RAPIDS cudf in index.md |
#252 | Update to 0.2.0-SNAPSHOT and cudf-0.15-SNAPSHOT |
#74 | [FEA] Support ToUnixTimestamp |
#21 | [FEA] NormalizeNansAndZeros |
#105 | [FEA] integration tests for equi-joins |
#116 | [BUG] calling replace with a NULL throws an exception |
#168 | [BUG] GpuUnitTests Date tests leak column vectors |
#209 | [BUG] Developers section in pom need to be updated |
#204 | [BUG] Code coverage docs are out of date |
#154 | [BUG] Incorrect output from partial-only averages with nulls |
#61 | [BUG] Cannot disable Parquet, ORC, CSV reading when using FileSourceScanExec |
#249 | Compatability -> Compatibility |
#247 | Add index.md for default doc page, fix table formatting for configs |
#241 | Let default branch to master per the release rule |
#177 | Fixed leaks in unit test and use ColumnarBatch for testing |
#243 | Jenkins file for Databricks release |
#225 | Make internal project dependencies optional for shaded artifact |
#242 | Add site pages |
#221 | Databricks Build Support |
#215 | Remove CudfColumnVector |
#213 | Add RapidsDeviceMemoryStore tests |
#214 | [REVIEW] Test failure to pass Attribute as GpuAttribute |
#211 | Add project leads to pom developer list |
#210 | Updated coverage docs |
#195 | Support public release for plugin jar |
#208 | Remove unneeded comment from pom.xml |
#191 | WindowExec handle different spark distributions |
#181 | Remove INCOMPAT for NormalizeNanAndZero, KnownFloatingPointNormalized |
#196 | Update Spark dependency to the released 3.0.0 artifacts |
#206 | Change groupID to 'com.nvidia' in IT scripts |
#202 | Fixed issue for contains when searching for an empty string |
#201 | Fix name of scan |
#200 | Fix issue with GpuAttributeReference not overrideing references |
#197 | Fix metrics for writes |
#186 | Fixed issue with nullability on concat |
#193 | Add RapidsBufferCatalog tests |
#188 | rebrand to com.nvidia instead of ai.rapids |
#189 | Handle AggregateExpression having resultIds parameter instead of a single resultId |
#190 | FileSourceScanExec can have logicalRelation parameter on some distributions |
#185 | Update type of parameter of GpuExpandExec to make it consistent |
#172 | Merge qa test to integration test |
#180 | Add MetaUtils unit tests |
#171 | Cleanup scaladoc warnings about missing links |
#176 | Updated join tests to cover more data. |
#169 | Remove dependency on shaded Spark artifact |
#174 | Added in fallback tests |
#165 | Move input metadata tests to pyspark |
#173 | Fix setting local mode for tests |
#160 | Integration tests for normalizing NaN/zeroes. |
#163 | Ignore the order locally for repartition tests |
#157 | Add partial and final only hash aggregate tests and fix nulls corner case for Average |
#159 | Add integration tests for joins |
#158 | Orc merge schema fallback and FileScan format configs |
#164 | Fix compiler warnings |
#152 | Moved cudf to 0.14 for CI |
#151 | Switch CICD pipelines to Github |