Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
826 commits
Select commit Hold shift + click to select a range
a72fd41
[SPARK-52394][PS] Fix autocorr divide-by-zero error under ANSI mode
xinrong-meng Aug 7, 2025
46fd525
[SPARK-53162][SQL][TESTS] Mark `DynamicPartitionPruningHive*Suite*` a…
dongjoon-hyun Aug 7, 2025
7214029
[SPARK-53165][CORE] Add `SparkExitCode.CLASS_NOT_FOUND`
dongjoon-hyun Aug 7, 2025
7e7f967
[SPARK-53166][CORE] Use `SparkExitCode.EXIT_FAILURE` in `SparkPipelin…
dongjoon-hyun Aug 7, 2025
f62724d
[SPARK-52828][SQL] Make hashing for collated strings collation agnostic
uros-db Aug 7, 2025
88dbe42
[SPARK-53066][SQL] Improve EXPLAIN output for DSv2 Join pushdown
PetarVasiljevic-DB Aug 7, 2025
4430cd8
[SPARK-53164][CORE][K8S][DSTREAM] Use Java `Files.readAllBytes` inste…
dongjoon-hyun Aug 7, 2025
598f310
[SPARK-53170][CORE] Improve `SparkUserAppException` to have `cause` p…
dongjoon-hyun Aug 7, 2025
89973d0
[SPARK-53171][CORE] Improvement UTF8String repeat
yaooqinn Aug 7, 2025
71afa46
[SPARK-53163][PYTHON][INFRA] Upgrade PyArrow to 21.0.0
zhengruifeng Aug 7, 2025
2420fe9
[SPARK-53167][DEPLOY] Spark launcher isRemote also respects propertie…
pan3793 Aug 7, 2025
3aa8c9d
[SPARK-53155][SQL] Global lower agggregation should not be replaced w…
viirya Aug 7, 2025
cffb8f6
[SPARK-53141][CORE] Add APIs to get overhead memory size and offheap …
PHILO-HE Aug 7, 2025
dd925c0
[SPARK-53168][CORE][TESTS] Change default value of the input paramete…
LuciferYang Aug 7, 2025
a609db3
[SPARK-53169][SQL] Remove comments related to "`Set the logger level …
LuciferYang Aug 7, 2025
b968d2a
[SPARK-53177][K8S] Use Java `Base64` instead of `com.google.common.io…
dongjoon-hyun Aug 7, 2025
84858f3
[SPARK-52215][PYTHON][TESTS][FOLLOW-UP] Fix `test_arrow_udf_output_ne…
zhengruifeng Aug 7, 2025
1068413
[SPARK-53179][CORE][TESTS] Use `SparkStreamUtils.toString` instead of…
dongjoon-hyun Aug 7, 2025
5d295c8
[SPARK-53178][BUILD] Upgrade `curator` to 5.9.0
dongjoon-hyun Aug 7, 2025
c77f316
[SPARK-53180][CORE] Use Java `InputStream.skipNBytes` instead of `Byt…
dongjoon-hyun Aug 7, 2025
44fd358
[SPARK-53185][CORE][YARN][TESTS] Use `SparkStreamUtils.toString` inst…
dongjoon-hyun Aug 8, 2025
6b1f1a6
[SPARK-52976][PYTHON] Fix Python UDF not accepting collated string as…
ilicmarkodb Aug 8, 2025
69b45c6
[SPARK-53183][SQL] Use Java `Files.readString` instead of `o.a.s.sql.…
dongjoon-hyun Aug 8, 2025
76d2878
[SPARK-53188][CORE][SQL] Support `readFully` in `SparkStreamUtils` an…
dongjoon-hyun Aug 8, 2025
8b80ea0
[SPARK-53190][CORE] Use Java `InputStream.transferTo` instead of `Byt…
dongjoon-hyun Aug 8, 2025
96edcce
[SPARK-53191][CORE][SQL][MLLIB][YARN] Use Java `InputStream.readAllBy…
dongjoon-hyun Aug 8, 2025
f7eee4e
[SPARK-53194][INFRA] Set -XX:ErrorFile to build/target directory for …
yaooqinn Aug 8, 2025
8eaad00
[SPARK-53195][CORE] Use Java `InputStream.readNBytes` instead of `Byt…
dongjoon-hyun Aug 8, 2025
50948e4
[SPARK-53196][CORE] Use Java `OutputStream.nullOutputStream` instead …
dongjoon-hyun Aug 8, 2025
5cab612
[SPARK-53199][SQL][TESTS] Use Java `Files.copy` instead of `com.googl…
dongjoon-hyun Aug 8, 2025
16b70ba
[SPARK-53200][CORE] Use Java `Files.newInputStream` instead of `Files…
dongjoon-hyun Aug 8, 2025
bc44933
[SPARK-53202][SQL][TESTS] Use `SparkFileUtils.touch` instead of `File…
dongjoon-hyun Aug 8, 2025
87af200
[SPARK-53201][CORE] Use `SparkFileUtils.contentEquals` instead of `Fi…
dongjoon-hyun Aug 8, 2025
babc78d
[SPARK-53206][CORE] Use `SparkFileUtils.move` instead of `com.google.…
dongjoon-hyun Aug 8, 2025
5c99c2b
[SPARK-53205][CORE][SQL] Support `createParentDirs` in `SparkFileUtils`
dongjoon-hyun Aug 8, 2025
f2568a8
[SPARK-53208][SQL][TESTS] Use `Hex.unhex` instead of `o.a.commons.cod…
dongjoon-hyun Aug 8, 2025
d3f2054
[SPARK-53181][PS] Enable doc tests under ANSI
xinrong-meng Aug 8, 2025
ccff998
[SPARK-53210][CORE][SQL][DSTREAM][YARN] Use Java `Files.write(String)…
dongjoon-hyun Aug 9, 2025
a8b8381
[SPARK-53197][CORE][SQL] Use `java.util.Objects#requireNonNull` inste…
LuciferYang Aug 9, 2025
4693e09
[SPARK-53218][BUILD] Upgrade `bouncycastle` to 1.81
dongjoon-hyun Aug 9, 2025
f251007
[SPARK-53211][TESTS] Ban `com.google.common.io.Files`
dongjoon-hyun Aug 9, 2025
e36f9de
[SPARK-53213][CORE][SQL][K8S] Use Java `Base64` instead of `Base64.(d…
dongjoon-hyun Aug 9, 2025
423bca6
[SPARK-53214][CORE][SQL][K8S] Use Java `HexFormat` instead of `Hex.en…
dongjoon-hyun Aug 10, 2025
4838f99
[SPARK-53216][CORE] Move `is*(Blank|Empty)` from `object SparkStringU…
dongjoon-hyun Aug 10, 2025
8f843a9
[SPARK-53215][CORE][TESTS] Use `JavaUtils.listFiles` in `CleanupNonSh…
dongjoon-hyun Aug 10, 2025
e6e79b0
[SPARK-53217][CORE][DSTREAM] Use Java `Set.of` instead of `Sets.newHa…
dongjoon-hyun Aug 10, 2025
d5905af
[SPARK-53227][SQL][TESTS] Use Java `HashMap.equals` instead of `Maps.…
dongjoon-hyun Aug 10, 2025
0e34870
[SPARK-53220][BUILD] Upgrade `dev.ludovic.netlib` to 3.0.4
dongjoon-hyun Aug 10, 2025
cb28e2c
[SPARK-53219][BUILD] Upgrade `Dropwizard` metrics to 4.2.33
dongjoon-hyun Aug 10, 2025
0b0944c
[SPARK-53222][BUILD] Upgrade `commons-compress` to 1.28.0
dongjoon-hyun Aug 10, 2025
69a65b9
[SPARK-53223][BUILD] Upgrade `jersey` to 3.0.18
dongjoon-hyun Aug 10, 2025
e994667
[SPARK-53228][CORE][SQL] Use Java `Map` constructors instead of `Maps…
dongjoon-hyun Aug 10, 2025
ca19902
[SPARK-53229][CORE][SQL][EXAMPLES][TESTS] Use Java `Map.of` instead o…
dongjoon-hyun Aug 10, 2025
99b45b3
[SPARK-53221][BUILD] Upgrade `commons-codec` to 1.19.0
dongjoon-hyun Aug 10, 2025
08fe88a
[SPARK-53224][BUILD] Upgrade `joda-time` to 2.14.0
dongjoon-hyun Aug 10, 2025
3bdcb2c
[SPARK-53231][TESTS] Ban `com.google.common.collect.Sets`
dongjoon-hyun Aug 10, 2025
f4d1012
[SPARK-53232][SQL][TESTS] Use Java `Map.copyOf` instead of `Immutable…
dongjoon-hyun Aug 10, 2025
5444354
[SPARK-53234][CORE][SQL][MLLIB][YARN] Use `java.util.Objects` instead…
LuciferYang Aug 10, 2025
6b71b22
[SPARK-53235][CORE][TESTS] Use Java `Set.of` instead of `ImmutableSet…
LuciferYang Aug 10, 2025
e915312
[SPARK-53192][CONNECT] Always cache a DataSource in the Spark Connect…
dillitz Aug 11, 2025
6eb0780
[SPARK-53237][SQL] Use Java `Base64` instead of `org.apache.commons.c…
dongjoon-hyun Aug 11, 2025
7d96524
[SPARK-53239][SQL] Improve `MapSort` and `SortArray` performance via …
dongjoon-hyun Aug 11, 2025
ba14965
[SPARK-53236][CORE][EXAMPLE] Use Java `ArrayList` constructors instea…
dongjoon-hyun Aug 11, 2025
d2dc055
[SPARK-53238][CORE][TESTS] Improve `SorterBenchmark` to include `Arra…
dongjoon-hyun Aug 11, 2025
3c4c2ee
[SPARK-53240][SQL] Ban `com.google.common.collect.(ArrayList)?Multimap`
dongjoon-hyun Aug 11, 2025
027d5c7
[SPARK-52996][TESTS] Update brace-expansion to 1.1.12
eschcam Aug 11, 2025
f9093f2
[SPARK-53147][SQL] Log generated JDBC query in JDBC connector
urosstan-db Aug 11, 2025
221e076
[SPARK-53241][CORE] Support `createArray` in `SparkCollectionUtils`
dongjoon-hyun Aug 11, 2025
b7ada56
[MINOR][PYTHON] Remove two obsolete TODO items
zhengruifeng Aug 11, 2025
1bc8ce0
[SPARK-53244][SQL] Don't store dual-run enabled and tentative mode en…
mihailoale-db Aug 11, 2025
73f8a84
[SPARK-53184][PS] `melt` when "value" has MultiIndex column labels
xinrong-meng Aug 11, 2025
8d5e602
[SPARK-53242][CORE][DSTREAM] Move `stackTraceToString` to `JavaUtils`…
LuciferYang Aug 11, 2025
66ff752
[SPARK-52008][FOLLOWUP] Fixing StateStoreCoordinator `warn` compilati…
ericm-db Aug 11, 2025
0c0fd94
[SPARK-53247][CORE][SQL][MLLIB][TESTS] Use `createArray` for large te…
dongjoon-hyun Aug 11, 2025
b82957c
[SPARK-53176][DEPLOY] Spark launcher should respect `--load-spark-def…
pan3793 Aug 11, 2025
7fe2f5e
[SPARK-53248][CORE] Support `checkedCast` in `JavaUtils`
dongjoon-hyun Aug 11, 2025
7c2c84a
[SPARK-53243][PYTHON][SQL] List the supported eval types in arrow nodes
zhengruifeng Aug 12, 2025
12e700c
[SPARK-53249][INFRA] Run `build_maven_java21_arm.yml` every two days
dongjoon-hyun Aug 12, 2025
3f9917a
[SPARK-53250][BUILD] Remove unused `Guava` dependency from `unsafe` m…
dongjoon-hyun Aug 12, 2025
1ae3d68
[SPARK-53252][TESTS] Use Java `IntStream` instead of `ParSeq` in `Col…
dongjoon-hyun Aug 12, 2025
07c85a5
[SPARK-53253][PYTHON] Fix register UDF of type `SQL_SCALAR_ARROW_ITER…
zhengruifeng Aug 12, 2025
6537153
[SPARK-53138][CORE][BUILD] Split common-utils Java code into a new mo…
pan3793 Aug 12, 2025
b248ba5
[SPARK-53255][SQL] Ban `org.apache.parquet.Preconditions`
dongjoon-hyun Aug 12, 2025
df3d8e4
[SPARK-53256][CORE] Promote `check(Argument|State)` to `JavaUtils`
dongjoon-hyun Aug 12, 2025
c910667
[SPARK-52482][SQL][CORE] Improve exception handling for reading certa…
mzhang Aug 12, 2025
96a4f50
[SPARK-53074][SQL] Avoid partial clustering in SPJ to meet a child's …
chirag-s-db Aug 12, 2025
69185dc
[SPARK-53258][CORE][SQL] Use `JavaUtils`'s `check(Argument|State)`
dongjoon-hyun Aug 12, 2025
d8dcfe7
[SPARK-53233][SQL][SS][MLLIB][CONNECT] Make the code related to `stre…
LuciferYang Aug 12, 2025
205ed98
[SPARK-53259][PYTHON] Correct the message for INVALID_UDF_EVAL_TYPE
zhengruifeng Aug 12, 2025
554f6b6
[SPARK-53246][TEST] remove class files for ReplSuite
cloud-fan Aug 12, 2025
d2b4966
[SPARK-52981][PYTHON] Add table argument support for Arrow Python UDTFs
allisonwang-db Aug 12, 2025
04a2d00
[SPARK-53110][SQL][PYTHON][CONNECT] Implement the time_trunc function…
uros-db Aug 12, 2025
19c8f90
[SPARK-53257][PYTHON][TESTS] Deduplicate have_graphviz and graphviz_r…
zhengruifeng Aug 12, 2025
2fef901
[MINOR][DOCS] Updated the docstring of DataStreamWriter.foreach() method
nagaarjun-p Aug 13, 2025
68fdc9b
[SPARK-53263][PYTHON] Support TimeType in df.toArrow
zhengruifeng Aug 13, 2025
aa1f7f1
[SPARK-53261][CORE][SQL] Use Java `String.join|StringJoiner` instead …
LuciferYang Aug 13, 2025
645ed16
[SPARK-53173][SQL][TESTS] Improve `Owner` regex pattern in the replac…
wangyum Aug 13, 2025
8163fa4
[SPARK-53124][SQL] Prune unnecessary fields from JsonTuple
wangyum Aug 13, 2025
ebf4dd1
[MINOR][PYTHON][TESTS] Use different temp table name in foreachBatch …
HyukjinKwon Aug 13, 2025
b3f12af
[SPARK-53266][TESTS] Regenerate benchmark results
dongjoon-hyun Aug 13, 2025
5c52a00
[SPARK-53265][PYTHON][DOCS] Add Arrow Python UDF Type Coercion Tables…
asl3 Aug 13, 2025
2d4f7c3
Revert "[SPARK-53265][PYTHON][DOCS] Add Arrow Python UDF Type Coercio…
zhengruifeng Aug 13, 2025
cf43735
[SPARK-53267][DOCS] Update the javadoc for Arrow UDF physical plans
zhengruifeng Aug 13, 2025
a656596
[MINOR][PYTHON][DOCS] Update an UDF example with specified eval type
zhengruifeng Aug 13, 2025
72c1d41
[SPARK-53270][SQL][TESTS] Disable oracle datetime pushdown tests in n…
dengziming Aug 13, 2025
43f650e
[SPARK-52844][PYTHON] Update numpy to 1.22
eschcam Aug 13, 2025
977bc7c
[SPARK-53106][SS] Add schema evolution tests for TWS Scala spark conn…
zeruibao Aug 13, 2025
14f3004
[SPARK-53271][PYTHON][INFRA] Show Python Versions in PySpark Jobs
zhengruifeng Aug 13, 2025
9297712
[SPARK-53272][SQL] Refactor SPJ pushdown logic out of BatchScanExec
chirag-s-db Aug 14, 2025
81850af
[SPARK-53278][INFRA] Improve `merge_spark_pr.py` to accept PR numbers…
dongjoon-hyun Aug 14, 2025
b7a9b42
[SPARK-53277][INFRA] Improve `merge_spark_pr.py` to stop early in cas…
dongjoon-hyun Aug 14, 2025
4aa3a36
[SPARK-53279][INFRA] Improve `determine_modules_for_files` to ignore …
dongjoon-hyun Aug 14, 2025
ca75a0e
[SPARK-53269][PYTHON][TESTS] Centralize connect dependency checks
zhengruifeng Aug 14, 2025
e20c21a
Revert "[SPARK-53277][INFRA] Improve `merge_spark_pr.py` to stop earl…
dongjoon-hyun Aug 14, 2025
a3a394e
[SPARK-53280][CORE] Use Java `instanceof` instead of `Throwables.thro…
dongjoon-hyun Aug 14, 2025
eba5381
[MINOR][DOCS] Fix an Arrow UDF example
zhengruifeng Aug 14, 2025
bdc4243
[SPARK-49984][CORE] Fix `supplementJava(Module|IPv6)Options` to updat…
Kimahriman Aug 14, 2025
0289833
[SPARK-52988][SQL] Fix race conditions at CREATE TABLE and FUNCTION w…
attilapiros Aug 14, 2025
2f27838
[SPARK-53276][SS] Checking if we own the stamp before closing RocksDB
ericm-db Aug 14, 2025
d453902
[SPARK-53050][PS] Enable MultiIndex.to_series() to return struct for …
xinrong-meng Aug 14, 2025
7f3c704
[SPARK-53269][PYTHON][FOLLOWUP] Fix GRPC and GRPCStatus check
dongjoon-hyun Aug 14, 2025
4544090
[SPARK-53284][PS] Adjust imports of Spark config in tests
xinrong-meng Aug 15, 2025
e5ed226
[SPARK-53285][INFRA] Run `Java 17/25` Maven install tests if necessary
dongjoon-hyun Aug 15, 2025
9414e46
[SPARK-53251][PYTHON] Enable DataFrame API testing with asTable() for…
allisonwang-db Aug 15, 2025
4b72478
[SPARK-53282][PYTHON][TESTS] Add test for arrow udf type hints
zhengruifeng Aug 15, 2025
337a67f
[SPARK-52741][SQL] RemoveFiles ShuffleCleanup mode doesnt work with n…
karuppayya Aug 15, 2025
cd8fdbc
[SPARK-53274][SQL] Support left and right join pushdown in JDBCScanBu…
PetarVasiljevic-DB Aug 15, 2025
f983940
[SPARK-52998][CORE] Multiple variables inside declare
TeodorDjelic Aug 15, 2025
efd7c85
[SPARK-53268][BUILD][TESTS] Update Oracle free version from 23.7 to 23.9
LucaCanali Aug 15, 2025
959f424
Revert "[SPARK-49872][CORE] allow unlimited json size again"
cloud-fan Aug 15, 2025
c68cf94
[SPARK-53290][SQL][CONNECT] Fix Metadata backward-compatibility breaking
yaooqinn Aug 15, 2025
923d70f
[SPARK-52307][PYTHON][FOLLOW-UP] Fix type hint for Scalar Arrow Itera…
zhengruifeng Aug 15, 2025
fd77ec6
[SPARK-53291][SQL] Fix nullability for value column
cashmand Aug 16, 2025
7831671
[SPARK-53297][SDP] Fix StreamingTable Declarative Pipelines API docst…
calilisantos Aug 18, 2025
87fc2ff
[SPARK-53299][INFRA] Rebalance the test modules of pandas API on connect
zhengruifeng Aug 18, 2025
2be3e54
[SPARK-53300][PYTHON][TESTS] Fix field names in test_unpivot
zhengruifeng Aug 18, 2025
59556b1
[SPARK-53302][PYTHON][TESTS] Make doctest of df.unpivot deterministic
zhengruifeng Aug 18, 2025
9e06a50
[SPARK-51920][SS][PYTHON] Fix composite/nested type in value state fo…
zeruibao Aug 18, 2025
c86093f
[SPARK-53146][CONNECT][SQL] Make MergeIntoTable in SparkConnectPlanne…
heyihong Aug 18, 2025
05101e9
[SPARK-53304][BUILD] Upgrade commons-text to 1.14.0
LuciferYang Aug 18, 2025
8ede68b
[SPARK-53307][CONNECT][CLIENT][PYTHON][SCALA] Remove RetriesExceeded …
khakhlyuk Aug 18, 2025
f951800
[SPARK-53305][PYTHON] Support TimeType in createDataFrame
zhengruifeng Aug 18, 2025
26dbf65
Revert "[SPARK-52709][SQL] Fix parsing of STRUCT<>"
cloud-fan Aug 18, 2025
b34b950
[SPARK-53306][SQL][CONNECT][YARN][TESTS] Fix wrong package statements
LuciferYang Aug 18, 2025
a58c5e1
[SPARK-53288][SS] Fix assertion error with streaming global limit
Aug 18, 2025
83e7b4f
[SPARK-53012][PYHTON] Support Arrow Python UDTF in Spark Connect
allisonwang-db Aug 18, 2025
7782a70
[SPARK-53301][PYTHON] Differentiate type hints of Pandas UDF and Arro…
zhengruifeng Aug 18, 2025
9f63d1d
[SPARK-53303][SS][CONNECT] Use the empty state encoder when the initi…
huanliwang-db Aug 18, 2025
076618a
[SPARK-49872][CORE] Remove jackson JSON string length limitation
cloud-fan Aug 19, 2025
9a62f7d
[SPARK-53311][SQL][PYTHON][CORE] Make PullOutNonDeterministic use can…
benhurdelhey Aug 19, 2025
2a5b097
[SPARK-53295][PS] Turn on ANSI by default for Pandas API on Spark
xinrong-meng Aug 19, 2025
d2e550f
[SPARK-53326][BUILD] Upgrade ORC Format to 1.1.1
williamhyun Aug 19, 2025
f41c538
[SPARK-52837][SQL][PYTHON][FOLLOW-UP] Specify the BitWidth of Arrow T…
zhengruifeng Aug 19, 2025
af0b444
[SPARK-53144][CONNECT][SQL] Make CreateViewCommand in SparkConnectPla…
heyihong Aug 19, 2025
77413d4
[SPARK-51874][SQL][FOLLOW-UP] Revert ParquetOptions rebase methods to…
cloud-fan Aug 19, 2025
9920b22
[SPARK-53030][PYTHON] Support Arrow writer for streaming Python data …
allisonwang-db Aug 19, 2025
5ce657f
[SPARK-53287][PS] Add ANSI Migration Guide
xinrong-meng Aug 20, 2025
4aa8b67
[SPARK-53331][PS] Re-enable SPARK_ANSI_SQL_MODE during doc generation
xinrong-meng Aug 20, 2025
594d26c
[SPARK-53015][BUILD] Upgrade log4j to 2.25.1
LuciferYang Aug 20, 2025
549c30a
[SPARK-52582][SQL] Improve the memory usage of XML parser
xiaonanyang-db Aug 20, 2025
967f2b6
[SPARK-53308][SQL] Don't remove aliases in RemoveRedundantAliases tha…
mihailoale-db Aug 20, 2025
ee619d3
[SPARK-53260][SQL] Reducing number of JDBC overhead connections creation
vanja-vujovic-db Aug 20, 2025
33df1b6
[SPARK-51874][SQL][FOLLOW-UP] Revert API changes of rebase methods in…
cloud-fan Aug 20, 2025
5660dba
[SPARK-53334][CONNECT] `LiteralValueProtoConverter` should keep the o…
zhengruifeng Aug 20, 2025
2e6f0ec
[MINOR][PYTHON][DOCS] Add TimeType to API reference
zhengruifeng Aug 20, 2025
44cdd26
[SPARK-52482][DOCS][FOLLOW-UP] Mention behavior changes in migration …
mzhang Aug 21, 2025
7e73d0e
[SPARK-53336][ML][CONNECT] Reset `MLCache.totalMLCacheSizeBytes` when…
WeichenXu123 Aug 21, 2025
12c87ce
[SPARK-53328][ML][CONNECT] Improve debuggability for SparkML-connect
WeichenXu123 Aug 21, 2025
8530444
[SPARK-53346][CONNECT] Avoid creating temporary collections in toCata…
zhengruifeng Aug 21, 2025
f0e8999
[SPARK-53265][PYTHON][DOCS] Add Arrow Python UDF Type Coercion Tables…
asl3 Aug 22, 2025
7007e1c
[SPARK-53345][SS][TESTS] Use withTempDir for consistent directory acr…
Aug 22, 2025
a322e0c
[SPARK-53348][SQL] Always persist ANSI value when creating a view or …
mihailoale-db Aug 22, 2025
1d84810
[SPARK-52873][SQL] Further restrict when SHJ semi/anti join can ignor…
bersprockets Aug 22, 2025
ce646b3
[SPARK-52991][SQL] Implement MERGE INTO with SCHEMA EVOLUTION for V2 …
szehon-ho Aug 22, 2025
e2eb540
[SPARK-53103][SS] Throw an error if state directory is not empty when…
Aug 22, 2025
6ab0df9
[SPARK-53044] Change Declarative Pipelines import alias convention fr…
sryza Aug 22, 2025
dab3464
[SPARK-52982][PYTHON] Disallow lateral join with Arrow Python UDTFs
allisonwang-db Aug 22, 2025
c863717
[SPARK-53358] Improve arrow Python UDTF output type mismatch error me…
allisonwang-db Aug 22, 2025
e28f427
[SPARK-53360][SQL] Once strategy with ConstantFolding's idempotence s…
viirya Aug 23, 2025
2a01925
[SPARK-53354][CONNECT] Simplify LiteralValueProtoConverter.toCatalyst…
heyihong Aug 24, 2025
8178f88
[SPARK-53353][PYTHON] Fail Scalar Iterator Arrow UDF with 0-arg
zhengruifeng Aug 24, 2025
f5f590b
[SPARK-53359][PYTHON] Fix Arrow UDTF to handle the results as iterator
ueshin Aug 24, 2025
1ff987f
[SPARK-53352][PYTHON] Refine the error message for unsupported return…
zhengruifeng Aug 24, 2025
50a2ebe
[SPARK-53344][DOCS] Add user guide for Arrow Python UDTFs
allisonwang-db Aug 25, 2025
ef9322f
[SPARK-53357][PYTHON] Update `pandas` to 2.3.2
bjornjorgensen Aug 25, 2025
c13c10f
[SPARK-53362][ML][CONNECT] Fix IDFModel local loader bug
WeichenXu123 Aug 25, 2025
bc36a7d
[SPARK-53275][SQL] Handle stateful expressions when ordering in inter…
bersprockets Aug 25, 2025
f441da4
[SPARK-52110][SDP][SQL][FOLLOWUP] Move optionsClause to before tableA…
jackywang-db Aug 25, 2025
79a0ca7
[SPARK-53366][CONNECT] Apply formatting rules to sql/connect/shims
zhengruifeng Aug 26, 2025
b5840e1
[SPARK-52930][CONNECT] Use DataType.Array/Map for Array/Map Literals
heyihong Aug 26, 2025
f0a3a2e
[SPARK-53349][SQL] Optimized XML parser can't handle corrupted files …
xiaonanyang-db Aug 26, 2025
5424514
[SPARK-53382][SQL] Fix rCTE bug with malformed recursion
Pajaraja Aug 26, 2025
fc1da93
[SPARK-53365][SQL] Unify code for persisting of configs in views and …
mihailoale-db Aug 26, 2025
20a6af7
[SPARK-52873][SQL][TESTS][FOLLOWUP] Fix test for non-ansi mode
bersprockets Aug 26, 2025
e74b77e
[SPARK-53342][SQL] Fix Arrow converter to handle multiple record batc…
grundprinzip Aug 26, 2025
ae7178c
[SPARK-53381][CONNECT] Avoid creating temporary collections in `toCat…
zhengruifeng Aug 26, 2025
67f9d37
[SPARK-53383][PYTHON][TESTS] Add tests to check the timezone handling…
zhengruifeng Aug 26, 2025
1dfb6a2
[SPARK-53330][SQL][PYTHON] Fix Arrow UDF with DayTimeIntervalType (bo…
benhurdelhey Aug 26, 2025
01667f1
[SPARK-53369][PYTHON] Fix error message for UDFs with `CHAR/VARCHAR` …
ilicmarkodb Aug 26, 2025
a9d6919
[SPARK-53367][PYTHON][SQL] add int to decimal coercion for Arrow UDFs
benhurdelhey Aug 27, 2025
b04245f
[SPARK-53388][SQL][TESTS] Split `collations.sql`
ilicmarkodb Aug 27, 2025
e7fb070
[SPARK-53384][SQL] Refactor variable resolution out
vladimirg-db Aug 27, 2025
994fc65
[SPARK-53109][SQL] Support TIME in the make_timestamp_ntz and try_mak…
uros-db Aug 27, 2025
21b1d11
[SPARK-53385][SQL] Refactor Identifier evaluation out
vladimirg-db Aug 27, 2025
2a4c188
[SPARK-53393][PYTHON] Disable memory profiler for Arrow Scalar Iterat…
zhengruifeng Aug 27, 2025
0c26d7a
[SPARK-53395][PYTHON][CONNECT][TESTS] Add tests for combinations of d…
zhengruifeng Aug 27, 2025
fb8df34
[SPARK-53391][CORE] Remove unused PrimitiveKeyOpenHashMap
yaooqinn Aug 27, 2025
8b1a748
[SPARK-53392][ML][CONNECT] Move SpecializedArray handling to connect-…
zhengruifeng Aug 27, 2025
5337a57
[SPARK-52777][SQL] Enable shuffle cleanup mode configuration in Spark…
karuppayya Aug 27, 2025
ac45be2
[SPARK-53294][SS] Enable StateDataSource with state checkpoint v2 (on…
Aug 27, 2025
86dad83
[SPARK-53348][SQL][FOLLOWUP] Don't run `AlwaysPersistedConfigsSuite` …
mihailoale-db Aug 27, 2025
51b5f30
[SPARK-53390][PS] Raise error when bools with None `astype` to ints u…
xinrong-meng Aug 27, 2025
0c9af99
[SPARK-51585][SQL][FOLLOWUP] Turn on ANSI mode in DockerJDBCIntegrati…
cloud-fan Aug 27, 2025
6204746
[SPARK-53397][PYTHON][TESTS] Fix UDTF with collations test indentation
ilicmarkodb Aug 28, 2025
e921a74
[SPARK-53414][PYTHON][TESTS] Add tests for Arrow UDF with profiler
zhengruifeng Aug 28, 2025
2167693
[SPARK-53408][SQL] Remove unused functions from `QueryCompilationErrors`
LuciferYang Aug 28, 2025
824da27
[SPARK-53391][CORE][FOLLOWUP] Add comments for PrimitiveKeyOpenHashMa…
yaooqinn Aug 28, 2025
316d06b
[SPARK-53318][SQL] Support the time type by make_timestamp_ltz()
uros-db Aug 28, 2025
2de0248
[SPARK-53417][PYTHON][TESTS] Add test for Arrow UDF with TimeType
zhengruifeng Aug 28, 2025
0c5797a
[SPARK-53416][SS][TESTS] Use `createOrReplaceTempView` instead of `re…
LuciferYang Aug 28, 2025
1f1bacc
[SPARK-53143][SQL] Fix self join in DataFrame API - Join is not the o…
davidm-db Aug 28, 2025
d233607
[SPARK-53412][K8S][INFRA][DOCS] Upgrade Volcano to 1.12.2
dongjoon-hyun Aug 28, 2025
596d03f
[SPARK-53415][SQL] Simply options for builtin FileFormats
yaooqinn Aug 28, 2025
7b8186a
[SPARK-53418][SQL] Support `TimeType` in `ColumnAccessor`
yaooqinn Aug 28, 2025
54b53f9
[SPARK-53398][SS] Ensure that RocksDBMemoryManager metrics reporting …
ericm-db Aug 28, 2025
9b2592c
[SPARK-53423][SQL] Move all the single-pass resolver related tags to …
mihailoale-db Aug 28, 2025
0f5204c
[SPARK-53403][PS] Improve add/sub tests under ANSI
xinrong-meng Aug 28, 2025
7b8877f
[SPARK-53355][PYTHON][SQL] test python udf type behavior
benhurdelhey Aug 28, 2025
78871d7
[SPARK-53394][CORE] UninterruptibleLock.isInterruptible should avoid …
Ngone51 Aug 29, 2025
5b2c4cf
[SPARK-53341][CORE] Expand golden test coverage on multivariable DECLARE
TeodorDjelic Aug 29, 2025
5bf4a29
[SPARK-53433][PYTHON][TESTS] Add test for Arrow UDF with VariantType
zhengruifeng Aug 29, 2025
b177b65
[SPARK-53431][PYTHON] Fix Python UDTF with named table arguments in D…
ueshin Aug 29, 2025
5c1c6e3
[MINOR] Fix redundant brace in log
WangGuangxin Aug 29, 2025
a68ac48
[SPARK-53419][SQL][TEST] Move common SqlScriptingContextManager initi…
vladimirg-db Aug 29, 2025
7bbc5d2
[SPARK-53386][SQL] Support query parameter ending with semicolon in J…
alekjarmov Aug 29, 2025
59b34dc
[SPARK-51168][BUILD] Upgrade to Hadoop 3.4.2
pan3793 Aug 29, 2025
eca8c62
[SPARK-53427][PS][TESTS] Test divisor 0 in truediv/floordiv/mod tests…
xinrong-meng Aug 30, 2025
08e39c3
[SPARK-53436][BUILD] Upgrade `Netty` to 4.1.124.Final
bjornjorgensen Aug 30, 2025
7e7380f
[SPARK-53424][PYTHON][TESTS] Hide traceback in `assertSchemaEqual/ass…
allisonwang-db Aug 30, 2025
38a3d32
[SPARK-53417][PYTHON][TESTS][FOLLOW-UP] Add more tests for aggregatio…
zhengruifeng Aug 31, 2025
6d36560
[SPARK-53422][SPARK-30269][SQL][TEST] Make test case robust
pan3793 Sep 1, 2025
5a0e5b1
[SPARK-48547][DEPLOY] Add opt-in flag to have SparkSubmit automatical…
JoshRosen Sep 1, 2025
871fe3d
[SPARK-53435][SQL] Fix race condition in CachedRDDBuilder
liuzqt Sep 1, 2025
c459d71
[SPARK-53421][SPARK-53377][SDP] Propagate Logical Plan ID in SDP Anal…
jackywang-db Sep 1, 2025
f93eff3
[SPARK-53329][CONNECT] Improve exception handling when adding artifacts
HendrikHuebner Sep 1, 2025
9c50156
[SPARK-53108][SQL] Implement the time_diff function in Scala
uros-db Sep 1, 2025
1485295
[SPARK-53437][SQL] InterpretedUnsafeProjection shall setNull4Bytes fo…
yaooqinn Sep 1, 2025
a74d50b
[SPARK-53156][CORE] Track Driver Memory Metrics when the Application …
Sep 1, 2025
688a30b
[SPARK-53433][TESTS][FOLLOW-UP] Make the test compatible with PyArrow…
HyukjinKwon Sep 2, 2025
11f8c36
FallbackStorage retries FileNotFoundExceptions
EnricoMi Nov 26, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion .asf.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.

# https://cwiki.apache.org/confluence/display/INFRA/git+-+.asf.yaml+features
# https://github.com/apache/infrastructure-asfyaml/blob/main/README.md
---
github:
description: "Apache Spark - A unified analytics engine for large-scale data processing"
Expand Down
30 changes: 28 additions & 2 deletions .github/workflows/benchmark.yml
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,11 @@ on:
description: 'Number of job splits'
required: true
default: '1'
create-commit:
type: boolean
description: 'Commit the benchmark results to the current branch'
required: true
default: false

jobs:
matrix-gen:
Expand Down Expand Up @@ -195,10 +200,31 @@ jobs:
# To keep the directory structure and file permissions, tar them
# See also https://github.com/actions/upload-artifact#maintaining-file-permissions-and-case-sensitive-files
echo "Preparing the benchmark results:"
tar -cvf benchmark-results-${{ inputs.jdk }}-${{ inputs.scala }}.tar `git diff --name-only` `git ls-files --others --exclude=tpcds-sf-1 --exclude=tpcds-sf-1-text --exclude-standard`
tar -cvf target/benchmark-results-${{ inputs.jdk }}-${{ inputs.scala }}.tar `git diff --name-only` `git ls-files --others --exclude=tpcds-sf-1 --exclude=tpcds-sf-1-text --exclude-standard`
- name: Create a pull request with the results
if: ${{ inputs.create-commit && success() }}
run: |
git config --local user.name "${{ github.actor }}"
git config --local user.email "${{ github.event.pusher.email || format('{0}@users.noreply.github.com', github.actor) }}"
git add -A
git commit -m "Benchmark results for ${{ inputs.class }} (JDK ${{ inputs.jdk }}, Scala ${{ inputs.scala }}, split ${{ matrix.split }} of ${{ inputs.num-splits }})"
for i in {1..5}; do
echo "Attempt $i to push..."
git fetch origin ${{ github.ref_name }}
git rebase origin/${{ github.ref_name }}
if git push origin ${{ github.ref_name }}:${{ github.ref_name }}; then
echo "Push successful."
exit 0
else
echo "Push failed, retrying in 3 seconds..."
sleep 3
fi
done
echo "Error: Failed to push after 5 attempts."
exit 1
- name: Upload benchmark results
uses: actions/upload-artifact@v4
with:
name: benchmark-results-${{ inputs.jdk }}-${{ inputs.scala }}-${{ matrix.split }}
path: benchmark-results-${{ inputs.jdk }}-${{ inputs.scala }}.tar
path: target/benchmark-results-${{ inputs.jdk }}-${{ inputs.scala }}.tar

94 changes: 80 additions & 14 deletions .github/workflows/build_and_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ jobs:
ui=false
docs=false
fi
build=`./dev/is-changed.py -m "core,unsafe,kvstore,avro,utils,network-common,network-shuffle,repl,launcher,examples,sketch,variant,api,catalyst,hive-thriftserver,mllib-local,mllib,graphx,streaming,sql-kafka-0-10,streaming-kafka-0-10,streaming-kinesis-asl,kubernetes,hadoop-cloud,spark-ganglia-lgpl,profiler,protobuf,yarn,connect,sql,hive,pipelines"`
build=`./dev/is-changed.py -m "core,unsafe,kvstore,avro,utils,utils-java,network-common,network-shuffle,repl,launcher,examples,sketch,variant,api,catalyst,hive-thriftserver,mllib-local,mllib,graphx,streaming,sql-kafka-0-10,streaming-kafka-0-10,streaming-kinesis-asl,kubernetes,hadoop-cloud,spark-ganglia-lgpl,profiler,protobuf,yarn,connect,sql,hive,pipelines"`
precondition="
{
\"build\": \"$build\",
Expand All @@ -122,6 +122,8 @@ jobs:
\"tpcds-1g\": \"$tpcds\",
\"docker-integration-tests\": \"$docker\",
\"lint\" : \"true\",
\"java17\" : \"$build\",
\"java25\" : \"$build\",
\"docs\" : \"$docs\",
\"yarn\" : \"$yarn\",
\"k8s-integration-tests\" : \"$kubernetes\",
Expand Down Expand Up @@ -240,7 +242,7 @@ jobs:
# Note that the modules below are from sparktestsupport/modules.py.
modules:
- >-
core, unsafe, kvstore, avro, utils,
core, unsafe, kvstore, avro, utils, utils-java,
network-common, network-shuffle, repl, launcher,
examples, sketch, variant
- >-
Expand Down Expand Up @@ -360,7 +362,7 @@ jobs:
- name: Install Python packages (Python 3.11)
if: (contains(matrix.modules, 'sql') && !contains(matrix.modules, 'sql-')) || contains(matrix.modules, 'connect') || contains(matrix.modules, 'yarn')
run: |
python3.11 -m pip install 'numpy>=1.20.0' pyarrow pandas scipy unittest-xml-reporting 'lxml==4.9.4' 'grpcio==1.67.0' 'grpcio-status==1.67.0' 'protobuf==5.29.1'
python3.11 -m pip install 'numpy>=1.22' pyarrow pandas scipy unittest-xml-reporting 'lxml==4.9.4' 'grpcio==1.67.0' 'grpcio-status==1.67.0' 'protobuf==5.29.1'
python3.11 -m pip list
# Run the tests.
- name: Run tests
Expand Down Expand Up @@ -519,13 +521,9 @@ jobs:
- >-
pyspark-pandas-slow
- >-
pyspark-pandas-connect-part0
pyspark-pandas-connect-part0, pyspark-pandas-connect-part3
- >-
pyspark-pandas-connect-part1
- >-
pyspark-pandas-connect-part2
- >-
pyspark-pandas-connect-part3
pyspark-pandas-connect-part1, pyspark-pandas-connect-part2
exclude:
# Always run if pyspark == 'true', even infra-image is skip (such as non-master job)
# In practice, the build will run in individual PR, but not against the individual commit
Expand Down Expand Up @@ -605,8 +603,9 @@ jobs:
run: |
for py in $(echo $PYTHON_TO_TEST | tr "," "\n")
do
echo $py
$py --version
$py -m pip list
echo ""
done
- name: Install Conda for pip packaging test
if: contains(matrix.modules, 'pyspark-errors')
Expand Down Expand Up @@ -919,6 +918,42 @@ jobs:
- name: R linter
run: ./dev/lint-r

java17:
needs: [precondition]
if: fromJson(needs.precondition.outputs.required).java17 == 'true'
name: Java 17 build with Maven
runs-on: ubuntu-latest
timeout-minutes: 120
steps:
- uses: actions/checkout@v4
- uses: actions/setup-java@v4
with:
distribution: zulu
java-version: 17
- name: Build with Maven
run: |
export MAVEN_OPTS="-Xss64m -Xmx4g -Xms4g -XX:ReservedCodeCacheSize=128m -Dorg.slf4j.simpleLogger.defaultLogLevel=WARN"
export MAVEN_CLI_OPTS="--no-transfer-progress"
./build/mvn $MAVEN_CLI_OPTS -DskipTests -Pyarn -Pkubernetes -Pvolcano -Phive -Phive-thriftserver -Phadoop-cloud -Pjvm-profiler -Pspark-ganglia-lgpl -Pkinesis-asl clean install

java25:
needs: [precondition]
if: fromJson(needs.precondition.outputs.required).java25 == 'true'
name: Java 25 build with Maven
runs-on: ubuntu-latest
timeout-minutes: 120
steps:
- uses: actions/checkout@v4
- uses: actions/setup-java@v4
with:
distribution: zulu
java-version: 25-ea
- name: Build with Maven
run: |
export MAVEN_OPTS="-Xss64m -Xmx4g -Xms4g -XX:ReservedCodeCacheSize=128m -Dorg.slf4j.simpleLogger.defaultLogLevel=WARN"
export MAVEN_CLI_OPTS="--no-transfer-progress"
./build/mvn $MAVEN_CLI_OPTS -DskipTests -Pyarn -Pkubernetes -Pvolcano -Phive -Phive-thriftserver -Phadoop-cloud -Pjvm-profiler -Pspark-ganglia-lgpl -Pkinesis-asl clean install

# Documentation build
docs:
needs: [precondition, infra-image]
Expand Down Expand Up @@ -998,10 +1033,14 @@ jobs:
# Should unpin 'sphinxcontrib-*' after upgrading sphinx>5
python3.9 -m pip install 'sphinx==4.5.0' mkdocs 'pydata_sphinx_theme>=0.13' sphinx-copybutton nbsphinx numpydoc jinja2 markupsafe 'pyzmq<24.0.0' 'sphinxcontrib-applehelp==1.0.4' 'sphinxcontrib-devhelp==1.0.2' 'sphinxcontrib-htmlhelp==2.0.1' 'sphinxcontrib-qthelp==1.0.3' 'sphinxcontrib-serializinghtml==1.1.5'
python3.9 -m pip install ipython_genutils # See SPARK-38517
python3.9 -m pip install sphinx_plotly_directive 'numpy>=1.20.0' pyarrow pandas 'plotly<6.0.0'
python3.9 -m pip install sphinx_plotly_directive 'numpy>=1.22' pyarrow pandas 'plotly<6.0.0'
python3.9 -m pip install 'docutils<0.18.0' # See SPARK-39421
- name: List Python packages
- name: List Python packages for branch-3.5 and branch-4.0
if: inputs.branch == 'branch-3.5' || inputs.branch == 'branch-4.0'
run: python3.9 -m pip list
- name: List Python packages
if: inputs.branch != 'branch-3.5' && inputs.branch != 'branch-4.0'
run: python3.11 -m pip list
- name: Install dependencies for documentation generation
run: |
# Keep the version of Bundler here in sync with the following locations:
Expand All @@ -1010,7 +1049,8 @@ jobs:
gem install bundler -v 2.4.22
cd docs
bundle install --retry=100
- name: Run documentation build
- name: Run documentation build for branch-3.5 and branch-4.0
if: inputs.branch == 'branch-3.5' || inputs.branch == 'branch-4.0'
run: |
# We need this link to make sure `python3` points to `python3.9` which contains the prerequisite packages.
ln -s "$(which python3.9)" "/usr/local/bin/python3"
Expand All @@ -1031,6 +1071,30 @@ jobs:
echo "SKIP_SQLDOC: $SKIP_SQLDOC"
cd docs
bundle exec jekyll build
- name: Run documentation build
if: inputs.branch != 'branch-3.5' && inputs.branch != 'branch-4.0'
run: |
# We need this link to make sure `python3` points to `python3.11` which contains the prerequisite packages.
ln -s "$(which python3.11)" "/usr/local/bin/python3"
# Build docs first with SKIP_API to ensure they are buildable without requiring any
# language docs to be built beforehand.
cd docs; SKIP_ERRORDOC=1 SKIP_API=1 bundle exec jekyll build; cd ..
if [ -f "./dev/is-changed.py" ]; then
# Skip PySpark and SparkR docs while keeping Scala/Java/SQL docs
pyspark_modules=`cd dev && python3.11 -c "import sparktestsupport.modules as m; print(','.join(m.name for m in m.all_modules if m.name.startswith('pyspark')))"`
if [ `./dev/is-changed.py -m $pyspark_modules` = false ]; then export SKIP_PYTHONDOC=1; fi
if [ `./dev/is-changed.py -m sparkr` = false ]; then export SKIP_RDOC=1; fi
fi
export PYSPARK_DRIVER_PYTHON=python3.11
export PYSPARK_PYTHON=python3.11
# Print the values of environment variables `SKIP_ERRORDOC`, `SKIP_SCALADOC`, `SKIP_PYTHONDOC`, `SKIP_RDOC` and `SKIP_SQLDOC`
echo "SKIP_ERRORDOC: $SKIP_ERRORDOC"
echo "SKIP_SCALADOC: $SKIP_SCALADOC"
echo "SKIP_PYTHONDOC: $SKIP_PYTHONDOC"
echo "SKIP_RDOC: $SKIP_RDOC"
echo "SKIP_SQLDOC: $SKIP_SQLDOC"
cd docs
bundle exec jekyll build
- name: Tar documentation
if: github.repository != 'apache/spark'
run: tar cjf site.tar.bz2 docs/_site
Expand Down Expand Up @@ -1279,8 +1343,10 @@ jobs:
kubectl create clusterrolebinding serviceaccounts-cluster-admin --clusterrole=cluster-admin --group=system:serviceaccounts || true
if [[ "${{ inputs.branch }}" == 'branch-3.5' ]]; then
kubectl apply -f https://raw.githubusercontent.com/volcano-sh/volcano/v1.7.0/installer/volcano-development.yaml || true
else
elif [[ "${{ inputs.branch }}" == 'branch-4.0' ]]; then
kubectl apply -f https://raw.githubusercontent.com/volcano-sh/volcano/v1.11.0/installer/volcano-development.yaml || true
else
kubectl apply -f https://raw.githubusercontent.com/volcano-sh/volcano/v1.12.2/installer/volcano-development.yaml || true
fi
eval $(minikube docker-env)
build/sbt -Phadoop-3 -Psparkr -Pkubernetes -Pvolcano -Pkubernetes-integration-tests -Dspark.kubernetes.test.volcanoMaxConcurrencyJobNum=1 -Dtest.exclude.tags=local "kubernetes-integration-tests/test"
Expand Down
14 changes: 0 additions & 14 deletions .github/workflows/build_infra_images_cache.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,6 @@ on:
- 'dev/spark-test-image/python-minimum/Dockerfile'
- 'dev/spark-test-image/python-ps-minimum/Dockerfile'
- 'dev/spark-test-image/pypy-310/Dockerfile'
- 'dev/spark-test-image/python-309/Dockerfile'
- 'dev/spark-test-image/python-310/Dockerfile'
- 'dev/spark-test-image/python-311/Dockerfile'
- 'dev/spark-test-image/python-311-classic-only/Dockerfile'
Expand Down Expand Up @@ -153,19 +152,6 @@ jobs:
- name: Image digest (PySpark with PyPy 3.10)
if: hashFiles('dev/spark-test-image/pypy-310/Dockerfile') != ''
run: echo ${{ steps.docker_build_pyspark_pypy_310.outputs.digest }}
- name: Build and push (PySpark with Python 3.9)
if: hashFiles('dev/spark-test-image/python-309/Dockerfile') != ''
id: docker_build_pyspark_python_309
uses: docker/build-push-action@v6
with:
context: ./dev/spark-test-image/python-309/
push: true
tags: ghcr.io/apache/spark/apache-spark-github-action-image-pyspark-python-309-cache:${{ github.ref_name }}-static
cache-from: type=registry,ref=ghcr.io/apache/spark/apache-spark-github-action-image-pyspark-python-309-cache:${{ github.ref_name }}
cache-to: type=registry,ref=ghcr.io/apache/spark/apache-spark-github-action-image-pyspark-python-309-cache:${{ github.ref_name }},mode=max
- name: Image digest (PySpark with Python 3.9)
if: hashFiles('dev/spark-test-image/python-309/Dockerfile') != ''
run: echo ${{ steps.docker_build_pyspark_python_309.outputs.digest }}
- name: Build and push (PySpark with Python 3.10)
if: hashFiles('dev/spark-test-image/python-310/Dockerfile') != ''
id: docker_build_pyspark_python_310
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/build_maven_java21_arm.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ name: "Build / Maven (master, Scala 2.13, Hadoop 3, JDK 21, ARM)"

on:
schedule:
- cron: '0 15 * * *'
- cron: '0 15 */2 * *'
workflow_dispatch:

jobs:
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/build_non_ansi.yml
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ jobs:
"PYSPARK_IMAGE_TO_TEST": "python-311",
"PYTHON_TO_TEST": "python3.11",
"SPARK_ANSI_SQL_MODE": "false",
"SPARK_TEST_SPARK_BLOOM_FILTER_SUITE_ENABLED": "true"
}
jobs: >-
{
Expand Down
47 changes: 0 additions & 47 deletions .github/workflows/build_python_3.9.yml

This file was deleted.

2 changes: 1 addition & 1 deletion .github/workflows/build_python_connect.yml
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ jobs:
python packaging/client/setup.py sdist
cd dist
pip install pyspark*client-*.tar.gz
pip install 'grpcio==1.67.0' 'grpcio-status==1.67.0' 'protobuf==5.29.1' 'googleapis-common-protos==1.65.0' 'graphviz==0.20.3' 'six==1.16.0' 'pandas==2.2.3' scipy 'plotly<6.0.0' 'mlflow>=2.8.1' coverage matplotlib openpyxl 'memory-profiler>=0.61.0' 'scikit-learn>=1.3.2' 'graphviz==0.20.3' 'torch<2.6.0' torchvision torcheval deepspeed unittest-xml-reporting
pip install 'grpcio==1.67.0' 'grpcio-status==1.67.0' 'protobuf==5.29.1' 'googleapis-common-protos==1.65.0' 'graphviz==0.20.3' 'six==1.16.0' 'pandas==2.3.2' scipy 'plotly<6.0.0' 'mlflow>=2.8.1' coverage matplotlib openpyxl 'memory-profiler>=0.61.0' 'scikit-learn>=1.3.2' 'graphviz==0.20.3' 'torch<2.6.0' torchvision torcheval deepspeed unittest-xml-reporting
- name: List Python packages
run: python -m pip list
- name: Run tests (local)
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/build_python_connect35.yml
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ jobs:
./build/sbt -Phive Test/package
- name: Install Python dependencies
run: |
pip install 'numpy==1.25.1' 'pyarrow==12.0.1' 'pandas<=2.0.3' scipy unittest-xml-reporting 'plotly<6.0.0' 'mlflow>=2.3.1' coverage 'matplotlib==3.7.2' openpyxl 'memory-profiler==0.60.0' 'scikit-learn==1.1.*'
pip install 'numpy==1.25.1' 'pyarrow>=18.0.0' 'pandas<=2.0.3' scipy unittest-xml-reporting 'plotly<6.0.0' 'mlflow>=2.3.1' coverage 'matplotlib==3.7.2' openpyxl 'memory-profiler==0.60.0' 'scikit-learn==1.1.*'

# Add Python deps for Spark Connect.
pip install 'grpcio==1.67.0' 'grpcio-status==1.67.0' 'protobuf==5.29.1' 'googleapis-common-protos==1.65.0' 'graphviz==0.20.3'
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/build_python_minimum.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ jobs:
envs: >-
{
"PYSPARK_IMAGE_TO_TEST": "python-minimum",
"PYTHON_TO_TEST": "python3.9"
"PYTHON_TO_TEST": "python3.10"
}
jobs: >-
{
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/build_python_ps_minimum.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ jobs:
envs: >-
{
"PYSPARK_IMAGE_TO_TEST": "python-ps-minimum",
"PYTHON_TO_TEST": "python3.9"
"PYTHON_TO_TEST": "python3.10"
}
jobs: >-
{
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/build_sparkr_window.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
# specific language governing permissions and limitations
# under the License.
#
name: "Build / SparkR-only (master, 4.4.3, windows-2022)"
name: "Build / SparkR-only (master, 4.4.3, windows-2025)"

on:
schedule:
Expand All @@ -26,7 +26,7 @@ on:
jobs:
build:
name: "Build module: sparkr"
runs-on: windows-2022
runs-on: windows-2025
timeout-minutes: 120
if: github.repository == 'apache/spark'
steps:
Expand Down
Loading
Loading