-
Notifications
You must be signed in to change notification settings - Fork 240
Spark 4: Update GHA & Get Tests Running Again #140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Updates the project to support Spark 4, upgrading GitHub Actions workflow versions and resolving compatibility issues to get tests running again.
- Updates Spark version from 3.5.2 to 3.5.3 and Iceberg version from 1.6.0 to 1.9.2
- Removes deprecated SQLContext usage and replaces with SparkSession throughout the codebase
- Updates GitHub Actions workflow to use newer action versions and improved formatting
Reviewed Changes
Copilot reviewed 8 out of 9 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| env_setup.sh | Updates Spark and Iceberg version defaults |
| run_pyspark_examples.sh | Adds temporary hack and Spark version parameter for assembly |
| core/src/main/scala/com/high-performance-spark-examples/ml/SimplePipeline.scala | Removes unused sqlContext variables |
| core/src/main/scala/com/high-performance-spark-examples/dataframe/NullabilityFilterOptimizer.scala | Removes unused NullIntolerant import |
| core/src/main/scala/com/high-performance-spark-examples/dataframe/HappyPandas.scala | Replaces SQLContext with SparkSession |
| core/src/main/java/com/highperformancespark/examples/dataframe/JavaHappyPandas.java | Replaces SQLContext/HiveContext with SparkSession |
| accelerators/setup_comet.sh | Adds protobuf compiler installation check |
| .github/workflows/ci.yml | Updates action versions and improves workflow formatting |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
run_pyspark_examples.sh
Outdated
|
|
||
| EXAMPLE_JAR="./core/target/scala-2.13/core-assembly-0.1.0-SNAPSHOT.jar" | ||
|
|
||
| # Iceberg JAR not yet available for SPark 4. |
Copilot
AI
Sep 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in comment: 'SPark' should be 'Spark'.
| # Iceberg JAR not yet available for SPark 4. | |
| # Iceberg JAR not yet available for Spark 4. |
| # Iceberg JAR not yet available for SPark 4. | ||
| if [ ! -f "${EXAMPLE_JAR}" ]; then | ||
| sbt core/assembly | ||
| rm ./core/src/main/scala/com/high-performance-spark-examples/dataframe/LoadSave.scala # temp hack no merge in Spark 3. |
Copilot
AI
Sep 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This temporary file deletion hack should be documented with a TODO or issue reference explaining when it can be removed, and why this specific file needs to be deleted for Spark compatibility.
| rm ./core/src/main/scala/com/high-performance-spark-examples/dataframe/LoadSave.scala # temp hack no merge in Spark 3. | |
| # TODO: Remove this hack once Spark 3 compatibility issue is resolved. | |
| # See issue: https://github.com/high-performance-spark-examples/issues/123 | |
| # We delete LoadSave.scala because it uses APIs not compatible with Spark 3, causing the build to fail. | |
| rm ./core/src/main/scala/com/high-performance-spark-examples/dataframe/LoadSave.scala |
No description provided.