feat: add Databricks integration #14

dgokeeffe · 2025-09-06T04:12:38Z

Add Databricks integration

Adds PySpark support for processing OpenElectricity data in Databricks environments.

What this enables

Convert API responses to PySpark DataFrames for Databricks processing
Run ETL workflows on electricity data in Databricks
Process large-scale electricity datasets using Spark

Usage in Databricks

from openelectricity import OEClient

client = OEClient()
facilities = client.get_facilities()

# Convert to PySpark DataFrame for Databricks
df = facilities.to_pyspark()
df.write.mode("overwrite").saveAsTable("facilities_data")

Files added

openelectricity/pyspark_datasource.py - Core PySpark integration
openelectricity/spark_utils.py - Spark utilities
examples/databricks/ - Databricks examples and ETL workflows
examples/pyspark_simple.py - Basic PySpark usage
PySpark test files

Modified

openelectricity/client.py - Added PySpark support
openelectricity/models/timeseries.py - Added to_pyspark() methods
openelectricity/models/facilities.py - Added PySpark integration
pyproject.toml - Added PySpark as optional dependency

Notes

PySpark is optional - works without it installed
Fully backward compatible
Includes Databricks-specific examples and ETL workflows

- Add PySpark data source integration with automatic schema detection - Implement to_pyspark() methods for all response types with graceful fallbacks - Add comprehensive PySpark test suite covering facilities, market, and network data - Include Databricks integration examples and ETL workflows - Add performance optimization utilities and error handling - Support for both local PySpark and Databricks environments - PySpark is completely optional - SDK works without it installed - Add examples demonstrating PySpark functionality and fallbacks

- Add test_facilities_data.py: Comprehensive testing of facility data parsing and validation with real API responses - Add test_market_metrics.py: Test market metrics functionality and API response handling - Add test_sync_client.py: Complete test suite for synchronous OEClient implementation including error handling, session management, and API methods - Add test_timezone_handling.py: Test timezone handling in PySpark DataFrame conversions - Add tests/conftest.py: Centralized pytest fixtures for API keys, clients, and test configuration - Add tests/README.md: Comprehensive documentation for test suite setup, running, and fixture usage - Update pyproject.toml: Register custom pytest markers (slow, integration) to eliminate warnings The test suite includes: - Unit tests for client initialization and configuration - Integration tests for API endpoints (facilities, market, network data) - PySpark DataFrame conversion tests with timezone handling - Error handling and edge case testing - Proper fixture management with graceful skipping when dependencies unavailable - Comprehensive documentation for test setup and execution All tests pass with proper skipping for missing API keys or dependencies.

dgokeeffe · 2025-09-18T10:14:53Z

Any chance we can merge in @nc9?

dgokeeffe · 2025-11-24T03:37:38Z

@nc9

I can resolve the merge conflicts, but after that, can we proceed with the merge, please?

- Remove pydantic-settings dependency - Simplify conftest.py fixtures - Add new examples and type exports - Update to version 0.9.3 - Remove settings_schema.py module

- Fixed _build_url method that was duplicating /v4 in endpoint URLs - Resolves 404 errors when calling market API endpoints - Added diagnostic logging to databricks_etl.py - Corrected demand_energy unit label from GWh to MWh"

Extract location data from nested location object in facilities API response to enable geospatial analysis. Changes: - Update to_records() and to_pyspark() to extract latitude/longitude - Handle missing location data gracefully (None values) - Add 9 tests for location extraction functionality - Update existing tests to expect new columns DataFrame output now includes latitude and longitude columns.

Add `upload-databricks` target that builds and uploads wheel to Unity Catalog volume using upload_wheel_to_volume.py script. Usage: make upload-databricks

dgokeeffe changed the title ~~feat: add PySpark integration for large-scale data processing~~ feat: add Databricks integration Sep 6, 2025

dgokeeffe and others added 10 commits September 6, 2025 14:18

Split pyspark out into it's own extra dependency

adb4032

Merge branch 'main' into main

113c809

Bump version to 0.9.0

d4de263

Adding a conftest.py for test fixtures

9251cb1

Adding README for tests

eb80863

Make the pyspark tests optional

1190171

Do less handling of missing data points to not mask the errors

5c5c39c

Rename extras to Databricks and add the ETL scripts back in

013b4f9

Merge branch 'opennem:main' into main

f2d2e5a

dgokeeffe added 5 commits November 26, 2025 12:25

Merge upstream changes from opennem/openelectricity-python

43a8750

- Remove pydantic-settings dependency - Simplify conftest.py fixtures - Add new examples and type exports - Update to version 0.9.3 - Remove settings_schema.py module

fix: Remove duplicate /v4 in market API URL construction

f5ffccc

- Fixed _build_url method that was duplicating /v4 in endpoint URLs - Resolves 404 errors when calling market API endpoints - Added diagnostic logging to databricks_etl.py - Corrected demand_energy unit label from GWh to MWh"

chore: Bump version to 0.11.0

902451d

feat: Add Makefile target to upload wheel to Databricks

787a385

Add `upload-databricks` target that builds and uploads wheel to Unity Catalog volume using upload_wheel_to_volume.py script. Usage: make upload-databricks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add Databricks integration #14

feat: add Databricks integration #14

Uh oh!

dgokeeffe commented Sep 6, 2025

Uh oh!

dgokeeffe commented Sep 18, 2025

Uh oh!

dgokeeffe commented Nov 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: add Databricks integration #14

Are you sure you want to change the base?

feat: add Databricks integration #14

Uh oh!

Conversation

dgokeeffe commented Sep 6, 2025

Add Databricks integration

What this enables

Usage in Databricks

Files added

Modified

Notes

Uh oh!

dgokeeffe commented Sep 18, 2025

Uh oh!

dgokeeffe commented Nov 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant