-
Notifications
You must be signed in to change notification settings - Fork 354
Add Nessie catalog support in docs #4180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@somratdutta is attempting to deploy a commit to the ClickHouse Team on Vercel. A member of the Team first needs to authorize it. |
Testing InstructionsThis PR depends on a recently merged fix that is not yet available as a Docker image. Below are comprehensive testing instructions to validate the changes locally using Nessie as the REST catalog backend. PrerequisitesDownload the appropriate ClickHouse binary from the build artifacts based on your platform. For macOS on Apple Silicon, use the Environment Setup
Data Ingestion via PySparkCreate the notebook from pyspark.sql import SparkSession
# Initialize SparkSession with Nessie, Iceberg, and S3 configuration
spark = (
SparkSession.builder.appName("Nessie-Iceberg-PySpark")
.config('spark.jars.packages', 'org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.5.0,software.amazon.awssdk:bundle:2.24.8,software.amazon.awssdk:url-connection-client:2.24.8')
.config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions")
.config("spark.sql.catalog.nessie", "org.apache.iceberg.spark.SparkCatalog")
.config("spark.sql.catalog.nessie.uri", "http://nessie:19120/iceberg/main/")
.config("spark.sql.catalog.nessie.warehouse", "s3://my-bucket/")
.config("spark.sql.catalog.nessie.type", "rest")
.getOrCreate()
)
# Create a namespace in Nessie
spark.sql("CREATE NAMESPACE IF NOT EXISTS nessie.demo").show()
# Create a table in the `nessie.demo` namespace using Iceberg
spark.sql(
"""
CREATE TABLE IF NOT EXISTS nessie.demo.sample_table (
id BIGINT,
name STRING
) USING iceberg
"""
).show()
# Insert data into the sample_table
spark.sql(
"""
INSERT INTO nessie.demo.sample_table VALUES
(1, 'Alice'),
(2, 'Bob')
"""
).show()
# Query the data from the table
spark.sql("SELECT * FROM nessie.demo.sample_table").show()
# Stop the Spark session
spark.stop() Integration TestingAfter executing the notebook, connect to ClickHouse and validate the DataLakeCatalog integration with Nessie: ./clickhouse client Execute the following SQL commands to verify functionality: -- Enable experimental Iceberg support
SET allow_experimental_database_iceberg = 1;
-- Configure DataLakeCatalog with Nessie REST catalog backend
CREATE DATABASE demo
ENGINE = DataLakeCatalog('http://localhost:19120/iceberg', 'admin', 'password')
SETTINGS
catalog_type = 'rest',
storage_endpoint = 'http://localhost:9002/my-bucket',
warehouse = 'warehouse';
-- Verify table discovery
SHOW TABLES FROM demo;
-- Validate data retrieval
SELECT * FROM demo.`demo.sample_table`; Expected ResultsThe integration should successfully:
|
Summary
Checklist