-
Notifications
You must be signed in to change notification settings - Fork 352
RELATIONAL-JDBC: Add support for cockroach DB #3352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
CockroachDB uses PostgreSQL wire protocol and is indistinguishable from PostgreSQL at the JDBC driver level. Compatibility is ensured through INT4 schema declarations and comprehensive test coverage. All PostgreSQL tests run against both PostgreSQL and CockroachDB.
- Created 5 new integration test classes that mirror PostgreSQL tests - CockroachApplicationIT, CockroachManagementServiceIT, CockroachPolicyServiceIT, CockroachRestCatalogIT, CockroachViewFileIT - All tests extend the same base test classes as PostgreSQL tests - Use CockroachRelationalJdbcProfile for CockroachDB container lifecycle - Tests run with same logic as PostgreSQL to ensure compatibility - Enabled parallel test execution (maxParallelForks = 2) - PostgreSQL and CockroachDB tests can now run concurrently This ensures comprehensive test coverage for CockroachDB backend.
CockroachDB Support: - Added COCKROACHDB to DatabaseType enum (maps to "cockroachdb" directory) - Created separate schema directory: cockroachdb/schema-v1.sql - CockroachDB schema v1 based on PostgreSQL schema v3 (includes all tables) - Uses INT4 explicitly for integer columns (required for CockroachDB JDBC driver) Database Type Detection and Validation: - Implemented DatabaseType.inferFromConnection() to detect database from JDBC metadata - If database-type is configured: uses configured type and validates against connection - If configured type mismatches connection: throws IllegalStateException with clear error - If no configured type: auto-detects from connection (CockroachDB, PostgreSQL, H2) - Validation ensures configuration matches actual database Configuration: - New property: polaris.persistence.relational.jdbc.database-type - Supported values: "postgresql", "cockroachdb", "h2" - If set, configuration is authoritative and validated against connection - If not set, auto-detects from JDBC connection metadata Bootstrap Support: - Added DatabaseType.getLatestSchemaVersion() method - PostgreSQL: latest version is 3 - CockroachDB: latest version is 1 - Updated JdbcBootstrapUtils to use database-specific latest version - Bootstrap automatically selects correct schema for each database type Test Configuration: - CockroachDB test profiles explicitly set database-type=cockroachdb - Ensures proper identification even with PostgreSQL JDBC driver - Added 5 integration test classes for server module - Enabled parallel test execution (PostgreSQL and CockroachDB tests run concurrently) PostgreSQL Schema: - Kept original INTEGER/INT types (no changes to existing schemas) - Removed trailing newlines from schema files
5a57873 to
b25ebae
Compare
| Set<Long> seenEntityIds = new HashSet<>(); | ||
| for (PolarisBaseEntity entity : entities) { | ||
| if (!seenEntityIds.add(entity.getId())) { | ||
| throw new RetryOnConcurrencyException( | ||
| "Multiple updates to entity id '%s' in the same transaction", entity.getId()); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the difference and why its required:
PostgreSQL SERIALIZABLE:
- Uses Serializable Snapshot Isolation (SSI)
- When it detects a write-write conflict within the same transaction, it typically throws SQLSTATE 40001 (serialization_failure) immediately
- It's designed to fail fast rather than retry
CockroachDB SERIALIZABLE:
- Has automatic retry logic built-in at the database level
- When it detects a conflict, it enters an internal retry loop hoping the conflict resolves
- For conflicts within the same transaction (unresolvable), it can get stuck retrying
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why / when would this code be invoked with multiple entities having the same ID? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
precisely this test got stuck : https://github.com/apache/polaris/blob/main/integration-tests/src/main/java/org/apache/polaris/service/it/test/PolarisRestCatalogIntegrationBase.java#L1088
i think it more about the implementation of how transaction api is implemented in Polaris
Line 1045 in 4cf09ec
| commitTransactionRequest.tableChanges().stream() |
if you see we could collect all the changes for one table and then apply then one shot rather than making then two calls ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting ... 🤔
Are you saying that our impl. of commitTransaction() will cause multiple PolarisBaseEntity objects to be written via the Persistence interface, while some of those objects share the same ID?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
precisely :) atleast thats my understanding is, we should group by table-uuid collect all the changes and then call commit for individual tables
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH, that looks like a bug in the commitTransaction() impl. 😅
+1 to grouping changes by table (ID) and writing in one batch.... However, it might be best to make that fix before introducing CockroachDB... just for the sake of clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you go : #3360
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sweet - let's merge #3360 , rebase and re-test whether this particular change is still necessary for CockroachDB, if you do not mind.
Please use the standard |
dimas-b
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to supporting CockroachDB and thanks for picking this work up, @singhpk234 !
Some comments below.
| @Override | ||
| public Map<String, String> start() { | ||
| // CockroachDB testcontainers uses PostgreSQL wire protocol compatibility | ||
| cockroach = new CockroachContainer("cockroachdb/cockroach:v24.3.0"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use ContainerSpecHelper to enable managing docker image versions via Renovate.
Cf.
Line 46 in 5d77feb
| containerSpecHelper("postgres", PostgresRelationalJdbcLifeCycleManagement.class) |
| PRIMARY KEY (event_id) | ||
| ); | ||
|
|
||
| -- INT4 type used directly in table definitions for CockroachDB JDBC compatibility |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is probably preferable to move this comment to the top of this file for visibility. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since CockroachDB uses mostly the same Polaris code a PostgreSQL, I believe the schema versions are best kept in sync. I do not think we have to support old schemas in CockroachDB, but the current one should probably be v3 as our PostgreSQL schema... WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about the same though it make it a bit tricky in sense where we take schema as part of bootstrap, hence thought its better to keep it v1 thought the contents are of v3 sql (as it would just have v3 no v1 / v2).
I additionally also thought how about we do this
ALTER COLUMN FROM INT to INT4 to make it work with existing pg version scripts but it seems like we the feature to enable this
Caused by: java.sql.SQLException: Failed due to 'ERROR: ALTER COLUMN TYPE from int to int4 is only supported experimentally
Hint: See: https://go.crdb.dev/issue-v/49329/v24.3
apparently
SET experimental_enable_alter_column_type_general = true;
ALTER TABLE your_table_name ALTER COLUMN your_column_name TYPE INT4;
setting this flag would be confusing.
Considering we are above i resorted to v1 name, please let me know your thoughts considering above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[...] a bit tricky in sense where we take schema as part of bootstrap,
Sorry, @singhpk234 , I do not quite understand this part 🤔 what exactly is "tricky" in case CockroachDB uses v3 for the schema version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries at all (apologies for not being clear), what i meant is what happens some one does bootstrap with v2 in this case, since we give schema version as arg for bootstrapping
Line 31 in 4cf09ec
| * Determines the correct schema version to use for bootstrapping a realm. |
may be we can abstract from the user in the handling ?
secondly can be confusing why there is no v1 / v2 unlike other persistence impl.
I believe when we add mariadb / mysql support we would have to work with an assumption that the column name are same and types (for example JSONB vs JSON) are handled accordingly so i just thought of starting fresh with v1 for cockroach which is exactly same as v3 of postgres schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but what about this?
Lines 75 to 81 in b1142b5
| public static List<String> getAllColumnNames(int schemaVersion) { | |
| if (schemaVersion < 2) { | |
| return ALL_COLUMNS; | |
| } else { | |
| return ALL_COLUMNS_V2; | |
| } | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch, i missed it, let me think this more
| Set<Long> seenEntityIds = new HashSet<>(); | ||
| for (PolarisBaseEntity entity : entities) { | ||
| if (!seenEntityIds.add(entity.getId())) { | ||
| throw new RetryOnConcurrencyException( | ||
| "Multiple updates to entity id '%s' in the same transaction", entity.getId()); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why / when would this code be invoked with multiple entities having the same ID? 🤔
| Set<Long> seenEntityIds = new HashSet<>(); | ||
| for (PolarisBaseEntity entity : entities) { | ||
| if (!seenEntityIds.add(entity.getId())) { | ||
| throw new RetryOnConcurrencyException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why a "retry" exception as opposed to an IllegalArgumentException? I believe the behaviour of this method with multiple objects having the same ID is not well-defined and such situations should be avoided on the caller side.
| public int getLatestSchemaVersion() { | ||
| return switch (this) { | ||
| case POSTGRES -> 3; // PostgreSQL has schemas v1, v2, v3 | ||
| case COCKROACHDB -> 1; // CockroachDB currently has only schema v1 | ||
| case H2 -> 3; // H2 uses same schemas as PostgreSQL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to my comment on the schema file: I believe all "latest" JDBC-based schemas should have the same version and evolve in sync. This is because from the caller side (Polaris JDBC persistence code), the "interface" of all JDBC schemas should be the same (column names and type compatibility).
Maintaining different column names and non-trivial type conversions separately for all JDBC backends is going to be too much overhead and will complicate Polaris code unnecessarily.
If the schemas can be assumed to be compatible from the client's (Polaris) perspective, using the same version number will only be natural. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to think this through more, i went ahead with the current one with the following in the mind :
#3352 (comment)
| "quarkus.datasource.active", | ||
| "true", | ||
| "quarkus.datasource.db-kind", | ||
| "postgresql"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be cockroachdb? or quarkus.datasource.db-kind simplifies all postgresql-like ones as postgresql.
| "postgresql"); | |
| "cockroachdb"); |
About the change
Add support of cockroach DB in relational-jdbc persistence
There was pr long ago :
seems like the author got busy, all pending was to testing it E2E with cockroach with all the integ tests along with making it work with schema evolution logic. also marking them as co-author for their contribution.
Happy to close it if they wanna resume their original pr
co-author : @sathesuraj
Checklist
CHANGELOG.md(if needed) -- will add latersite/content/in-dev/unreleased(if needed) - will add later