Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom transformation for Cassandra to Spanner SourceDB #2201

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

taherkl
Copy link
Contributor

@taherkl taherkl commented Feb 18, 2025

This pull request introduces the following key enhancements:

  1. Enhanced Null Handling:
  • Improved handling of null values for all column types, including String columns, ensuring consistent data integrity and preventing unintended null conversions.
  • This enhancement covers all possible column data types, providing a more robust and error-free data transformation process.
  1. Custom Transformation Logic:
  • Added support for custom transformation logic specifically for Spanner to Cassandra Reverse migrations.
  • This includes the ability to handle complex data type mappings and custom transformations needed for seamless data synchronization between Cassandra and Spanner.

* sync upstream/main (#98)

* Custom transformation fixes

* Added Custom Transformation

* Added Custom Transformation

* Added Fixes

* Address null to all columns

* Added Null Assert

* Added Timeout fixes

* Added Spotless fixes

* reverse merge the main

* Added Custom Fixes

* Added Drop Keys

---------
Move null test case into previous tests
taherkl and others added 2 commits February 18, 2025 13:16
Referesh `SpannerToSourceDbCustomTransformationIT` tables for re-runs…
@taherkl taherkl marked this pull request as ready for review February 18, 2025 12:35
@taherkl taherkl requested a review from a team as a code owner February 18, 2025 12:35
@@ -78,6 +79,7 @@ public class SpannerToCassandraSourceDbIT extends SpannerToSourceDbITBase {
private static final String USER_TABLE = "Users";
private static final String USER_TABLE_2 = "Users2";
private static final String ALL_DATA_TYPES_TABLE = "AllDatatypeColumns";
private static final String ALL_DATA_TYPES_TABLE_FOR_NULL_KEY = "AllDataTypeColumnsForNullKey";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not required?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shreyakhajanchi went unknowingly, I have removed the same will able to reflect soon

Mutation.newInsertOrUpdateBuilder(ALL_DATA_TYPES_TABLE)
.set("varchar_column")
.to("SampleVarcharForNull") // Only this column has a value
.set("tinyint_column")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why set null for others, we can simply skip setting other columns and it will be NULL automatically, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shreyakhajanchi as per our discussion we are going to add another case where we are going to add only value for the primary key alone

@@ -538,8 +631,8 @@ private void assertAllDataTypeRowsInCassandraDB()
PipelineOperator.Result result =
pipelineOperator()
.waitForCondition(
createConfig(jobInfo, Duration.ofMinutes(10)),
() -> getRowCount(ALL_DATA_TYPES_TABLE) == 1);
createConfig(jobInfo, Duration.ofMinutes(20)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it taking more than 10 min?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No @shreyakhajanchi Will reduce to 10

Comment on lines 5 to 6
"first_name" text,
"last_name" text
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are these two columns in quotes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we can remove

DROP TABLE IF EXISTS customers;
CREATE TABLE IF NOT EXISTS customers (
id INT64 NOT NULL,
full_name STRING(125),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

full_name column not required in spanner schema?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes @shreyakhajanchi as we are trying to populate full name in Cassandra based on FirstName and lastName

@@ -210,7 +212,8 @@ private void assertDeleteRowInCassandraDB() throws InterruptedException {
PipelineOperator.Result result =
pipelineOperator()
.waitForCondition(
createConfig(jobInfo, Duration.ofMinutes(10)), () -> getRowCount(USER_TABLE) == 0);
createConfig(jobInfo, Duration.ofMinutes(10)),
() -> getRowCount(USER_TABLE_2) == 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this should still be USER_TABLE and and the table in below should change because the writes to users2 is with txn tag and is filtered:

private void writeDeleteInSpanner() {

    Mutation insertOrUpdateMutation =
        Mutation.newInsertOrUpdateBuilder(USER_TABLE_2)
            .set("id")
            .to(4)
            .set("full_name")
            .to("GG")
            .build();
    spannerResourceManager.write(insertOrUpdateMutation);

    KeySet allRows = KeySet.all();
    Mutation deleteAllMutation = Mutation.delete(USER_TABLE_2, allRows);
    spannerResourceManager.write(deleteAllMutation);
  }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shreyakhajanchi Address the same

spannerResourceManager.write(m1);
}

/**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove these elaborate comments, the methods are self explainatory

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 58 to 63
if (request.getEventType().equals("UPDATE")) {
return new MigrationTransformationResponse(null, false);
}
if (request.getEventType().equals("DELETE")) {
return new MigrationTransformationResponse(null, true);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not required

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shreyakhajanchi as per current logic in IT we are populating full name based on first name and last name so I guess it is need to be here

Comment on lines +41 to +47
if (request.getTableName().equals("Customers")) {
Map<String, Object> row = new HashMap<>(request.getRequestRow());
row.put("full_name", row.get("first_name") + " " + row.get("last_name"));
row.put("migration_shard_id", request.getShardId() + "_" + row.get("id"));
MigrationTransformationResponse response = new MigrationTransformationResponse(row, false);
return response;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not required

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shreyakhajanchi as per current logic in IT we are populating full name based on first name and last name so I guess it is need to be here

@@ -0,0 +1,70 @@
/*
* Copyright (C) 2024 Google LLC
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2025

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shreyakhajanchi address the same

Copy link

codecov bot commented Feb 19, 2025

Codecov Report

Attention: Patch coverage is 50.00000% with 26 lines in your changes missing coverage. Please review.

Project coverage is 47.05%. Comparing base (810e553) to head (a8008cf).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
...om/CustomTransformationWithCassandraForLiveIT.java 0.00% 22 Missing ⚠️
...2/templates/dbutils/dml/CassandraDMLGenerator.java 92.30% 0 Missing and 2 partials ⚠️
...v2/templates/dbutils/dml/CassandraTypeHandler.java 50.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2201      +/-   ##
============================================
- Coverage     47.06%   47.05%   -0.01%     
- Complexity     4048     4378     +330     
============================================
  Files           876      877       +1     
  Lines         52223    52270      +47     
  Branches       5505     5515      +10     
============================================
+ Hits          24577    24594      +17     
- Misses        25885    25911      +26     
- Partials       1761     1765       +4     
Components Coverage Δ
spanner-templates 68.83% <50.00%> (-0.08%) ⬇️
spanner-import-export 65.64% <ø> (-0.04%) ⬇️
spanner-live-forward-migration 76.48% <ø> (ø)
spanner-live-reverse-replication 78.49% <50.00%> (-0.31%) ⬇️
spanner-bulk-migration 87.94% <ø> (ø)
Files with missing lines Coverage Δ
...2/templates/dbutils/dml/CassandraDMLGenerator.java 91.15% <92.30%> (+0.75%) ⬆️
...v2/templates/dbutils/dml/CassandraTypeHandler.java 83.58% <50.00%> (-0.77%) ⬇️
...om/CustomTransformationWithCassandraForLiveIT.java 0.00% <0.00%> (ø)

... and 3 files with indirect coverage changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants