Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Exception Handling for Retryable errors #2196

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

taherkl
Copy link
Contributor

@taherkl taherkl commented Feb 17, 2025

This pull request introduces enhanced retry logic in DataFlow by incorporating additional exception handling for improved reliability and fault tolerance. The following exceptions are now included under the retry mechanism:

  • ConnectionInitException: Handles scenarios where the initial connection to a Cassandra node fails.
  • DriverTimeoutException: Retries when the driver times out while awaiting a response, ensuring transient network delays are managed gracefully.
  • AllNodesFailedException: Implements retry logic when all nodes fail to respond, which can occur due to temporary network partitions or high load.
  • BusyConnectionException: Addresses situations where all connections are busy, enhancing request handling during peak loads.
  • NodeUnavailableException: Retries when a specific node is temporarily unavailable, improving availability in distributed environments.
  • QueryExecutionException: Previously categorized as a permanent error, this is now moved to the retry logic. This change is made because QueryExecutionException often occurs due to temporary issues like timeouts or resource contention, even when the query is valid.

Key Changes:

  • Added Retry Logic: Included the above exceptions in the retry mechanism to enhance fault tolerance and reduce transient failures.
  • Updated Error Handling: Moved QueryExecutionException from permanent error classification to retry logic, ensuring valid queries are retried for better consistency.

Copy link

codecov bot commented Feb 19, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 47.06%. Comparing base (7b7081b) to head (8873d7f).
Report is 4 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2196      +/-   ##
============================================
- Coverage     47.06%   47.06%   -0.01%     
+ Complexity     4376     4374       -2     
============================================
  Files           876      876              
  Lines         52218    52218              
  Branches       5505     5505              
============================================
- Hits          24578    24574       -4     
- Misses        25881    25884       +3     
- Partials       1759     1760       +1     
Components Coverage Δ
spanner-templates 68.93% <100.00%> (-0.01%) ⬇️
spanner-import-export 65.72% <ø> (-0.02%) ⬇️
spanner-live-forward-migration 76.54% <ø> (ø)
spanner-live-reverse-replication 78.80% <100.00%> (ø)
spanner-bulk-migration 87.94% <ø> (ø)
Files with missing lines Coverage Δ
...leport/v2/templates/transforms/SourceWriterFn.java 85.21% <100.00%> (ø)

... and 2 files with indirect coverage changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants