[Fix][Connector-doris] DorisStreamLoad loading state mismanagement causes RecordBuffer infinite loop during shutdown #10060

Mrhs121 · 2025-11-13T04:01:43Z

Purpose of this pull request

Does this PR introduce any user-facing change?

How was this patch tested?

Check list

If any new Jar binary package adding in your PR, please add License Notice according
New License Guide
If necessary, please update the documentation to describe the new feature. https://github.com/apache/seatunnel/tree/dev/docs
If you are contributing the connector code, please check that the following files are updated:
1. Update plugin-mapping.properties and add new connector information in it
2. Update the pom file of seatunnel-dist
3. Add ci label in label-scope-conf
4. Add e2e testcase in seatunnel-e2e
5. Update connector plugin_config

davidzollo

Good job.
CI is not successful, you can refer to https://github.com/apache/seatunnel/pull/10060/checks?check_run_id=55259688662.

CI will run a few hours per time ^_^

Can you add an E2E test for enabling 2PC?

davidzollo · 2025-11-13T13:23:24Z

When will the flushing=false state be reset?

Mrhs121 · 2025-11-14T02:12:21Z

Thanks for pointing this out. The flushing flag should be reset to false immediately after the flush operation completest. I'll include the fix for resetting the flushing state along with the new E2E test in the next commit.

When will the flushing=false state be reset?

Thanks for pointing this out. The flushing flag should be reset to false immediately after the flush operation completest. I'll include the fix for resetting the flushing state along with the new E2E test in the next commit.

FYI, I noticed the CI failure was due to the job exceeding the 10-minute timeout limit.

davidzollo · 2025-11-14T05:52:31Z

Thanks for pointing this out. The flushing flag should be reset to false immediately after the flush operation completest. I'll include the fix for resetting the flushing state along with the new E2E test in the next commit.

When will the flushing=false state be reset?

Thanks for pointing this out. The flushing flag should be reset to false immediately after the flush operation completest. I'll include the fix for resetting the flushing state along with the new E2E test in the next commit.

FYI, I noticed the CI failure was due to the job exceeding the 10-minute timeout limit.

Good. you can also help fix CI ^_^
Usually reviewers will review a new PR when CI passed.

By the way, I think we can have a more in-depth communication to help you get familiar with SeaTunnel. Feel free to contact me on LinkedIn (David Zollo) or WeChat (taskflow). When adding me, please let me know your GitHub ID

Mrhs121 · 2025-11-14T16:26:12Z

I have provide a pure test case #10069 to reproduction #10059

Mrhs121 · 2025-11-15T15:09:26Z

...r-doris/src/main/java/org/apache/seatunnel/connectors/doris/sink/writer/DorisSinkWriter.java

+            }
+        } catch (Exception e) {
+            throw new RuntimeException(e);
+        } finally {


In Spark 2.4, the DataWriter interface does not extend Closeable, so when the test case runs on the Spark 2.4 engine and the job fails, theclose()method of the DorisSinkWriter is never invoked. As a result, the threads inside the DorisSinkWriter remain alive and prevent the SeaTunnel job from terminating. Therefore, releasing resources here.
I'm not sure if this is a good fix.

@davidzollo I made some new changes, please help review it when you have time, thank you.

It is not allowed to perform close in abortPrepare, and other connectors do not have such an implementation.
How about implementing Closeable in Spark2.4?

It is impossible to implement Closeable in Spark2.4, We are unable to modify the interface of the execution engine.

The code differences between spark2 and spark3 are as follows.
spark2
https://github.com/apache/spark/blob/4be566062defa249435c4d72eb106fe7b933e023/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala#L146-L153
spark3
https://github.com/apache/spark/blob/2f3e4e36017d16d67086fd4ecaf39636a2fb4b7c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala#L477

spark2
https://github.com/apache/spark/blob/4be566062defa249435c4d72eb106fe7b933e023/sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataWriter.java#L59
spark3
https://github.com/apache/spark/blob/2f3e4e36017d16d67086fd4ecaf39636a2fb4b7c/sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/DataWriter.java#L63

zhangshenghang · 2025-11-17T23:36:44Z

I have provide a pure test case #10069 to reproduction #10059

We can merge #10069 into the current PR to verify that this issue will not occur.

zhangshenghang · 2025-11-17T23:37:23Z

...r-doris/src/main/java/org/apache/seatunnel/connectors/doris/sink/writer/DorisStreamLoad.java


    public RespContent stopLoad() throws IOException {
        loading = false;
+        flushing = true;


Why add a separate "flushing" instead of just using "loading"?

FYI, you can take a look at the description of root cause first #10059 (comment).

As shown in the following code, the error msg in the http response will only be obtained when it is in the loading state, and this loading will be reset to false during flush. Therefore, if the flush action is executed before the http response is returned. This means that errorMessage will always be null, an infinite loop occurs, preventing the seatunnel task from stopping.

seatunnel/seatunnel-connectors-v2/connector-doris/src/main/java/org/apache/seatunnel/connectors/doris/sink/writer/DorisStreamLoad.java

Lines 194 to 210 in de9085d

public String getLoadFailedMsg() {

if (!loading) {

return null;

}

if (this.getPendingLoadFuture() != null && this.getPendingLoadFuture().isDone()) {

String errorMessage;

try {

errorMessage = handlePreCommitResponse(pendingLoadFuture.get()).getMessage();

} catch (Exception e) {

errorMessage = ExceptionUtils.getMessage(e);

}

recordStream.setErrorMessageByStreamLoad(errorMessage);

return errorMessage;

} else {

return null;

}

}

Another way to fix it is to place the action of resetting the loading to false after the endInput, that is, the loading is only considered to have ended after the streaming is truly closed

I approve of your second plan.

I approve of your second plan.

Done

I approve of your second plan.

@zhangshenghang I made some new changes, please help review it when you have time, thank you.

Mrhs121 · 2025-11-21T09:49:41Z

@zhangshenghang I made some new changes, please help review it when you have time, thank you.

Copilot

Pull request overview

This PR fixes a critical issue where DorisStreamLoad's loading state mismanagement caused RecordBuffer to enter an infinite loop during shutdown, particularly when Doris returns parsing errors (e.g., ANALYSIS_ERROR). The fix ensures proper cleanup and state management across multiple components.

Key Changes

Moved the loading flag update to a finally block in DorisStreamLoad to ensure consistent state even when exceptions occur
Added try-catch-finally blocks in DorisSinkWriter.close() and SparkDataWriter.abort() to guarantee resource cleanup
Moved sinkWriter.close() from commit() to close() in Spark 3.3 DataWriter for proper lifecycle management
Added E2E tests to verify graceful failure handling when Doris returns cast errors

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
DorisStreamLoad.java	Moved `loading = false` to finally block to prevent infinite loops in RecordBuffer when exceptions occur during stopLoad()
DorisSinkWriter.java	Added try-catch-finally to ensure scheduledExecutorService and dorisStreamLoad are closed even if flush() fails
SeaTunnelSparkDataWriter.java	Moved sinkWriter.close() and WriterCloseEvent from commit() to close() method for proper resource lifecycle
SparkDataWriter.java	Enhanced abort() with try-catch-finally to ensure sinkWriter.close() is called even when abort operations fail
doris_source_and_sink_with_cast_error.conf	Test configuration for cast error scenario with 2PC disabled
doris_source_and_sink_with_cast_error_2pc_true.conf	Test configuration for cast error scenario with 2PC enabled
DorisIT.java	Added testDorisCastError() to verify graceful failure when Doris returns type cast errors, and createTypeCastErrorSinkTableForTest() to create incompatible schema

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-05T04:23:39Z

...r-doris/src/main/java/org/apache/seatunnel/connectors/doris/sink/writer/DorisSinkWriter.java

+                flush();
+            }
+        } catch (Exception e) {
+            log.error("Flush data failed when close doris writer.", e);


Grammar error in log message. Should be "when closing" instead of "when close".

Suggested change

log.error("Flush data failed when close doris writer.", e);

log.error("Flush data failed when closing doris writer.", e);

...-e2e/connector-doris-e2e/src/test/java/org/apache/seatunnel/e2e/connector/doris/DorisIT.java

...rk-2.4/src/main/java/org/apache/seatunnel/translation/spark/sink/writer/SparkDataWriter.java

zhangshenghang · 2025-12-05T11:12:42Z

...rc/main/java/org/apache/seatunnel/translation/spark/sink/write/SeaTunnelSparkDataWriter.java

-        sinkWriter.close();
-        context.getEventListener().onEvent(new WriterCloseEvent());


@Hisoka-X Will there be a problem? Why was it closed here before?

It looks like this snippet was copied from the Spark 2 template and the author missed the subtle difference: DataWriter interface in Spark 2 doesn’t implement Closeable, so close it manually here.，(￢_￢)

Mrhs121 · 2025-12-10T12:09:36Z

@zhangshenghang I made some new changes, please help review it when you have time, thank you.

…uses RecordBuffer infinite loop during shutdown

github-actions bot added connectors-v2 doris labels Nov 13, 2025

davidzollo reviewed Nov 13, 2025

View reviewed changes

github-actions bot added the e2e label Nov 15, 2025

Mrhs121 commented Nov 15, 2025

View reviewed changes

zhangshenghang reviewed Nov 17, 2025

View reviewed changes

Mrhs121 force-pushed the fix-infinite-loop branch from 744e03e to ab4b85d Compare November 18, 2025 13:57

zhangshenghang mentioned this pull request Nov 19, 2025

[Bugfix][Connector-V2] Doris sink check load error before stopLoad to interrupt blocking poll() in RecordBuffer #10083

Closed

4 tasks

github-actions bot added the Spark label Nov 20, 2025

nielifeng requested a review from Copilot December 5, 2025 04:13

Copilot started reviewing on behalf of nielifeng December 5, 2025 04:14 View session

Copilot finished reviewing on behalf of nielifeng December 5, 2025 04:17

Copilot AI reviewed Dec 5, 2025

View reviewed changes

zhangshenghang reviewed Dec 5, 2025

View reviewed changes

...rk-2.4/src/main/java/org/apache/seatunnel/translation/spark/sink/writer/SparkDataWriter.java Outdated Show resolved Hide resolved

zhangshenghang reviewed Dec 5, 2025

View reviewed changes

Mrhs121 force-pushed the fix-infinite-loop branch from 4dc6c62 to 1cc4a42 Compare December 10, 2025 03:12

Mrhs121 added 7 commits December 14, 2025 00:09

[Fix][Connector-doris] DorisStreamLoad loading state mismanagement ca…

9a890ee

…uses RecordBuffer infinite loop during shutdown

Add e2e test case

be26c52

add e2e test case

01bd8ef

Compatible with spark2

d805d3d

opt code

af54d6d

close doris sink writer after spark data writer aborted

a955891

opt sink close

042536a

[Fix][Connector-doris] fix typo

ce235e3

Mrhs121 force-pushed the fix-infinite-loop branch from 1cc4a42 to ce235e3 Compare December 13, 2025 16:11

	public String getLoadFailedMsg() {
	if (!loading) {
	return null;
	}
	if (this.getPendingLoadFuture() != null && this.getPendingLoadFuture().isDone()) {
	String errorMessage;
	try {
	errorMessage = handlePreCommitResponse(pendingLoadFuture.get()).getMessage();
	} catch (Exception e) {
	errorMessage = ExceptionUtils.getMessage(e);
	}
	recordStream.setErrorMessageByStreamLoad(errorMessage);
	return errorMessage;
	} else {
	return null;
	}
	}

	log.error("Flush data failed when close doris writer.", e);
	log.error("Flush data failed when closing doris writer.", e);

		sinkWriter.close();
		context.getEventListener().onEvent(new WriterCloseEvent());

[Fix][Connector-doris] DorisStreamLoad loading state mismanagement causes RecordBuffer infinite loop during shutdown #10060

Are you sure you want to change the base?

[Fix][Connector-doris] DorisStreamLoad loading state mismanagement causes RecordBuffer infinite loop during shutdown #10060

Conversation

Mrhs121 commented Nov 13, 2025

Purpose of this pull request

Does this PR introduce any user-facing change?

How was this patch tested?

Check list

Uh oh!

davidzollo left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidzollo commented Nov 13, 2025

Uh oh!

Mrhs121 commented Nov 14, 2025

Uh oh!

davidzollo commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Mrhs121 commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Mrhs121 Nov 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhangshenghang commented Nov 17, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Mrhs121 Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Mrhs121 commented Nov 21, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes

Reviewed changes

Uh oh!

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Mrhs121 commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

davidzollo left a comment •

edited

Loading

davidzollo commented Nov 14, 2025 •

edited

Loading

Mrhs121 commented Nov 14, 2025 •

edited

Loading

Mrhs121 Nov 15, 2025 •

edited

Loading

Mrhs121 Nov 18, 2025 •

edited

Loading