Skip to content

Conversation

@github-actions
Copy link
Contributor

@github-actions github-actions bot commented Dec 4, 2025

Cherry-picked from #57198

…nsistent tablet replica information in one txn (#57198)

Problem Summary:

During the load of a auto partition table, we will meet this case:

let rpc(2025-10-21) is `create partition rpc (need create partition :
"2025-10-21" )`
 
                         
sink1
-----------------------------rpc(2025-10-21)------------------------------------------>
FE
sink2
-----------------------------rpc(2025-10-21)------------------------------------------>
FE

The FE's processing of these two RPCs is not atomic within a single
import transaction, so we need to ensure the idempotence of this RPC.

But, if the FE processes the RPC from sink1 first, creating partition
"2025-10-21", and then creates several tablets under it while selecting
BEs for them. Assuming tablet1's replica is assigned to [BE-1].

----> This time, a tablet migration occurs (for load balancing), causing
tablet1 move from [BE1] to [BE2]. When the FE processes the second RPC,
it returns the post-migration information, resulting in inconsistent
tablet1 replica distribution information observed by the two RPC
initiators.

This indicates that the createPartition RPC is not idempotent.

Fix this case:
When the first RPC creates a partition and selects BE1 as the replica
for tablet1, we record this information on a per-transaction basis. In
the next RPC, we can simply return the recorded information.
@github-actions github-actions bot requested a review from morrySnow as a code owner December 4, 2025 06:45
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@dataroaring dataroaring closed this Dec 4, 2025
@dataroaring dataroaring reopened this Dec 4, 2025
@hello-stephen
Copy link
Contributor

run buildall

@hello-stephen
Copy link
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 81.99% (1266/1544)
Line Coverage 65.91% (22722/34473)
Region Coverage 67.34% (11318/16807)
Branch Coverage 57.02% (5981/10490)

@0AyanamiRei
Copy link
Contributor

run buildall

@0AyanamiRei
Copy link
Contributor

rerun buildall

@hello-stephen
Copy link
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 81.99% (1266/1544)
Line Coverage 65.94% (22730/34473)
Region Coverage 67.37% (11323/16807)
Branch Coverage 57.09% (5989/10490)

@morrySnow
Copy link
Contributor

compile failed

@morrySnow morrySnow closed this Dec 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants