Skip to content

Commit 35331a2

Browse files
committed
merge: reduce prevalence of nested conflicts with diff3
This is a pretty small code change, but one that perhaps deserves a lengthy explanation... When merging, it is possible to have nested conflicts. This most frequently happens when using merge.conflictStyle=diff3 (or zdiff3) and doing so in a case where there is more than one merge base. For example: L1---L2 / \ / \ B X ? \ / \ / R1---R2 Here on branches L and R there are many commits omitted, but L1 and R1 are both valid merge bases for a merge of L2 and R2. This reason we end up with two valid merge bases is because we have both a merge from L into R and a merge from R into L (each merge occurring before or at L2 and R2, respectively). When merging L2 and R2 using the diff3 conflict style, today you might get a conflict of the form: Non-conflicting leading content <<<<<<< e11e11e1 (First line of commit message of L2) L2:conflicting region ||||||| merged common ancestors <<<<<<<<< Temporary merge branch 1 L1:conflicting region ||||||||| ba5eba11 M:conflicting region ========= R1:conflicting region ======= R2:conflicting region >>>>>>> 52525252 (First line of commit message of R2) Non-conflicting trailing content where "COMMIT: conflicting region" above stands for several (or even hundreds) of lines of content from the (relevant file of) the relevant commit. You could get another layer of nesting here, if you found that there was more than one merge base of the merge bases. In fact, the number of layers of nesting is not limited. In effect, the higher the depth of recursion needed for merging, the more the "base" version in the diff3 output expands. Reports over the years suggest the presence of nested conflicts diminish the value of having the base version available; the greater the nesting (and perhaps also the longer the length of each region of lines when there is a nested conflict), the more diminished the value is. In fact, it might be preferable for these particular conflicts to have used merge.conflictStyle=merge instead, i.e. to provide 0 context lines for the base version, while still using merge.conflictStyle=diff3 for other cases that don't have conflicts between the merge bases. However, there is an alternative way to handle the recursive merges that would approximate merge.conflictStyle=merge as the number of nesting levels increases: resolve the merge of merge bases not by using a conflicted merge of the two merge bases, but by using their base version. This alternative strategy works because we have some latitude in how the virtual merge base is selected. Using the base version of the merge bases is something we have done before in specific contexts, and in each case doing so fixed actual bugs. For more details, see: 816147e (merge-recursive: add a bunch of FIXME comments documenting known bugs, 2021-03-20) -- particularly the cases where resolution for merge bases are wrong 4ef88fc (merge-ort: add handling for different types of files at same path, 2021-01-01) c73cda7 (merge-ort: copy and adapt merge_submodule() from merge-recursive.c, 2021-01-01) 62fdec1 (merge-ort: flesh out implementation of handle_content_merge(), 2021-01-01) ec61d14 (merge-recursive: Fix modify/delete resolution in the recursive case, 2011-08-11) a129d96 (Allow specifying specialized merge-backend per path., 2007-04-16) -- particularly the "common ancestor" comment and associated code If this explanation feels like "magic" to you, there's an alternative rules-based approach by which we can evaluate the choice of how to create a virtual merge base. We want any virtual merge base to follow these rules: Rule 1) If within a certain range of lines, all merge bases match each other, then use those lines from any of them in the virtual merge base. Rule 2) If within a certain range of lines, there is at most one version of those lines that does not match the merge base of the merge bases, then use that unique version of those lines in the virtual merge base. Rule 3) In lines of the file that disagree between two or more merge bases (and which also disagree with the base of the merge bases), fill those lines in the virtual merge base with something that matches none of the merge bases. The first two rules simply let us resolve cases that are clearly unambiguous. The third rule may look funny but is necessary to avoid the virtual merge base accidentally matching one of the two sides in the outer merge. (If the virtual merge base matches one of the two sides in the outer merge, the merge machinery will think that side of the outer merge made no change and thus that there is no conflict in the outer merge, despite the fact that the two sides of the outer merge may disagree with each other.) If we are using merge.conflictStyle=merge, then these three rules are sufficient; anything else we do will be irrelevant to the end result. In that case, we could even satisfy rule 3 by ignoring the conflicting lines and replacing them with totally random lines. However, for merge.conflictStyle=diff3, we want something that looks more like a "base version" of the relevant file. That gives us a goal for the virtual merge base: Goal 4) In lines of the file falling under rule 3, try to pick something that looks like a base version. For Goal 4, both merging the conflicted portions of the merge bases and taking the base of the merge bases satisfy this goal. Both have their plusses and minuses. But both become less and less useful when there is a deeply nested recursive merge. For a deeply nested recursive merge, the conflicted contents gives a highly nested conflict showing every version of the file going back to the eventual common point in history. In contrast, the "base of the merge bases" strategy instead only gives the single version of the file from that final common point in history. Since codebases tend to grow over time, odds are that the more deeply recursive the merge has to go, the smaller the context that will be provided with the "base of the merge bases" strategy. In the limit, the original version of the lines far enough back in history may have been empty, so the "base of the merge bases" strategy effectively makes recursive merges look like merge.conflictStyle=merge for deep recursions, while still providing some "base version" context for more shallow recursions. As noted near the beginning of this commit message, having something that approaches no context in the special cases of deep recursions is exactly what we'd prefer. So: Goal 5) In lines of the file falling under Rule 3, the more deep the recursion is, the less likely relevant context can be kept; prefer small (or even empty) context regions over very complicated ones. This all sounds great, but there is one gotcha -- since we iteratively merge the merge-bases pairwise, we don't have an easy way to distinguish between Rule 2 and Rule 3 at times. For example, if we have three merge bases and they all disagree on some line, the conflicted-content solution avoids an ambiguity, but taking the "base of the merge bases" introduces one. In particular, for this case of three merge bases that disagree on some line, merging the first two merge bases yields an interim virtual merge base that matches the base, making it look like the virtual merge base has not been modified relative to its base. Then when we merge the third merge base with the interim merge base, we'd think it cleanly resolved to that line from the third merge base, against our wishes. Since all three merge bases differed on that line, we'd want to use the base of the merge bases, but the pairwise merging made that difficult. To avoid this problem, only use the "base of the merge bases" strategy when we have two merge bases. That will limit where this new virtual merge base strategy will help us, but since two merge bases is the most common case for recursive merges, it should still provide significant benefit. Signed-off-by: Elijah Newren <[email protected]>
1 parent 47291da commit 35331a2

File tree

3 files changed

+49
-36
lines changed

3 files changed

+49
-36
lines changed

merge-ll.h

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -61,9 +61,9 @@ struct ll_merge_options {
6161
* Resolve local conflicts automatically in favor of one side or the other
6262
* (as in 'git merge-file' `--ours`/`--theirs`/`--union`). Can be `0`,
6363
* `XDL_MERGE_FAVOR_OURS`, `XDL_MERGE_FAVOR_THEIRS`,
64-
* or `XDL_MERGE_FAVOR_UNION`.
64+
* or `XDL_MERGE_FAVOR_UNION`, `XDL_MERGE_FAVOR_BASE`.
6565
*/
66-
unsigned variant : 2;
66+
unsigned variant : 3;
6767

6868
/**
6969
* Resmudge and clean the "base", "theirs" and "ours" files before merging.

merge-ort.c

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -401,6 +401,9 @@ struct merge_options_internal {
401401
/* call_depth: recursion level counter for merging merge bases */
402402
int call_depth;
403403

404+
/* vmb_favor: preferred resolution variant for virtual merge base */
405+
unsigned vmb_favor;
406+
404407
/* field that holds submodule conflict information */
405408
struct string_list conflicted_submodules;
406409
};
@@ -2070,7 +2073,7 @@ static int merge_3way(struct merge_options *opt,
20702073

20712074
if (opt->priv->call_depth) {
20722075
ll_opts.virtual_ancestor = 1;
2073-
ll_opts.variant = 0;
2076+
ll_opts.variant = opt->priv->vmb_favor;
20742077
} else {
20752078
switch (opt->recursive_variant) {
20762079
case MERGE_VARIANT_OURS:
@@ -5186,6 +5189,17 @@ static void merge_ort_internal(struct merge_options *opt,
51865189
ancestor_name = "empty tree";
51875190
} else if (merge_bases) {
51885191
ancestor_name = "merged common ancestors";
5192+
/*
5193+
* If there were more than two virtual merge bases, we just
5194+
* fall back to our virtual merge base having conflict markers
5195+
* between versions. But if there are only two, then we can
5196+
* resolve conflicts by taking the version from the merge
5197+
* base of our merge bases.
5198+
*/
5199+
if (merge_bases->next)
5200+
opt->priv->vmb_favor = 0;
5201+
else
5202+
opt->priv->vmb_favor = XDL_MERGE_FAVOR_BASE;
51895203
} else {
51905204
strbuf_add_unique_abbrev(&merge_base_abbrev,
51915205
&merged_merge_bases->object.oid,

t/t6416-recursive-corner-cases.sh

Lines changed: 32 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -213,13 +213,12 @@ test_expect_success 'git detects differently handled merges conflict' '
213213
git cat-file -p C:new_a >ours &&
214214
git cat-file -p C:a >theirs &&
215215
>empty &&
216-
test_must_fail git merge-file \
216+
git merge-file --base \
217217
-L "Temporary merge branch 1" \
218218
-L "" \
219219
-L "Temporary merge branch 2" \
220220
ours empty theirs &&
221-
sed -e "s/^\([<=>]\)/\1\1\1/" ours >ours-tweaked &&
222-
git hash-object ours-tweaked >expect &&
221+
git hash-object ours >expect &&
223222
git rev-parse >>expect \
224223
D:new_a E:new_a &&
225224
git rev-parse >actual \
@@ -275,13 +274,12 @@ test_expect_success 'git detects differently handled merges conflict, swapped' '
275274
git cat-file -p C:a >ours &&
276275
git cat-file -p C:new_a >theirs &&
277276
>empty &&
278-
test_must_fail git merge-file \
277+
git merge-file --base \
279278
-L "Temporary merge branch 1" \
280279
-L "" \
281280
-L "Temporary merge branch 2" \
282281
ours empty theirs &&
283-
sed -e "s/^\([<=>]\)/\1\1\1/" ours >ours-tweaked &&
284-
git hash-object ours-tweaked >expect &&
282+
git hash-object ours >expect &&
285283
git rev-parse >>expect \
286284
D:new_a E:new_a &&
287285
git rev-parse >actual \
@@ -1564,11 +1562,13 @@ test_expect_failure 'check conflicting modes for regular file' '
15641562
# - R2 renames 'a' to 'm'
15651563
#
15661564
# In the end, in file 'm' we have four different conflicting files (from
1567-
# two versions of 'b' and two of 'a'). In addition, if
1568-
# merge.conflictstyle is diff3, then the base version also has
1569-
# conflict markers of its own, leading to a total of three levels of
1570-
# conflict markers. This is a pretty weird corner case, but we just want
1571-
# to ensure that we handle it as well as practical.
1565+
# two versions of 'b' and two of 'a'). In addition, if we didn't use
1566+
# XDL_MERGE_FAVOR_BASE and merge.conflictstyle is diff3, then the base
1567+
# version would also have conflict markers of its own, leading to a total of
1568+
# three levels of conflict markers. However, we do use XDL_MERGE_FAVOR_BASE
1569+
# for recursive merges, so this should keep conflict nestings to two. This
1570+
# is a pretty weird corner case, but we just want to ensure that we handle
1571+
# it as well as practical.
15721572

15731573
test_expect_success 'setup nested conflicts' '
15741574
git init nested_conflicts &&
@@ -1646,7 +1646,7 @@ test_expect_success 'setup nested conflicts' '
16461646
)
16471647
'
16481648

1649-
test_expect_success 'check nested conflicts' '
1649+
test_expect_success 'check nested merges without nested conflicts' '
16501650
(
16511651
cd nested_conflicts &&
16521652
@@ -1670,26 +1670,28 @@ test_expect_success 'check nested conflicts' '
16701670
git cat-file -p main:a >base &&
16711671
git cat-file -p L1:a >ours &&
16721672
git cat-file -p R1:a >theirs &&
1673-
test_must_fail git merge-file --diff3 \
1673+
git merge-file --base \
16741674
-L "Temporary merge branch 1" \
16751675
-L "$MAIN" \
16761676
-L "Temporary merge branch 2" \
16771677
ours \
16781678
base \
16791679
theirs &&
1680-
sed -e "s/^\([<|=>]\)/\1\1/" ours >vmb_a &&
1680+
mv ours vmb_a &&
1681+
test_cmp base vmb_a &&
16811682
16821683
git cat-file -p main:b >base &&
16831684
git cat-file -p L1:b >ours &&
16841685
git cat-file -p R1:b >theirs &&
1685-
test_must_fail git merge-file --diff3 \
1686+
git merge-file --base \
16861687
-L "Temporary merge branch 1" \
16871688
-L "$MAIN" \
16881689
-L "Temporary merge branch 2" \
16891690
ours \
16901691
base \
16911692
theirs &&
1692-
sed -e "s/^\([<|=>]\)/\1\1/" ours >vmb_b &&
1693+
mv ours vmb_b &&
1694+
test_cmp base vmb_b &&
16931695
16941696
# Compare :2:m to expected values
16951697
git cat-file -p L2:m >ours &&
@@ -1749,14 +1751,15 @@ test_expect_success 'check nested conflicts' '
17491751
# L<n> and R<n> resolve the conflicts differently.
17501752
#
17511753
# X<n> is an auto-generated merge-base used when merging L<n+1> and R<n+1>.
1752-
# By construction, X1 has conflict markers due to conflicting versions.
1753-
# X2, due to using merge.conflictstyle=3, has nested conflict markers.
1754+
# By construction, X1 has two handle conflicting versions. X2 is both
1755+
# based on a merge base that had to ahndle conflicting versions, and is
1756+
# trying to resolve conflicting versions between L2 and R2.
17541757
#
1755-
# So, merging R3 into L3 using merge.conflictstyle=3 should show the
1756-
# nested conflict markers from X2 in the base version -- that means we
1757-
# have three levels of conflict markers. Can we distinguish all three?
1758+
# So, merging R3 into L3 using merge.conflictstyle=3 has a merge base where
1759+
# conflicts had to dealth with multiple levels back. Do we get a reasonable
1760+
# diff3 output for it?
17581761

1759-
test_expect_success 'setup virtual merge base with nested conflicts' '
1762+
test_expect_success 'setup virtual merge base where multiple levels back had conflicts' '
17601763
git init virtual_merge_base_has_nested_conflicts &&
17611764
(
17621765
cd virtual_merge_base_has_nested_conflicts &&
@@ -1815,7 +1818,7 @@ test_expect_success 'setup virtual merge base with nested conflicts' '
18151818
)
18161819
'
18171820

1818-
test_expect_success 'check virtual merge base with nested conflicts' '
1821+
test_expect_success 'check virtual merge base where multiple levels back had conflicts' '
18191822
(
18201823
cd virtual_merge_base_has_nested_conflicts &&
18211824
@@ -1839,34 +1842,30 @@ test_expect_success 'check virtual merge base with nested conflicts' '
18391842
git rev-parse :2:content :3:content >actual &&
18401843
test_cmp expect actual &&
18411844
1842-
# Imitate X1 merge base, except without long enough conflict
1843-
# markers because a subsequent sed will modify them. Put
1844-
# result into vmb.
1845+
# Imitate X1 merge base, Put result into vmb.
18451846
git cat-file -p main:content >base &&
18461847
git cat-file -p L:content >left &&
18471848
git cat-file -p R:content >right &&
18481849
cp left merged-once &&
1849-
test_must_fail git merge-file --diff3 \
1850+
git merge-file --base \
18501851
-L "Temporary merge branch 1" \
18511852
-L "$MAIN" \
18521853
-L "Temporary merge branch 2" \
18531854
merged-once \
18541855
base \
18551856
right &&
1856-
sed -e "s/^\([<|=>]\)/\1\1\1/" merged-once >vmb &&
1857+
mv merged-once vmb &&
18571858
1858-
# Imitate X2 merge base, overwriting vmb. Note that we
1859-
# extend both sets of conflict markers to make them longer
1860-
# with the sed command.
1859+
# Imitate X2 merge base, overwriting vmb.
18611860
cp left merged-twice &&
1862-
test_must_fail git merge-file --diff3 \
1861+
git merge-file --base \
18631862
-L "Temporary merge branch 1" \
18641863
-L "merged common ancestors" \
18651864
-L "Temporary merge branch 2" \
18661865
merged-twice \
18671866
vmb \
18681867
right &&
1869-
sed -e "s/^\([<|=>]\)/\1\1\1/" merged-twice >vmb &&
1868+
mv merged-twice vmb &&
18701869
18711870
# Compare :1:content to expected value
18721871
git cat-file -p :1:content >actual &&

0 commit comments

Comments
 (0)