Skip to content

Commit 3f2ac8c

Browse files
[#24692, #24784] YSQL: Refactor INSERT ... ON CONFLICT batching - Part 1 + Bug Fixes
Summary: D38354 introduced batching support for INSERT ... ON CONFLICT queries. This revision seeks to address a subset of the design comments on that revision. In particular this refactor aims to: - Better integrate with/reuse existing YBCTID libraries for identifying unique index tuples. - Fix inconsistent results produced by different batch sizes. - Fix visibility issues observed in queries with a pipeline pattern (multiple CTEs, each doing a ModifyTable, each with a returning clause, feeding into the next) **Summary of changes** - Reuse existing YBCTID container infrastructure in pggate for storing tuple ID of “on-conflict” tuples. - Use separate containers for storing batch-read tuples and “just-inserted” tuples. - Move logic to update map of “just-inserted” tuples from execIndexing (at the time of enqueueing the index request) to nodeModifyTable **Reusing existing YBCTID container infrastructure** This revision reuses the YBCTID infrastructure that is used by the foreign key and explicit row locking buffers that already exists today. To do this, the key columns of unique (and non-NULL) indexes are treated as YBCTIDs. A session-level container is created to store the mapping between the IDs of tuples read by the index scan and their corresponding table slots. This design allows us to: - Reuse existing YBCTID helper functions for equality comparisons. - Mitigate tuple visibility issues through the creation of the global map when multiple INSERT … ON CONFLICT CTEs are chained together. - Inherit benefits from improvements to YBCTID processing logic in the future. **Use of separate containers to store batch-read tuples and “just-inserted” tuples** The index batch-read workflow requires mapping key columns of a unique index to the tuple returned by the index scan. To check if the same row is modified twice, identifiers (ie. key columns) to index tuples are required to be stored. While the data stored is similar (hash of the key columns of the index), the data can be stored in two separate structures: a **hash map** for the index batch-read, and a **hash set** for the list of modified tuples. This design allows us to: - Separate the scaling of the two data structures. If no restriction is placed on the size of the hash set, we can detect "command cannot affect row a second time" errors across batches of writes. - Remain consistent with the design of other buffers such as the foreign key buffer. The foreign key buffer currently has no restriction on how many YBCTID intents are placed within its hash set. - Cleanly remove the **hash set** when DocDB natively supports detection of tuples that are modified multiple times by the same query. **Rework logic of when entries are added/deleted to on-conflict map** The previous design adds and deletes entries to/from the on-conflict map when the payload for the index insert/delete operation is prepared (in ExecIndexing.c). This suffers from the following drawbacks: - Index updates (DELETEs + INSERTs) may be skipped. If the update of an arbiter index is skipped, it is incorrectly not added/deleted from the on-conflict map. - Primary key and secondary index modifications (insert/delete/update) have two distinct paths. This means that the previous design would have had to replicate this logic in both code paths, which it did not. This duplication is not required in the current design. This revision reworks the above design to add/delete entries to the map in two different places (in nodeModifyTable.c): - During on-conflict checking IF it is decided if the tuple will be inserted. Note that at this stage, the DO UPDATE projections have not been carried out (so we don’t know whether the updated values change the index keys). - After the updated values have been projected if it is decided that the tuple will be updated. This design currently has the drawback that the index tuple will have to be constructed one extra time (`FormIndexDatum()`) per tuple for tuples that are to be updated. This is particularly significant for expression indexes as it would involve evaluating the expression an extra time. This can be fixed for non-expression indexes by directly copying over the index key columns from the tuple slot. However, the fix for expression indexes requires storing and propagating the index tuple which is a large change relative to the benefit it offers. **Future Work** A second revision is planned that will: - Add support for RETURNING clause - Add limited support for foreign key relationships - Fix bugs related to partitioned tables Jira: DB-13766, DB-13883 Test Plan: On Almalinux 8: ``` #!/usr/bin/env bash set -euo pipefail ./yb_build.sh fastdebug --gcc11 find java/yb-pgsql/src/test/java/org/yb/pgsql -name 'TestPgRegressInsertOnConflict*' \ | grep -oE 'TestPgRegress\w+' \ | while read -r testname; do ./yb_build.sh fastdebug --gcc11 --java-test "$testname" --sj done ``` Reviewers: amartsinchyk, smishra, jason, telgersma Reviewed By: jason, telgersma Subscribers: jason, smishra, yql Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D39023
1 parent 841f7bf commit 3f2ac8c

23 files changed

+1381
-510
lines changed

src/postgres/src/backend/executor/Makefile

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,6 @@ OBJS = \
1717
nodeYbBitmapIndexscan.o \
1818
nodeYbBitmapTablescan.o \
1919
nodeYbSeqscan.o \
20-
ybInsertOnConflictBatchingMap.o \
2120
ybOptimizeModifyTable.o \
2221
ybcExpr.o \
2322
ybcFunction.o \

src/postgres/src/backend/executor/execIndexing.c

Lines changed: 80 additions & 87 deletions
Original file line numberDiff line numberDiff line change
@@ -118,7 +118,6 @@
118118

119119
/* Yugabyte includes */
120120
#include "catalog/pg_am_d.h"
121-
#include "executor/ybInsertOnConflictBatchingMap.h"
122121
#include "executor/ybcModifyTable.h"
123122
#include "funcapi.h"
124123
#include "utils/relcache.h"
@@ -286,8 +285,7 @@ YbExecDoInsertIndexTuple(ResultRelInfo *resultRelInfo,
286285
bool *specConflict,
287286
List *arbiterIndexes,
288287
bool update,
289-
ItemPointer tupleid,
290-
struct yb_insert_on_conflict_batching_hash *ybConflictMap)
288+
ItemPointer tupleid)
291289
{
292290
bool applyNoDupErr;
293291
IndexUniqueCheck checkUnique;
@@ -312,18 +310,6 @@ YbExecDoInsertIndexTuple(ResultRelInfo *resultRelInfo,
312310
values,
313311
isnull);
314312

315-
if (ybConflictMap)
316-
{
317-
int indnkeyatts =
318-
IndexRelationGetNumberOfKeyAttributes(indexRelation);
319-
320-
YbInsertOnConflictBatchingMapInsert(ybConflictMap,
321-
indnkeyatts,
322-
values,
323-
isnull,
324-
NULL /* slot */);
325-
}
326-
327313
/*
328314
* After updating INSERT ON CONFLICT batching map, PK is no longer
329315
* relevant from here on.
@@ -549,8 +535,7 @@ ExecInsertIndexTuples(ResultRelInfo *resultRelInfo,
549535
* TODO(neil) The following YB check might not be needed due to later work on indexes.
550536
* We keep this check for now as this bugfix will be backported to ealier releases.
551537
*/
552-
if (isYBRelation && YBIsCoveredByMainTable(indexRelation) &&
553-
!YbIsInsertOnConflictReadBatchingEnabled(resultRelInfo))
538+
if (isYBRelation && YBIsCoveredByMainTable(indexRelation))
554539
continue;
555540

556541
/* If the index is marked as read-only, ignore it */
@@ -567,10 +552,7 @@ ExecInsertIndexTuples(ResultRelInfo *resultRelInfo,
567552
if (YbExecDoInsertIndexTuple(resultRelInfo, indexRelation, indexInfo,
568553
slot, estate, noDupErr,
569554
specConflict, arbiterIndexes, update,
570-
tupleid,
571-
(resultRelInfo->ri_YbConflictMap ?
572-
resultRelInfo->ri_YbConflictMap[i] :
573-
NULL)))
555+
tupleid))
574556
result = lappend_oid(result, RelationGetRelid(indexRelation));
575557
}
576558

@@ -595,8 +577,7 @@ YbExecDoDeleteIndexTuple(ResultRelInfo *resultRelInfo,
595577
IndexInfo *indexInfo,
596578
TupleTableSlot *slot,
597579
Datum ybctid,
598-
EState *estate,
599-
struct yb_insert_on_conflict_batching_hash *ybConflictMap)
580+
EState *estate)
600581
{
601582
Datum values[INDEX_MAX_KEYS];
602583
bool isnull[INDEX_MAX_KEYS];
@@ -621,17 +602,6 @@ YbExecDoDeleteIndexTuple(ResultRelInfo *resultRelInfo,
621602
indexInfo); /* index AM may need this */
622603
MemoryContextSwitchTo(oldContext);
623604
}
624-
625-
if (ybConflictMap)
626-
{
627-
int indnkeyatts =
628-
IndexRelationGetNumberOfKeyAttributes(indexRelation);
629-
630-
YbInsertOnConflictBatchingMapDelete(ybConflictMap,
631-
indnkeyatts,
632-
values,
633-
isnull);
634-
}
635605
}
636606

637607
/* ----------------------------------------------------------------
@@ -696,8 +666,7 @@ ExecDeleteIndexTuples(ResultRelInfo *resultRelInfo, Datum ybctid, HeapTuple tupl
696666
* - As a result, we don't need distinguish between Postgres and YugaByte here.
697667
* I update this code only for clarity.
698668
*/
699-
if (isYBRelation && YBIsCoveredByMainTable(indexRelation) &&
700-
!YbIsInsertOnConflictReadBatchingEnabled(resultRelInfo))
669+
if (isYBRelation && YBIsCoveredByMainTable(indexRelation))
701670
continue;
702671

703672
indexInfo = indexInfoArray[i];
@@ -725,10 +694,7 @@ ExecDeleteIndexTuples(ResultRelInfo *resultRelInfo, Datum ybctid, HeapTuple tupl
725694
}
726695

727696
YbExecDoDeleteIndexTuple(resultRelInfo, indexRelation, indexInfo,
728-
slot, ybctid, estate,
729-
(resultRelInfo->ri_YbConflictMap ?
730-
resultRelInfo->ri_YbConflictMap[i] :
731-
NULL));
697+
slot, ybctid, estate);
732698
}
733699

734700
/* Drop the temporary slot */
@@ -849,8 +815,7 @@ YbExecUpdateIndexTuples(ResultRelInfo *resultRelInfo,
849815
* Primary key is a part of the base relation in Yugabyte and does not
850816
* need to be updated here.
851817
*/
852-
if (YBIsCoveredByMainTable(indexRelation) &&
853-
!YbIsInsertOnConflictReadBatchingEnabled(resultRelInfo))
818+
if (YBIsCoveredByMainTable(indexRelation))
854819
continue;
855820

856821
indexInfo = indexInfoArray[i];
@@ -1039,10 +1004,7 @@ YbExecUpdateIndexTuples(ResultRelInfo *resultRelInfo,
10391004
index = lfirst_int(lc);
10401005
YbExecDoDeleteIndexTuple(resultRelInfo, relationDescs[index],
10411006
indexInfoArray[index], deleteSlot, ybctid,
1042-
estate,
1043-
(resultRelInfo->ri_YbConflictMap ?
1044-
resultRelInfo->ri_YbConflictMap[index] :
1045-
NULL));
1007+
estate);
10461008
}
10471009

10481010
econtext->ecxt_scantuple = slot;
@@ -1055,10 +1017,7 @@ YbExecUpdateIndexTuples(ResultRelInfo *resultRelInfo,
10551017
NULL /* specConflict */,
10561018
NIL /* arbiterIndexes */,
10571019
true /* update */,
1058-
tupleid,
1059-
(resultRelInfo->ri_YbConflictMap ?
1060-
resultRelInfo->ri_YbConflictMap[index] :
1061-
NULL)))
1020+
tupleid))
10621021
result = lappend_oid(result, RelationGetRelid(relationDescs[index]));
10631022
}
10641023

@@ -1129,36 +1088,13 @@ ExecCheckIndexConstraints(ResultRelInfo *resultRelInfo, TupleTableSlot *slot,
11291088
for (i = 0; i < numIndices; i++)
11301089
{
11311090
Relation indexRelation = relationDescs[i];
1132-
IndexInfo *indexInfo;
1091+
IndexInfo *indexInfo = indexInfoArray[i];
11331092
bool satisfiesConstraint;
11341093

1135-
if (indexRelation == NULL)
1094+
if (!YbShouldCheckUniqueOrExclusionIndex(indexInfo, indexRelation,
1095+
heapRelation, arbiterIndexes))
11361096
continue;
11371097

1138-
indexInfo = indexInfoArray[i];
1139-
Assert(indexInfo->ii_ReadyForInserts ==
1140-
indexRelation->rd_index->indisready);
1141-
1142-
if (!indexInfo->ii_Unique && !indexInfo->ii_ExclusionOps)
1143-
continue;
1144-
1145-
/* If the index is marked as read-only, ignore it */
1146-
if (!indexInfo->ii_ReadyForInserts)
1147-
continue;
1148-
1149-
/* When specific arbiter indexes requested, only examine them */
1150-
if (arbiterIndexes != NIL &&
1151-
!list_member_oid(arbiterIndexes,
1152-
indexRelation->rd_index->indexrelid))
1153-
continue;
1154-
1155-
if (!indexRelation->rd_index->indimmediate)
1156-
ereport(ERROR,
1157-
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
1158-
errmsg("ON CONFLICT does not support deferrable unique constraints/exclusion constraints as arbiters"),
1159-
errtableconstraint(heapRelation,
1160-
RelationGetRelationName(indexRelation))));
1161-
11621098
checkedIndex = true;
11631099

11641100
if (YbIsInsertOnConflictReadBatchingEnabled(resultRelInfo))
@@ -1860,12 +1796,6 @@ yb_batch_fetch_conflicting_rows(int idx, ResultRelInfo *resultRelInfo,
18601796
return;
18611797
}
18621798

1863-
/* Create an ON CONFLICT batching map. */
1864-
resultRelInfo->ri_YbConflictMap[idx] =
1865-
YbInsertOnConflictBatchingMapCreate(estate->es_query_cxt,
1866-
resultRelInfo->ri_BatchSize,
1867-
index->rd_att);
1868-
18691799
/*
18701800
* Create the array used for the RHS of the batch read RPC.
18711801
* Parts copied from ExecEvalArrayExpr.
@@ -1968,6 +1898,7 @@ yb_batch_fetch_conflicting_rows(int idx, ResultRelInfo *resultRelInfo,
19681898
{
19691899
Datum existing_values[INDEX_MAX_KEYS];
19701900
bool existing_isnull[INDEX_MAX_KEYS];
1901+
MemoryContext oldcontext;
19711902

19721903
/*
19731904
* Extract the index column values and isnull flags from the existing
@@ -1976,11 +1907,19 @@ yb_batch_fetch_conflicting_rows(int idx, ResultRelInfo *resultRelInfo,
19761907
FormIndexDatum(indexInfo, existing_slot, estate,
19771908
existing_values, existing_isnull);
19781909

1979-
YbInsertOnConflictBatchingMapInsert(resultRelInfo->ri_YbConflictMap[idx],
1980-
indnkeyatts,
1981-
existing_values,
1982-
existing_isnull,
1983-
existing_slot);
1910+
/*
1911+
* Irrespective of how distinctness of NULLs are treated by the index,
1912+
* the index keys having NULL values are filtered out above, and will
1913+
* not be a part of the index scan result.
1914+
*/
1915+
Assert(!YbIsAnyIndexKeyColumnNull(indexInfo, existing_isnull));
1916+
1917+
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
1918+
YBCPgInsertOnConflictKeyInfo info = {existing_slot};
1919+
YBCPgYBTupleIdDescriptor *descr =
1920+
YBCBuildNonNullUniqueIndexYBTupleId(index, existing_values);
1921+
HandleYBStatus(YBCPgAddInsertOnConflictKey(descr, &info));
1922+
MemoryContextSwitchTo(oldcontext);
19841923

19851924
existing_slot = table_slot_create(heap, NULL);
19861925
econtext->ecxt_scantuple = existing_slot;
@@ -1991,3 +1930,57 @@ yb_batch_fetch_conflicting_rows(int idx, ResultRelInfo *resultRelInfo,
19911930
econtext->ecxt_scantuple = save_scantuple;
19921931
ExecDropSingleTupleTableSlot(existing_slot);
19931932
}
1933+
1934+
bool
1935+
YbIsAnyIndexKeyColumnNull(IndexInfo *indexInfo, bool isnull[INDEX_MAX_KEYS])
1936+
{
1937+
for (int i = 0; i < indexInfo->ii_NumIndexKeyAttrs; i++)
1938+
{
1939+
if (isnull[i])
1940+
return true;
1941+
}
1942+
1943+
return false;
1944+
}
1945+
1946+
/*
1947+
* YbShouldCheckUniqueOrExclusionIndex
1948+
*
1949+
* Function to determine if the given index satisfies prerequisites for a
1950+
* unique or exclusion constraint check.
1951+
* Logic has been lifted from ExecCheckIndexConstraints.
1952+
*/
1953+
bool
1954+
YbShouldCheckUniqueOrExclusionIndex(IndexInfo *indexInfo,
1955+
Relation indexRelation,
1956+
Relation heapRelation,
1957+
List *arbiterIndexes)
1958+
{
1959+
if (indexRelation == NULL)
1960+
return false;
1961+
1962+
Assert(indexInfo->ii_ReadyForInserts ==
1963+
indexRelation->rd_index->indisready);
1964+
1965+
if (!indexInfo->ii_Unique && !indexInfo->ii_ExclusionOps)
1966+
return false;
1967+
1968+
/* If the index is marked as read-only, ignore it */
1969+
if (!indexInfo->ii_ReadyForInserts)
1970+
return false;
1971+
1972+
/* When specific arbiter indexes requested, only examine them */
1973+
if (arbiterIndexes != NIL &&
1974+
!list_member_oid(arbiterIndexes,
1975+
indexRelation->rd_index->indexrelid))
1976+
return false;
1977+
1978+
if (!indexRelation->rd_index->indimmediate)
1979+
ereport(ERROR,
1980+
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
1981+
errmsg("ON CONFLICT does not support deferrable unique constraints/exclusion constraints as arbiters"),
1982+
errtableconstraint(heapRelation,
1983+
RelationGetRelationName(indexRelation))));
1984+
1985+
return true;
1986+
}

0 commit comments

Comments
 (0)