Skip to content

schemachanger: handle duplicate key error from addsstable during backfills #151688

@rafiss

Description

@rafiss

We saw the following error in monitoring:

SCHEMA CHANGE job 1096949175289937921: stepping through state failed with unexpected error: addsstable [/Tenant/3594/Table/1013/555/‹×›/‹×›,/Tenant/3594/Table/1013/555/‹×›/‹×›/‹×›): checking for key collisions: ‹×›
(1)
Wraps: (2) attached stack trace
  -- stack trace:
  | github.com/cockroachdb/cockroach/pkg/kv/bulk.(*SSTBatcher).addSSTable
  | 	pkg/kv/bulk/sst_batcher.go:976
  | github.com/cockroachdb/cockroach/pkg/kv/bulk.(*SSTBatcher).doFlush.func1
  | 	pkg/kv/bulk/sst_batcher.go:731
  | github.com/cockroachdb/cockroach/pkg/kv/bulk.(*SSTBatcher).doFlush
  | 	pkg/kv/bulk/sst_batcher.go:781
  | github.com/cockroachdb/cockroach/pkg/kv/bulk.(*SSTBatcher).flushIfNeeded
  | 	pkg/kv/bulk/sst_batcher.go:522
  | github.com/cockroachdb/cockroach/pkg/kv/bulk.(*SSTBatcher).AddMVCCKey
  | 	pkg/kv/bulk/sst_batcher.go:395
  | github.com/cockroachdb/cockroach/pkg/kv/bulk.(*BufferingAdder).doFlush
  | 	pkg/kv/bulk/buffering_adder.go:322
  | github.com/cockroachdb/cockroach/pkg/kv/bulk.(*BufferingAdder).Add
  | 	pkg/kv/bulk/buffering_adder.go:205
  | github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*bulkRowWriter).ingestLoop.func1
  | 	pkg/sql/rowexec/bulk_row_writer.go:162
  | github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*bulkRowWriter).ingestLoop
  | 	pkg/sql/rowexec/bulk_row_writer.go:175
  | github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*bulkRowWriter).work.func1
  | 	pkg/sql/rowexec/bulk_row_writer.go:112
  | github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*bulkRowWriter).work.Group.GoCtx.func3
  | 	pkg/util/ctxgroup/ctxgroup.go:199
  | golang.org/x/sync/errgroup.(*Group).Go.func1
  | 	external/org_golang_x_sync/errgroup/errgroup.go:78
  | runtime.goexit
  | 	src/runtime/asm_amd64.s:1700
Wraps: (3) addsstable [/Tenant/3594/Table/1013/555/‹×›/‹×›,/Tenant/3594/Table/1013/555/‹×›/‹×›/‹×›)
Wraps: (4)
  | (opaque error wrapper)
  | type name: github.com/cockroachdb/errors/withstack/*withstack.withStack
  | reportable 0:
  |
  | github.com/cockroachdb/cockroach/pkg/kv/kvserver/batcheval.EvalAddSSTable
  | 	pkg/kv/kvserver/batcheval/cmd_add_sstable.go:236
  | github.com/cockroachdb/cockroach/pkg/kv/kvserver.evaluateCommand
  | 	pkg/kv/kvserver/replica_evaluate.go:539
  | github.com/cockroachdb/cockroach/pkg/kv/kvserver.evaluateBatch
  | 	pkg/kv/kvserver/replica_evaluate.go:355
  | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).evaluateWriteBatchWrapper
  | 	pkg/kv/kvserver/replica_write.go:751
  | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).evaluateWriteBatchWithServersideRefreshes
  | 	pkg/kv/kvserver/replica_write.go:716
  | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).evaluateWriteBatch
  | 	pkg/kv/kvserver/replica_write.go:479
  | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).evaluateProposal
  | 	pkg/kv/kvserver/replica_proposal.go:999
  | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).requestToProposal
  | 	pkg/kv/kvserver/replica_proposal.go:1095
  | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).evalAndPropose
  | 	pkg/kv/kvserver/replica_raft.go:116
  | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).executeWriteBatch
  | 	pkg/kv/kvserver/replica_write.go:188
  | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).executeBatchWithConcurrencyRetries
  | 	pkg/kv/kvserver/replica_send.go:498
  | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).SendWithWriteBytes
  | 	pkg/kv/kvserver/replica_send.go:177
  | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).SendWithWriteBytes
  | 	pkg/kv/kvserver/store_send.go:188
  | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Stores).SendWithWriteBytes
  | 	pkg/kv/kvserver/stores.go:199
  | github.com/cockroachdb/cockroach/pkg/server.(*Node).batchInternal
  | 	pkg/server/node.go:1646
  | github.com/cockroachdb/cockroach/pkg/server.(*Node).Batch
  | 	pkg/server/node.go:1839
  | github.com/cockroachdb/cockroach/pkg/server.(*Node).batchStreamImpl
  | 	pkg/server/node.go:1894
  | github.com/cockroachdb/cockroach/pkg/server.(*Node).BatchStream
  | 	pkg/server/node.go:1865
  | github.com/cockroachdb/cockroach/pkg/kv/kvpb._Internal_BatchStream_Handler
  | 	bazel-out/k8-opt/bin/pkg/kv/kvpb/kvpb_go_proto_/github.com/cockroachdb/cockroach/pkg/kv/kvpb/api.pb.go:10849
  | github.com/cockroachdb/cockroach/pkg/rpc.NewServerEx.StreamServerInterceptor.func13
  | 	pkg/util/tracing/grpcinterceptor/grpc_interceptor.go:143
  | google.golang.org/grpc.getChainStreamHandler.func1
  | 	external/org_golang_google_grpc/server.go:1504
  | github.com/cockroachdb/cockroach/pkg/rpc.NewServerEx.func4
  | 	pkg/rpc/context.go:181
  | google.golang.org/grpc.getChainStreamHandler.func1
  | 	external/org_golang_google_grpc/server.go:1504
  | github.com/cockroachdb/cockroach/pkg/rpc.kvAuth.streamInterceptor
  | 	pkg/rpc/auth.go:152
  | google.golang.org/grpc.getChainStreamHandler.func1
  | 	external/org_golang_google_grpc/server.go:1504
  | github.com/cockroachdb/cockroach/pkg/rpc.NewServerEx.func2.1
  | 	pkg/rpc/context.go:142
  | github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunTaskWithErr
  | 	pkg/util/stop/stopper.go:350
  | github.com/cockroachdb/cockroach/pkg/rpc.NewServerEx.func2
  | 	pkg/rpc/context.go:141
  | google.golang.org/grpc.NewServer.chainStreamServerInterceptors.chainStreamInterceptors.func2
  | 	external/org_golang_google_grpc/server.go:1495
  | google.golang.org/grpc.(*Server).processStreamingRPC
  | 	external/org_golang_google_grpc/server.go:1659
  | google.golang.org/grpc.(*Server).handleStream
  | 	external/org_golang_google_grpc/server.go:1739
  | google.golang.org/grpc.(*Server).serveStreams.func1.1
  | 	external/org_golang_google_grpc/server.go:970
Wraps: (5) checking for key collisions
Wraps: (6) ‹×›
  |
  | (opaque error leaf)
  | type name: github.com/cockroachdb/cockroach/pkg/kv/kvpb/*kvpb.KeyCollisionError
Error types: (1) *colexecerror.notInternalError (2) *withstack.withStack (3) *errutil.withPrefix (4) *errbase.opaqueWrapper (5) *errutil.withPrefix (6) *errbase.opaqueLeaf

Since this is a duplicate key error, we should handle this more cleanly without logging an overly verbose stack trace.

This is similar issue to #110826

Jira issue: CRDB-53419

Metadata

Metadata

Assignees

Labels

C-cleanupTech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior.T-sql-foundationsSQL Foundations Team (formerly SQL Schema + SQL Sessions)

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions