Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

github.com/cockroachdb/pebble/internal/metamorphic: TestMeta failed #4267

Closed
cockroach-teamcity opened this issue Jan 17, 2025 · 4 comments · Fixed by #4280
Closed

github.com/cockroachdb/pebble/internal/metamorphic: TestMeta failed #4267

cockroach-teamcity opened this issue Jan 17, 2025 · 4 comments · Fixed by #4280

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Jan 17, 2025

github.com/cockroachdb/pebble/internal/metamorphic.TestMeta failed with artifacts on refs/heads/master @ 7c863c9c3bd8:

=== CONT  TestMeta/execution/standard-020
=== RUN   TestMeta/execution/random-026
=== PAUSE TestMeta/execution/random-026
=== CONT  TestMeta/execution/random-026
=== RUN   TestMeta/execution/random-029
=== PAUSE TestMeta/execution/random-029
=== CONT  TestMeta/execution/random-029
=== RUN   TestMeta/execution/standard-000
=== PAUSE TestMeta/execution/standard-000
=== CONT  TestMeta/execution/standard-000
=== RUN   TestMeta/execution/random-009
=== PAUSE TestMeta/execution/random-009
=== CONT  TestMeta/execution/random-009
=== RUN   TestMeta/execution/random-010
=== PAUSE TestMeta/execution/random-010
=== CONT  TestMeta/execution/random-010
=== RUN   TestMeta/execution/random-018
=== PAUSE TestMeta/execution/random-018
=== CONT  TestMeta/execution/random-018
=== RUN   TestMeta/execution/random-021
=== PAUSE TestMeta/execution/random-021
=== CONT  TestMeta/execution/random-021
=== RUN   TestMeta/compare
=== RUN   TestMeta/execution/random-013
=== PAUSE TestMeta/execution/random-013
=== CONT  TestMeta/execution/random-013
=== RUN   TestMeta/execution/random-025
=== PAUSE TestMeta/execution/random-025
=== CONT  TestMeta/execution/random-025
=== RUN   TestMeta/execution/standard-017
=== PAUSE TestMeta/execution/standard-017
=== CONT  TestMeta/execution/standard-017
=== RUN   TestMeta/execution/random-008
=== PAUSE TestMeta/execution/random-008
=== CONT  TestMeta/execution/random-008
=== RUN   TestMeta/execution/random-019
=== PAUSE TestMeta/execution/random-019
=== CONT  TestMeta/execution/random-019
=== RUN   TestMeta/execution/random-007
=== PAUSE TestMeta/execution/random-007
=== CONT  TestMeta/execution/random-007
=== RUN   TestMeta/execution/standard-019
=== PAUSE TestMeta/execution/standard-019
=== CONT  TestMeta/execution/standard-019
=== RUN   TestMeta/execution/standard-022
=== PAUSE TestMeta/execution/standard-022
=== CONT  TestMeta/execution/standard-022
=== RUN   TestMeta/execution/standard-023
=== PAUSE TestMeta/execution/standard-023
=== CONT  TestMeta/execution/standard-023
Help

To reproduce, try:

go test -tags 'invariants' -exec 'stress -p 1' -timeout 0 -test.v -run 'TestMeta$' ./internal/metamorphic -seed 1737097969327854277 -ops "uniform:5000-10000"

This test on roachdash | Improve this report!

Jira issue: PEBBLE-323

@jbowens
Copy link
Collaborator

jbowens commented Jan 17, 2025

===== SEED =====
1737097969327854277
===== DIFF =====
/artifacts/meta/250117-071249.3271015605685/{standard-000,standard-001}
@@ -6664,11 +6664,11 @@
 iter53.Next("") // [false] <nil> #6663
 snap39.Get("vxlslamcvnd@7") // [""] pebble: not found #6664
 iter51.First() // [true,"vtsaosdcczgv@14",<no point>,["vtsaosdcczgv@14","xlqednvas@2")=>{"@6"="mguxyjxp"}*] <nil> #6665
 iter53.Next("") // [false] <nil> #6666
 iter55.SetBounds("aweksfzsekn@8", "pxpgeqeec") // <nil> #6667
-iter55.SeekLT("pxpgeqeec", "") // [true,"myczrfvl@1","dmmhbuqdmmhbuqdmmhbuqdmmhbuq",<no range>] <nil> #6668
+iter55.SeekLT("pxpgeqeec", "") // [true,"myczrfvl@1","dmmhbuqdmmhbuq",<no range>] <nil> #6668
 db1.Merge("rxobdmazvuw@8", "nveec") // <nil> #6669
 db1.IngestExternalFiles(external0, "aimaueep" /* start */, "aimaujurkssjkqj" /* end */, "" /* syntheticSuffix */, "aima" /* syntheticPrefix */) // <nil> #6670
 db1.SingleDelete("owjzkzrbawcf@5", false /* maybeReplaceDelete */) // <nil> #6671
 iter51.SeekGE("iejsi", "") // [true,"vtsaosdcczgv@14",<no point>,["vtsaosdcczgv@14","xlqednvas@2")=>{"@6"="mguxyjxp"}] <nil> #6672
 db1.Delete("lkadcy@2") // <nil> #6673

@jbowens
Copy link
Collaborator

jbowens commented Jan 21, 2025

// 16:03:18.589 INFO: possible API misuse: ineffectual SINGLEDEL (key="myczrfvl@2")
// 16:03:18.589 INFO: possible API misuse: nondeterministic SINGLEDEL (key="myczrfvl@1")

Well that's helpful.

jbowens added a commit to jbowens/pebble that referenced this issue Jan 21, 2025
Terminate and fail the metamorphic test if a nondeterministic single delete is
observed. The generator is responsible for generating sequences of operations
that ensure determinism across all operations, including single deletes.

Informs cockroachdb#4267.
@jbowens
Copy link
Collaborator

jbowens commented Jan 21, 2025

db1.Get("myczrfvl@1") // [""] pebble: not found #223
...
db1.IngestExternalFiles(external0, "ljildvpc" /* start */, "xlqednvas" /* end */, "" /* syntheticSuffix */, "" /* syntheticPrefix */) // <nil> #261
...
db1.IngestExternalFiles(external0, "ljildvpc" /* start */, "smhndaclxvq" /* end */, "" /* syntheticSuffix */, "" /* syntheticPrefix */, external0, "wnljildvpc" /* start */, "wnmyczrfvl" /* end */, "" /* syntheticSuffix */, "wn" /* syntheticPrefix */, external0, "hxkrdktf" /* start */, "ljildvpc" /* end */, "" /* syntheticSuffix */, "" /* syntheticPrefix */) // <nil> #270
...
db1.SingleDelete("myczrfvl@1", false /* maybeReplaceDelete */) // <nil> #288

...
// 16:37:34.621 INFO: possible API misuse: nondeterministic SINGLEDEL (key="myczrfvl@1")

The single delete op 288 is invalid because we 'Merge'd the key once in op 261 and once on the first external object ingestion in op 270. Not sure yet where the key manager went wrong.

@jbowens jbowens self-assigned this Jan 21, 2025
@jbowens
Copy link
Collaborator

jbowens commented Jan 22, 2025

Oof, this is a result of the way we discretize range deletions:

case *deleteRangeOp:
		// We track the history of discrete point keys, but a range deletion
		// applies over a continuous key span of infinite keys. However, the key
		// manager knows all keys that have been used in all operations, so we
		// can discretize the range tombstone by adding it to every known key
		// within the range.
		//
		// TODO(jackson): If s.writerID is a batch, we may not know the set of
		// all keys that WILL exist by the time the batch is committed. This
		// means the delete range history is incomplete. Fix this.
		keyRange := pebble.KeyRange{Start: s.start, End: s.end}
		for _, key := range k.knownKeysInRange(keyRange) {
			meta := k.getOrInit(s.writerID, key)
			if meta.objID.tag() == dbTag {
				meta.clear()
			} else {
				meta.history = append(meta.history, keyHistoryItem{
					opType: OpWriterDeleteRange,
				})
			}
		}
		k.expandBounds(s.writerID, k.makeEndExclusiveBounds(s.start, s.end))
		k.objKeyMeta(s.writerID).hasRangeDels = true

Adding a range deletion to a batch can cause it to add a tracked key to represent the discretized delete. If later a real point key that sorts after the discretized key is added with the same prefix and then the batch is converted to an external object, we'll incorrectly record the discretized key as the key defined in the object. At runtime when actually building the external object, we'll create the object with the real point key instead.

Maybe time to address this TODO:

// We track the history of discrete point keys, but a range deletion
// applies over a continuous key span of infinite keys. However, the key
// manager knows all keys that have been used in all operations, so we
// can discretize the range tombstone by adding it to every known key
// within the range.
//
// TODO(jackson): If s.writerID is a batch, we may not know the set of
// all keys that WILL exist by the time the batch is committed. This
// means the delete range history is incomplete. Fix this.

jbowens added a commit to jbowens/pebble that referenced this issue Jan 22, 2025
When constructing an external object, only the first point key with each unique
prefix is added to the sstable. The key manager strives to maintain consistency
with this behavior to ensure that the it maintains correct key histories after
an ingestion of an external object. The key manager however also will
discretize range deletions, sometimes creating keyMetas for keys that do not
exist as point keys on the associated writer. The presence of these was
confusing the key manager's calculation of which keys would end up in the
external object.

This commit updates the external object calculation to account for the
existence of the range deletions. Future work may refactor the tracking of
range deletions to track range deletions separately from point keys and as
continuous spans.

Fix cockroachdb#4267.
jbowens added a commit to jbowens/pebble that referenced this issue Jan 22, 2025
When constructing an external object, only the first point key with each unique
prefix is added to the sstable. The key manager strives to maintain consistency
with this behavior to ensure that the it maintains correct key histories after
an ingestion of an external object. The key manager however also will
discretize range deletions, sometimes creating keyMetas for keys that do not
exist as point keys on the associated writer. The presence of these was
confusing the key manager's calculation of which keys would end up in the
external object.

This commit updates the external object calculation to account for the
existence of the range deletions. Future work may refactor the tracking of
range deletions to track range deletions separately from point keys and as
continuous spans.

Fix cockroachdb#4267.
jbowens added a commit to jbowens/pebble that referenced this issue Jan 23, 2025
When constructing an external object, only the first point key with each unique
prefix is added to the sstable. The key manager strives to maintain consistency
with this behavior to ensure that the it maintains correct key histories after
an ingestion of an external object. The key manager however also will
discretize range deletions, sometimes creating keyMetas for keys that do not
exist as point keys on the associated writer. The presence of these was
confusing the key manager's calculation of which keys would end up in the
external object.

This commit updates the external object calculation to account for the
existence of the range deletions. Future work may refactor the tracking of
range deletions to track range deletions separately from point keys and as
continuous spans.

Fix cockroachdb#4267.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants