Skip to content

[CI] TimeSeriesDataStreamsIT testSearchableSnapshotAction failing #125867

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
elasticsearchmachine opened this issue Mar 28, 2025 · 3 comments · Fixed by #126605
Closed

[CI] TimeSeriesDataStreamsIT testSearchableSnapshotAction failing #125867

elasticsearchmachine opened this issue Mar 28, 2025 · 3 comments · Fixed by #126605
Assignees
Labels
:Data Management/ILM+SLM Index and Snapshot lifecycle management low-risk An open issue or test failure that is a low risk to future releases Team:Data Management Meta label for data/management team >test-failure Triaged test failures from CI

Comments

@elasticsearchmachine
Copy link
Collaborator

Build Scans:

Reproduction Line:

./gradlew ":x-pack:plugin:ilm:qa:multi-node:javaRestTest" --tests "org.elasticsearch.xpack.ilm.TimeSeriesDataStreamsIT.testSearchableSnapshotAction" -Dtests.seed=B64E5CCDB22BF95B -Dtests.locale=cs -Dtests.timezone=Europe/Istanbul -Druntime.java=24

Applicable branches:
main

Reproduces locally?:
N/A

Failure History:
See dashboard

Failure Message:

java.lang.AssertionError: null

Issue Reasons:

  • [main] 2 failures in test testSearchableSnapshotAction (0.3% fail rate in 651 executions)

Note:
This issue was created using new test triage automation. Please report issues or feedback to es-delivery.

@elasticsearchmachine elasticsearchmachine added :Data Management/ILM+SLM Index and Snapshot lifecycle management >test-failure Triaged test failures from CI labels Mar 28, 2025
@elasticsearchmachine
Copy link
Collaborator Author

This has been muted on branch main

Mute Reasons:

  • [main] 2 failures in test testSearchableSnapshotAction (0.3% fail rate in 651 executions)

Build Scans:

@elasticsearchmachine elasticsearchmachine added Team:Data Management Meta label for data/management team needs:risk Requires assignment of a risk label (low, medium, blocker) labels Mar 28, 2025
@elasticsearchmachine
Copy link
Collaborator Author

Pinging @elastic/es-data-management (Team:Data Management)

@nielsbauman nielsbauman self-assigned this Mar 28, 2025
@nielsbauman nielsbauman added low-risk An open issue or test failure that is a low risk to future releases and removed needs:risk Requires assignment of a risk label (low, medium, blocker) labels Mar 28, 2025
@nielsbauman
Copy link
Contributor

I already encountered this over at #125752 (comment). It's caused by ILM being stuck in the wait-for-index-color step. We're currently still discussing the best way to tackle this issue - I opened #125812 for a different test class but we're discussing if we want to go for a different approach.

omricohenn pushed a commit to omricohenn/elasticsearch that referenced this issue Mar 28, 2025
nielsbauman added a commit that referenced this issue Apr 1, 2025
ILM sometimes skips a policy/index for a cluster state update if the
step was still running/enqueued while the update came in. That on its
own isn't a problem, but in very quiet clusters, this would mean that
it could take arbitrarily long for the policy step to be run -
i.e. when the next cluster state comes in. We saw this happening in
a few tests, but it could potentially happen in production too.

Fixes #125683
Fixes #125789
Fixes #125867
Fixes #125911
nielsbauman added a commit that referenced this issue Apr 1, 2025
ILM sometimes skips a policy/index for a cluster state update if the
step was still running/enqueued while the update came in. That on its
own isn't a problem, but in very quiet clusters, this would mean that
it could take arbitrarily long for the policy step to be run -
i.e. when the next cluster state comes in. We saw this happening in
a few tests, but it could potentially happen in production too.

Fixes #125683
Fixes #125789
Fixes #125867
Fixes #125911
nielsbauman added a commit to nielsbauman/elasticsearch that referenced this issue Apr 1, 2025
ILM sometimes skips a policy/index for a cluster state update if the
step was still running/enqueued while the update came in. That on its
own isn't a problem, but in very quiet clusters, this would mean that
it could take arbitrarily long for the policy step to be run -
i.e. when the next cluster state comes in. We saw this happening in
a few tests, but it could potentially happen in production too.

Fixes elastic#125683
Fixes elastic#125789
Fixes elastic#125867
Fixes elastic#125911
Fixes elastic#126053
nielsbauman added a commit to nielsbauman/elasticsearch that referenced this issue Apr 1, 2025
ILM sometimes skips a policy/index for a cluster state update if the
step is still running/enqueued when the update comes in. That on its own
isn't a problem, but in very quiet clusters, this would mean that
it could take arbitrarily long for the policy step to be run -
i.e. when the next cluster state comes in. We saw this happening in
a few tests, but it could potentially happen in production too.

Fixes elastic#125683
Fixes elastic#125789
Fixes elastic#125867
Fixes elastic#125911
Fixes elastic#126053
nielsbauman added a commit that referenced this issue Apr 10, 2025
The `indexNameSupplier` was included in the equality and is of type
`BiFunction`, which doesn't implement a proper `equals` method by
default - and thus neither do the lambdas. This meant that two instances
of this step would only be considered equal if they were the same
instance. By excluding `indexNameSupplier` from the `equals` method, we
ensure the method works as intended and is able to properly tell the
equality between two instances.

As a side effect, we expect/hope this change will fix a number of tests
that were failing because `WaitForIndexColorStep` missed the last
cluster state update in the test, causing ILM to get stuck and the test
to time out.

Fixes #125683
Fixes #125789
Fixes #125867
Fixes #125911
Fixes #126053
Fixes #126354
nielsbauman added a commit to nielsbauman/elasticsearch that referenced this issue Apr 10, 2025
The `indexNameSupplier` was included in the equality and is of type
`BiFunction`, which doesn't implement a proper `equals` method by
default - and thus neither do the lambdas. This meant that two instances
of this step would only be considered equal if they were the same
instance. By excluding `indexNameSupplier` from the `equals` method, we
ensure the method works as intended and is able to properly tell the
equality between two instances.

As a side effect, we expect/hope this change will fix a number of tests
that were failing because `WaitForIndexColorStep` missed the last
cluster state update in the test, causing ILM to get stuck and the test
to time out.

Fixes elastic#125683
Fixes elastic#125789
Fixes elastic#125867
Fixes elastic#125911
Fixes elastic#126053
Fixes elastic#126354
nielsbauman added a commit to nielsbauman/elasticsearch that referenced this issue Apr 10, 2025
The `indexNameSupplier` was included in the equality and is of type
`BiFunction`, which doesn't implement a proper `equals` method by
default - and thus neither do the lambdas. This meant that two instances
of this step would only be considered equal if they were the same
instance. By excluding `indexNameSupplier` from the `equals` method, we
ensure the method works as intended and is able to properly tell the
equality between two instances.

As a side effect, we expect/hope this change will fix a number of tests
that were failing because `WaitForIndexColorStep` missed the last
cluster state update in the test, causing ILM to get stuck and the test
to time out.

Fixes elastic#125683
Fixes elastic#125789
Fixes elastic#125867
Fixes elastic#125911
Fixes elastic#126053
Fixes elastic#126354

(cherry picked from commit 3231eb2)

# Conflicts:
#	muted-tests.yml
elasticsearchmachine pushed a commit that referenced this issue Apr 10, 2025
The `indexNameSupplier` was included in the equality and is of type
`BiFunction`, which doesn't implement a proper `equals` method by
default - and thus neither do the lambdas. This meant that two instances
of this step would only be considered equal if they were the same
instance. By excluding `indexNameSupplier` from the `equals` method, we
ensure the method works as intended and is able to properly tell the
equality between two instances.

As a side effect, we expect/hope this change will fix a number of tests
that were failing because `WaitForIndexColorStep` missed the last
cluster state update in the test, causing ILM to get stuck and the test
to time out.

Fixes #125683
Fixes #125789
Fixes #125867
Fixes #125911
Fixes #126053
Fixes #126354

(cherry picked from commit 3231eb2)

# Conflicts:
#	muted-tests.yml
elasticsearchmachine pushed a commit that referenced this issue Apr 10, 2025
The `indexNameSupplier` was included in the equality and is of type
`BiFunction`, which doesn't implement a proper `equals` method by
default - and thus neither do the lambdas. This meant that two instances
of this step would only be considered equal if they were the same
instance. By excluding `indexNameSupplier` from the `equals` method, we
ensure the method works as intended and is able to properly tell the
equality between two instances.

As a side effect, we expect/hope this change will fix a number of tests
that were failing because `WaitForIndexColorStep` missed the last
cluster state update in the test, causing ILM to get stuck and the test
to time out.

Fixes #125683
Fixes #125789
Fixes #125867
Fixes #125911
Fixes #126053
Fixes #126354

(cherry picked from commit 3231eb2)

# Conflicts:
#	muted-tests.yml
#	x-pack/plugin/ilm/qa/multi-node/src/javaRestTest/java/org/elasticsearch/xpack/ilm/actions/SearchableSnapshotActionIT.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/ILM+SLM Index and Snapshot lifecycle management low-risk An open issue or test failure that is a low risk to future releases Team:Data Management Meta label for data/management team >test-failure Triaged test failures from CI
Projects
None yet
2 participants