Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI Failure (500 Server Error) in IdempotencySnapshotDelivery.test_recovery_after_snapshot_is_delivered #16202

Closed
vbotbuildovich opened this issue Jan 20, 2024 · 7 comments
Labels
area/storage auto-triaged used to know which issues have been opened from a CI job ci-failure sev/low Bugs which are non-functional paper cuts, e.g. typos, issues in log messages

Comments

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Jan 20, 2024

https://buildkite.com/redpanda/redpanda/builds/43869

Module: rptest.tests.idempotency_test
Class: IdempotencySnapshotDelivery
Method: test_recovery_after_snapshot_is_delivered
test_id:    IdempotencySnapshotDelivery.test_recovery_after_snapshot_is_delivered
status:     FAIL
run time:   105.788 seconds

HTTPError('500 Server Error: Internal Server Error for url: http://docker-rp-12:9644/v1/cloud_storage/reset_scrubbing_metadata/kafka/topic-mmltvskecb/0')
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 184, in _do_run
    data = self.run_test()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 269, in run_test
    return self.test_context.function(self.test)
  File "/root/tests/rptest/services/cluster.py", line 159, in wrapped
    self.redpanda.maybe_do_internal_scrub()
  File "/root/tests/rptest/services/redpanda.py", line 3955, in maybe_do_internal_scrub
    results = self.wait_for_internal_scrub(cloud_partitions)
  File "/root/tests/rptest/services/redpanda.py", line 4062, in wait_for_internal_scrub
    self._admin.reset_scrubbing_metadata(
  File "/root/tests/rptest/services/admin.py", line 1170, in reset_scrubbing_metadata
    return self._request(
  File "/root/tests/rptest/services/admin.py", line 363, in _request
    r.raise_for_status()
  File "/usr/local/lib/python3.10/dist-packages/requests/models.py", line 941, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http://docker-rp-12:9644/v1/cloud_storage/reset_scrubbing_metadata/kafka/topic-mmltvskecb/0

JIRA Link: CORE-1729

@vbotbuildovich vbotbuildovich added auto-triaged used to know which issues have been opened from a CI job ci-failure labels Jan 20, 2024
@dotnwat dotnwat changed the title CI Failure (key symptom) in IdempotencySnapshotDelivery.test_recovery_after_snapshot_is_delivered CI Failure (500 Server Error) in IdempotencySnapshotDelivery.test_recovery_after_snapshot_is_delivered Jan 20, 2024
@andrwng
Copy link
Contributor

andrwng commented Jan 22, 2024

TRACE 2024-01-18 06:41:20,649 [shard 0:admi] admin_api_server - server.cc:615 - Attempting to audit authn for /v1/cloud_storage/reset_scrubbing_metadata/kafka/topic-mmltvskecb/0
TRACE 2024-01-18 06:41:20,649 [shard 0:admi] admin_api_server - server.cc:571 - Attempting to audit authz for /v1/cloud_storage/reset_scrubbing_metadata/kafka/topic-mmltvskecb/0
TRACE 2024-01-18 06:41:20,649 [shard 1:au  ] http - [/5a8073d5/kafka/topic-mmltvskecb/0_24/41009-46188-1098169-1-v1.log.2] - client.cc:89 - client.make_request HEAD /5a8073d5/kafka/topic-mmltvskecb/0_24/41009-46188-1098169-1-v1.log.2 HTTP/1.1
User-Agent: redpanda.vectorized.io
Host: panda-bucket-5bcff224-b5cc-11ee-bbd3-0242ac10101c.minio-s3
Content-Length: 0
x-amz-date: 20240118T064120Z
x-amz-content-sha256: [secret]
Authorization: [secret]


DEBUG 2024-01-18 06:41:20,650 [shard 0:admi] admin_api_server - server.cc:647 - [admin] POST http://docker-rp-12:9644/v1/cloud_storage/reset_scrubbing_metadata/kafka/topic-mmltvskecb/0
DEBUG 2024-01-18 06:41:20,650 [shard 1:au  ] http - [/5a8073d5/kafka/topic-mmltvskecb/0_24/41009-46188-1098169-1-v1.log.2] - client.cc:101 - reusing connection, age 1389585
TRACE 2024-01-18 06:41:20,650 [shard 1:au  ] http - /5a8073d5/kafka/topic-mmltvskecb/0_24/41009-46188-1098169-1-v1.log.2 - client.cc:434 - request_stream.send_some 0
DEBUG 2024-01-18 06:41:20,650 [shard 1:admi] cluster - ntp: {kafka/topic-mmltvskecb/0} - archival_metadata_stm.cc:411 - command_batch_builder::replicate called
WARN  2024-01-18 06:41:20,651 [shard 1:admi] archival - [fiber4 kafka/topic-mmltvskecb/0] - ntp_archiver_service.cc:603 - Failed to replicate reset scrubbing metadata command: Current node is not a leader for partition
TRACE 2024-01-18 06:41:20,652 [shard 1:au  ] http - /5a8073d5/kafka/topic-mmltvskecb/0_24/41009-46188-1098169-1-v1.log.2 - client.cc:296 - chunk received, chunk length 485
ERROR 2024-01-18 06:41:20,652 [shard 0:admi] admin_api_server - server.cc:680 - [admin] exception intercepted - url: [http://docker-rp-12:9644/v1/cloud_storage/reset_scrubbing_metadata/kafka/topic-mmltvskecb/0] http_return_status[500] reason - seastar::httpd::server_error_exception (Failed to replicate or apply scrubber metadata reset command)

Looks like there was a leadership change as the command to start the scrubber came in.

@andrwng andrwng added the sev/low Bugs which are non-functional paper cuts, e.g. typos, issues in log messages label Jan 22, 2024
@vbotbuildovich
Copy link
Collaborator Author

@vbotbuildovich
Copy link
Collaborator Author

@dotnwat dotnwat added area/storage sev/low Bugs which are non-functional paper cuts, e.g. typos, issues in log messages and removed sev/low Bugs which are non-functional paper cuts, e.g. typos, issues in log messages labels Apr 4, 2024
@piyushredpanda
Copy link
Contributor

Not seen in at least two months, closing

@abhijat
Copy link
Contributor

abhijat commented Sep 21, 2024

@abhijat
Copy link
Contributor

abhijat commented Sep 21, 2024

DEBUG 2024-09-20 12:07:38,835 [shard 0:admi] cluster - ntp: {kafka/__consumer_offsets/12} - archival_metadata_stm.cc:478 - command_batch_builder::replicate called
WARN  2024-09-20 12:07:38,835 [shard 0:admi] archival - [fiber27 kafka/__consumer_offsets/12] - ntp_archiver_service.cc:622 - Failed to replicate reset scrubbing metadata command: Current node is not a leader for partition
ERROR 2024-09-20 12:07:38,835 [shard 0:admi] admin_api_server - server.cc:657 - [admin] exception intercepted - url: [http://docker-rp-13:9644/v1/cloud_storage/reset_scrubbing_metadata/kafka/__consumer_offsets/12] http_return_status[500] reason - seastar::httpd::server_error_exception (Failed to replicate or apply scrubber metadata reset command)

This seems to be run on __consumer_offsets

@piyushredpanda
Copy link
Contributor

Closing older-bot-filed CI issues as we transition to a more reliable system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/storage auto-triaged used to know which issues have been opened from a CI job ci-failure sev/low Bugs which are non-functional paper cuts, e.g. typos, issues in log messages
Projects
None yet
Development

No branches or pull requests

5 participants