-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of segment metadata cache on Overlord #17785
Conversation
… unused
…_cache
@@ -66,19 +67,22 @@ public class OverlordDataSourcesResource | |||
private static final Logger log = new Logger(OverlordDataSourcesResource.class); | |||
|
|||
private final SegmentsMetadataManager segmentsMetadataManager; | |||
private final IndexerMetadataStorageCoordinator metadataStorageCoordinator; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized that the SegmentsMetadataManager
is the Coordinator version of a metadata cache. Do you think we're moving towards getting rid of it completely on the Overlord side of things, in favor of using only IndexerMetadataStorageCoordinator
?
It looks like after this patch, the SegmentsMetadataManager
is still used in two places on the OL:
- this class, for
markAsUsedNonOvershadowedSegmentsInInterval
,markAsUsedNonOvershadowedSegments
, andmarkSegmentAsUsed
. - in
OverlordCompactionScheduler
, forgetSnapshotOfDataSourcesWithAllUsedSegments
. to me it seems like this could already be replaced byIndexerSQLMetadataStorageCoordinator#retrieveAllUsedSegments
. (although theSegmentsMetadataManager
version would perform better, since it caches the snapshot, and theIndexerSQLMetadataStorageCoordinator
version needs to build a timeline. I am not sure how much this matters.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, @gianm , I plan to get rid of SqlSegmentsMetadataManager
from the Overlord completely.
Coordinator will continue to use it for the time being, but eventually we might just get rid of it altogether.
- The usage of
SqlSegmentsMetadataManager
byOverlordCompactionScheduler
does matter for performance because of the timeline and caching reasons as you guessed, but I should be able to resolve this soon.
@@ -167,4 +169,24 @@ public static SegmentId getValidSegmentId(String dataSource, String serializedSe | |||
return parsedSegmentId; | |||
} | |||
} | |||
|
|||
/** | |||
* Tries to parse the given serialized ID as {@link SegmentId}s of the given |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is suspicious to me, since it encourages ignoring seemingly-invalid segment IDs. Where are the places that we are encountering segment IDs that may not be valid? Is it ok to ignore them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I guess it would be better to error out on invalid segment IDs.
The only place these can come from are REST APIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
} | ||
} | ||
catch (Exception e) { | ||
log.noStackTrace().error( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When might this happen? Does it mean we will ignore segments that we can't read? That would be worrying, because Exception
is very broad. Is there some kind of scenario you have in mind where a failure is possible?
Also, .noStackTrace().error
is not a combination I expect to see. Typically error
is reserved for serious problems, and for serious problems we should always have a stack trace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When might this happen? Does it mean we will ignore segments that we can't read? That would be worrying, because Exception is very broad. Is there some kind of scenario you have in mind where a failure is possible?
This can happen in the rare case when the segment payload has been tampered or some other column was not parseable. It is not frequent but it can happen, as I only recently encountered this in a prod DB.
We are not throwing an error here so that the processing can continue with the rest of the segments.
Even though this segment is ignored here, it actually increments a skippedCount
in the code where this class is used and we raise an alert if skippedCount > 0
. The log here is just to allow the operator to go through Overlord logs and see which segment IDs failed and why.
Do you think it would be better to create an alert for each failing segment ID separately? (similar to SqlSegmentsMetadataManager
I felt it would be too noisy, so alerted at total count level instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, .noStackTrace().error is not a combination I expect to see. Typically error is reserved for serious problems, and for serious problems we should always have a stack trace.
Yeah, I see your point. I will include the stack trace here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it would be better to create an alert for each failing segment ID separately? (similar to
SqlSegmentsMetadataManager
I felt it would be too noisy, so alerted at total count level instead.
I think the way you did it here is OK. It's good that there is an alert, and IMO an alert with a number of bad segments is ok.
Verified the changes in this patch on the following clusters:
|
@kfaraz Also could you provide some description around what is the delta sync? |
@cryptoe , those are two separate clusters and the results are not really comparable between the two since the underlying metadata store is different. The comparison is to be done between the old sync time and the new sync time for any given cluster. Also, the 850s for 600k segments is a little pessimistic, since a real production cluster with say 1M segments would have a much beefier metadata store and master node. Full sync: fetch all used segment payloads |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks for the review, @gianm ! |
Description
#17653 introduces a cache for segment metadata on the Overlord.
This patch is a follow up to that to make the cache more robust, performant and debug-friendly.
Changes
This significantly reduces sync time in cases where the cluster has a lot of unused segments.
Unused segments are needed only during segment allocation to ensure that a duplicate ID is not allocated.
This is a rare DB query which is supported by sufficient indexes and thus need not be cached at the moment.
HeapMemoryDatasourceSegmentCache
using methods
syncSegmentIds()
andsyncPendingSegments()
rather than updating one by one.This ensures that the locks are held for a shorter period and the update made to the cache is atomic.
Classes to review
IndexerMetadataStorageCoordinator
OverlordDataSourcesResource
HeapMemorySegmentMetadataCache
HeapMemoryDatasourceSegmentCache
Cleaner cache sync
In every sync, the following steps are performed for each datasource:
Testing
Verified the changes in this patch on the following clusters:
This PR has: