-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(ecs): Narrowing the cache search for the ECS provider on views #6256
base: master
Are you sure you want to change the base?
Conversation
@dbyron0 @deverton would appreciate your feedback on this change. There are still improvements to be made as the current implementation of ECS goes through every region per account to retrieve the necessary data from cache which is far from ideal when there are hundreds of accounts. The perf of alarms is still a problem as right now it goes through all the alarms and tries to match with a service but this will be addressed in a future PR. |
@dbyron-sf @jasonmcintosh Added some results from an internal testing related to this change. Would appreciate any feedback! |
@@ -65,6 +65,11 @@ public Collection<T> getAll(String account, String region) { | |||
return convertAll(data); | |||
} | |||
|
|||
public Collection<T> getAll(Collection<String> identifiers) { | |||
Collection<CacheData> allData = cacheView.getAll(keyNamespace, identifiers); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: merge the variable dont' need it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
@@ -480,6 +485,15 @@ public Map<String, Set<EcsServerCluster>> getClusters() { | |||
return clusterMap; | |||
} | |||
|
|||
public Map<String, Set<EcsServerCluster>> getClusters0(String application) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getClusters0 seems like a bad function name. I'm... wondering why this method vs. the getClusterSummaries up above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Im following the same pattern/naming convention established here for the AWS provider https://github.com/spinnaker/clouddriver/blob/0226100a95b16e1176a296198b1f892c4514e4d6/clouddriver-aws/src/main/groovy/com/netflix/spinnaker/clouddriver/aws/provider/view/AmazonClusterProvider.groovy#L70:L78
But for the ECS provider the Cluster view never implemented the clusterSummaries. It was always returning the clusterDetails.
Which is another area of improvement. getClusterSummaries should return only the names of the serverGroups and loadbalancers. getClusterDetails should return the full details of the serverGroups and loadbalancers.
Does it make sense to go on addressing this in this PR or maybe a follow up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Key differences:
- getClusters0 in AWS is a parameterized private method which takes an argument to control whether details are added. In this case, there's no difference between the getDetails & getSummaries calls (which is a question - WHY there isn't a difference, but not looked at the data returned).
- As such, it's extra code that doesn't help code deduplication. AN alternative is making getClusters0 private but I'd prefer merge the logic and just have one call the other vs. a separate method that's private that both call increasing the call stack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! No objection in implementing the getDetails and getSummaries better :) Just pushed the changes + i verified that the API responses before and after the change are the same.
With the difference that previously getClusterSummaries was returning All the ECS clusters/services for all the accounts (no application filtered). After the change it will return only the application clusters/services.
getSummaries is called via the applicationController -> GET /applications/
getDetails is called via the serverGroup controller -> GET /applications//serverGroups
...ver-ecs/src/main/java/com/netflix/spinnaker/clouddriver/ecs/view/EcsApplicationProvider.java
Outdated
Show resolved
Hide resolved
Few minor things but overall looks good. |
36aa330
to
6d6ec79
Compare
@jasonmcintosh planning to push the Alarm caching/lookup perf improvements as well tomorrow. |
|
||
Collection<EcsMetricAlarm> allMetricAlarms = getAll(accountName, region); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before All the alarms for an ECS account/region where fetched and iterated through to match the service. This is extremely costly.
After the change the ECSCluster is added during the caching cycles to the cache key id for the ECS provider in the alarms. We retrieve the IDs with ECS account/region/EcsClusterName and then try to match the service.
metricAlarms.add(metricAlarm); | ||
continue outLoop; | ||
} | ||
if (metricAlarm.getAlarmActions().stream().anyMatch(action -> action.contains(serviceName)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small refactoring here to make it more readable
@@ -118,7 +118,13 @@ Map<String, Collection<CacheData>> generateFreshData(Set<MetricAlarm> cacheableM | |||
Map<String, Collection<CacheData>> newDataMap = new HashMap<>(); | |||
|
|||
for (MetricAlarm metricAlarm : cacheableMetricAlarm) { | |||
String key = Keys.getAlarmKey(accountName, region, metricAlarm.getAlarmArn()); | |||
String cluster = | |||
metricAlarm.getDimensions().stream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the AWS SDK a cloudwatch alarm for the ECS contains 2 dimensions depending for the type:
- Service alarm contains the dimension ECSCluster and ServiceName
- Autoscaling group alarm of an ECS cluster contains the ECSCluster and the Capacity provider.
This change includes the ECSClusterName in the cached key id to make the search less costly
.setMoniker(moniker); | ||
|
||
EcsServerGroup serverGroup = new EcsServerGroup(); | ||
if (includeDetails) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
includeDetails is false only for the getSummaries. The rest of the logic remains the same
Attempt to address some of the issues described in spinnaker/spinnaker#6084
Improving the response times on:
Endpoints when ECS is enabled and a substantial amount of accounts/services exist in cache.
The perf issue with the Alarms still exists and will be addressed in a future PR
Adding some results from a performance test clouddriver response times: