-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ecs): update applicationcachingagent to store applications/relationships #5377
base: master
Are you sure you want to change the base?
Conversation
services.forEach( | ||
key -> { | ||
Map<String, String> parsedKey = Keys.parse(key); | ||
if (application.getClusterNames().get(parsedKey.get("account")) != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we have a null check on getClusterNames
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about it, but the existing logic doesn't, so didn't lean that way. That being said, I'm not opposed to adding it.
Map<String, Map<String, Collection<String>>> appRelationships = new HashMap<>(); | ||
|
||
for (Service service : services) { | ||
String applicationKey = service.getApplicationName(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how does this with monikers? in that case is the returned getApplicationName()
the actual application (as determined by tags) or the service prefix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic for determining applications is the same as it exists today. The only difference being that we're performing this logic and writing it to the cache rather than performing it at query runtime.
EcsApplication application = new EcsApplication(appName, attributes, clusterNames); | ||
|
||
Set<String> services = getServiceRelationships(cacheData); | ||
log.info("Found {} services for app {}", services.size(), appName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest we make this debug
level instead of info
- it's useful but produces quite a bit of noise in the logs (tested this locally myself)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated.
Set<Application> retrievedApplication = client.getApplications(true); | ||
|
||
// Then | ||
assertTrue( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we're asserting it's the right application, but shouldn't we also assert on the expected service relationship(s) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The application object contains the services as the cluster names, so it's actually comparing the entire application object not just the app name itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh I see we're comparing the HashSets - nvm!
98bca2a
to
6726ee7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved with the caveat that in-progress testing does not reveal any issues. I'll be unavailable for the next week and don't want to hold up merging this in if all is green.
We've run a short test of this patch (along with #5375 ) and unfortunately it doesn't seem to fix the issue. We're still seeing a large number of queries in the form of SELECT `body` AS `body` , ? AS `id` , ? AS `rel_id` , ? AS `rel_type` FROM `cats_v1_alarms` WHERE `ID` IN (...) UNION ALL SELECT ? AS `body` , `id` AS `id` , `rel_id` AS `rel_id` , `rel_type` AS `rel_type` FROM `cats_v1_alarms_rel` WHERE `ID` IN (...) where the and SELECT `body` AS `body` , ? AS `id` , ? AS `rel_id` , ? AS `rel_type` FROM `cats_v1_loadBalancers` WHERE `ID` IN (...) UNION ALL SELECT ? AS `body` , `id` AS `id` , `rel_id` AS `rel_id` , `rel_type` AS `rel_type` FROM `cats_v1_loadBalancers_rel` WHERE ( `ID` IN (...) AND `rel_type` LIKE ? ) where the Seeing plenty of messages like |
@deverton Thanks so much for giving this change a shot and reporting back! Can you elaborate a little bit on what you specifically did to test (e.g., used the general |
The primary way this shows for us is the general search endpoint from the front page of Deck. From a user perspective the search never returns and we see long running queries against the From the Clouddriver side this shows up as multi-hour queries as you can see in this chart from our deployment. We only have five AWS accounts on-boarded at the moment with 1 region each and ECS enabled for all five. What might be making the difference is that those accounts have a large number of alarms and load balancers (not Spinnaker managed) which might be causing the slow down. Looking at the type of resource queried by Clouddriver we see a lot of calls for those types: We did grab a quick flamegraph of one of the Clouddriver pods which I've attached. |
This change updates the application caching agent to store the application name as well as the services associated to that application as relationships. Storing these objects allows the EcsApplicationProvider to be able to query and retrieve all applications and their related services. Improving the search experience and returning the records quicker.
Previously, if users had too many services in their associated AWS accounts, the search would time out, and throw an exception.
Testing:
Testing by using the current logic, to perform multiple application searches (both using the application search and well as the shared search modal), as well as clicking through an application and deploying to ECS. Then deployed these changes and redid the same tests to validate that the previous behaviour continued to work and the search was able to function as expected.
Fixes: spinnaker/spinnaker#6084