feat(ecs): update applicationcachingagent to store applications/relationships #5377

piradeepk · 2021-06-01T23:38:04Z

This change updates the application caching agent to store the application name as well as the services associated to that application as relationships. Storing these objects allows the EcsApplicationProvider to be able to query and retrieve all applications and their related services. Improving the search experience and returning the records quicker.

Previously, if users had too many services in their associated AWS accounts, the search would time out, and throw an exception.

Testing:

IN PROGRESS
Testing by using the current logic, to perform multiple application searches (both using the application search and well as the shared search modal), as well as clicking through an application and deploying to ECS. Then deployed these changes and redid the same tests to validate that the previous behaviour continued to work and the search was able to function as expected.

Fixes: spinnaker/spinnaker#6084

allisaurus · 2021-06-04T16:21:39Z

...src/main/java/com/netflix/spinnaker/clouddriver/ecs/cache/client/ApplicationCacheClient.java

+    services.forEach(
+        key -> {
+          Map<String, String> parsedKey = Keys.parse(key);
+          if (application.getClusterNames().get(parsedKey.get("account")) != null) {


should we have a null check on getClusterNames here?

I thought about it, but the existing logic doesn't, so didn't lean that way. That being said, I'm not opposed to adding it.

allisaurus · 2021-06-04T16:44:08Z

.../main/java/com/netflix/spinnaker/clouddriver/ecs/provider/agent/ApplicationCachingAgent.java

+    Map<String, Map<String, Collection<String>>> appRelationships = new HashMap<>();
+
+    for (Service service : services) {
+      String applicationKey = service.getApplicationName();


how does this with monikers? in that case is the returned getApplicationName() the actual application (as determined by tags) or the service prefix?

The logic for determining applications is the same as it exists today. The only difference being that we're performing this logic and writing it to the cache rather than performing it at query runtime.

allisaurus · 2021-06-04T17:25:38Z

...src/main/java/com/netflix/spinnaker/clouddriver/ecs/cache/client/ApplicationCacheClient.java

+    EcsApplication application = new EcsApplication(appName, attributes, clusterNames);
+
+    Set<String> services = getServiceRelationships(cacheData);
+    log.info("Found {} services for app {}", services.size(), appName);


suggest we make this debug level instead of info - it's useful but produces quite a bit of noise in the logs (tested this locally myself)

allisaurus · 2021-06-04T17:31:34Z

...cs/src/test/java/com/netflix/spinnaker/clouddriver/ecs/cache/ApplicationCacheClientTest.java

+    Set<Application> retrievedApplication = client.getApplications(true);
+
+    // Then
+    assertTrue(


we're asserting it's the right application, but shouldn't we also assert on the expected service relationship(s) ?

The application object contains the services as the cluster names, so it's actually comparing the entire application object not just the app name itself.

oh I see we're comparing the HashSets - nvm!

…ionships

allisaurus

Approved with the caveat that in-progress testing does not reveal any issues. I'll be unavailable for the next week and don't want to hold up merging this in if all is green.

deverton · 2021-06-07T02:41:42Z

We've run a short test of this patch (along with #5375 ) and unfortunately it doesn't seem to fix the issue. We're still seeing a large number of queries in the form of

SELECT `body` AS `body` , ? AS `id` , ? AS `rel_id` , ? AS `rel_type` FROM `cats_v1_alarms` WHERE `ID` IN (...) UNION ALL SELECT ? AS `body` , `id` AS `id` , `rel_id` AS `rel_id` , `rel_type` AS `rel_type` FROM `cats_v1_alarms_rel` WHERE `ID` IN (...)

where the IN values look like 'ecs;alarms;ecs-account-id;us-west-2;arn:aws:cloudwatch:us-west-2:1234567890:alarm:nameofalarm-MemoryAlarmScalingOutPolicy-ID

and

SELECT `body` AS `body` , ? AS `id` , ? AS `rel_id` , ? AS `rel_type` FROM `cats_v1_loadBalancers` WHERE `ID` IN (...) UNION ALL SELECT ? AS `body` , `id` AS `id` , `rel_id` AS `rel_id` , `rel_type` AS `rel_type` FROM `cats_v1_loadBalancers_rel` WHERE ( `ID` IN (...) AND `rel_type` LIKE ? )

where the IN values look like aws:loadBalancers:aws-account-idp:us-west-2:lb-id:vpc-ID:application

Seeing plenty of messages like Cached 115 applications for 974 services and Found 974 ECS services for which to cache applications in the logs from the com.netflix.spinnaker.clouddriver.ecs.provider.agent.ApplicationCachingAgent so I assume it's doing its thing.

allisaurus · 2021-06-14T22:46:26Z

@deverton Thanks so much for giving this change a shot and reporting back! Can you elaborate a little bit on what you specifically did to test (e.g., used the general /search field in deck, hit a specific gate endpoint, etc.) so we can work on repo/validation of further changes?

deverton · 2021-06-14T23:01:30Z

The primary way this shows for us is the general search endpoint from the front page of Deck. From a user perspective the search never returns and we see long running queries against the /search endpoint in Gate and Clouddriver. Search from the Application tab is fine so presumably this is specific to the infrastructure search.

From the Clouddriver side this shows up as multi-hour queries as you can see in this chart from our deployment.

We only have five AWS accounts on-boarded at the moment with 1 region each and ECS enabled for all five. What might be making the difference is that those accounts have a large number of alarms and load balancers (not Spinnaker managed) which might be causing the slow down. Looking at the type of resource queried by Clouddriver we see a lot of calls for those types:

We did grab a quick flamegraph of one of the Clouddriver pods which I've attached.

piradeepk requested a review from allisaurus June 1, 2021 23:38

piradeepk assigned allisaurus Jun 1, 2021

piradeepk added the do not merge label Jun 1, 2021

allisaurus reviewed Jun 4, 2021

View reviewed changes

piradeepk and others added 2 commits June 4, 2021 13:44

feat(ecs): update applicationcachingagent to store applications/relat…

62d309b

…ionships

chore(ecs): change log level to debug

6726ee7

piradeepk force-pushed the search branch 2 times, most recently from 98bca2a to 6726ee7 Compare June 4, 2021 21:20

allisaurus approved these changes Jun 5, 2021

View reviewed changes

deverton mentioned this pull request Jun 16, 2021

clouddriver-ecs: applications search very slow spinnaker/spinnaker#6084

Open

allisaurus removed their assignment Mar 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ecs): update applicationcachingagent to store applications/relationships #5377

feat(ecs): update applicationcachingagent to store applications/relationships #5377

piradeepk commented Jun 1, 2021 •

edited

Loading

allisaurus Jun 4, 2021

piradeepk Jun 4, 2021

allisaurus Jun 4, 2021

piradeepk Jun 4, 2021

allisaurus Jun 4, 2021

piradeepk Jun 4, 2021

allisaurus Jun 4, 2021

piradeepk Jun 4, 2021

allisaurus Jun 4, 2021

allisaurus left a comment

deverton commented Jun 7, 2021

allisaurus commented Jun 14, 2021

deverton commented Jun 14, 2021

feat(ecs): update applicationcachingagent to store applications/relationships #5377

Are you sure you want to change the base?

feat(ecs): update applicationcachingagent to store applications/relationships #5377

Conversation

piradeepk commented Jun 1, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

allisaurus left a comment

Choose a reason for hiding this comment

deverton commented Jun 7, 2021

allisaurus commented Jun 14, 2021

deverton commented Jun 14, 2021

piradeepk commented Jun 1, 2021 •

edited

Loading