Skip to content

Conversation

@ebarskiy
Copy link

COMMAND command is expensive to process (too much string operations)
but result is rarely changes, it makes sense to add caching

Signed-off-by: Evgeny Barskiy <[email protected]>
Signed-off-by: Evgeny Barskiy <[email protected]>
Copy link
Member

@JimB123 JimB123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What evidence is there to suggest that COMMAND

is expensive to process (too much string operations)

How meaningful is the benefit of caching this information? I can't tell if this represents premature optimization (adding complexity without sufficient value).

Signed-off-by: Evgeny Barskiy <[email protected]>
@ebarskiy
Copy link
Author

What evidence is there to suggest that COMMAND

is expensive to process (too much string operations)

How meaningful is the benefit of caching this information? I can't tell if this represents premature optimization (adding complexity without sufficient value).

@PingXie can you share internal latency of COMMAND?

Copy link
Collaborator

@hpatro hpatro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have a similar question in the lines of @JimB123 requested.

Why do we want to optimize this particular command? This is not a heavily used command AFAIK. Apart from the clients invoking it upfront to build certain metadata. Does this lead to tail latency on your end for datapath commands?

Copy link
Member

@roshkhatri roshkhatri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with reviews, overall looks good
Few things:

  1. Can we also add some tests to test invalidation?
  2. Can we also add some benchmark numbers to see the improvements that we are seeing?

Signed-off-by: Evgeny Barskiy <[email protected]>
@ebarskiy
Copy link
Author

Regardless the performance:
Internally, with default valkey it takes 2ms (little to no variance) to regenerate a result.
Add bunch of modules (with extra commands) and it shoots up to 5ms.
For comparison, processing cached result takes 50usec internally.

When its important:

  1. Small cluster (low request distribution)
  2. Lots of clients (to add up, 1000 clients per node makes it 5 sec of wall time)
  3. Clients do a rolling update/deployment (so COMMAND commands are localized in time)

In general, we have identified 3 similar issues responsible for the tail latency:

  1. Topology refresh (also takes ~5ms)
  2. AUTH + IAM token (vendor/module specific)
  3. COMMAND

While COMMAND has the lowest impact out of these three, its easiest to address

Signed-off-by: Evgeny Barskiy <[email protected]>
Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Internally, with default valkey it takes 2ms (little to no variance) to regenerate a result.
Add bunch of modules (with extra commands) and it shoots up to 5ms.
For comparison, processing cached result takes 50usec internally.

These numbers look convincing. Thank you.

It looks solid. I have only a few minor comments.

}

/* Forward declaration */
void addReplyCommandInfo(client *c, struct serverCommand *cmd);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a block of forward declarations near the top of this file where we could move this declaration, under

/*============================ Internal prototypes ========================== */

Comment on lines +5437 to +5442
sds cache = server.command_response_cache[cache_idx];

if (cache == NULL) {
cacheCommandResponse(c->resp);
cache = server.command_response_cache[cache_idx];
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we rely on cacheCommandResponse populating a specific global and then we fetch it afterwards. It looks like an implicit dependency that we can avoid.

Consider returning the response from cachecCommandResponse, like we do in generateClusterSlotResponse, and where the corresponding call looks like this:

    sds cached_reply = server.cached_cluster_slot_info[conn_type];
    if (!cached_reply) {
        cached_reply = generateClusterSlotResponse(c->resp);
        server.cached_cluster_slot_info[conn_type] = cached_reply;
    }

Comment on lines +5274 to +5276
if (cache == NULL) {
cacheCommandInfo(cmd, c->resp);
cache = cmd->info_cache[cache_idx];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar here. We can let the other function just return the response instead of caching it to avoid some dependency on side-effects between these two functions:

Suggested change
if (cache == NULL) {
cacheCommandInfo(cmd, c->resp);
cache = cmd->info_cache[cache_idx];
if (cache == NULL) {
cache = generateCommandInfoResponse(cmd, c->resp);
cmd->info_cache[cache_idx] = cache;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants