Demo zero copy grpc #966

duongkame · 2023-11-14T18:09:46Z

What changes were proposed in this pull request?

Demo zero-copy in GrpcService, including GrpcClientProtocolService and GrpcServerProtocolService (appendEntries).
This PR is for an early review to get suggestions for correctly shaping the code.

Zero-copy is done by a simple trick, any protobuf's ByteString object parsed will refer to the original netty buffers instead of having a separated copy in heap. This helps avoid copying data to heap memory and thus saves the cost of buffer copy and GC (for intermediate heap buffers).
Yet, it comes with a challenge: The application needs to explicitly close the original netty buffers when it knows the original protobuf objects (and the it's descendant) is no longer needed. In Ratis, it means to decide when a LogEntryProto is no longer used.

Today, Ratis caches LogEntryProto in SegmentedRaftLogCache. However, for data-intensive applications like Apache Ozone, the cached log entries get their StateMachine data truncated and Ratis relies on the StateMachine to cache the StateMachine data. This behavior is defined by the config raft.server.log.statemachine.data.caching.enabled.

This demo solves the cleanup problem by having DirectBufferCleaner that keeps track of all opening original buffers (handled by an InputStream interface). The cleaner is invoked when:

SegmentedRaftLogCache evicts LogEntryProto: while this sounds like the point when we're sure Ratis no longer need a particular log, it doesn't realse memory fast enough for Raft group with raft.server.log.statemachine.data.caching.enabled, because the log size with StateMachine data truncated doesn't reflect the right size of the original buffer, and this defer cache eviction. We need another strategy for data-intensive StateMachine.
On leader replica, when the 2 follower has caught up with a particular index, and the log index has been applied to StateMachine, it's safe to discard the original buffers of the log. In the follower replica, after a particular index is applied, it's safe to release buffers. This is done for data-intensive StateMachine.

A quick thought, as data-intensive StateMachine may cache data referring to the original buffers, we may need a new StateMachine API to tell when StateMachine should evict data (up to a particular index).

This demo also has a fix to avoid RaftId like (RaftPeerId, RaftGroupId) from referring to the original data source, because that is not zero-copy friendly. This fix will go as a separate PR.

And the code in this demo is not well-structure. For the easy of making the demo, I put DirectBufferCleaner in ratis-server so that it can be invoked directly from ratis-server code. The component should be in ratis-grpc and get invoked based on subscribing events from RaftServer.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/RATIS-1925
https://issues.apache.org/jira/browse/RATIS-1934

How was this patch tested?

To be tested.

duongkame · 2023-11-14T18:48:24Z

@szetszwo

duongkame added 3 commits November 10, 2023 17:15

POC of ZeroCopy in GrpcService

34852d1

Decouple ratis internal data from the original input data buffer.

9553ee3

More aggressive in cleaning up buffer: right after follower catch up.

9c022d6

duongkame marked this pull request as draft November 14, 2023 18:09

Decouple cache commitinfo from original datasource.

1d96703

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Demo zero copy grpc #966

Demo zero copy grpc #966

duongkame commented Nov 14, 2023 •

edited

Loading

duongkame commented Nov 14, 2023

Demo zero copy grpc #966

Are you sure you want to change the base?

Demo zero copy grpc #966

Conversation

duongkame commented Nov 14, 2023 • edited Loading

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

duongkame commented Nov 14, 2023

duongkame commented Nov 14, 2023 •

edited

Loading