Replies: 2 comments 1 reply
-
@mmaslankaprv and @rystsov please take a look |
Beta Was this translation helpful? Give feedback.
-
Without having thought about this deeply, it seems like option 3 is the most attractive, with option 1 potentially being the quickest to implement. Could you add a pro/con list above for a solution 4 where we roll the controller segment more often? For example, we could have a policy that if the controller log has received an update like a change to users or a change to partition assignments or topic creation that we'd roll the controller segment within 10 minutes or something? @Lazin I'm wondering if this general problem encountered with the controller log may apply to normal partitions for some workloads. For example, a system with say 100 partitions each receiving around 1KB/s would have upwards of 100 GB not backed up for a long time until they all reached 1 GB and rolled. How do we handle this case today? Would option 3 or 4 handle this? |
Beta Was this translation helpful? Give feedback.
-
We have the following problem. To be able to run disaster recovery on a cluster level we need information stored inside the controller log. But the problem is that the controller log grows slowly (in normal circumstances). The archival storage subsystem only upload “sealed” segments that won’t receive any new updates. For the controller log we don’t want to wait until the last segment will be sealed. We want to upload recent data as frequently as rationally possible.
This means that controller log upload has to be a special case (general archival mechanism is disabled for it right now so it’s not even uploaded).
S3 doesn’t have an “append” operation. We can only upload and re-upload the whole objects. This means that a naive implementation will have to re-upload the last segment many times. The segment will grow to a large size eventually. This will lead to a lot of used traffic and resources being taken from other subsystems in redpanda.
Possible solution 1:
This is the most simple solution to this problem. We can generate manifest files with the information that we need to recover the cluster and upload these manifest files instead of the controller log.
Pros:
Cons:
Possible solution 2:
Use multipart upload for controller log. We can create parts and upload them as the new data is added to the controller log. When the log is sealed we can complete multipart upload. During the recovery phase we need to find all unfinished parts and complete multipart upload.
Pros:
Cons:
Possible solution 3:
Upload controller log in chunks without doing the multipart upload. So basically, this means that we’re reimplementing the multipart upload feature on the redpanda side. When all segment data is uploaded we will have to merge the chunks into one object using the multipart api (won’t work in GCS) or just reupload full segment and delete the chunks (will work on GCS).
Pros:
Cons:
This is how it will look like in practice. Right now we upload segments using the names like this
<prefix>/<ns>/<topic>/<partition-id>_<revision>/<base-offset>_<term>_v1.log
For controller log, instead of the object name with this structure we will have the following:
<prefix>/<ns>/<topic>/<partition-id>_<revision>/<base-offset>_<term>_v1.log/<start-offset>_<end-offset>.part
the prefix for all parts of the same segment and for the segment itself will be the same so it will be possible to quickly locate all related components.
Possible solution 4
We can roll segments of the controller log more often.
Pors:
Cons:
Beta Was this translation helpful? Give feedback.
All reactions