-
Notifications
You must be signed in to change notification settings - Fork 292
Documentation: Nakadi Platform Management #1064
Comments
Hi @wsalembi, thank you for your interest in Nakadi! There is no open source documents on how to operate Nakadi or forums to discuss it, but the team is happy to collaborate to clarify any questions you have and maybe open source (partially) what we have on the topic. Feel free to ask in this issue or file another one. I will cover your point in this issue.
|
Thanks for your post here, it helped me better understand it. Since I know Zalando is running quite a few services in k8s (besides the vintage DC and maybe still some STUPS services around?) I wonder if you have experience running Nakadi (maybe together with Kafka?) in k8s. Or together with managed Kafka such as Amazon MSK. Besides that, are there others running Nakadi you know of outside of Zalando? I am happy to read names or numbers, but a simple "yes/no" would help as well. |
@lenalebt unfortunately, the answer is no to your questions for now |
Too bad! But thanks for the update :). |
I am also interested in this subject. @lenalebt We are trying now to run Nakadin in ESK using MSK(and the provided zookeeper) |
@mateimicu that's very interesting, how is it working for you? We'd love to know more about your efforts, and any issues you may be running into! |
@lmontrieux it works, we can use it. The problem we had is ensuring backups for it. Keeping all Nakadi data stores in sync(MSK, zookeeper, DB). We are now trying to understand how we could what should we backup and how to do it. We want to use it for missing critical system, we can't afford to lose messages. Also, we need to be able to do disaster recovery and migrations (maybe even keep a hot-replica in another AWS region). |
We keep zookeeper and DB backups separately - the two are not synchronized, but that should not be an issue. In the extremely unlikely event that we would lose both the entire zookeeper ensemble and the database (and all its replicas) at the same time, we can recover from the backups and start again from there. Regarding Kafka, we have replication on 3 brokers, each in a different availability zone, and ack=all to make sure that Nakadi does not confirm publishing until the data is on all brokers. The Kafka brokers use EBS volumes, so the risks of data loss are very, very small. But we also have an application that consumes all data from all Nakadi event types and persists it to S3 for our data lake, so we have another copy in case everything goes south. |
re: keeping a hot-replica in another AWS region: we don't do it, but I guess you could use Kafka MM to replicate to another cluster, or even an application that reads from Nakadi and writes to another Nakadi. I see potential issues around offsets and timelines, though. |
Thanks for sharing @lmontrieux If the DB and ZK backups are not synchronised you will probably need to do some 'magic' after you restore from the backups to deal with the data which is stored in both, e.g. subscriptions and event types? Is it possible to create a subscription which will get all the events event as new event types get registered? Or you update that 'sink' subscription with every new event type explicitly? |
Yes, you'd probably have to do some cleanup, as there could be subscriptions in the DB that aren't in Zookeeper, or the other way around. But since subscription or event type creation are relatively rare events, we take it as an acceptable risk that, in the case of a complete meltdown, we'll have to clear up a few subscriptions or event types. It isn't possible to create a subscription which gets all events from all event types, for 2 reasons:
To archive everything, we wrote a (not yet open source unfortunately) application that periodically lists all the event types, and then creates subscriptions to read events from them. |
Hi @lmontrieux, I see a line mentioning SLO monitoring support in the documentation, but I can't find more information, is there another place I should look at? |
Hi @mateimicu For SLO monitoring, we use Nakadi itself. Basically, Nakadi produces its own access log to an event type, |
I think we need to come up with a proper admin documentation - that would be the right place to answer these questions, and also highlight configuration options |
@lmontrieux souds like a good idea. Another question: who creates the
|
I think i figured this out, just starting a new process worked :). |
:) |
If you do not mind I ask here instead of starting a new thread: We have two timelines for event type X: What should happen if we create a new subscription for X with read_from=begin? Should it start streaming from e1 or eN+1? It looks like it starts from eN+1 in my case, which is not what I expected. |
It should start at the oldest available offset of the oldest available timeline, if you read from begin. |
Thanks. In may case the retention time for the event type was set to -1 and because of that the old topic was cleaned up almost immediately because of:
Is this a bug or a feature? =) |
Looks like a bug. Could you please open an issue (and maybe a PR if you feel like it) ? |
I'm looking for any documentation that provides more information or recommendations on managing the Nakadi platform in a production environment.
Is there any community forum to discuss the Nakadi platform?
The text was updated successfully, but these errors were encountered: