-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preserve routed messages if closed actor is supposed to restart #132
Comments
Hi, thanks for the library, I tried several other actor systems (ractor and coerce), and found this one to be the best suited for my purpose. I'm building a system with this library, and found that I need this feature for at-least-once delivery semantics, can you share some insights on this? (eg. what it is, what are the design considerations, how will it be implemented, will the mailbox survive operating system process restarts, things like that). Curiously enough I used to work with Erlang and don't remember that I needed such feature. |
Have nice documents would obviously be better, but generally a one sentence summary is enough if the function and data type names describes what they do or encode. We already have the example directory to get started, and already have a good idea on how the system work. Documentation is needed more where the function does something unexpected. I picked the 0.2 version from the start, in a functional language, the code tends to be reliable even for alpha versions in my experience. About the problem you described and the proposed solution, If I remember correctly, In erlang My problem is two folds, the first one is to provide a at least once delivery semantics to the consumers A: A polls messages from a data source (in my case, redis stream), and distribute it to the consumers B, the distribution is done using cast since the processing can take a long time (seconds) and I do not want to prevent A from distributing messages to other consumers, however this introduces a problem - B might fail (B also send the message to other actors C and C can also fail, transiently or permanently ), when that fails, I want to B to receive that message again, and if the previous error is a transient error, then hopefully the second time it would succeed. And if failed too much time, maybe there is an error counter associated with the message and after a few retries, the message is dropped (by A). The second problem is to ack the message to A somehow, since redis does not delete the message from memory upon message delivery, so A has to delete it somehow, this requires some coordination between the consumers of the message (A, B, C). I'm thinking starting a timer in A, and redeliver the message if a ack is not received in time, and drop the message if errors too much (so there is a counter associated with the message). The only PIA is I have to coordinate the ack in my code (maybe the library can send back an ack to A automatically upon the handler function returns Another approach is to implement some kind of persistence (Coerce has this - persistent mailbox), and restart the whole supervision tree if there is an error (maybe not this much brutal, but something like this) I imagine the two problems would be somehow common to a system using asynchronous message passing, but maybe too much to implement it in a library, and best left it to the application layer. Include it as a design pattern in the book would be well-received by the community I suppose. Anyhow, the reason I opened this issue is just looking for inspirations, so feel free to ignore it :) I sincerely wish this project would succeed and have some serious adoptions. I miss the Erlang approach of doing things but the lack of static typing is a pain point and the ecosystem is not that good. PS. your approach on defining the route in a topology.rs is really interesting, the libraries I see all use the send-to-an-address approach I think, it's very much like defining a static routing table, it makes me wonder do you have background in networking. It also solves the problem of "have no way of knowing the path the message took to get here, or no idea where it is originated. (when the application get complicated)". |
@qwfy Thanks for your thoughts! You mentioned common and essential issues in message-passing systems. On the rights of "just brainstorming" (so feel free to ignore it), I would like to share my opinion and the principles I use to design such systems, how elfo helps now, and what should be added to improve the situation further. Firstly, I'm convinced that reliable systems shouldn't use regular messaging at all, except in some rare cases. Instead, any communication should use either request-response or subscriptions. RequestsEverything is evident with the requests. Most popular cases are fetching historical data or asking the data owner to change it. Elfo provides first-class support for requests and guarantees that either the request will be successfully delivered and handled or an error will be returned. Of course, the responder actor can freeze (e.g., an infinite loop), but it's another topic. Note that if a failed request is repeated, it gives at-least-once semantic because failure can happen after responding, but before getting a response (e.g. the network connection fails before delivering a response). However, elfo doesn't support "non-waiting" requests directly, so the actor should call SubscriptionsA subscription is a long-term virtual connection between actors. Both peers should know when this connection is broken so they can recover (e.g., renew a subscription). Initially, we implemented subscriptions by making an opening request (e.g., Now, we have a dedicated (internal) library called "elfo-subs", which implements such virtual connections by using never-ending requests (so, the response token is held until an actor dies or explicitly unsubscribed) and provides API to make some sort of However, never-ending requests feel like a hack, so we're going to introduce linking mechanics #113 (see a sketch there; that's not the final API but very close to it). Note: "link" differs from the similar term in erlang because links must be attached to some requests (to perform routing and to avoid some races). This is actually the most wanted feature for elfo v0.2. Ideally, elfo should provide not only links to implement subscriptions but also provide them as a first-class feature. However, it's unclear now which API should be provided, so I will show (as examples and descriptions in the actoromicon) subscriptions as a pattern based on links. And later (v0.3), think about providing more high-level mechanics. Types of subscriptionsI like to separate two types of subscriptions: log subs and snapshot subs. Snapshot subs Here, the subscriber is only interested in the newest state. Such subscriptions are implemented by sending a snapshot in the first response and then sending incremental updates: #[message(ret = BalancesReport)]
struct SubscribeToBalances;
#[message]
struct BalancesReport {
entities: Vec<Balance>,
is_snapshot: bool,
} This is the simplest type of subscription in terms of reliability. If the producer detects an error, he drops subscriptions (because a subscriber will re-request again later). If the subscriber detects an error, he drops his state and tries to resubscribe (obviously with some backoff). Log subs Here the subscriber wants to observe all events, so the recovery process should retransmit events on failures. Usually, this pattern requires adding some #[message(ret = TradesReport)] // consumer -> producer
struct SubscribeToTrades {
sequence_no: SequenceNo,
}
#[message] // producer -> consumer
struct TradesReport {
entities: Vec<Balance>,
sequence_no: SequenceNo,
}
#[message] // consumer -> producer
struct TradesAck(SequenceNo); We have both cases: when we want to persistent events (where the destination is the database) on disks and when we want to store them in memory until some threshold is reached, and we'll decide this is an emergency and have another way to repair subscribers. For the persistent case, we have the "keepers" and "collectors" actor groups where keepers are located on essential nodes and implement persistence logic and such a subscription with collectors located on different nodes and inserting events into the database. RoutingThe topology's main point is route requests (one-shot requests and subscription requests). We don't know the exact address (it can be some local actor, remote, or both of them in some cases). So, we perform inter-group routing. Because it can often occur in the case of regular requests (resubscriptions are rarer), the routing should be fast, so it has first-class support for stateless routers in elfo. So, the routing allows the implementation of loosely coupling groups. Another solution is, obviously, actor locators, but I think that it's a less flexible approach and more error-prone. However, this comment is already too long to explain my points. Anyway, loose coupling is essential to recover after failures by locating (or starting) a new producer. The current implementation is fine for local nodes, but if groups are located on different nodes, it's not because, in this case, providing "NodeNo" is required (or |
No description provided.
The text was updated successfully, but these errors were encountered: