Moving queue size and making node flume queue bigger #724

haixuanTao · 2024-11-30T12:33:32Z

Moving queue size to node fixes long latency issues for python node and make it possible to set the right queue_size.

I also changed flume queue length within nodes to makes this possible.

This seems to fix some of the graceful stop issue in Python but further investigation is required.

haixuanTao · 2024-12-01T19:40:02Z

After double checking, it seems that we're back to the problem that if event["value"] within python and all its reference get delete first there is no deadlock.

Meaning that the current DelayedCleanup solution within rust does not work.

I think we should replace our current DelayedCleanup Solution with a python based solution.

I have opened #726 to temporarely ignore this issue.

I have double checked that this actually does not create memory leak as DropToken are well reported back to the origin nodes cleaning up the shared memory.

phil-opp · 2024-12-02T10:54:22Z

apis/rust/node/src/event_stream/mod.rs

+        }
+        let event = loop {
+            if !self.queue.is_empty() {
+                let event = self.queue.remove(0);


This call copies all items in the queue to move them one slot left. It would be better to make self.queue a VecDeque instead of Vec and use the more efficient push_back/pop_front operations.

phil-opp · 2024-12-02T11:07:31Z

apis/rust/node/src/event_stream/mod.rs

+                    break;
+                }
+            } else {
+                match select(Delay::new(Duration::from_nanos(1000)), self.receiver.next()).await {


We could use the flume::Receiver::try_recv function here, then we don't need the timeout dance.

We're using self receiver which is a RecvStream. There is no such function. Replacing it with a receiver would need a significant refactoring.

I can prepare a PR against this PR if you like.

phil-opp · 2024-12-02T11:17:17Z

apis/rust/node/src/event_stream/mod.rs

+                match &event {
+                    EventItem::TimeoutError(_) => break Some(event),
+                    EventItem::FatalError(_) => break Some(event),
+                    EventItem::NodeEvent {
+                        event: node_event,
+                        ack_channel: _,
+                    } => match node_event {
+                        NodeEvent::Stop => break Some(event),
+                        NodeEvent::Reload { operator_id: _ } => break Some(event),
+                        NodeEvent::InputClosed { id: _ } => break Some(event),
+                        NodeEvent::AllInputsClosed => break Some(event),
+                        NodeEvent::Input {
+                            id: current_event_id,
+                            metadata: _,
+                            data: _,
+                        } => {
+                            if self
+                                .queue
+                                .iter()
+                                .filter(|queue_event| {
+                                    if let EventItem::NodeEvent {
+                                        event:
+                                            NodeEvent::Input {
+                                                id,
+                                                data: _,
+                                                metadata: _,
+                                            },
+                                        ack_channel: _,
+                                    } = queue_event
+                                    {
+                                        id == current_event_id
+                                    } else {
+                                        false
+                                    }
+                                })
+                                .count()
+                                > self
+                                    .input_config
+                                    .get(current_event_id)
+                                    .map(|input| input.queue_size.unwrap_or(1))
+                                    .unwrap_or(1)
+                                    - 1
+                            {
+                                continue;
+                            } else {
+                                break Some(event);
+                            }
+                        }
+                    },
+                }


Could we reuse the code from the drop_oldest_inputs function from binaries/daemon/src/node_communication/mod.rs here? This function has the advantage that we do the cleanup for all input IDs in a single traversal of the list. For readability I would prefer to keep the cleanup code in a separate function.

We should also to the cleanup directly after receiving all the events. Then we only need to traverse the queue once per batch of events, instead of once per event.

In addition to reducing the number of queue traversals, this approach also has the advantage that we drop events earlier, thereby freeing memory and allowing the sender to cleanup/reuse the shared memory.

I understand. I have rewriten drop_oldest_inputs with a scheduler in #728 in a way that we go through the queue only once and we create fairness between inputs.

This new scheduler should free memory as soon as possible.

haixuanTao added 2 commits November 30, 2024 13:29

Moving queue size and making node flume queue bigger

00edc88

Fix clippy warning

f56d02c

haixuanTao mentioned this pull request Dec 1, 2024

Reduce event stream timeout #726

Open

haixuanTao requested a review from phil-opp December 1, 2024 20:13

phil-opp reviewed Dec 2, 2024

View reviewed changes

haixuanTao mentioned this pull request Dec 3, 2024

Add a scheduler to the node api to manage fairness among inputs #728

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Moving queue size and making node flume queue bigger #724

Moving queue size and making node flume queue bigger #724

haixuanTao commented Nov 30, 2024

haixuanTao commented Dec 1, 2024

phil-opp Dec 2, 2024

phil-opp Dec 2, 2024

haixuanTao Dec 3, 2024 •

edited

Loading

phil-opp Dec 4, 2024

phil-opp Dec 2, 2024

haixuanTao Dec 3, 2024 •

edited

Loading

Moving queue size and making node flume queue bigger #724

Are you sure you want to change the base?

Moving queue size and making node flume queue bigger #724

Conversation

haixuanTao commented Nov 30, 2024

haixuanTao commented Dec 1, 2024

phil-opp Dec 2, 2024

Choose a reason for hiding this comment

phil-opp Dec 2, 2024

Choose a reason for hiding this comment

haixuanTao Dec 3, 2024 • edited Loading

Choose a reason for hiding this comment

phil-opp Dec 4, 2024

Choose a reason for hiding this comment

phil-opp Dec 2, 2024

Choose a reason for hiding this comment

haixuanTao Dec 3, 2024 • edited Loading

Choose a reason for hiding this comment

haixuanTao Dec 3, 2024 •

edited

Loading

haixuanTao Dec 3, 2024 •

edited

Loading