Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[orchagent] implement ring buffer feature with a flag #3242

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

a114j0y
Copy link

@a114j0y a114j0y commented Jul 22, 2024

What I did

  • fix the covariant return type issue of swss::TableBase* Consumer::getConsumerTable() const override
    • it should return swss::ConsumerTableBase *
  • add a ring thread for orchdaemon
  • support ring buffer feature

Why I did it

  • increase the speed for APP_ROUTE_TABLE consumers doing tasks

How I verified it
measure the performance with PerformanceTimer GitHub issue/pull request detail

@siqbal1986
Copy link
Contributor

can you please add sone swss tests for this functionality. There are no tests mentioned in the PR. if you have created a separate PR , Please link it here.

@siqbal1986 siqbal1986 self-requested a review August 23, 2024 18:23
Copy link
Contributor

@siqbal1986 siqbal1986 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is hte need of rign buffer. Can you please add some detail regarding its need.

@a114j0y a114j0y force-pushed the master branch 2 times, most recently from 80e7a90 to a6b3b0c Compare October 2, 2024 01:17
@a114j0y a114j0y force-pushed the master branch 6 times, most recently from 2ccb3dc to 12b402a Compare October 23, 2024 20:56
@a114j0y
Copy link
Author

a114j0y commented Oct 30, 2024

rebased and conflict resolved @siqbal1986 could you help re-approve it? : )

@a114j0y a114j0y force-pushed the master branch 22 times, most recently from 553594b to 370fc1c Compare November 8, 2024 01:25

while (!ring_thread_exited)
{
gRingBuffer->pause_thread();
Copy link
Contributor

@liuh-80 liuh-80 Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RingBufffer is a singleton share by 2 threads, however there is no multithread protect.
So when RingBufffer instance delete by main thread, the gRingBuffer here will become a dangling pointer.
Which will cause segment fault, the test case failed with code 139 is because segment fault issue.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but the testcase is single-threaded

and in the orchdaemon destructor, we set ring_thread_exited as true to avoid delete gRingBuffer before detach the ring thread

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test case may be failed by other reason.

The variable name ring_thread_exited suggests that the ring thread has exited, but this is not actually the case. When ring_thread_exited is true, the while loop might still be executing, and you cannot predict the ring thread or the main thread exist first.

These two threads are scheduled by the system. The system might suspend the ring thread for a few milliseconds at any point in the while loop. During that time, the main thread might exit and destruct the ring buffer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is another possible reason for the UT issue:

The dtor of orchdaemon will access gRingBuffer, also dtor of RingBuffer class will delete the instance. which dtor will invoke first? if RingBuffer class destroy first, then gRingbuffer also will become a dangling pointer.

My suggestion is RingBuffer should not be a singleton and release in the dtor of RingBuffer class.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure,

  1. a try catch block is added for the while loop in ring thread, to cover the case ring buffer is released before ring thread is detached.
  2. ring buffer is released by its own dtor
    in orchdaemon dtor , the local variable gRingBuffer is only used in this line to indicate whether we has enabled gRingMode
if (gRingBuffer) {
        ring_thread_exited = true;
        ring_thread.detach();}

if it's not nullptr, it indicates we enabled gRingMode, so need to detach the thread and set ring_thread_exited as true, nothing else, so should be good here

Copy link
Contributor

@liuh-80 liuh-80 Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When access a dangling pointer on linux system, it will trigger a SIGSEGV signal.
And as I know a signal is not a C++ exception. so, try-catch will not work. unless there are some signal handler convert the signal to c++ exception.

Also handle a c++ exception in a thread may have resource release issue.

My suggestion is:

  1. continue use singleton, but in while loop, always use get() method to access singleton. and get() method need improve to handle multithread, also while loop need handle get() return null pointer case (when singleton destructued). which is really difficult and low performance.

2.Don't use singleton, create a global instance and manage by OrchDaemon.

Copy link
Author

@a114j0y a114j0y Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

singleton has API Get() to access the global and unique ring buffer

gRingBuffer is just a variable that stores what Get() returns

we use it to compare with nullptr to check whether ring is enabled, we can always use OrchRing::Get() to access the unique ring buffer

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happen when dtor of ring buffer called before popRingBuffer() thread exit? popRingBuffer() thread will cause SIGSEGV and make orchagent crash.

This kind crash is very difficult to find root cause.

@a114j0y a114j0y force-pushed the master branch 3 times, most recently from 6ee296e to 3776ca8 Compare November 8, 2024 04:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants