-
Notifications
You must be signed in to change notification settings - Fork 529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[orchagent] implement ring buffer feature with a flag #3242
base: master
Are you sure you want to change the base?
Conversation
7f23081
to
8062ead
Compare
3d3cde6
to
ba7f1a0
Compare
can you please add sone swss tests for this functionality. There are no tests mentioned in the PR. if you have created a separate PR , Please link it here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is hte need of rign buffer. Can you please add some detail regarding its need.
80e7a90
to
a6b3b0c
Compare
2ccb3dc
to
12b402a
Compare
rebased and conflict resolved @siqbal1986 could you help re-approve it? : ) |
553594b
to
370fc1c
Compare
orchagent/orchdaemon.cpp
Outdated
|
||
while (!ring_thread_exited) | ||
{ | ||
gRingBuffer->pause_thread(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RingBufffer is a singleton share by 2 threads, however there is no multithread protect.
So when RingBufffer instance delete by main thread, the gRingBuffer here will become a dangling pointer.
Which will cause segment fault, the test case failed with code 139 is because segment fault issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but the testcase is single-threaded
and in the orchdaemon destructor, we set ring_thread_exited as true to avoid delete gRingBuffer before detach the ring thread
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test case may be failed by other reason.
The variable name ring_thread_exited suggests that the ring thread has exited, but this is not actually the case. When ring_thread_exited is true, the while loop might still be executing, and you cannot predict the ring thread or the main thread exist first.
These two threads are scheduled by the system. The system might suspend the ring thread for a few milliseconds at any point in the while loop. During that time, the main thread might exit and destruct the ring buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is another possible reason for the UT issue:
The dtor of orchdaemon will access gRingBuffer, also dtor of RingBuffer class will delete the instance. which dtor will invoke first? if RingBuffer class destroy first, then gRingbuffer also will become a dangling pointer.
My suggestion is RingBuffer should not be a singleton and release in the dtor of RingBuffer class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure,
- a try catch block is added for the while loop in ring thread, to cover the case ring buffer is released before ring thread is detached.
- ring buffer is released by its own dtor
in orchdaemon dtor , the local variablegRingBuffer
is only used in this line to indicate whether we has enabled gRingMode
if (gRingBuffer) {
ring_thread_exited = true;
ring_thread.detach();}
if it's not nullptr, it indicates we enabled gRingMode, so need to detach the thread and set ring_thread_exited as true, nothing else, so should be good here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When access a dangling pointer on linux system, it will trigger a SIGSEGV signal.
And as I know a signal is not a C++ exception. so, try-catch will not work. unless there are some signal handler convert the signal to c++ exception.
Also handle a c++ exception in a thread may have resource release issue.
My suggestion is:
- continue use singleton, but in while loop, always use get() method to access singleton. and get() method need improve to handle multithread, also while loop need handle get() return null pointer case (when singleton destructued). which is really difficult and low performance.
2.Don't use singleton, create a global instance and manage by OrchDaemon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
singleton has API Get() to access the global and unique ring buffer
gRingBuffer is just a variable that stores what Get() returns
we use it to compare with nullptr to check whether ring is enabled, we can always use OrchRing::Get() to access the unique ring buffer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will happen when dtor of ring buffer called before popRingBuffer() thread exit? popRingBuffer() thread will cause SIGSEGV and make orchagent crash.
This kind crash is very difficult to find root cause.
6ee296e
to
3776ca8
Compare
What I did
swss::TableBase* Consumer::getConsumerTable() const override
swss::ConsumerTableBase *
Why I did it
APP_ROUTE_TABLE
consumers doing tasksHow I verified it
measure the performance with PerformanceTimer