Skip to content

🌱 single connection for grpc options. #97

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

morvencao
Copy link
Member

Summary

This PR ensures one single connection for each grpc/mqtt option.

Related issue(s)

Fixes #

@openshift-ci openshift-ci bot requested review from deads2k and qiujian16 March 7, 2025 01:57
@morvencao
Copy link
Member Author

/assign @qiujian16 @skeeey

@@ -43,6 +43,10 @@ type KeepAliveOptions struct {

// Dial connects to the gRPC server and returns a gRPC client connection.
func (d *GRPCDialer) Dial() (*grpc.ClientConn, error) {
// Return the cached connection if it exists.
Copy link
Member

@qiujian16 qiujian16 Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add a lock here? and what happens when it is not in ready state?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grpc.ClientConn is thread safe from the doc: https://github.com/grpc/grpc-go/blob/master/Documentation/concurrency.md#clients

If we return a connection which is not ready, the base client will not get a new connection in reconnect cases, see:
https://github.com/open-cluster-management-io/sdk-go/blob/main/pkg/cloudevents/generic/baseclient.go#L63

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add some comments here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if the two goroutine are dialing at the same time?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I misunderstood, the GRPCDialer needs a lock to protect the conn since multiple callers (cloudevents clients) may access it concurrently. Updated the code and comments.

Signed-off-by: morvencao <[email protected]>
@morvencao morvencao force-pushed the br_grpc_conn branch 3 times, most recently from 367c920 to 0c3e76f Compare March 7, 2025 09:31
// Return the cached connection if it exists and is ready.
// Should not return a nil or unready connection, otherwise the caller (cloudevents client)
// will not use the connection for sending and receiving events in reconnect scenarios.
d.mu.RLock()
Copy link
Member

@skeeey skeeey Mar 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we only lock once at the beginning to avoid building the connection multiple times among the goroutines

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@skeeey
Copy link
Member

skeeey commented Mar 10, 2025

LGTM

@qiujian16
Copy link
Member

could add an integration test that starts multiple thread and dial at the same time?

@morvencao
Copy link
Member Author

morvencao commented Mar 10, 2025

@qiujian16 added an integration test:

  1. two work agents start concurrently with the same gRPC options to ensure both can subscribe to the resource spec.
  2. then, one work agent is closed, causing its underlying gRPC connection to close.
  3. the second work agent can still work, this ensure it automatically reconnects by recreating the gRPC connection.

Signed-off-by: morvencao <[email protected]>
@morvencao morvencao changed the title 🌱 single connection for grpc & mqtt options. 🌱 single connection for grpc options. Mar 10, 2025
@qiujian16
Copy link
Member

/approve
/lgtm

@openshift-ci openshift-ci bot added the lgtm label Mar 10, 2025
Copy link

openshift-ci bot commented Mar 10, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: morvencao, qiujian16

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit b3879d7 into open-cluster-management-io:main Mar 10, 2025
11 checks passed
@morvencao morvencao deleted the br_grpc_conn branch March 10, 2025 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants