Skip to content

Conversation

@kingluo
Copy link
Contributor

@kingluo kingluo commented Jun 26, 2024

close #2117

The leak only occurs when si_wq is full and continues to process the current skb (which may contain the remaining SSL record), sk->sk_receive_queue, and possibly skbs that come in later. The leak is never triggered when we reset the connection and stop processing data immediately. Such a fix would be reasonable even without the leak since it is unlikely that si_wq will become full without flooding.

@kingluo kingluo requested review from const-t and krizhanovsky June 26, 2024 10:46
@const-t
Copy link
Contributor

const-t commented Jul 5, 2024

In general I don't have any corrections, however I have few suggestions:

  1. I would like to suggest to do connection reset on failed tfw_connection_send() for each protocol, not only for http2.
  2. Let's replace DBG message with warning in case of failed pushing to si_wq. It's pretty important event, where we disconnect the client and it's would be great to know about that. Maybe it worth to add statistics counter for this event.
  3. Maybe we should find exact place where we leaks on si_wq overflow, just to know, maybe it can be reproduced in another way that not known at this moment.

@krizhanovsky Please see this comment, we need to know your opinion.

@krizhanovsky krizhanovsky modified the milestone: 0.9 - LA Jul 22, 2024
@kingluo
Copy link
Contributor Author

kingluo commented Jul 24, 2024

  1. Maybe we should find exact place where we leaks on si_wq overflow, just to know, maybe it can be reproduced in another way that not known at this moment.

It's not so easy to do this. Maybe do not close that issue for later investigation.

Copy link
Contributor

@const-t const-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Only one question, maybe we should do the same thing for websockets as well?

@kingluo
Copy link
Contributor Author

kingluo commented Aug 5, 2024

LGTM. Only one question, maybe we should do the same thing for websockets as well?

Yes, other places should be fixed, too. I'll try to cover them later.

Copy link
Contributor

@krizhanovsky krizhanovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a lot of questions about this PR. Also from #2117 :

The root cause of the tls error is that si_wq (which has a default budget of 10, but even 1,000,000 is not enough to flood a single connection) cannot tolerate the high rate of ping acks being sent and returns -EBUSY

Why? si_wq is supposed to be a very fast lock-free RB and we wake up a target processor on insertions. What's the reason for the slowness on the read (processing) side? There is something fundamentally broken if a Python script can flood the lock-free in-kernel network processing.

Probably it's OK to reset TCP connections, which we can't handle, but we should not involve security events handling for this.

Having #1940 (comment) in mind, I'd propose to postpone the fix until #1940

tls_state_to_str(tls->state), r,
r == -EBADMSG ? "(bad ciphertext)" : "");
return r;
return T_BLOCK_WITH_RST;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we're going to block the client, not just reset it's connection. As noted in lib/log.h the return code for a security event, not for OOM, which might have different reasons.

Copy link
Contributor Author

@kingluo kingluo Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's pure misleading in naming, all I want is sending RST, but we have only T_BLOCK_WITH_RST constant. Maybe we should bring in a dedicated constant for normal RST.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, if we need to reset connections, then we do need designated RST constant and appropriate workflow handling the return codes.

}

if (unlikely(SS_CONN_TYPE(sk) & Conn_Reset))
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introducing Conn_Reset state, used only in one function seems like a workaround. I'd prefer a more clear solution. Seems this state is only needed to not to execute the skb processing while the socket is in the queue for closing, so don't we already have enough information about the socket state at the moment?

Copy link
Contributor Author

@kingluo kingluo Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not a workaround, but a bugfix: In case of errors, we should exit the call chain and stop handling the involved TLS records, unrolled skb, sk->sk_receive_queue, and future sk_data_ready callbacks from the kernel (that's why we need to reset but not close the socket). Otherwise, it's an undefined behavior. And yes, we didn't cover the RST case (not close) in that function, that's exactly why I made changes here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new socket state used only in one function looks awkward.

Can we handle this with a real socket state? Maybe we can even avoid calling sk_data_ready by setting some socket state and/or doing partial close/reset?

Co-authored-by: Alexander Krizhanovsky <[email protected]>
@kingluo
Copy link
Contributor Author

kingluo commented Aug 15, 2024

Why? si_wq is supposed to be a very fast lock-free RB and we wake up a target processor on insertions. What's the reason for the slowness on the read (processing) side? There is something fundamentally broken if a Python script can flood the lock-free in-kernel network processing.

The bottleneck is not the locking, but the efficiency of work processing in the NET_TX softirq. Obviously, sending is much slower than enqueuing, maybe due to the TLS encryption. Another suspicious point is, that when we handle the sending of the same socket in another CPU, it's most likely we will be blocked at the locking of the sk, because the receiving softirq is busy handling a lot of skb in that sk and producing a lot of ping ack, so finally overflows the queue.

@krizhanovsky
Copy link
Contributor

Closed in favor of #2257

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Tls errors under ping flood

4 participants