-
Notifications
You must be signed in to change notification settings - Fork 3
Test connection multiplexing across multiple servers and clients #14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thoughts @CMCDragonkai ? |
Yes you need to be able to keep track of clients too. You have to track all clients and servers of a given socket... |
Lifecycle rules too. Can't let a socket be destroyed with clients that exist. |
So far for any tests on the connection level were I need to test a connection to a server I've been putting them in the |
You can separate it with |
I have a good basis now for spawing and running servers, clients and streams concurrently with random delays. * Related #14 [ci skip]
Destroying a connection might be hanging if no streams were created. It might be test specific, I'll need to dig into this. |
Found a bug. In certain scenarios client streams are not fulling cleaning up when they are meant to be ended. This is similar to the previous one I fixed where streams are not listed as readable or writable after they have finished. This means we are not checking them after the last state change. Previously this was fixed by adding streams to a short list to be checked when destroying. The problem here however is that we're not entering the destroying state, we're waiting for the readable side to end during normal operation. So the previous fix to the logic doesn't apply here. I think given this we need to check all active streams indiscriminately. This means an |
Notably, the examples don't maintain state for the streams. but it does for the connections. Every time the socket receives a packet, it handles the packet and processes any events on the streams, readable and writable. After processing all incoming packets, it process all clients for outgoing packets, then checks all clients again if they have closed so it can clean them up. I think that means the only solution here is to check every active stream for a state change after we process a recv for that connection. but we still need to check the readable list for new streams. The available methods here for processing events are fine for the example since it processes reads and write statelessly. if the stream has ended then we're just not processing them as part of the readable or writable list. Ultimately the solution here is to process the readable list so we can create state for new streams. We can then iterate over all the existing streams checking if they're writable and process any state changes. There is an unknown here. If the stream has ended with a shutdown before any data was sent to create remote state. Is that stream considered readable? Would state for it be created? Do we care if it isn't? Anyway, changes to be made include...
|
I'm just confused now. I've made changes so that after every receive and send event we check the state for every existing stream. So we shouldn't be missing any state changes now. I can see one side sending the fin frame in all cases. But some times for no reason I can tell, the receiving side never sees this finish frame? weather this happens at all can be very random. Assuming all packets are being properly sent and processed then I have no idea why this is still happening. |
I'm going to take a step back from this for a little bit and focus on some of the other issues. Some solutions may come while I mull it over. |
This sounds pretty similar to my problem. |
Posted an upstream issue cloudflare/quiche#1507 |
This will allow easy cleanup after tests to ensure the process doesn't hold open. Everything else can be handled via garbage collection. I removed the register client method, It's not needed since we check against the connection map before stopping. * Related #14 [ci skip]
I have a good basis now for spawing and running servers, clients and streams concurrently with random delays. * Related #14 [ci skip]
Things to try:
|
Here's some ideas as to how to debug this. We need to use
We should do this after calling This just confirms to us even if we get an exception, whether the stream is really finished. Next thing is to do a more controlled test. Setup your 2 servers with many clients. Maybe reduce this to lots of clients to 1 server. Create a bunch of streams. Then sequentially start closing the streams. On each step, check if the stream is finished on the other side. Then also consider doing it in batches, like 2 by 2, 3 by 3, 50 by 50... so on. We need to ensure that it's not our code that's causing the problem. |
1 client with a lot of streams to 1 server. Many clients with each 1 stream to the 1 server. Many clients with multiple streams to 1 server. |
Choose only 1 side to close the stream. Just close it on the client side. Reduce the amount of randomness. @tegefaulkes your current theory is that when the client closes the stream, instead of the server receiving a If client side doesn't resolve in this problem, then do it on the server side. Scenarios to prove:
|
Found a new oddity. When the problem happens, we're not receiving ALL of the messages but only when the problem happens. Running tests with low congestion does not have this problem. this is VERY odd. a quiche stream is meant to maintain data integrity. So is the problem happening between the quiche stream and our I'll dig into this further and post my other observations in a moment. |
Make sure you're testing with 64 bytes and 64 bytes. There's no equivalence on the number of stream writes and stream reads. Use buffer comparison instead. |
Notes:
|
The missing bytes are always at the end of the messages sent. I was expecting the message with the fin frame to be empty. but during congestion this is not the case, the final packet can contain part of the message. This is a simple fix, I just wasn't writing the message with the fin frame to the stream. That may be part of the problem but it doesn't explain the streams not cleaning up. I'm still digging. |
Ok, things are fixed up so that I can see all of the data is being sent and receive. In one example with the stream closing problem I can see the following.
So after fixing the missing data, we're back to the original problem. |
If I send some data along with the fin frame, in this case You mentioned cloudflare/quiche@cc98af0 where they patched the quiche code to allow 0 buffer fin frames to be sent. It's possible they missed an edge case where they're not sending or processing a fin frame. I think I can verify this pretty easily if I can modify the If including data within the fin frame does prevent the problem, then that is a stop gap solution where we provide an finish message and remove it on the receiving side. |
I'm reasonably sure that adding data to the finish message is a stop-gap fix for this. That means I can move forward with fixing the tests and resolving this issue. There is still an underlying problem that needs to be investigated. A reproducible example found and an issue posted upstream. For the sake of book keeping I'll complete this issue and create a new issue for tracking and working on the remaining problem. |
Did you post upstream? |
Also use a single null byte if you need to do that. |
Give me a reproducible rust example to post upstream before closing this issue. We can use the temporary fix for now but upstream issue should be posted. |
|
using a stop-gap measure where we add and remove data to the finish frame message. Likely there is a bug in the quiche code where this packet is not being sent due to it being a 0-length message. It was fixed before in their code, but they may have missed an edge case where a large volume of data is being processed. * Related #14 [ci skip]
Almost ready to merge changes. Just need to clean up commit history. After that I'll create a new issue for tracking the fin frame/stream clean up problem and start of the reproducible rust example and the upstream issue. |
* using a stop-gap measure where we add and remove data to the finish frame message. * Likely there is a bug in the `quiche` code where this packet is not being sent due to it being a 0-length message. It was fixed before in their code, but they may have missed an edge case where a large volume of data is being processed. * Fixed race condition with stream pulling data throwing error and cleaning up after stream completes. Somehow despite the code having no awaits, the readable stream was getting pulled and errored out the stream before cleaning up after the finish frame. * Added tests for concurrent servers and clients. * Cleaned up logging for streams. * Related #14 [ci skip]
New issue created at #20 for tracking the remaining problem |
Merged this into master now, resolving. I'll start on the remaining problem #20 now. |
This will allow easy cleanup after tests to ensure the process doesn't hold open. Everything else can be handled via garbage collection. I removed the register client method, It's not needed since we check against the connection map before stopping. * Related #14 [ci skip]
* using a stop-gap measure where we add and remove data to the finish frame message. * Likely there is a bug in the `quiche` code where this packet is not being sent due to it being a 0-length message. It was fixed before in their code, but they may have missed an edge case where a large volume of data is being processed. * Fixed race condition with stream pulling data throwing error and cleaning up after stream completes. Somehow despite the code having no awaits, the readable stream was getting pulled and errored out the stream before cleaning up after the finish frame. * Added tests for concurrent servers and clients. * Cleaned up logging for streams. * Related #14 [ci skip]
Specification
We need to create tests that create an arbitrary number of connections across an arbitrary number of servers and clients.
The ultimate goal is to test that connection multiplexing and that connections function properly when run concurrently.
All the tests should be sending at least 1 stream of random data
Additional context
dgram
module #1Tasks
The text was updated successfully, but these errors were encountered: