Running update concurrently #39

fcollonval · 2022-11-10T16:57:42Z

The publication of update is done task at a time:

ypy-websocket/ypy_websocket/websocket_server.py

Lines 113 to 123 in 357d2a6

    
           for client in [c for c in room.clients if c != websocket]: 
        
               # clients may have disconnected but not yet removed from the room 
        
               # ignore them and continue forwarding to other clients 
        
               try: 
        
                   await client.send(message) 
        
               except Exception: 
        
                   pass 
        
           # update our internal state 
        
           update = await process_message(message, room.ydoc, websocket) 
        
           if room.ystore and update: 
        
               await room.ystore.write(update)

We should do those update concurrently in a asyncio.gather to get them as quickly as possible and not be stuck by a slow client.

Moreover we should update with higher priority the document because it is the reference to be uploaded by any new clients before receiving deltas.

I think that part of the code is partly responsible for a data lost case seen when 20 people were collaborating simultaneously. What happen is some clients (cannot know if all) were still receiving the deltas. But the file on disk (that is a regular dump of the in-memory ydoc) stops updated. And if a new client connects, the document at the latest dump version was the one pushed.

My best guess (unfortunately we did not see any error) is that one client was blocking and so the code after the client loop was never executed.

cc: @davidbrochart @hbcarlos

The text was updated successfully, but these errors were encountered:

davidbrochart · 2022-11-10T20:26:35Z

These are very good points @fcollonval, thanks for pointing that out.
I'll open a PR for that. It also goes in the same direction as #38.
cc @ellisonbg

SylvainCorlay · 2022-11-11T17:44:29Z

I would go even further than what @fcollonval is describing. I don't think that we even need to gather at all between updates to the peers and saving to the backend. We could do:

async for message in websocket:
    await self._process_queue.put(message)
    await self._broadcast_queue.put(message)

and then have two completely independent asyncio tasks,

One processing _process_queue.

message = await self._process_queue.get()
await process_message(message, room.ydoc, websocket)

One forwarding updates to the peers.

In doing so, we will still prevent one of the two tasks from stopping if the other one is (momentarily) choking on updates and having a longer queue.

Another thing we could experiment with is to synchronously use put_nowait instead of put. If queue.Full is raised for one of the two Queues, this means that it is not beeing emptied quickly enough and the server is choking on Y updates. This may be a good way to detect a performance issue in either of those tasks.

@afshin

ellisonbg · 2022-11-11T19:11:06Z

Great discussion! I like the idea of @SylvainCorlay of having two async queues for processing and broadcasting the messages. Another concern that I have is the presence of blocking calls in the server or server extensions. Those would quickly slow everything down. We should probably spend some time profiling the server to understand where blocking calls are happening.

davidbrochart · 2022-11-12T10:15:41Z

Another concern that I have is the presence of blocking calls in the server or server extensions.

It could be interesting to compare with Jupyverse. In particular, the file ID service will be fully async, while jupyter-server-fileid is not async.

davidbrochart · 2022-11-12T10:24:53Z

Yet another alternative to @SylvainCorlay's proposal is to just create background tasks for processing messages, to update our internal state and to broadcast to clients. They would be like "fire and forget" tasks.
Indeed, the idea behind putting the updates in a queue before processing them seems to be that they have to be processed in order, but one of the characteristics of CRDTs is that they can be processed out of order.

davidbrochart · 2022-11-14T10:28:25Z

I made these changes part of #38.

davidbrochart · 2022-11-15T10:49:44Z

In @SylvainCorlay's proposal, a task consumes the _broadcast_queue and forwards the updates to the peers, I suppose sequentially? If so, then we have the same potential issue of one slow client slowing down the whole broadcast. To solve that, we would need to forward the updates to the peers in individual tasks, which is what I did in #38.
Let me know if I'm missing something.

SylvainCorlay · 2022-11-15T13:24:27Z

Would Websocket.send block the processing of the broadcast queue for a slow client?

davidbrochart · 2022-11-15T13:37:21Z

ypy-websocket is agnostic to the WebSocket implementation.
In jupyverse we use websockets, where send is async, which suggests that it's not that fast, maybe checking for connection and eventually timing out. So I don't think queuing calls to send is a good idea.
In jupyter-server we use Tornado's WebSocket implementation, where write_message is not async. I don't know if they use threads under the hood.
In both implementation, I don't think it's safe to assume that sending data on a WebSocket is fast.

SylvainCorlay · 2022-11-16T15:58:52Z

In jupyverse we use websockets, where send is async, which suggests that it's not that fast, maybe checking for connection and eventually timing out.

OK, I did not know that send could potentially be slow. Thanks for the explanation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running update concurrently #39

Running update concurrently #39

fcollonval commented Nov 10, 2022 •

edited

Loading

davidbrochart commented Nov 10, 2022

SylvainCorlay commented Nov 11, 2022 •

edited by davidbrochart

Loading

ellisonbg commented Nov 11, 2022

davidbrochart commented Nov 12, 2022

davidbrochart commented Nov 12, 2022

davidbrochart commented Nov 14, 2022

davidbrochart commented Nov 15, 2022

SylvainCorlay commented Nov 15, 2022

davidbrochart commented Nov 15, 2022 •

edited

Loading

SylvainCorlay commented Nov 16, 2022

Running update concurrently #39

Running update concurrently #39

Comments

fcollonval commented Nov 10, 2022 • edited Loading

davidbrochart commented Nov 10, 2022

SylvainCorlay commented Nov 11, 2022 • edited by davidbrochart Loading

ellisonbg commented Nov 11, 2022

davidbrochart commented Nov 12, 2022

davidbrochart commented Nov 12, 2022

davidbrochart commented Nov 14, 2022

davidbrochart commented Nov 15, 2022

SylvainCorlay commented Nov 15, 2022

davidbrochart commented Nov 15, 2022 • edited Loading

SylvainCorlay commented Nov 16, 2022

fcollonval commented Nov 10, 2022 •

edited

Loading

SylvainCorlay commented Nov 11, 2022 •

edited by davidbrochart

Loading

davidbrochart commented Nov 15, 2022 •

edited

Loading