-
Notifications
You must be signed in to change notification settings - Fork 2k
Description
Bug Description
When changing the plugin in the example agent multi-user-transcriber.py from deepgram to aws, the transcription starts getting very slow with 3 or more participants connected to the same room. When transcribing different participants from different Rooms, transcriptions behave well and arrive within a reasonable time.
This does not happen for any of the other 17 STT plugins. Only the Amazon Transcribe plugin behaves this way in this sample agent.
The agent log shows this warning message again and again:
{"message": "inference is slower than realtime", "level": "WARNING", "name": "livekit.plugins.silero", "delay": 13.077118384041238, "pid": 52, "job_id": "AJ_zzYdH8TPndaD", "timestamp": "2025-10-28T19:12:35.291028+00:00"}
{"message": "inference is slower than realtime", "level": "WARNING", "name": "livekit.plugins.silero", "delay": 12.733135948984998, "pid": 52, "job_id": "AJ_zzYdH8TPndaD", "timestamp": "2025-10-28T19:12:35.293962+00:00"}
{"message": "inference is slower than realtime", "level": "WARNING", "name": "livekit.plugins.silero", "delay": 13.187841181981316, "pid": 52, "job_id": "AJ_zzYdH8TPndaD", "timestamp": "2025-10-28T19:12:35.457624+00:00"}
{"message": "inference is slower than realtime", "level": "WARNING", "name": "livekit.plugins.silero", "delay": 13.44968232001003, "pid": 52, "job_id": "AJ_zzYdH8TPndaD", "timestamp": "2025-10-28T19:12:35.458104+00:00"}
{"message": "inference is slower than realtime", "level": "WARNING", "name": "livekit.plugins.silero", "delay": 13.052407222982552, "pid": 52, "job_id": "AJ_zzYdH8TPndaD", "timestamp": "2025-10-28T19:12:35.645356+00:00"}
{"message": "inference is slower than realtime", "level": "WARNING", "name": "livekit.plugins.silero", "delay": 13.400238036039632, "pid": 52, "job_id": "AJ_zzYdH8TPndaD", "timestamp": "2025-10-28T19:12:35.648003+00:00"}
{"message": "inference is slower than realtime", "level": "WARNING", "name": "livekit.plugins.silero", "delay": 13.929592445012062, "pid": 52, "job_id": "AJ_zzYdH8TPndaD", "timestamp": "2025-10-28T19:12:35.970727+00:00"}
{"message": "inference is slower than realtime", "level": "WARNING", "name": "livekit.plugins.silero", "delay": 13.66886186598532, "pid": 52, "job_id": "AJ_zzYdH8TPndaD", "timestamp": "2025-10-28T19:12:35.970937+00:00"}
{"message": "inference is slower than realtime", "level": "WARNING", "name": "livekit.plugins.silero", "delay": 14.252598414036559, "pid": 52, "job_id": "AJ_zzYdH8TPndaD", "timestamp": "2025-10-28T19:12:36.532489+00:00"}
{"message": "inference is slower than realtime", "level": "WARNING", "name": "livekit.plugins.silero", "delay": 13.905491034980454, "pid": 52, "job_id": "AJ_zzYdH8TPndaD", "timestamp": "2025-10-28T19:12:36.532791+00:00"}
Which I think is just the silero plugin complaining about delays in the pipeline. To me it seems that the AWS STT plugin is somehow saturating some kind of buffer when sending audio to Amazon Transcribe or receiving its events. But again, this does not happen for any other STT plugin.
Expected Behavior
The aws STT plugin should work fine in the multi-user-transcriner.py sample agent, just as well as with any other of the STT plugins.
Reproduction Steps
I have a very simple setup to demonstrate the problem at https://github.com/OpenVidu/livekit-agents-transcription-test/tree/multi-user
As its README states:
First of all clone the sample repository (be aware of checking out to branch multi-user):
git clone -b multi-user https://github.com/OpenVidu/livekit-agents-transcription-test.git
cd livekit-agents-transcription-testThen:
-
Build the agent container
docker build -t livekit/transcription-agent-test:latest agent/.
-
Export requried env vars and start the agent
# Export your LiveKit Cloud credentials export LIVEKIT_URL=wss://xxxxxxxx.livekit.cloud export LIVEKIT_API_KEY=your_livekit_cloud_api_key export LIVEKIT_API_SECRET=your_livekit_cloud_api_secret # Export your AWS credentials export AWS_ACCESS_KEY_ID=your_access_key_id export AWS_SECRET_ACCESS_KEY=your_secret_access_key export AWS_DEFAULT_REGION=your_aws_region # Start the agent docker compose up -d
-
Set up your LiveKit Cloud credentials in the webapp HTML right here:
const LIVEKIT_URL = "wss://xxxxxxxx.livekit.cloud"; const LIVEKIT_API_KEY = "your_livekit_cloud_api_key"; const LIVEKIT_API_SECRET = "your_livekit_cloud_api_secret";
-
Run the webapp
cd webapp npm install npm start -
Open http://localhost:3000 in your browser and test.
When adding 3 or more participants to the same room, speech events will be very delayed, becoming unusable:
```
Operating System
Linux
Package Versions
livekit-agents>=1.2.14