Skip to content

AWS STT plugin extremely slow when using multi-user transcription #3739

@pabloFuente

Description

@pabloFuente

Bug Description

When changing the plugin in the example agent multi-user-transcriber.py from deepgram to aws, the transcription starts getting very slow with 3 or more participants connected to the same room. When transcribing different participants from different Rooms, transcriptions behave well and arrive within a reasonable time.

This does not happen for any of the other 17 STT plugins. Only the Amazon Transcribe plugin behaves this way in this sample agent.

The agent log shows this warning message again and again:

{"message": "inference is slower than realtime", "level": "WARNING", "name": "livekit.plugins.silero", "delay": 13.077118384041238, "pid": 52, "job_id": "AJ_zzYdH8TPndaD", "timestamp": "2025-10-28T19:12:35.291028+00:00"}
{"message": "inference is slower than realtime", "level": "WARNING", "name": "livekit.plugins.silero", "delay": 12.733135948984998, "pid": 52, "job_id": "AJ_zzYdH8TPndaD", "timestamp": "2025-10-28T19:12:35.293962+00:00"}
{"message": "inference is slower than realtime", "level": "WARNING", "name": "livekit.plugins.silero", "delay": 13.187841181981316, "pid": 52, "job_id": "AJ_zzYdH8TPndaD", "timestamp": "2025-10-28T19:12:35.457624+00:00"}
{"message": "inference is slower than realtime", "level": "WARNING", "name": "livekit.plugins.silero", "delay": 13.44968232001003, "pid": 52, "job_id": "AJ_zzYdH8TPndaD", "timestamp": "2025-10-28T19:12:35.458104+00:00"}
{"message": "inference is slower than realtime", "level": "WARNING", "name": "livekit.plugins.silero", "delay": 13.052407222982552, "pid": 52, "job_id": "AJ_zzYdH8TPndaD", "timestamp": "2025-10-28T19:12:35.645356+00:00"}
{"message": "inference is slower than realtime", "level": "WARNING", "name": "livekit.plugins.silero", "delay": 13.400238036039632, "pid": 52, "job_id": "AJ_zzYdH8TPndaD", "timestamp": "2025-10-28T19:12:35.648003+00:00"}
{"message": "inference is slower than realtime", "level": "WARNING", "name": "livekit.plugins.silero", "delay": 13.929592445012062, "pid": 52, "job_id": "AJ_zzYdH8TPndaD", "timestamp": "2025-10-28T19:12:35.970727+00:00"}
{"message": "inference is slower than realtime", "level": "WARNING", "name": "livekit.plugins.silero", "delay": 13.66886186598532, "pid": 52, "job_id": "AJ_zzYdH8TPndaD", "timestamp": "2025-10-28T19:12:35.970937+00:00"}
{"message": "inference is slower than realtime", "level": "WARNING", "name": "livekit.plugins.silero", "delay": 14.252598414036559, "pid": 52, "job_id": "AJ_zzYdH8TPndaD", "timestamp": "2025-10-28T19:12:36.532489+00:00"}
{"message": "inference is slower than realtime", "level": "WARNING", "name": "livekit.plugins.silero", "delay": 13.905491034980454, "pid": 52, "job_id": "AJ_zzYdH8TPndaD", "timestamp": "2025-10-28T19:12:36.532791+00:00"}

Which I think is just the silero plugin complaining about delays in the pipeline. To me it seems that the AWS STT plugin is somehow saturating some kind of buffer when sending audio to Amazon Transcribe or receiving its events. But again, this does not happen for any other STT plugin.

Expected Behavior

The aws STT plugin should work fine in the multi-user-transcriner.py sample agent, just as well as with any other of the STT plugins.

Reproduction Steps

I have a very simple setup to demonstrate the problem at https://github.com/OpenVidu/livekit-agents-transcription-test/tree/multi-user
As its README states:

First of all clone the sample repository (be aware of checking out to branch multi-user):

git clone -b multi-user https://github.com/OpenVidu/livekit-agents-transcription-test.git
cd livekit-agents-transcription-test

Then:

  1. Build the agent container

    docker build -t livekit/transcription-agent-test:latest agent/.
  2. Export requried env vars and start the agent

    # Export your LiveKit Cloud credentials
    export LIVEKIT_URL=wss://xxxxxxxx.livekit.cloud
    export LIVEKIT_API_KEY=your_livekit_cloud_api_key
    export LIVEKIT_API_SECRET=your_livekit_cloud_api_secret
    
    # Export your AWS credentials
    export AWS_ACCESS_KEY_ID=your_access_key_id
    export AWS_SECRET_ACCESS_KEY=your_secret_access_key
    export AWS_DEFAULT_REGION=your_aws_region
    
    # Start the agent
    docker compose up -d
  3. Set up your LiveKit Cloud credentials in the webapp HTML right here:

    const LIVEKIT_URL = "wss://xxxxxxxx.livekit.cloud";
    const LIVEKIT_API_KEY = "your_livekit_cloud_api_key";
    const LIVEKIT_API_SECRET = "your_livekit_cloud_api_secret";
  4. Run the webapp

    cd webapp
    npm install
    npm start
  5. Open http://localhost:3000 in your browser and test.

When adding 3 or more participants to the same room, speech events will be very delayed, becoming unusable:

Image ```

Operating System

Linux

Package Versions

livekit-agents>=1.2.14

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions