AWS STT plugin extremely slow when using multi-user transcription

### Bug Description

When changing the plugin in the example agent [**multi-user-transcriber.py**](https://github.com/livekit/agents/blob/main/examples/other/transcription/multi-user-transcriber.py) from deepgram to aws, the transcription starts getting **very slow with 3 or more participants connected to the same room**. When transcribing different participants from different Rooms, transcriptions behave well and arrive within a reasonable time.

This does not happen for any of the other [17 STT plugins](https://docs.livekit.io/agents/models/stt/#plugins). Only the Amazon Transcribe plugin behaves this way in this sample agent.

The agent log shows this warning message again and again:

```log
{"message": "inference is slower than realtime", "level": "WARNING", "name": "livekit.plugins.silero", "delay": 13.077118384041238, "pid": 52, "job_id": "AJ_zzYdH8TPndaD", "timestamp": "2025-10-28T19:12:35.291028+00:00"}
{"message": "inference is slower than realtime", "level": "WARNING", "name": "livekit.plugins.silero", "delay": 12.733135948984998, "pid": 52, "job_id": "AJ_zzYdH8TPndaD", "timestamp": "2025-10-28T19:12:35.293962+00:00"}
{"message": "inference is slower than realtime", "level": "WARNING", "name": "livekit.plugins.silero", "delay": 13.187841181981316, "pid": 52, "job_id": "AJ_zzYdH8TPndaD", "timestamp": "2025-10-28T19:12:35.457624+00:00"}
{"message": "inference is slower than realtime", "level": "WARNING", "name": "livekit.plugins.silero", "delay": 13.44968232001003, "pid": 52, "job_id": "AJ_zzYdH8TPndaD", "timestamp": "2025-10-28T19:12:35.458104+00:00"}
{"message": "inference is slower than realtime", "level": "WARNING", "name": "livekit.plugins.silero", "delay": 13.052407222982552, "pid": 52, "job_id": "AJ_zzYdH8TPndaD", "timestamp": "2025-10-28T19:12:35.645356+00:00"}
{"message": "inference is slower than realtime", "level": "WARNING", "name": "livekit.plugins.silero", "delay": 13.400238036039632, "pid": 52, "job_id": "AJ_zzYdH8TPndaD", "timestamp": "2025-10-28T19:12:35.648003+00:00"}
{"message": "inference is slower than realtime", "level": "WARNING", "name": "livekit.plugins.silero", "delay": 13.929592445012062, "pid": 52, "job_id": "AJ_zzYdH8TPndaD", "timestamp": "2025-10-28T19:12:35.970727+00:00"}
{"message": "inference is slower than realtime", "level": "WARNING", "name": "livekit.plugins.silero", "delay": 13.66886186598532, "pid": 52, "job_id": "AJ_zzYdH8TPndaD", "timestamp": "2025-10-28T19:12:35.970937+00:00"}
{"message": "inference is slower than realtime", "level": "WARNING", "name": "livekit.plugins.silero", "delay": 14.252598414036559, "pid": 52, "job_id": "AJ_zzYdH8TPndaD", "timestamp": "2025-10-28T19:12:36.532489+00:00"}
{"message": "inference is slower than realtime", "level": "WARNING", "name": "livekit.plugins.silero", "delay": 13.905491034980454, "pid": 52, "job_id": "AJ_zzYdH8TPndaD", "timestamp": "2025-10-28T19:12:36.532791+00:00"}
```

Which I think is just the silero plugin complaining about delays in the pipeline. To me it seems that the AWS STT plugin is somehow saturating some kind of buffer when sending audio to Amazon Transcribe or receiving its events. But again, this does not happen for any other STT plugin.

### Expected Behavior

The `aws` STT plugin should work fine in the [**multi-user-transcriner.py**](https://github.com/livekit/agents/blob/main/examples/other/transcription/multi-user-transcriber.py) sample agent, just as well as with any other of the STT plugins. 

### Reproduction Steps

I have a very simple setup to demonstrate the problem at https://github.com/OpenVidu/livekit-agents-transcription-test/tree/multi-user
As its README states:

First of all clone the sample repository (be aware of checking out to branch `multi-user`):

```bash
git clone -b multi-user https://github.com/OpenVidu/livekit-agents-transcription-test.git
cd livekit-agents-transcription-test
```

Then:

1. Build the agent container

   ```bash
   docker build -t livekit/transcription-agent-test:latest agent/.
   ```
   

2. Export requried env vars and start the agent

   ```bash
   # Export your LiveKit Cloud credentials
   export LIVEKIT_URL=wss://xxxxxxxx.livekit.cloud
   export LIVEKIT_API_KEY=your_livekit_cloud_api_key
   export LIVEKIT_API_SECRET=your_livekit_cloud_api_secret

   # Export your AWS credentials
   export AWS_ACCESS_KEY_ID=your_access_key_id
   export AWS_SECRET_ACCESS_KEY=your_secret_access_key
   export AWS_DEFAULT_REGION=your_aws_region
   
   # Start the agent
   docker compose up -d
   ```
   

3. Set up your LiveKit Cloud credentials in the webapp HTML [right here](https://github.com/OpenVidu/livekit-agents-transcription-test/blob/87c0ea4d1872ee6de645e5642a2135ebaa0cb190/webapp/index.html#L97-L99):

   ```bash
   const LIVEKIT_URL = "wss://xxxxxxxx.livekit.cloud";
   const LIVEKIT_API_KEY = "your_livekit_cloud_api_key";
   const LIVEKIT_API_SECRET = "your_livekit_cloud_api_secret";
   ```

4. Run the webapp

   ```bash
   cd webapp
   npm install
   npm start
   ```

5. Open [http://localhost:3000](http://localhost:3000) in your browser and test.

When adding 3 or more participants to the same room, speech events will be very delayed, becoming unusable:

<img width="1034" height="921" alt="Image" src="https://github.com/user-attachments/assets/6ca7f30f-5845-4a75-a5f0-f69f4be11009" />
```

### Operating System

Linux

### Package Versions

```bash
livekit-agents>=1.2.14
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AWS STT plugin extremely slow when using multi-user transcription #3739

Bug Description

Expected Behavior

Reproduction Steps

Operating System

Package Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

AWS STT plugin extremely slow when using multi-user transcription #3739

Description

Bug Description

Expected Behavior

Reproduction Steps

Operating System

Package Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions