PSRAM enabled causes OutputMixer performance to plummet to near-unusable levels #2181

Nawor3565 · 2025-09-25T01:41:26Z

Nawor3565
Sep 25, 2025

Hi! A bit of a followup from #2158. There's a lot going on in my sketch so I'll try to summarize just the relevant parts: I'm using an A2DPSinkQueued to stream audio, which is output into a BufferedStream and then into a CallbackBuffer. In the callback, I'm trying to mix multiple .wav samples with the A2DP stream using an OutputMixer. This all worked fine, until I added additional functionality and ran out of RAM. After getting a new ESP32 board with PSRAM and enabling it in the Arduino IDE, I was finally able to use BLE in tandem with A2DP without connection issues, but had a weird side effect: the OutputMixer is now unbearably slow. Even with just two inputs of identical data, the mixer is barely able to keep up with the A2DPStream and the ring buffer starts dropping packets. With four inputs, the audio is unusable.

I was able to optimize it a little bit by changing output[i] += weight * sample / total_weights; to just output[i] += sample >> 1; in AudioOutput.h, which gets 2 inputs mixing properly but still can't help with all 4. Again, the mixer was perfectly fine with 4 inputs before enabling PSRAM, so I suspect this is just due to the mixer doing so many per-byte calculations on data stored in the much slower PSRAM.

I tried to use MemoryManager to put the small mixer-related allocations in DRAM, but no dice. Honestly though, I suspect there's a better way to do this without using an OutputMixer at all since I don't need to do weighted mixing (all 4 inputs should be evenly weighted), but I'm not sure what the best approach would be. Any insight would be greatly appreciated!!

Giant sketch

Note: I removed all the BLE functions for this post because they're not relevant to the issue at hand

#include "BLEDevice.h"
#include "AudioTools.h"
#include "AudioTools/CoreAudio/I2SStream.h"
#include "AudioTools/CoreAudio/AudioStreams.h"
#include "AudioTools/CoreAudio/StreamCopy.h"
#include "AudioTools/CoreAudio/AudioOutput.h"
//#include "AudioTools/AudioLibs/AudioRealFFT.h"
//#include "BluetoothA2DPSink.h"
#include "BluetoothA2DPSinkQueued.h"
#include <AudioTools/AudioLibs/AudioESP32FFT.h>
#include <AudioTools/Disk/AudioSourceLittleFS.h>
#include "AudioTools/Concurrency/RTOS.h"
#include "AudioTools/AudioLibs/MemoryManager.h"

typedef enum {
  DRUM_TOM = 1,
  DRUM_MARACA,
  DRUM_COWBELL,
} drum_t;

// --- Config ---
constexpr int SAMPLE_RATE = 44100;
constexpr int FFT_LENGTH = 256;            // must be power of 2
constexpr size_t a2dp_buffer_size = 3000;  

// Frequency bands (Hz)
constexpr float BASS_MAX_HZ = 250.0f;
constexpr float MIDS_MIN_HZ = 250.0f;
constexpr float MIDS_MAX_HZ = 6000.0f;
constexpr float TREBLE_MIN_HZ = 6000.0f;

// Sensitivity scaling factors (tune to taste)
// Larger divisor = lower sensitivity (smaller bar movement).
constexpr float SENS_BASS = 5000.0f;
constexpr float SENS_MIDS = 5000.0f;
constexpr float SENS_TREBLE = 5000.0f;


I2SStream i2s;
boolean a2dp_connected = false;
esp_a2d_audio_state_t bluetoothAudioState = ESP_A2D_AUDIO_STATE_SUSPEND;
AudioESP32FFT fftESP32;
OutputMixer<int16_t> mixer(i2s, 2);
ConverterScaler<int16_t> volume(1.0, 0, 32767);
#define uint8_to_float(in) (float)((float)in / (float)255)

AudioInfo info(44100, 2, 16);

CallbackStream cb;
BufferedStream buffered(cb, a2dp_buffer_size);
BluetoothA2DPSinkQueued a2dp_sink;      


BufferRTOS<uint8_t> rtos_buffer(10000);
QueueStream<uint8_t> queue_FTT(rtos_buffer);
StreamCopy copierSink(fftESP32, queue_FTT);
Task readTask("read", 10000, 23, 1);

MultiOutput multiOut;
MemoryManager memory(9000);

File audioFile1;
File audioFile2;
File audioFile3;

// Shared results (set in FFT callback)
volatile uint8_t g_bass_level = 0;
volatile uint8_t g_mids_level = 0;
volatile uint8_t g_treble_level = 0;

uint8_t wristband_magnitude = 0;

SET_LOOP_TASK_STACK_SIZE(16 * 1024);



static inline uint8_t clampToByte(float x) {
  if (x <= 0) return 0;
  if (x >= 255) return 255;
  return (uint8_t)lroundf(x);
}

// FFT result callback
void onFFTResult(AudioFFTBase &base) {
  AudioESP32FFT &my = static_cast<AudioESP32FFT &>(base);
  //Serial.printf("onFFTResult Running on core %u\n", xPortGetCoreID());

  float *magnitudes = my.magnitudesFast();
  int bins = my.size();
  if (!magnitudes || bins <= 0) return;

  double sumBass = 0.0;
  double sumMids = 0.0;
  double sumTreble = 0.0;

  for (int i = 0; i < bins; ++i) {
    float freq = my.frequency(i);
    float v = magnitudes[i];
    double power = (double)v * (double)v;  // energy

    if (freq <= BASS_MAX_HZ) {
      sumBass += power;
    } else if (freq <= MIDS_MAX_HZ) {
      sumMids += power;
    } else {
      sumTreble += power;
    }
  }

  // Convert energy level
  // sqrt to approximate RMS magnitude, then apply sensitivity scaling
  float bassLevel = sqrtf(sumBass) / SENS_BASS * 255.0f;
  float midsLevel = sqrtf(sumMids) / SENS_MIDS * 255.0f;
  float trebleLevel = sqrtf(sumTreble) / SENS_TREBLE * 255.0f;

  g_bass_level = clampToByte(bassLevel);
  g_mids_level = clampToByte(midsLevel);
  g_treble_level = clampToByte(trebleLevel);

  //Serial.printf("Bass:%3d  Mids:%3d  Treble:%3d\n", g_bass_level, g_mids_level, g_treble_level);
}


// with each A2DP write: write X bytes of A2DP data to Mixer index 0, and copy X
// bytes of AudioPlayer data to Mixer index 1. Then flush Mixer buffer to I2S
size_t onWrite(const uint8_t *data, size_t len) {
  //LOGW("=>  onWrite: %u", len);
  //Serial.printf("onWrite Running on core %u\n", xPortGetCoreID());
  
  mixer.write(0, data, len);
  mixer.write(1, data, len);
  //mixer.write(2, data, len);
  //mixer.write(3, data, len);

  // int bytes_open1 = mixer.available(0) - mixer.available(1);
  // uint8_t data_from_file1[bytes_open1];
  // memset(data_from_file1, 0, bytes_open1);                     // provide silence by default
  // audioFile1.readBytes((char *)data_from_file1, bytes_open1);  // read bytes if available
  // mixer.write(1, data_from_file1, bytes_open1);                // write data from file or silence
  // //mixer.write(2, data_from_file1, bytes_open1);
  // //mixer.write(3, data_from_file1, bytes_open1);


  // int bytes_open2 = mixer.available(0) - mixer.available(2);
  // uint8_t data_from_file2[bytes_open2];
  // memset(data_from_file2, 0, bytes_open2);                    
  // audioFile2.readBytes((char *)data_from_file2, bytes_open2);  
  // //volume.convert(data_from_file2, bytes_open2);
  // //mixer.write(2, data_from_file2, bytes_open2);  


  // int bytes_open3 = mixer.available(0) - mixer.available(3);
  // uint8_t data_from_file3[bytes_open3];
  // memset(data_from_file3, 0, bytes_open3);                     
  // audioFile3.readBytes((char *)data_from_file3, bytes_open3); 
  // //volume.convert(data_from_file3, bytes_open3);
  // //mixer.write(3, data_from_file3, bytes_open3); 

  mixer.flushMixer();
  //delay(0);
  return len;
}

// Restarts the audio stream from the specified file. Apparently is
// needed to prevent a memory leak
void playDrumSound(uint8_t voice, drum_t drum_type) {
  Serial.printf("Drum trigger, type %u, voice %u\n", drum_type, voice);
  char filename[100];
  sprintf(filename, "/drum_%u/sample_1_no_pop.wav", drum_type);
  Serial.println(filename);
  switch (voice) {
    case 1: audioFile1.close(); break;
    case 2: audioFile2.close(); break;
    case 3: audioFile3.close(); break;
  }
  File new_file = LittleFS.open(filename);
  if (!new_file) {
    Serial.println("Could not open file");
    return;
  }
  new_file.seek(44);  // ignore wav header
  switch (voice) {
    case 1: audioFile1 = new_file; break;
    case 2: audioFile2 = new_file; break;
    case 3: audioFile3 = new_file; break;
  }
}

void bt_connection_state_changed(esp_a2d_connection_state_t state, void *ptr) {
  Serial.print("A2DP connection changed to ");
  Serial.println(a2dp_sink.to_str(state));
  if (state == ESP_A2D_CONNECTION_STATE_CONNECTED) {
    a2dp_connected = true;
    a2dp_sink.set_volume(128);
  } else {
    a2dp_connected = false;
  }
};

void a2dp_state_changed(esp_a2d_audio_state_t state, void *ptr) {
  Serial.println(a2dp_sink.to_str(state));
  bluetoothAudioState = state;
}

static void notifyCallback(BLERemoteCharacteristic *pBLERemoteCharacteristic, uint8_t *pData, size_t length, bool isNotify) {
  if (pData[0] > 0) {
    wristband_magnitude = pData[0];
    volume.setFactor(uint8_to_float(wristband_magnitude));
    //Serial.printf("Magnitude float: %f\n", uint8_to_float(wristband_magnitude));
  }
  Serial.printf("Magnitude: %u\n", wristband_magnitude);
};


void setup() {
   Serial.begin(115200);

  Serial.println("Starting Arduino BLE Client application...");
  BLEDevice::init("");


  //AudioToolsLogger.begin(Serial, AudioToolsLogLevel::Debug);


  // Start Bluetooth Audio Receiver
  a2dp_sink.set_auto_reconnect(true);
  a2dp_sink.set_default_bt_mode(ESP_BT_MODE_BTDM);
  a2dp_sink.set_on_connection_state_changed(bt_connection_state_changed);
  a2dp_sink.set_on_audio_state_changed(a2dp_state_changed);
  a2dp_sink.set_task_core(1);
  a2dp_sink.set_i2s_stack_size(12000);
  a2dp_sink.set_i2s_ringbuffer_size(100000);
  a2dp_sink.set_i2s_task_priority(configMAX_PRIORITIES -1);
  a2dp_sink.set_output(multiOut);
  a2dp_sink.start("a2dp-i2s");


  // setup output
  auto cfgI2S = i2s.defaultConfig();
  cfgI2S.copyFrom(info);
  cfgI2S.pin_bck = 18;
  cfgI2S.pin_ws = 22;
  cfgI2S.pin_data = 19;
  i2s.begin(cfgI2S);

  AudioFFTConfig cfgFFT = fftESP32.defaultConfig(TX_MODE);
  cfgFFT.length = FFT_LENGTH;
  cfgFFT.copyFrom(info);
  cfgFFT.channel_used = 0;
  cfgFFT.callback = onFFTResult;
  fftESP32.begin(cfgFFT);

  LittleFS.begin();

  mixer.setAutoIndex(false);
  mixer.begin(a2dp_buffer_size * 2);

  cb.setWriteCallback(onWrite);

  queue_FTT.begin();
  
  multiOut.add(buffered);//buffered
  multiOut.add(queue_FTT);
  multiOut.begin();

  readTask.begin([]() {
    copierSink.copy();
    delay(1);
    //Serial.printf("copierSink Running on core %u\n", xPortGetCoreID());
  });

}  // End of setup.

// This is the Arduino main loop function.
void loop() {

  // If A2DP isn't writing to the callback buffer, we do it manually
  if (bluetoothAudioState != ESP_A2D_AUDIO_STATE_STARTED) {
    uint8_t silent_placeholder[a2dp_buffer_size];
    memset(silent_placeholder, 0, a2dp_buffer_size);
    buffered.write(silent_placeholder, a2dp_buffer_size);
  } else {
    delay(5);
  }

  if (wristband_magnitude > 0) {
    playDrumSound(1, DRUM_TOM);
    //Serial.println("played drum");
    wristband_magnitude = 0;
  }

  // If the flag "doConnect" is true then we have scanned for and found the desired
  // BLE Server with which we wish to connect.  Now we connect to it.  Once we are
  // connected we set the connected flag to be true.
  if (doConnect == true) {
    if (connectToServer()) {
      Serial.println("We are now connected to the BLE Server.");
    } else {
      Serial.println("We have failed to connect to the server; there is nothing more we will do.");
    }
    doConnect = false;
  }

  static uint8_t BLE_transmit_array[20];
  if (connected) {
    BLE_transmit_array[0] = g_bass_level;
    BLE_transmit_array[1] = g_mids_level;
    BLE_transmit_array[2] = g_treble_level;
    pRemoteCharacteristicWrite->writeValue(BLE_transmit_array, 20);
    //Serial.println(pRemoteCharacteristic->readValue());
  } else if (doScan) {
    BLEDevice::getScan()->start(0);  
  }


  uint8_t bass = g_bass_level;
  uint8_t mids = g_mids_level;
  uint8_t treble = g_treble_level;

  //Serial.printf("Bass:%3d  Mids:%3d  Treble:%3d\n", bass, mids, treble);


}  // End of loop

Answered by pschatzmann

Sep 25, 2025

I was running some tests with this output mixer test sketch.

I noticed that idependently of what Allocator was selected, it was breaking up with N=6 when PSRAM has been activated, with N=5 working fine, even with the DefaultAllocator

To double check the result, I changed the DefaultAllocator to DefaultESP32AllocatorRAM with no improvment!

As a conclusion: the issue must be somewhere in the ESP32 core and not my library

View full answer

pschatzmann · 2025-09-25T02:33:53Z

pschatzmann
Sep 25, 2025
Maintainer

Just a question for my understanding: is BLE active while you do the mixing ? If so this could slow down things considerably...
You could implement your own optimized mixing directly in your callback doing integer math instread of using float.

Another optimizatoin would be to relace the Files from LittleFS with a MemoryStream.

I will try to run some tests about the impact of PSRAM on this functionality and add some Template parameters to the class to be able to select an allocator.

ps. did you try to call delay() in your loop ?

7 replies

pschatzmann Sep 25, 2025
Maintainer

Yes, that would be helpful to pin it down to PSRAM. Also experiment if adding some delay() in the loop has some impact.
You could also try the new 1.2.0 release of the AudioTools w/o PSRAM. You should have more RAM available since you are not using any WiFi...

Nawor3565 Sep 25, 2025
Author

Just tested with BLE disabled and with a 5ms delay in the main loop, doesn't seem to have made a difference. It starts throwing ringbuffer is full, drop this packet! errors as soon as the A2DP stream starts. And this is with the A2DP audio being written to all 4 inputs for testing, so it seems like the bottleneck is definitely with the mixer and not reading files.

pschatzmann Sep 25, 2025
Maintainer

I dommitted some changes, that have not been tested yet:

OutputMixer accepts an allocator in the constructor (default is DefaultAllocatorRAM)
- DefaultAllocatorRAM uses standard malloc
- DefaultESP32AllocatorRAM uses MALLOC_CAP_8BIT | MALLOC_CAP_INTERNAL
- DefaultAllocator uses PSRAM when activated (= original behaviour)

pschatzmann Sep 25, 2025
Maintainer

I was running some tests with this output mixer test sketch.

I noticed that idependently of what Allocator was selected, it was breaking up with N=6 when PSRAM has been activated, with N=5 working fine, even with the DefaultAllocator

To double check the result, I changed the DefaultAllocator to DefaultESP32AllocatorRAM with no improvment!

As a conclusion: the issue must be somewhere in the ESP32 core and not my library

Answer selected by pschatzmann

pschatzmann Sep 25, 2025
Maintainer

To confirm my understanding I was mesuring the time for each mixing loop sending data to a NullStream:

w/o PSRAM it was 5.570 ms
with PSRAM it was 9.556 ms

So just activating PSRAM is slowing down things quite a bit...

Nawor3565 Sep 25, 2025
Author

@pschatzmann Wow, thanks for looking into that! That's really bizarre, I wonder why PSRAM simply being enabled is causing the mixer to slow down so much.

In the meantime, I tried to write my own code to do the mixing without using any float math, but I'm having some trouble with array typing (i2s.write doesn't like to get anything but uint8, but declaring mixerArray as a uint32 and casting it as uint8 just outputs silence, and declaring mixerArray as uint8 gives terrible clipping of the audio). Do you have any feedback? It seems like this algorithm is able to mix 4 inputs quickly enough to keep up with A2DP at least.

uint8_t mixerArray[a2dp_buffer_size];
constexpr uint8_t shift_by_num = 2; // 1 = two inputs, 2 = four inputs

size_t onWrite(const uint8_t *data, size_t len) {
   memset(mixerArray, 0, len);

  uint8_t data_from_file1[len];
  memset(data_from_file1, 0, len);                     // provide silence by default
  audioFile1.readBytes((char *)data_from_file1, len);  // read bytes if available
  uint8_t data_from_file2[len];
  memset(data_from_file2, 0, len);                     // provide silence by default
  audioFile2.readBytes((char *)data_from_file2, len);  // read bytes if available
  uint8_t data_from_file3[len];
  memset(data_from_file3, 0, len);                     // provide silence by default
  audioFile3.readBytes((char *)data_from_file3, len);  // read bytes if available
  
  for (int j = 0; j <= len; j++) {
    mixerArray[j] = (data[j] >> shift_by_num) + (data_from_file1[j] >> shift_by_num) + (data_from_file2[j] >> shift_by_num) + (data_from_file3[j] >> shift_by_num);
    //mixerArray[j] >>= shift_by_num;
  }

  i2s.write((uint8_t *)mixerArray, len);
}

pschatzmann · 2025-09-25T12:49:34Z

pschatzmann
Sep 25, 2025
Maintainer

Did you double check if you need PSRAM at all with the new release ?
The info how to work with samples can be found in the introduction: you need to cast to int16_t when doing calculations....

0 replies

Uh oh!

PSRAM enabled causes OutputMixer performance to plummet to near-unusable levels #2181

Uh oh!

Uh oh!

Nawor3565 Sep 25, 2025

Replies: 2 comments · 7 replies

Uh oh!

Uh oh!

pschatzmann Sep 25, 2025 Maintainer

Uh oh!

Uh oh!

pschatzmann Sep 25, 2025 Maintainer

Uh oh!

Nawor3565 Sep 25, 2025 Author

Uh oh!

Uh oh!

pschatzmann Sep 25, 2025 Maintainer

Uh oh!

Uh oh!

pschatzmann Sep 25, 2025 Maintainer

Uh oh!

pschatzmann Sep 25, 2025 Maintainer

Uh oh!

Uh oh!

Nawor3565 Sep 25, 2025 Author

Uh oh!

Uh oh!

pschatzmann Sep 25, 2025 Maintainer

Nawor3565
Sep 25, 2025

Replies: 2 comments 7 replies

pschatzmann
Sep 25, 2025
Maintainer

pschatzmann Sep 25, 2025
Maintainer

Nawor3565 Sep 25, 2025
Author

pschatzmann Sep 25, 2025
Maintainer

pschatzmann Sep 25, 2025
Maintainer

pschatzmann Sep 25, 2025
Maintainer

Nawor3565 Sep 25, 2025
Author

pschatzmann
Sep 25, 2025
Maintainer