Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UART truncated packets (IDFGH-14948) #15657

Closed
3 tasks done
zekageri opened this issue Mar 26, 2025 · 17 comments
Closed
3 tasks done

UART truncated packets (IDFGH-14948) #15657

zekageri opened this issue Mar 26, 2025 · 17 comments
Assignees
Labels
Status: Opened Issue is new Type: Bug bugs in IDF

Comments

@zekageri
Copy link

zekageri commented Mar 26, 2025

Answers checklist.

  • I have read the documentation ESP-IDF Programming Guide and the issue is not addressed there.
  • I have updated my IDF branch (master or release) to the latest version and checked that the issue is present there.
  • I have searched the issue tracker for a similar issue and not found a similar issue.

IDF version.

ESP-IDF v5.3.2.250210

Espressif SoC revision.

ESP32-WROVER-E (N16R8) (rev 3)

Operating System used.

Windows

How did you build your project?

VS Code IDE

If you are using Windows, please specify command line type.

PowerShell

Development Kit.

ESP32-Wrover-Kit

Power Supply used.

External 3.3V

What is the expected behavior?

I expect uart to not drop frames.

What is the actual behavior?

Uart misses packets.

Steps to reproduce.

void HsUart::init(int baud, uint8_t rx, uint8_t tx, uint8_t rts, byte byteTimeout) {
    uart_config_t uart_config = {};

    uart_config.baud_rate = baud;
    uart_config.data_bits = UART_DATA_8_BITS;
    uart_config.parity = UART_PARITY_DISABLE;
    uart_config.stop_bits = UART_STOP_BITS_1;
    uart_config.flow_ctrl = UART_HW_FLOWCTRL_DISABLE;
    uart_config.rx_flow_ctrl_thresh = 0;
    uart_config.source_clk = UART_SCLK_DEFAULT;
    uart_config.flags.backup_before_sleep = 0;

    uart_driver_install(
        _uartNum,
        HS_UART_RX_BUF_SIZE,
        HS_UART_TX_BUF_SIZE,
        HS_UART_QUEUE_SIZE,
        &uart_queue,
        ESP_INTR_FLAG_IRAM
    );
    uart_param_config(_uartNum, &uart_config);
    uart_set_pin(_uartNum, tx, rx, rts, UART_PIN_NO_CHANGE);
    uart_set_mode(_uartNum, UART_MODE_RS485_HALF_DUPLEX);
    uart_set_always_rx_timeout(_uartNum, 1);
    uart_set_rx_timeout(_uartNum, byteTimeout);

    worker.offloadToIRAM(
        std::bind(&HsUart::uart_event_task, this),
        3048, 23, "HsUart"
    );
}

void IRAM_ATTR HsUart::handlePacket(uint8_t *data, uint16_t length) {
    if (length < HS_UART_MIN_DATA_SIZE) {
        ESP_LOGE(HS_UART_TAG, "Invalid data length: %d", length);
        //utils.printRawPacket("Raw packet: ", data, length);
        emitErr(HS_UART_LENGTH_ERROR);
        return;
    }

    bool crcOK = isCRCValid(data, length);
    if( !crcOK ){
        ESP_LOGE(HS_UART_TAG, "CRC error");
        //utils.printRawPacket("Raw packet: ", data, length);
        emitErr(HS_UART_CRC_ERROR);
        return;
    }

    bool gotErrorByte = hasErrorByte(data);
    if( gotErrorByte ){
        ESP_LOGE(HS_UART_TAG, "Got error byte: %d", data[1]);
        //utils.printRawPacket("Raw packet: ", data, length);
        emitErr(getErrorCode(data));
        return;
    }

    bool dataLenValid = isDataLenValid(data, length);
    if( !dataLenValid ){
        ESP_LOGE(HS_UART_TAG, "Invalid register length");
        //utils.printRawPacket("Raw packet: ", data, length);
        emitErr(HS_UART_LENGTH_ERROR);
        return;
    }

    if (_packetCB) {
        _packetCB(data, length);
    }
    uart_flush_input(_uartNum);
}

void HsUart::uart_event_task() {
    uart_event_t event;    
    while (true) {
        if (xQueueReceive(uart_queue, (void *)&event, portMAX_DELAY)) {
            if(event.type == UART_DATA){
                handleDataEvent(&event);
            }else if(event.type == UART_BUFFER_FULL){
                emitErr(HS_UART_BUFFER_FULL_ERROR);
            }else if(event.type == UART_FIFO_OVF){
                emitErr(HS_UART_FIFO_OVF_ERROR);
            }else if(event.type == UART_FRAME_ERR){
                emitErr(HS_UART_FRAME_ERROR);
            }else if( event.type == UART_PARITY_ERR){
                emitErr(HS_UART_PARITY_ERROR);
            }else if(event.type == UART_DATA_BREAK){
                emitErr(HS_UART_DATA_BREAK_ERROR);
            }else if(event.type == UART_PATTERN_DET){
                emitErr(HS_UART_PATTERN_ERROR);
            }else if( event.type == UART_BREAK ){
                uart_flush_input(_uartNum);
            }
        }
    }
}

void IRAM_ATTR HsUart::handleDataEvent(uart_event_t *event) {
    uint8_t* buf = static_cast<uint8_t*>(heap_caps_malloc(event->size, MALLOC_CAP_INTERNAL | MALLOC_CAP_8BIT));
    if (buf != nullptr) {
        int len = uart_read_bytes(_uartNum, buf, event->size, portMAX_DELAY);
        handlePacket(buf, len);
        heap_caps_free(buf);
    }
    if(!event->timeout_flag){
        ESP_LOGW(HS_UART_TAG, "RX FIFO overflow data event.");
    }
}

Debug Logs.

E (586659) HS_UART: CRC error

E (587738) HS_UART: CRC error

E (591466) HS_UART: Invalid data length: 4

E (592511) HS_UART: CRC error

E (594114) HS_UART: Invalid data length: 4

E (594658) HS_UART: Invalid data length: 4

E (594666) HS_UART: CRC error

E (596839) HS_UART: CRC error

E (601036) HS_UART: CRC error

Diagnostic report archive.

No response

More Information.

This is how I use it

#define MBUS_BAUD 115200
#define MBUS_RX 35
#define MBUS_TX 32
#define MBUS_RTS 33
#define MBUS_RX_TIMEOUT 2 // I even tried with 127. Did not help.

void Modbus::init() {
    hshUart.init(MBUS_BAUD, MBUS_RX,MBUS_TX, MBUS_RTS, MBUS_RX_TIMEOUT);
    hshUart.onPacket( std::bind(&Modbus::handlePacket, this, std::placeholders::_1, std::placeholders::_2) );
    hshUart.onError( std::bind(&Modbus::handleError, this, std::placeholders::_1) );
}

void IRAM_ATTR Modbus::handlePacket(uint8_t *data, uint16_t length) {
    lastPacket.error = HsUartErrorType::HS_UART_NO_ERROR;
    lastPacket.packetLen = length;
    memcpy(lastPacket.packet, data, length);

    if( !isScanning ){
        eventHandler.postEvent(
            MBUS_EVENTS,
            MBUS_EVENT_TYPE::GOT_PACKET,
            (void*)&lastPacket,
            sizeof(lastPacket)
        );
    }else{
        parseScanPacket(lastPacket.packet, lastPacket.packetLen);
    }
}


void IRAM_ATTR Modbus::handleError(HsUartErrorType error) {
    //ESP_LOGE(MBUS_DEBUG_TAG, "UART error: %d", error);
    lastPacket.error = error;
    lastPacket.packetLen = 0;
    eventHandler.postEvent(
        MBUS_EVENTS,
        MBUS_EVENT_TYPE::PACKET_ERROR,
        (void*)&lastPacket,
        sizeof(lastPacket)
    );
}

I have confirmed via an oscilloscope that the data is OK. There is no inter byte delay. The line is properly terminated with resistors. We are using an RS485 chip. If I go with SYMBOL_TIMEOUT 1 the uart even split the packets by every 0xFF byte. If I go with more then SYMBOL_TIMEOUT 1 it does not split on 0xFF bytes but we got truncated packets at random intervals and at random bytes.

What we have tried without resolution

  • Set symbol timeout between 2 to 127
  • Put uart and every related function and buffer to IRAM ( even set the uart driver in menuconfig to use iram )
  • Check uart_event_t's timeout_flag and assemble the packet ourselfs ( timeout_flag is always true when we get a data event )
  • flush input on every rx packet
  • flush input before every write
  • set heap_caps_malloc_extmem_enable between 128 byte and 4kb
  • disable uart interrupt when flash access is happening ( uart_disable_rx_intr(UART_NUM_2); and uart_enable_rx_intr(UART_NUM_2); after flash access ) by our code.
@zekageri zekageri added the Type: Bug bugs in IDF label Mar 26, 2025
@github-actions github-actions bot changed the title UART truncated packets UART truncated packets (IDFGH-14948) Mar 26, 2025
@espressif-bot espressif-bot added the Status: Opened Issue is new label Mar 26, 2025
@songruo
Copy link
Collaborator

songruo commented Mar 27, 2025

It is unpredictable when every time UART_DATA event happens how many data is available, it is not going to be a fixed amount of data all the time. From your code, it looks like you know how much data you want for one transaction. So you should fetch specific amount of data out each time, not rely on the UART_DATA event. You may take uart_async_rxtxtasks as the example.

@zekageri
Copy link
Author

I have modified my approach and checking the tx packets as they come in. The responses containing the register counts so I can determine how much byte will come from the first 3 bytes. This works much better but I notice that I lose the first two or even three bytes sometimes. This is because the slave starts to answer as soon as it processes the packet but the esp32 still not pulled the rts pin so the rs485 chip is still in transfer mode. This is because the uart_write_bytes is not pulling the rts pin soon enough. Can I speed up the rts pin toggling after all bytes sent out? The write method looks like this for me now

int HsUart::write(uint8_t *data, uint16_t length) {
    int crc16 = getCRC(data, length - 2);
    data[length - 2] = lowByte(crc16);
    data[length - 1] = highByte(crc16);
    internalRxBufLen = 0;
    outgoingFunctionCode = data[1];
    return uart_write_bytes(_uartNum, data, length);
}

Will measure the rts pin with an oscilloscope soon but I have read that it can take up to 30us for the driver to pull the rts pin. This is unacceptable.

https://esp32.com/viewtopic.php?t=22980
https://www.esp32.com/viewtopic.php?t=23860

@songruo
Copy link
Collaborator

songruo commented Mar 28, 2025

Your slave device is suppose to wait for the RTS line to be high to answer. This is how RS485_HALF_DUPLEX mode should work. Please check why your slave device answers before the RTS line is set to high, is it really in RS485 half duplex mode?

BTW, can you close your other issue (#15668). We can discuss in this post.

@zekageri
Copy link
Author

We try to squezee every bit of speed from the devices so they answer as soon as they get the message.
The question is that why does the esp needs so much time to set the rts pin after the uart data is sent out?

@songruo
Copy link
Collaborator

songruo commented Mar 31, 2025

The RTS line is set in the ISR when TX_DONE interrupt raises. 30us doesn't really seem to be a ridiculous time for ESP-IDF interrupt to response. Maybe you can try to enable UART_ISR_IN_IRAM Kconfig in menuconfig, and if there are quite a few other peripherals running at the same time in your program, then you may raise the interrupt priority to ESP_INTR_FLAG_LEVEL3 in the call of uart_driver_install to give it a try.

@zekageri
Copy link
Author

This is already set in menuconfig

#
# ESP-Driver:UART Configurations
#
CONFIG_UART_ISR_IN_IRAM=y
# end of ESP-Driver:UART Configurations

Now I'm trying

uart_driver_install(
    _uartNum,
    HS_UART_RX_BUF_SIZE * 2,
    0,HS_UART_QUEUE_SIZE,
    &uart_queue,
    ESP_INTR_FLAG_LEVEL3
);

@zekageri
Copy link
Author

Still the first few bytes are missing but not every time, just sometimes. Frustrating...

@zekageri
Copy link
Author

void HsUart::uart_event_task() {
    uart_event_t event;
    while (true) {
        if (xQueueReceive(uart_queue, (void *)&event, portMAX_DELAY)) {
            if (event.type == UART_DATA) {
                handleDataEvent(&event);
            } else if (event.type == UART_BUFFER_FULL) {
                emitErr(HS_UART_BUFFER_FULL_ERROR);
            } else if (event.type == UART_FIFO_OVF) {
                emitErr(HS_UART_FIFO_OVF_ERROR);
            } else if (event.type == UART_FRAME_ERR) {
                emitErr(HS_UART_FRAME_ERROR);
            } else if (event.type == UART_PARITY_ERR) {
                emitErr(HS_UART_PARITY_ERROR);
            } else if (event.type == UART_DATA_BREAK) {
                emitErr(HS_UART_DATA_BREAK_ERROR);
            } else if (event.type == UART_PATTERN_DET) {
                emitErr(HS_UART_PATTERN_ERROR);
            } else if (event.type == UART_BREAK) {
                uart_flush_input(_uartNum);
            }
        }
    }
}


void HsUart::handleDataEvent(uart_event_t *event) {
    uint8_t tempBuf[event->size];
    int len = uart_read_bytes(_uartNum, tempBuf, event->size, portMAX_DELAY);
    if (len <= 0) return;

    // Add to internal buffer
    if (internalRxBufLen + len > sizeof(internalRxBuf)) {
        emitErr(HS_UART_BUFFER_FULL_ERROR);
        internalRxBufLen = 0;
        return;
    }

    memcpy(internalRxBuf + internalRxBufLen, tempBuf, len);
    internalRxBufLen += len;

    int expectedLen = expectedPacketLength(internalRxBuf, internalRxBufLen);
    if (expectedLen > 0 && internalRxBufLen >= expectedLen) {
        // validate the packet if we got the expected bytes
        handlePacket(internalRxBuf, expectedLen);
        internalRxBufLen = 0;
    }else if(event->timeout_flag){
        // validate the packet also on timeout.
        handlePacket(internalRxBuf, internalRxBufLen);
        internalRxBufLen = 0;
    }
}

@songruo
Copy link
Collaborator

songruo commented Mar 31, 2025

Well, I guess the major problem is that you are not following RS485 half duplex mode, since your other device sends before RTS line gets set. In our TX_DONE ISR handling, we do reset the RX FIFO and then set the RTS high if the mode is RS485 half duplex.

Maybe you can try our other RS485 modes, UART_MODE_RS485_COLLISION_DETECT or UART_MODE_RS485_APP_CTRL. Some documentations for your reference: https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-reference/peripherals/uart.html#overview-of-rs485-specific-communication-0ptions.

@zekageri
Copy link
Author

Okay I have tried everything but then I noticed something. I have a continous websocket sending task which consumes it's own queue to send frames to a server. When I uncomment esp_websocket_client_send_text the UART errors are gone. Compeltelly.
How is this possible?

highPriorityQueue = xQueueCreateWithCaps(20, sizeof(WsMessage), MALLOC_CAP_SPIRAM);
lowPriorityQueue = xQueueCreateWithCaps(128, sizeof(WsMessage), MALLOC_CAP_SPIRAM);

void ServerBridge::bridgeSender_task() {
    WsMessage wsMessage;
    while (1) {
        if (xQueueReceive(serverBridge.highPriorityQueue, &wsMessage, pdMS_TO_TICKS(10)) == pdTRUE) {
            serverBridge.sendWsMessage(wsMessage);
        }
        if (xQueueReceive(serverBridge.lowPriorityQueue, &wsMessage, pdMS_TO_TICKS(10)) == pdTRUE) {
            serverBridge.sendWsMessage(wsMessage);
        }
        vTaskDelay(pdMS_TO_TICKS(2));
    }
}


void ServerBridge::sendWsMessage(WsMessage& wsMessage) {
    // When I remove the comments from this if statement, the uart errors are reappearing.
    /*if (isConnected()) {
        // This causes the uart truncation.
        int err = esp_websocket_client_send_text(
            serverBridge.client,
            wsMessage.message,
            wsMessage.size,
            pdMS_TO_TICKS(1000)
        );

        if (err == -1) {
            ESP_LOGE(SERVER_BRIDGE_DEBUG_TAG, "Failed to send message");
        }
    }*/

    // Free memory allocated for the message
    if (wsMessage.message != nullptr) {
        free(wsMessage.message);
        wsMessage.message = nullptr;
    }
}

It also seems to me that any kind of websocket code causes an altered uart frame. ( like server reconnection, etc )

@songruo
Copy link
Collaborator

songruo commented Mar 31, 2025

Yeah, it makes sense. Any use of websocket could lead to the delay in handling the UART ISR. Since you are on ESP32, which has two cores, you can install the UART driver on a different core than your websocket service.

@zekageri
Copy link
Author

O.o is it make sense? How? Can you explain please

@songruo
Copy link
Collaborator

songruo commented Mar 31, 2025

I think your UART and ethernet interrupts were both registered on core 0. Say both peripheral interrupts got triggered at almost the same time, and core 0 processed ethernet interrupt first, then UART interrupt has to wait until ethernet ISR exits to get handled. This is why "Any use of websocket could lead to the delay in handling the UART ISR". If you register UART interrupt on core 1, then core 0 can handle ethernet ISR, while in the meantime, core 1 can handle UART ISR.

You can use xTaskCreatePinnedToCore to pin the UART task (including uart_driver_install, which registers UART interrupt) to core 1.

@zekageri
Copy link
Author

So it's not just websockets but any ethernet interrupt? Even wifi? So I need to pin every network related stuff to core 0 and the uart to core 1?

@songruo
Copy link
Collaborator

songruo commented Mar 31, 2025

Exactly! 💡

@zekageri
Copy link
Author

Thank you very much for the help. I really appreacite!
Will look into this!

@zekageri
Copy link
Author

zekageri commented Apr 1, 2025

Okay, confirmed. It seems that any kind of flash or SPI will mess with the uart timings if it is on the same core the uart driver is installed on. I had a couple of unexplained things going on before but this cleared out a bunch of them.

  • I can't put a task stack to psram if that task is going to access flash ( both using the same spi ). I don't quite understand that one but I accept.
  • I can't do any flash operation ( not even read ) on the same core as the uart driver because the flash access will disable interrupts for that core
  • I can't run any network related event on the same core as the uart driver because of the interrupt priorities. ( for example: websocket stuff )

Thank you very much for the help.

@zekageri zekageri closed this as completed Apr 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Opened Issue is new Type: Bug bugs in IDF
Projects
None yet
Development

No branches or pull requests

3 participants