Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Random LoadProhibited when publishing command callback #50

Closed
law-ko opened this issue Sep 25, 2023 · 7 comments
Closed

[BUG] Random LoadProhibited when publishing command callback #50

law-ko opened this issue Sep 25, 2023 · 7 comments

Comments

@law-ko
Copy link

law-ko commented Sep 25, 2023

The crash will occur after around 20000000 ticks, and happens when trying to send a new publish to AWS IoT. The program is set to publish to AWS IoT every 30 seconds.

The error occurs when performing xTaskNotify, which is coming from the example function provided in this repo.

System information

  • Hardware board: ESP32-PICO-MINI
  • Operating System: Linux

Expected behavior
A clear description of the expected behavior.

Screenshots or console output

I (21102850) coreMQTT: Ack packet deserialized with result: MQTTSuccess.
I (21102850) coreMQTT: State record updated. New state=MQTTPublishDone.
Guru Meditation Error: Core  0 panic'ed (LoadProhibited). Exception was unhandled.

Core  0 register dump:
PC      : 0x400964de  PS      : 0x00060f33  A0      : 0x800e1b51  A1      : 0x3ffe7170  
0x400964de: xTaskGenericNotify at /Users/test/esp/esp-idf/components/freertos/FreeRTOS-Kernel/tasks.c:5718

A2      : 0x0000cdcd  A3      : 0x3ffbdc38  A4      : 0x80096c4c  A5      : 0x00000003  
A6      : 0x00000000  A7      : 0x00000031  A8      : 0x800964ce  A9      : 0x0000cecd  
A10     : 0x00000001  A11     : 0x3ffc3638  A12     : 0x00060f20  A13     : 0x00060f23  
A14     : 0x0000000f  A15     : 0x0000cdcd  SAR     : 0x00000004  EXCCAUSE: 0x0000001c  
EXCVADDR: 0x0000cf21  LBEG    : 0x400014fd  LEND    : 0x4000150d  LCOUNT  : 0xfffffffc  


Backtrace: 0x400964db:0x3ffe7170 0x400e1b4e:0x3ffe7190 0x400e7919:0x3ffe71b0 0x400e7b50:0x3ffe71f0 0x400e9a11:0x3ffe7210 0x400e9b51:0x3ffe7290 0x400e7e69:0x3ffe72b0 0x400dbea8:0x3ffe7300 0x40096c49:0x3ffe7320
0x400964db: xTaskGenericNotify at /Users/test/esp/esp-idf/components/freertos/FreeRTOS-Kernel/tasks.c:5718

0x400e1b4e: prvPublishCommandCallback at /Users/test/esp-git/esp-matter/examples/im/outlet/main/aws_iot/fleet_prov_by_claim_demo.c:380

0x400e7919: concludeCommand at /Users/test/esp-git/esp-matter/libraries/esp-aws-iot/libraries/coreMQTT-Agent/coreMQTT-Agent/source/core_mqtt_agent.c:808

0x400e7b50: handleAcks at /Users/test/esp-git/esp-matter/libraries/esp-aws-iot/libraries/coreMQTT-Agent/coreMQTT-Agent/source/core_mqtt_agent.c:638
 (inlined by) mqttEventCallback at /Users/test/esp-git/esp-matter/libraries/esp-aws-iot/libraries/coreMQTT-Agent/coreMQTT-Agent/source/core_mqtt_agent.c:700

0x400e9a11: handlePublishAcks at /Users/test/esp-git/esp-matter/libraries/esp-aws-iot/libraries/coreMQTT/coreMQTT/source/core_mqtt.c:1555
 (inlined by) handleIncomingAck at /Users/test/esp-git/esp-matter/libraries/esp-aws-iot/libraries/coreMQTT/coreMQTT/source/core_mqtt.c:1601
 (inlined by) receiveSingleIteration at /Users/test/esp-git/esp-matter/libraries/esp-aws-iot/libraries/coreMQTT/coreMQTT/source/core_mqtt.c:1756

0x400e9b51: MQTT_ProcessLoop at /Users/test/esp-git/esp-matter/libraries/esp-aws-iot/libraries/coreMQTT/coreMQTT/source/core_mqtt.c:3065

0x400e7e69: processCommand at /Users/test/esp-git/esp-matter/libraries/esp-aws-iot/libraries/coreMQTT-Agent/coreMQTT-Agent/source/core_mqtt_agent.c:611
 (inlined by) MQTTAgent_CommandLoop at /Users/test/esp-git/esp-matter/libraries/esp-aws-iot/libraries/coreMQTT-Agent/coreMQTT-Agent/source/core_mqtt_agent.c:1059

0x400dbea8: prvMQTTAgentTask at /Users/test/esp-git/esp-matter/examples/im/outlet/main/aws_iot/networking/mqtt/core_mqtt_agent_manager.c:521

0x40096c49: vPortTaskWrapper at /Users/test/esp/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:154
static void prvPublishCommandCallback( MQTTAgentCommandContext_t * pxCommandContext,
                                       MQTTAgentReturnInfo_t * pxReturnInfo )
{
    /* Store the result in the application defined context so the task that
     * initiated the publish can check the operation's status. */
    pxCommandContext->xReturnStatus = pxReturnInfo->returnCode;

    if( pxCommandContext->xTaskToNotify != NULL )
    {
        /* Send the context's ulNotificationValue as the notification value so
         * the receiving task can check the value it set in the context matches
         * the value it receives in the notification. */
        xTaskNotify( pxCommandContext->xTaskToNotify,
                     pxCommandContext->ulNotificationValue,
                     eSetValueWithOverwrite ); # LINE 380
    }
}

static void prvPublishToTopic( MQTTQoS_t xQoS,
                               char * pcTopicName,
                               char * pcPayload )
{
    uint32_t ulPublishMessageId, ulNotifiedValue = 0;

    MQTTStatus_t xCommandAdded;
    BaseType_t xCommandAcknowledged = pdFALSE;

    MQTTPublishInfo_t xPublishInfo = { 0 };

    MQTTAgentCommandContext_t xCommandContext = { 0 };
    MQTTAgentCommandInfo_t xCommandParams = { 0 };

    xTaskNotifyStateClear( NULL );

    /* Create a unique number for the publish that is about to be sent.
     * This number is used in the command context and is sent back to this task
     * as a notification in the callback that's executed upon receipt of the
     * publish from coreMQTT-Agent.
     * That way this task can match an acknowledgment to the message being sent.
     */
    xSemaphoreTake( xMessageIdSemaphore, portMAX_DELAY );
    {
        ++ulMessageId;
        ulPublishMessageId = ulMessageId;
    }
    xSemaphoreGive( xMessageIdSemaphore );

    /* Configure the publish operation. The topic name string must persist for
     * duration of publish! */
    xPublishInfo.qos = xQoS;
    xPublishInfo.pTopicName = pcTopicName;
    xPublishInfo.topicNameLength = ( uint16_t ) strlen( pcTopicName );
    xPublishInfo.pPayload = pcPayload;
    xPublishInfo.payloadLength = ( uint16_t ) strlen( pcPayload );

    /* Complete an application defined context associated with this publish
     * message.
     * This gets updated in the callback function so the variable must persist
     * until the callback executes. */
    xCommandContext.ulNotificationValue = ulPublishMessageId;
    xCommandContext.xTaskToNotify = xTaskGetCurrentTaskHandle();

    xCommandParams.blockTimeMs = fleetprovbyclaimconfigMAX_COMMAND_SEND_BLOCK_TIME_MS;
    xCommandParams.cmdCompleteCallback = prvPublishCommandCallback;
    xCommandParams.pCmdCompleteCallbackContext = &xCommandContext;

    do
    {
        /* Wait for coreMQTT-Agent task to have working network connection and
         * not be performing an OTA update. */
        xEventGroupWaitBits( xNetworkEventGroup,
                             CORE_MQTT_AGENT_CONNECTED_BIT | CORE_MQTT_AGENT_OTA_NOT_IN_PROGRESS_BIT,
                             pdFALSE,
                             pdTRUE,
                             portMAX_DELAY );

        ESP_LOGI( TAG,
                  "Sending publish request to coreMQTT-Agent with message \"%s\" on topic \"%s\" with ID %ld.",
                  pcPayload,
                  pcTopicName,
                  ulPublishMessageId );

        /* To ensure ulNotification doesn't accidentally hold the expected value
         * as it is to be checked against the value sent from the callback.. */
        ulNotifiedValue = ~ulPublishMessageId;

        xCommandAcknowledged = pdFALSE;

        xCommandAdded = MQTTAgent_Publish( &xGlobalMqttAgentContext,
                                           &xPublishInfo,
                                           &xCommandParams );

        if( xCommandAdded == MQTTSuccess )
        {
            /* For QoS 1 and 2, wait for the publish acknowledgment.  For QoS0,
             * wait for the publish to be sent. */
            ESP_LOGI( TAG,
                      "Waiting for publish %ld to complete.",
                      ulPublishMessageId );

            xCommandAcknowledged = prvWaitForNotification( &ulNotifiedValue );
        }
        else
        {
            ESP_LOGE( TAG,
                      "Failed to enqueue publish command. Error code=%s",
                      MQTT_Status_strerror( xCommandAdded ) );
        }

        /* Check all ways the status was passed back just for demonstration
         * purposes. */
        if( ( xCommandAcknowledged != pdTRUE ) ||
            ( xCommandContext.xReturnStatus != MQTTSuccess ) ||
            ( ulNotifiedValue != ulPublishMessageId ) )
        {
            ESP_LOGW( TAG,
                      "Error or timed out waiting for ack for publish message %ld. Re-attempting publish.",
                      ulPublishMessageId );
        }
        else
        {
            ESP_LOGI( TAG,
                      "Publish %ld succeeded.",
                      ulPublishMessageId );
        }
		// @note newly added
		/* Add a little randomness into the delay so the tasks don't remain
         * in lockstep. */
        vTaskDelay( pdMS_TO_TICKS( 125 ) +
                        ( rand() % 0xff ) );

    } while( ( xCommandAcknowledged != pdTRUE ) ||
             ( xCommandContext.xReturnStatus != MQTTSuccess ) ||
             ( ulNotifiedValue != ulPublishMessageId ) );
}

Thank you.

@chinglee-iot
Copy link
Member

Thank you for reporting this problem. We will look into this problem and discuss with you here.

@chinglee-iot
Copy link
Member

@law-ko
Can you help to provide the following information?

  • Your esp-idf version
  • Your iot-reference-esp32c3 tag or commit
  • The application code which calls prvPublishToTopic

@law-ko
Copy link
Author

law-ko commented Sep 27, 2023

@chinglee-iot

Your esp-idf version: v5.0.1
Your iot-reference-esp32c3 tag or commit: latest
The application code which calls prvPublishToTopic:

char *jsonString = cJSON_PrintUnformatted(rootJSON);
prvPublishToTopic( xQoS, pcPubTopicBuffer, (char *) jsonString );

We have adjusted the code into our app that work with AWS IoT with desired and reported style. This is uncommon error to see and looking deeper into the code we don't know what was causing this.

@chinglee-iot
Copy link
Member

chinglee-iot commented Oct 2, 2023

The exception you shared indicates a LoadProhibited exception in the tasks.c. From the tasks.c in v5.0.1, the xTaskToNotify is passed by parameters in prvPublishCommandCallback. It looks like xTaskToNotify points to an invalid address.

    BaseType_t xTaskGenericNotify( TaskHandle_t xTaskToNotify,
                                   UBaseType_t uxIndexToNotify,
                                   uint32_t ulValue,
                                   eNotifyAction eAction,
                                   uint32_t * pulPreviousNotificationValue )
    {
            ...
            ucOriginalNotifyState = pxTCB->ucNotifyState[ uxIndexToNotify ]; /* This line has LoadProhibited exception. */

xTaskNotify(xTaskGenericNotify) is called in the prvPublishCommandCallback and is a member variable of input parameter pxCommandContext.
pxCommandContext points to a local variable xCommandContext in prvPublishToTopic. Therefore, it is on the stack of the task which calls prvPublishToTopic.

static void prvPublishToTopic( MQTTQoS_t xQoS,
                               char * pcTopicName,
                               char * pcPayload )
{
...
    MQTTAgentCommandInfo_t xCommandParams = { 0 };
...
    xCommandContext.ulNotificationValue = ulPublishMessageId;
    xCommandContext.xTaskToNotify = xTaskGetCurrentTaskHandle();

    xCommandParams.blockTimeMs = subpubunsubconfigMAX_COMMAND_SEND_BLOCK_TIME_MS;
    xCommandParams.cmdCompleteCallback = prvPublishCommandCallback;
    xCommandParams.pCmdCompleteCallbackContext = &xCommandContext;

Base on these information, we can troubleshoot the following possible causes:

  1. Stack overflow
    Local variable xCommandContext in prvPublishToTopic may be corrupted if there is stack overflow. FreeRTOS supports stack overflow detection mechanism. You can reference this page to enable stack overflow checking.

  2. Use after free
    The prvPublishCommandCallback is called from another task. If the task calls prvPublishToTopic exited, the stack memory will be used by other task. The pxCommandContext may point to memory modified by other task. We would like to suggest to check the status of the task which calls prvPublishToTopic.

We also suggest you to run the sub_pub_unsub_demo only on your platform to see if this problem can be reproduced and feedback the result. This can help us to narrow down the problem.

@law-ko
Copy link
Author

law-ko commented Nov 18, 2023

@chinglee-iot Would it be possible that this crash is related to the MQTT_ProcessLoop being blocked like xEventGroupWaitBit?

@chinglee-iot
Copy link
Member

@law-ko
MQTT_ProcessLoop is blocked by transport interface receive function in receiveSingleIteration and is only called in the prvMQTTAgentTask.

static MQTTStatus_t receiveSingleIteration( MQTTContext_t * pContext,
                                            bool manageKeepAlive )
{
    ....
    recvBytes = pContext->transportInterface.recv( pContext->transportInterface.pNetworkContext,
                                                   &( pContext->networkBuffer.pBuffer[ pContext->index ] ),
                                                   pContext->networkBuffer.size - pContext->index );

Could you help to provide the xEventGroupWaitBits call in question so we can discuss it with you here?

@paulbartell
Copy link
Contributor

Closing this issue due to inactivity. Feel free to re-open the issue if it persists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants