-
Notifications
You must be signed in to change notification settings - Fork 7.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi SPI configuration causes occasional corrupted data transfer (IDFGH-11187) #12354
Comments
@lilalaunestift thanks for nice detailed report. Edit: I see, it's actually 5 MHz... What is SPI CLK of slave? |
Hi, the clock of the slave SPI is running with 4.096MHz. |
One more question...
Do you mean just solely Ethernet or Ethernet and the SPI slave? |
I moved only the Ethernet to the dm9051 example. |
@lilalaunestift thank you for the report, we try to reproduce and let you know. |
Hey @kostaond, |
Hi @lilalaunestift, my colleague tries to reproduce it. However, he hasn't be able to reproduce yet... |
@lilalaunestift how long are expected transactions at SPI slave? If they are less than or equal to 32B, could you please try to disable DMA at the slave interface? |
We haven't been able to reproduce. We've based our setup on SPI Slave example. We have two ESP32's - one as master and one slave with DM9051. Could you please provide more information about your SPI slave code? |
Care to elaborate on that? |
The transactions are longer than 32B. Disabling the DMA is unfortunately not an option here.
Just for clarification: are you also using a setup where the ESP32 is master and slave at the same time (see picture)? |
Yes, we added the DM9051 to the SPI Slave example. Could you please provide more details about your setup? For example, slave code, what is the traffic (size, period), etc. |
Hey, sorry for the delay. Regarding our slave code:
Sending and receiving are done sequentially. So while the ESP slave is sending, the master is only receiving and vice versa. #include "../inc/Clock.h"
#include "../inc/Crc16.h"
#include "driver/gpio.h"
#include "driver/spi_slave.h"
#include "esp_intr_alloc.h"
static void Msp_postSetupCb(spi_slave_transaction_t* pTrans);
static void Msp_postTransCb(spi_slave_transaction_t* pTrans);
uint8_t acSnd[268];
typedef struct SMsp
{
spi_slave_transaction_t t0;
uint32_t nCpuTick;
}
Msp_t;
Msp_t oMsp;
esp_err_t Msp_init(uint32_t nMaxLen)
{
esp_err_t esp_err;
memset(&acSnd, 0xff, sizeof(acSnd));
memset(&oMsp, 0x00, sizeof(Msp_t));
// Configuration for the SPI bus
spi_bus_config_t buscfg =
{
.mosi_io_num = SPI_MSP_MOSI_PIN,
.miso_io_num = SPI_MSP_MISO_PIN,
.sclk_io_num = SPI_MSP_CLK_PIN,
.quadwp_io_num = -1,
.quadhd_io_num = -1,
.max_transfer_sz = nMaxLen,
.flags = SPICOMMON_BUSFLAG_SLAVE,
.intr_flags = ESP_INTR_FLAG_LOWMED
};
// Configuration for the SPI slave interface
spi_slave_interface_config_t slvcfg =
{
.spics_io_num = SPI_MSP_nCS_PIN,
.flags = 0,
.queue_size = 2, // at least 2
.mode = 1,
.post_setup_cb = &Msp_postSetupCb,
.post_trans_cb = &Msp_postTransCb
};
//Initialize SPI slave interface
esp_err = spi_slave_initialize(MSP_HOST, &buscfg, &slvcfg, MSP_DMA_CHAN);
assert(esp_err == ESP_OK);
return esp_err;
}
void Msp_transReady(void)
{
// ready to transmit
gpio_set_level(OUT_MSP_nINT_PIN, 0);
}
void Msp_writeBlock(uint8_t* acSndData, uint8_t* acRcvData, uint32_t nMaxLen)
{
esp_err_t esp_err;
oMsp.t0.length = nMaxLen << 3;
oMsp.t0.rx_buffer = acRcvData;
oMsp.t0.trans_len = 0;
oMsp.t0.tx_buffer = acSndData;
oMsp.t0.user = (void*)1;
esp_err = spi_slave_queue_trans(MSP_HOST, &oMsp.t0, 0);
assert(esp_err == ESP_OK);
}
void Msp_readBlock(uint8_t* acRcvData, uint32_t nMaxLen)
{
esp_err_t esp_err;
oMsp.t0.length = nMaxLen << 3;
oMsp.t0.rx_buffer = acRcvData;
oMsp.t0.trans_len = 0;
oMsp.t0.tx_buffer = acSnd;
oMsp.t0.user = (void*)0;
esp_err = spi_slave_queue_trans(MSP_HOST, &oMsp.t0, 0);
assert(esp_err == ESP_OK);
}
void Msp_getTransResult(void)
{
esp_err_t esp_err;
spi_slave_transaction_t * pTrans = NULL;
esp_err = spi_slave_get_trans_result(MSP_HOST, &pTrans, 0);
esp_err = ESP_OK;
if (esp_err != ESP_OK)
return;
if ((uint32_t)pTrans->user != 0)
{
// not ready to transmit
gpio_set_level(OUT_MSP_nINT_PIN, 1);
}
if (pTrans->tx_buffer == acSnd)
pTrans->tx_buffer = NULL;
SioMsp_onEvTransComplete( pTrans->tx_buffer,
pTrans->rx_buffer,
pTrans->trans_len );
}
// called after a transaction is queued and ready for pickup by master.
static void IRAM_ATTR Msp_postSetupCb(spi_slave_transaction_t* pTrans)
{
// wait 1�s if necessary! MSP must see this negative edge!
while (Clock_getCpuTicks() - oMsp.nCpuTick < eTick_tm1us);
gpio_set_level(OUT_MSP_nRDY_PIN, 0);
}
// called after transaction is sent/received.
static void IRAM_ATTR Msp_postTransCb(spi_slave_transaction_t* pTrans)
{
BaseType_t xHigherPriorityTaskWoken;
// not ready to receive or transmit
gpio_set_level(OUT_MSP_nRDY_PIN, 1);
oMsp.nCpuTick = Clock_getCpuTicks();
// call spi_slave_get_trans_result ...
xHigherPriorityTaskWoken = pdFALSE;
SioMsp_onPostTransFromISR(&xHigherPriorityTaskWoken);
if (xHigherPriorityTaskWoken == pdTRUE)
portYIELD_FROM_ISR();
} Thanks and Greetings |
Hey @kostaond , Assuming the problem is related to the DMA, is there anything I can do track the issue down or provide additional debug information? The documentation does not give much information about the topic. So I don't really know where to start. |
@lilalaunestift sorry for not replying, I was busy with other tasks. However, provided code still has room for uncertainty. We invested quite some time with the previous attempt using modified SPI Slave example. Therefore I would much appreciate, if you could provide fully functioning minimum project under which you are able to demonstrate the issue. We need to reproduce it at our side to move forward. I tried to discuss with team responsible for SPI and they indicated that the issue could be at HW design side (PCB)... |
Ok, I will try to create a minimal project. I guess this will take some days till I find the time. I will let you know. |
Hey,
The mentioned additional pins for the SPI bus are not used in this minimum project (they are set to a fixed state and do not participate in the communication). If I now use 'ping' to send ICMP packages to the esp32, roughly 7-10% of the messages are lost or damaged. I'm still using the same setup regarding IDF and HW as mentioned in the beginning. Greetings. |
Hey, |
Hi @lilalaunestift, yes, we've give it a try but we have some troubles. I'll get back to you once there is something to share. Please be patient. |
Ok, great. Thank you very much for the update. |
Hi @kostaond, |
Hi @lilalaunestift, we had issues with SPI master... At the end, I needed to implement it on bear-metal SAM3S MCU to achieve 4 ms period. Therefore it took a time to find appropriate hardware, prepare all the infrastructure and the test setup. Anyway, I was able to reproduce the issue with minimum code example you provided. The good thing is that I probably found the root cause of the issue. Your Rx buffer is not 32-bit aligned: typedef struct SData
{
uint8_t acData[258]; // !!!
}
Data_t; The memory alignment is required by DMA engine otherwise the DMA may write incorrectly or not in a boundary aligned manner.. When I changed the Rx buffer size to 256B and transmitted the SPI message with the same size, there were no lost ping packets (I tried with The problem is the driver didn't report error as it should have when incorrect aliment was used. I've already reported this issue to SPI colleagues. |
@kostaond Do the restrictions described on the linked page also apply to spi_master? |
Very good question, they apply. I'm not sure if check is correctly implemented in code though. I asked SPI team to double check. |
Hi @kostaond, |
Hi @kostaond, The not word aligned buffer you mentioned is something I introduced while creating the minimal example. Sorry for that. In our actual code the Data struct is only part of the bigger struct Frame_t which acts as receive buffer. But for simplification I removed the other part and only Data_t was left. Actually there are asserts that make sure the buffer is word aligned and has the correct length: #pragma pack(push, 1)
typedef struct SFrame
{
union
{
Data_t Data;
Packet_t Packet;
};
uint16_t nLen;
struct
{
ESioAddr_t eSioDstPortAddr;
ESioAddr_t eSioSrcPortAddr;
};
}
Frame_t;
#pragma pack(pop)
// make sure that some properties hold:
_Static_assert(sizeof(Frame_t) == 268, "wrong Frame_t Size");
_Static_assert(sizeof(Frame_t) % 4 == 0, "Frame_t Array must be word aligned");
_Static_assert(sizeof(Data_t) == 258, "wrong Data_t Size");
_Static_assert(sizeof(Packet_t) < 258, "wrong Packet_t Size");
_Static_assert(OFFSET(Frame_t, Data) % 4 == 0, "wrong Data Offset");
_Static_assert(OFFSET(Frame_t, Packet) % 4 == 0, "wrong Packet Offset"); The actual call to MSP_readblock looks like this: ...
static void SioMsp_rcvBuffer(Frame_t* pRcvFrame)
{
Msp_readBlock(&pRcvFrame->Data.acData[0], sizeof(Frame_t));
}
... where sizeof(Frame_t) is applied as length to the spi_slave_transaction_t struct. Anyways, I tested your suggested changes with the provided minimal example and I got the following results:
The data transmitted by our SPI master is (in 98% of the cases) 204B in length. Can you confirm this behavior with your setup? By the way, I did the tests with: ping <ip> -c 100 -i 0.5 Greetings |
@lilalaunestift if I set
This could be your workaround. However, something is probably wrong somewhere. I'll pass it to SPI team. My work is done here since it is beyond my specialization... I'm responsible for Ethernet... |
Hey @kostaond, I assume that the cause for the 1-2% losses I still observe in the case where the buffer is configured to 204B in length is, that some of our messages transmitted by the SPI master are shorter than 204B. So there are still some occasions where the size of the buffer and the message don't match. Anyways, thank you very much for your effort so far. I will then wait for some information from the SPI team. |
IDF SPI team hide in corner and scare a lot 🤣 |
@lilalaunestift I remember that, due to DMA HW architecture, for That means, you use esp32 as slave and use dma, you need config slave rx buffer address and length WORD aligned, meanwhile, master side should also write actually length align to WORD.
However other chips after ⭐ |
Hey @wanckl, Changing to another type of the esp32 is not an option since the product is already in the market for two years with the esp32. |
@lilalaunestift yes, but you mentioned By the way, you means even 5.0 is work fine ? |
Ok, sorry. I didn't get that you are talking about the length of the transmitted data. oSioSpi.nSndCnt = (pPacket->Header.cTotLen + 3) & 0xfffc; IDF5.0 is not tested. The issue occured while updating the code from IDFv4.3 to IDFv5.1 (the latest version at that time). |
So now issue is on slave side that slave can't receive correct data some time right ? slave send direction and master side is OK. beside, I think IDF SPI team is also going to spring festival, may no update several days,,, |
It's even worse. It seems the slave transactions are OK but they somehow affect transmit side of SPI master (SPI Ethernet DM9051) which is connected to the other SPI interface. |
Details of broken transactionRegarding the data corruption (see also one of the first comments):
In this example data should be transmitted to the ethernet controller. The same can be observed when data is received from the ethernet controller: When is the transaction brokenAs Kostaond says, the SPI slave communication is fine. It seems that the SPI Slave causes issues on the SPI master. In the attached pictures here you can see that the issue occurs, when SPI master transmits/receives data and SPI slave receives data at the same time. The first part of the received data is correct. But when the transaction on the SPI slave is finished, the data corruption on the SPI master starts and I can observe the repeating pattern as shown above. And it seems that the issue on the SPI master only occurs, when the received data on the SPI slave is shorter than the specified receive buffer size. |
Hi again, |
😢 A bit busy recently |
Hey @wanckl, are there any news on this topic? Updating the IDF now becomes important for us since we need some of the new features. I updated to IDFv5.4 and I still can observe the described issue. To summarize the issue: We are using two SPI buses: The HSPI is configured as master and communicating with a DM9051 board, while the VSPI is configured as slave communication with a MSP430i controller. With IDF4.3 everything was working fine. Since the update to to IDFv5.0 (and higher) I can observe corrupted data on the HSPI master bus. The corruption happens when data transmission on both SPI busses occurs simultaneously. Following is an example of the data corruption (with IDF v5.4): I updated the given minimum example earlier in this thread to the latest IDF Ethernet example: Tools.iConTrace.Firmware.Esp.zip I try to receive and answer ICMP packages. Reading the dm9051 input buffer, I can verify that the package is received correctly:
In the transmit buffer of the dm9051 driver I can observe that the answer is passed to the driver correctly:
But when observing the data transmitted on the SPI bus (with a logic analyzer), I can see that the data is corrupted:
The first part of the message is correct, but the second part is corrupted: The corrupted answer reads as follows (captured with WireShark):
Parts of the payload are suddenly repeated. This always starts, shortly after a message on the VSPI bus is received completely (See picture). My assumption is, that the DMA starts copying the received VSPI message at that point and interferes with the data for the HSPI bus. Receive and send buffer both are word aligned. See also this post and the following for that topic. Regards, |
One more question: Is this issue related? And is there a similar option for use with an external mac? |
Answers checklist.
IDF version.
v5.1.1-1-gd3c99ed3b8
Espressif SoC revision.
ESP32-D0WD-V3 (revision v3.0)
Operating System used.
Linux
How did you build your project?
VS Code IDE
If you are using Windows, please specify command line type.
None
Development Kit.
Custom Board
Power Supply used.
External 3.3V
What is the expected behavior?
Two SPI buses are used:
Reliable data transfer on both configured SPI busses is expected.
What is the actual behavior?
The communication of HSPI master is unstable. Approx. 10% of the messages are corrupted somehow.
This can be observed for both incoming and outgoing data:
For incoming data over the MISO line, it can be observed that data on the SPI bus sent by dm9051 is correct (via Logic Analyzer), but partly faulty data can be found in the receive buffer.
For outgoing data (MOSI), there is correct data in the send buffer, but partly faulty data can be observed on the SPI bus.
Steps to reproduce.
sdkconfig file:
sdkconfig.txt
Debug Logs.
More Information.
Is it possible that there is an issue on balancing the DMA usage? It seems that somehow the data is corrupted between the SPI bus and dm9051 send/receive buffer. At the same time I would not suspect a SPI issue, since the data is correct in most parts and the faulty parts are not arbitrary data (see log above).
The text was updated successfully, but these errors were encountered: