-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RMT Buffer Allocation Fix for Issue #375 #392
base: master
Are you sure you want to change the base?
Conversation
…eing of RMT resources
Lgtm
That free/malloc could be a realloc, but overall I like it.
The whitespace is inconsistent. It was before, too, but it still is.
Nicely done. Thanks.
…On Sun, Jun 9, 2024, 3:23 PM teknynja ***@***.***> wrote:
This pull request addresses issue #375
<#375> [pixel.show()
crash with more than 73 pixel on ESP32s3]
After reviewing the code and also getting some helpful feedback from the
Espressif Forums (https://www.esp32.com/viewtopic.php?f=13&t=40270) and
@robertlipe <https://github.com/robertlipe> it was determined that the
code in esp.c for handling the RMT item buffers when using the IDF v5
framework was allocating too much space on the stack when using more than
70ish pixels.
This pull request addresses that issue by allocating the RMT buffers from
the heap instead. It will attempt to allocate a single block of memory to
accommodate the largest configured instance (sharing the buffer between
instances is fine, as the buffer is completely populated each each time the
espShow() method is called).
I also took the time to improve the channel allocation management,
previously the RMT channels were initialized on each call to espShow(),
now the RMT channels are only de-initialized and re-initialized whenever
the output pin is changed.
Finally, I was concerned about problems that may be caused by allocating
large buffers on the heap without giving the user any way to free that
memory, so the code allows a user to free that memory (and also release the
RMT channels) by setting the number of pixels to zero using
.updateLength(0) and then calling .show(). This will de-allocate the heap
memory used for the RMT buffers and release the RMT channels held by
driver. They will automatically be re-allocated if needed when setting the
number of pixels back to a non-zero value.
It seems one of the primary goals on this project is to minimize the
distribution changes across files, to that end you'll find that all the
changes here are constrained to a single method in the esp.c file. A
different approach would be required to address thread-safety and instance
sharing issues, but it seems like there is existing code that is
problematic in those areas as well so those goals were not prioritized. I
think for a majority of the users of this library, those are not going to
be issues anyways.
------------------------------
You can view, comment on, or merge this pull request online at:
#392
Commit Summary
- 3ceb712
<3ceb712>
Allocate RMT buffer from heap, better RMT channel handling, allow freeing
of RMT resources
File Changes
(1 file <https://github.com/adafruit/Adafruit_NeoPixel/pull/392/files>)
- *M* esp.c
<https://github.com/adafruit/Adafruit_NeoPixel/pull/392/files#diff-7f4fa73e74bcd3b92630bbfea26f271bb201a49731eed3e8629b47ad20782c1a>
(83)
Patch Links:
- https://github.com/adafruit/Adafruit_NeoPixel/pull/392.patch
- https://github.com/adafruit/Adafruit_NeoPixel/pull/392.diff
—
Reply to this email directly, view it on GitHub
<#392>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACCSD37OL72A6ZYG3VW2FWTZGS2ULAVCNFSM6AAAAABJBFFFZGVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM2DENBXGI3DAMQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Cool, thanks! As for the free/malloc vs realloc, I just copied how the |
This may or may not be the root of #380 and #332, too. Looking more at overall context, there's hopefully some kind of lock held way above this to make it thread safe anyway. This code is about to honk on the RMT registers which are clearly globals. If you have two shows()s running on two different cores and there's not something above this to stop the instance from each reprogramming the RMT device, this code is already going to have a bad day. Hopefully some code that's calling this is already responsible for grabbing an mtx_lock()/pthread_mutex_lock before multiple shows()s can be called. My concerns about multithreading are either unfounded (it's handled elsewhere) or a canary in a coal mine (it's not handled and it just a latent ticking bomb). Perhaps there is locking at the rmt driver level and we really DO need to implement something similar up here. Arduino doesn't work like other systems I'm familiar with, so I don't know the expected locking heirarchy or appropriate tools here (C++ threading? FreeRTOS? C11?) or the synchronization model for rmtWrite() (is it guaranteed that it's done with the buffers when it returns or might it still be DMA'ing out of that block when it returns?) or such. I'll defer to an actual Arduino dev on that. So this definitely leaves it better than we found it, but this code still looks 'sus. |
I completely agree with you on these safety issues. There are a few different areas of consideration that come to mind regarding this:
All this being said, I think the currently proposed code does the best it can given then current environment, and at least allows users the ability to drive larger numbers of pixels when building against Core v3 (of course, with caveats). Properly addressing the above issues will likely require a fair bit of restructuring of the library code, and feels out-of-scope for fixing the actual problem users are currently experiencing. When the above issues are addressed, all of the ESP32 code (including the old Core v2 code) will need to be re-worked to bring it up to the level of safety we are considering here. |
Just for giggles, I created two instances of the Adafruit_NeoPixel class each with 500 pixels on the ESP32-C3 (along with my custom IR driver) and am calling (BTW, I do not have 1000 pixels laying around to play with, I have 150 on one pin and 50 on the other, but neither of them are showing artifacts or glitches) |
We agree pretty much across the board. The cranky view is that this is
pretty typical Arduino code - oblivious to such issues and is fine with
everything being essentially a global.
I have on my TODO list the replacement of an Arduino IR driver that's about
140KLOC that I'm pretty sure I can replace with < 20 lines of ESP-IDF code.
See above.
Better API design would expose that unfortunate 3x the buffer thing to the
caller. Depending on the architecture (and how desperate for memory you
are) you might either want that buffer shared, with multiple DMAs blasting
pixels out different fractions of it or separately using it round-robin or,
with different threads filling different channels. That buffer is likely to
be the elephant in the room for any non-trivial number of LEDs, so
pretending it doesn't exist just seems unwise.
With the prowess of the surrounding code, stopping just short of a
replacement (and I think Adafruit calls that replacement pxl8, though I've
not scrutinized it though there are still other other replacements around -
without Arduino heritage) that may have designs more likely to remain
assembled under stress.
Remember, too, that C3 is still a single-core product. All of the current
Xtensa family (ok, except S2, which was pretty much the beta version of S3
from what I can tell) has dual cores and are generally more likely to run
into this kind of thing. $35 can get any of several 8-core boards. It's not
like actual multi-threading is something you can pretend isn't a concern on
contemporary code.
Where's our dual-core Espressif part with an WS2812 shifter on it so we
don't NEED to abuse RMT like this with such a buffer blow up, anway? Take
my money!
You left it better than you found it. :-) In the context of the surrounding
code, take the fix and move on to replacing or creating a version that's a
better fit for the hardware.
Thanx and good luck!
…On Mon, Jun 10, 2024 at 4:16 PM teknynja ***@***.***> wrote:
Just for giggles, I created two instances of the Adafruit_NeoPixel class
each with 500 pixels on the ESP32-C3 (along with my custom IR driver) and
am calling .show() on every pass through loop (with no delay(n) calls)
and everything seems to be working - no weird timing/dropped pixels on the
strips, and the IR timing seems to be fine as well - so it looks like the
RMT functions appear to be doing their job. The Adafruit driver being
called across instances seems like it isn't causing any problems either.
I'm sure it would all blow up if i tried calling them from different
threads though...
(*BTW, I do not have 1000 pixels laying around to play with, I have 150
on one pin and 50 on the other, but neither of them are showing artifacts
or glitches*)
—
Reply to this email directly, view it on GitHub
<#392 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACCSD3ZDOVWW4X4RVOYVWRLZGYJTDAVCNFSM6AAAAABJBFFFZGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJZGI4TQMJWG4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I created a torture-test program that drives 8 instances of Adafruit_NeoPixel driver with 150 pixels each, all running on their own thread: #include <Adafruit_NeoPixel.h>
#define PIXELS_PER_PIN 150
#if defined(ARDUINO_XIAO_ESP32C3)
const uint PIXEL_PINS[] { D0, D1, D2, D3, D4, D5, D6, D7 };
#elif defined(ARDUINO_ESP32_DEV)
const uint PIXEL_PINS[] { 22, 21, 19, 18, 5, 17, 16, 4 };
#else
#error Please define pins for your board here!
#endif
const uint PIN_COUNT = sizeof(PIXEL_PINS) / sizeof(uint);
uint task_ids[PIN_COUNT];
void PixelTask(void *pvParameters);
void setup() {
Serial.begin(115200);
delay(2000);
Serial.printf("NeoPixel RMT Threaded Torture Test (%u Pins/Threads)\r\n", PIN_COUNT);
for (uint i = 0; i < PIN_COUNT; i++) {
task_ids[i] = i;
xTaskCreate(PixelTask, "PixelTask", 2048, (void *)&task_ids[i], 1, NULL);
Serial.printf("Created PixelTask %u for pin %u\r\n", i, PIXEL_PINS[i]);
}
}
void loop() {
Serial.println("Hello from the main thread...");
delay(10000);
}
void PixelTask(void *pvParameters) {
uint task_id = *((uint *)pvParameters);
unsigned long period = 500 + random(1000);
Serial.printf("Started PixelTask %u with period %ums\r\n", task_id, period);
static const uint MAX_COLORS = 3;
uint color = 0;
Adafruit_NeoPixel *pixel_driver = new Adafruit_NeoPixel(PIXELS_PER_PIN, PIXEL_PINS[task_id], NEO_GRB + NEO_KHZ800);
pixel_driver->begin();
unsigned long update_timer = 0;
while (true) {
unsigned long now = millis();
if (now - update_timer > period) {
update_timer = now;
Serial.printf("PixelTask %u updating pixels for pin %u\r\n", task_id, PIXEL_PINS[task_id]);
for (uint i = 0; i < PIXELS_PER_PIN; i++) {
switch (color) {
case 0:
pixel_driver->setPixelColor(i, 0x40, 0x00, 0x00);
break;
case 1:
pixel_driver->setPixelColor(i, 0x00, 0x40, 0x00);
break;
case 2:
pixel_driver->setPixelColor(i, 0x00, 0x00, 0x40);
break;
default:
pixel_driver->setPixelColor(i, 0x00, 0x00, 0x00);
break;
}
}
color = (color + 1) % MAX_COLORS;
}
pixel_driver->show(); // Note we are calling .show() every scan for extra stress!!!
}
Serial.printf("Ending PixelTask %u\r\n", task_id);
vTaskDelete(NULL);
} As expected, this caused the devices to crash almost immediately. I then added mutex protection inside the Let me know if I should push the mutex changes here for review... |
Nicely played!
As a former OS designer for large systems, if I can have only one, I'll
take "correct" over "pretty" every single time. It doesn't matter how
pretty the code is if it's wrong, IMO.
TBC, I have no review power here. I saw your (?) struggles on esp32.com,
had some hopefully relevant experience, and came here to trick you into
doing the real work. :-) But my approval is not what you need.
Unfortunately the actual code owner seems to be pretty quiet so far.
I'd encourage you to publish your change in another fork/PR. If it's
accepted by the maintainer, jolly good! If it's not and it helps someone
that's struggling with this, that's just wonderful. The maintainer is free
to integrate either the low-risk, intellectually simple fix for the crash
that's absolutely being observed in the wild and affecting multiple people
today OR the version that may add some overhead/beauty-impairment in
non-threaded environments, but is rock solid.
Thanx for sticking with this!
RJL
BTW. It's absolutely not he point, but stylistically, I'd prefer
const uint PIN_COUNT = sizeof(PIXEL_PINS) / PIXEL_PINS[0];
over
const uint PIN_COUNT = sizeof(PIXEL_PINS) / sizeof(uint);
If you later change the type of PIXEL_PINS to something that's another
size, admittedly unlikely in this case, that's one less place you have to
keep it in sync. (Maybe they're evantually pins on a GPIO MUX or you need
to encode bank numbers as a tuple or something.) Of course, C++ provides
much better safety for this kind of thing in general. There, pixel_pins
could be a std::array and then you can just use a .size() method which
decays into a constant expression anyway for this case..
…On Tue, Jun 11, 2024 at 10:22 AM teknynja ***@***.***> wrote:
I created a torture-test program that drives 8 instances of
Adafruit_NeoPixel driver with 150 pixels each, all running on their own
thread:
#include <Adafruit_NeoPixel.h>
#define PIXELS_PER_PIN 150
#if defined(ARDUINO_XIAO_ESP32C3)
const uint PIXEL_PINS[] { D0, D1, D2, D3, D4, D5, D6, D7 };
#elif defined(ARDUINO_ESP32_DEV)
const uint PIXEL_PINS[] { 22, 21, 19, 18, 5, 17, 16, 4 };
#else
#error Please define pins for your board here!
#endif
const uint PIN_COUNT = sizeof(PIXEL_PINS) / sizeof(uint);
uint task_ids[PIN_COUNT];
void PixelTask(void *pvParameters);
void setup() {
Serial.begin(115200);
delay(2000);
Serial.printf("NeoPixel RMT Threaded Torture Test (%u Pins/Threads)\r\n", PIN_COUNT);
for (uint i = 0; i < PIN_COUNT; i++) {
task_ids[i] = i;
xTaskCreate(PixelTask, "PixelTask", 2048, (void *)&task_ids[i], 1, NULL);
Serial.printf("Created PixelTask %u for pin %u\r\n", i, PIXEL_PINS[i]);
}
}
void loop() {
Serial.println("Hello from the main thread...");
delay(10000);
}
void PixelTask(void *pvParameters) {
uint task_id = *((uint *)pvParameters);
unsigned long period = 500 + random(1000);
Serial.printf("Started PixelTask %u with period %ums\r\n", task_id, period);
static const uint MAX_COLORS = 3;
uint color = 0;
Adafruit_NeoPixel *pixel_driver = new Adafruit_NeoPixel(PIXELS_PER_PIN, PIXEL_PINS[task_id], NEO_GRB + NEO_KHZ800);
pixel_driver->begin();
unsigned long update_timer = 0;
while (true) {
unsigned long now = millis();
if (now - update_timer > period) {
update_timer = now;
Serial.printf("PixelTask %u updating pixels for pin %u\r\n", task_id, PIXEL_PINS[task_id]);
for (uint i = 0; i < PIXELS_PER_PIN; i++) {
switch (color) {
case 0:
pixel_driver->setPixelColor(i, 0x40, 0x00, 0x00);
break;
case 1:
pixel_driver->setPixelColor(i, 0x00, 0x40, 0x00);
break;
case 2:
pixel_driver->setPixelColor(i, 0x00, 0x00, 0x40);
break;
default:
pixel_driver->setPixelColor(i, 0x00, 0x00, 0x00);
break;
}
}
color = (color + 1) % MAX_COLORS;
}
pixel_driver->show(); // Note we are calling .show() every scan for extra stress!!!
}
Serial.printf("Ending PixelTask %u\r\n", task_id);
vTaskDelete(NULL);
}
As expected, this caused the devices to crash almost immediately. I then
added mutex protection inside the espShow() method to protect the led_data
buffer and the rmtXxxx() calls. Running the above code now works reliably
with no glitches/artifacts on both my C3 and S3 chips. I think this
solves the remaining safety questions as discussed. (*It's not pretty,
but it seems to work!*)
Let me know if I should push the mutex changes here for review...
—
Reply to this email directly, view it on GitHub
<#392 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACCSD3YX4OVPZ27JDWUHNBDZG4I25AVCNFSM6AAAAABJBFFFZGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRRGAZTMOJVGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
OIC - I wasn't sure about your relationship with this project, but I guess if you hadn't come along to encourage me (and the maintainers stayed quiet) then I probably wouldn't have tried to tackle this. This is essentially my first pull request for an open-source project, so I know nothing about the protocol. I kind of thought that simply pushing additional changes to my fork would end up in this pull request. Would it be confusing for the maintainer to have two separate forks/pull requests from the same person for the same issue? TBH the changes to add the mutex weren't all that invasive (but it did change the code flow a bit). I see your point about the array size, I don't do a lot of c/c++ coding (Internet to the rescue!), but I will keep that in mind for the future! |
Since we've heard only silence from the maintainers (grrr.) it's hard to
tell how they'd react, I'm afraid. Since you already have "paid for" that
better fix, I'd probably go ahead and make a PR for it, too. and just be
sure to reference each in the other. ("This is the low invasion fix" and
"This is the better fix.") I'd think the common knee jerk reaction ("but
what if the person isn't using threads!") doesn't apply to ESP32 because
the official SDK (ESP-IDF) and thus all the layers of goo smeared upon it
("Arduino") are going to have lock primitives naturally. Some environments
like, oh, Nuttx may use different ones, but I think all of those added
together are such a minority (and besides, this project is very heavily
Arduino accented anyway) that it might not be a concern. If it IS a
concern, we could probably #ifdef away the primitives to make them easy to
replace.
For your first project, you nailed the social and coding aspects.
Congrats! Now you just have to wait to get a maintainer's attention. :-|
As for me, I've been involved in open source since the 80's and have been
involved in lots and lots of projects through the years. I joined the
disabled in 2017, so I now have time on my hands - much of this coaching
has been from bed - to read and cheerlead those that have taken the time to
get their heads around a problem (the hard part) and just need a little
help to land it. My interest in blinkies comes from nightdriverled.com as
there I'm heavily involved in the lowest-level stuff. So while I wasn't
familiar with this code in particular, I knew how it had to work
internally. Fortunately, it's not a case of needing to decode signal timing
issues or anything gnarly, it was the most common of mistakes in C/C++ -
walking off the end of an array.
As trivia, if this had been a c++ object (made by operator new[] an not
malloc) and you had accessed the members via at(i) instead of [i], it would
have generated an exception. There's a runtime cost to that, of course, and
in this case it's probably not worth it. I just mention it as part of my
tutoring service. :-) I'd probably then rearrange those nested, tight loops
to do the deref only once, ,something more like:
light_data = new rmt_data_t[required_size);
and then...
auto ld = light_data.at(i);
loopity loopity...
ld.level0 = 1;
ld.duration0 = 8;
Or you could put it into a vector, let the vector handle the resize and the
bound checking...
static std::vector<rmt_data_t> light_data(0);
...figure out how big it needs to be
if (required_size > light_data.size()) {
light_data.resize(required_data
}
then you could use light_data.at(i) and it'll throw an exception if you
walk over the bounds.
HOWEVER, Arduino tends to be cranky about C++ code actually looking like
C++ code, even when it give readible and other tangible benefits. This
might look too different that the other blocks and, besides, now that it's
fixed, it should never overrun again, right? (Of course, if it had been
written in this style to start with, we wouldn't be here.)
So I probably wouldn't bother with version 3 of this patch doing this. I'm
just saying that not every access past the ends of an array-like thing (a
vector or anything from operator new that thus actually knows the allocated
size, as opposed to a dumb pointers) doesn't HAVE to share the 1960's-era
traits inherited by C. If you're on one of those 8-bit parts that's the
namesake, those extra dozen bytes and corresponding update in skills might
cause stress. On a ral computer, it's usually worth it.
Anyway, I'm off in the weeds. I'd propose submitting both versions to our
reviewer and let them pick the best for the project. Either one will save
the crash, which is where we are and one will save the crash in threading,
which is where people will soon be once it quits crashing. :-)
Thanx again!
…On Tue, Jun 11, 2024 at 11:18 AM teknynja ***@***.***> wrote:
OIC - I wasn't sure about your relationship with this project, but I guess
if you hadn't come along to encourage me (and the maintainers stayed quiet)
then I probably wouldn't have tried to tackle this.
This is essentially my first pull request for an open-source project, so I
know nothing about the protocol. I kind of thought that simply pushing
additional changes to my fork would end up in this pull request. Would it
be confusing for the maintainer to have two separate forks/pull requests
from the same person for the same issue? TBH the changes to add the mutex
weren't all that invasive (but it did change the code flow a bit).
I see your point about the array size, I don't do a lot of c/c++ coding
(Internet to the rescue!), but I will keep that in mind for the future!
—
Reply to this email directly, view it on GitHub
<#392 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACCSD3ZVZYJW3Z6JMFAMKXLZG4PNHAVCNFSM6AAAAABJBFFFZGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRRGE2TCMZVGM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
NOTE: This pull requests precedes pull request #394 - I've since added a mutex for thread safety, I'm leaving this in place so both versions are available for evaluation.
This pull request addresses issue #375 [pixel.show() crash with more than 73 pixel on ESP32s3]
After reviewing the code and also getting some helpful feedback from the Espressif Forums (https://www.esp32.com/viewtopic.php?f=13&t=40270) and @robertlipe it was determined that the code in esp.c for handling the RMT item buffers when using the IDF v5 framework was allocating too much space on the stack when using more than 70ish pixels.
This pull request addresses that issue by allocating the RMT buffers from the heap instead. It will attempt to allocate a single block of memory to accommodate the largest configured instance (sharing the buffer between instances is fine, as the buffer is completely populated each each time the
espShow()
method is called).I also took the time to improve the channel allocation management, previously the RMT channels were initialized on each call to
espShow()
, now the RMT channels are only de-initialized and re-initialized whenever the output pin is changed.Finally, I was concerned about problems that may be caused by allocating large buffers on the heap without giving the user any way to free that memory, so the code allows a user to free that memory (and also release the RMT channels) by setting the number of pixels to zero using
.updateLength(0)
and then calling.show()
. This will de-allocate the heap memory used for the RMT buffers and release the RMT channels held by driver. They will automatically be re-allocated if needed when setting the number of pixels back to a non-zero value.It seems one of the primary goals on this project is to minimize the distribution changes across files, to that end you'll find that all the changes here are constrained to a single method in the esp.c file. A different approach would be required to address thread-safety and instance sharing issues, but it seems like there is existing code that is problematic in those areas as well so those goals were not prioritized. I think for a majority of the users of this library, those are not going to be issues anyways.