I’ve been tasked with developing a deeper knowledge of Espressif’s ESP32 family in a hurry, and encouraged to blog any interesting findings along the way.
Up to this point, I’ve largely left ESP32 to others. Lacking a Johnny Mnemonic memory doubler for my brain, it felt like one thing too many with my head already full of SAMD and AVR detritus. But the global chip shortage has suddenly tipped the scales, and with ESP32 devices available while others are constrained, adding that knowledge is suddenly an urgent necessity, not luxury!
Pressing issue of the moment has been the Adafruit_Protomatter library for HUB75 RGB LED matrices. Successive generations of ESP32 have each presented some interesting GPIO hoops.
Adafruit_Protomatter is written strangely in that it tends to avoid specialized peripherals (such as PIO on the Raspberry Pi RP2040), instead favoring old school AVR-like GPIO twiddling. This might offend one’s programming sensibilities in that it’s not optimal…but it’s not pessimal either, and up to this point was helpful in that the same inner loop works across half a dozen different devices, by relying on only the simplest and most plentiful peripherals (GPIO and a timer interrupt). Most important, not being tied to sequential GPIO pins or similar peripheral constraints, the exact same FeatherWing-to-HUB75 adapter works across several different Feather boards with different pin arrangements (only the nRF52840 is an exception).
A detailed discussion of HUB75 matrix driving is impractical here, but plenty has been written about it elsewhere if you Google around. The basic idea is that these matrices lack a full framebuffer. Instead, the microcontroller must continually refresh the LEDs, row by row and bitplane by bitplane. Six bits of color are issued in parallel (red, green and blue for each column of two rows of the matrix), clocked by a write-strobe bit. To get decent refresh rates and not flicker too badly, a 16 to 20 MHz clock is ideal.
Thus far, every 32-bit MCU we’ve adapted this to has provided sufficiently fast GPIO through some PORT registers…not pin-by-pin one at a time, but where these can be accessed up to 32 concurrently. Usually there’s one register that accepts a set-bits mask, one for a clear-bits mask, or just write a whole new pin state to the PORT. Some of the fancier chips have a toggle-bits mask. So Adafruit_Protomatter’s inner data-issuing loop has typically looked like:
- Set 6 bits as needed for RGBRGB
- Set clock bit
- Clear all 7 bits
- Repeat
The first-generation ESP32 has an interesting quirk: applying those first two bit-setting operations in rapid succession, the second one doesn’t take effect! Workarounds are to either delay the second bit-set with some dodgy, empirically-derived NOPs…or, through experimentation, it was found that interleaving access to the bit set and clear registers would automatically provide wait-states as needed, and it’s totally valid to pass a zero bitmask. So on that one chip specifically, the inner loop changes to:
- Set 6 bits as needed for RGBRGB
- Clear no bits (write “0” to clear-bits register)
- Set clock bit
- Clear all 7 bits
- Repeat
This worked, and the ESP32 could reliably deliver decent refresh rates.
Then an odd thing occurred when adding support for the ESP32-S2 (and S3, more on that later): only minimal changes were required to the code, the same GPIO registers are there and it compiles and runs…but the LED matrix refresh rate was visibly slower!
After much experimenting, there seems to be an inherent GPIO speed limit on the new part, that perhaps it’s running on a different clock domain. With the following Arduino code and monitoring on an oscilloscope, you’d expect (from the prior ESP32) for the pin state to toggle at 20 MHz or better…but in practice, it was a steady 8 MHz:
volatile uint32_t *set = (volatile uint32_t *)&GPIO.out_w1ts; volatile uint32_t *clear = (volatile uint32_t *)&GPIO.out_w1tc; const uint32_t bit = 1 << 13; void setup() { pinMode(13, OUTPUT); for(;;) { *set = bit; *clear = bit; } } void loop() {}
Such throughput wouldn’t suffice on anything but the smallest matrices. Other, faster LED matrix implementations on ESP32 tend to use the LCD controller peripheral in interesting ways. While not opposed to that, it would require a pretty radical departure in the library code for one specific chip.
A seeming workaround was found in an esoteric ESP32-S2-specific peripheral called Dedicated GPIO, which allows faster access to GPIO operations on up to 8 pins. This would still be a departure in the code, but much less drastic than the LCD controller approach.
Dedicated GPIO requires a bit of extra setup, so here’s what some similar line-toggling code might look like:
#include <driver/dedic_gpio.h> #include <soc/dedic_gpio_struct.h> const int pins[] = { 13 }; dedic_gpio_bundle_config_t config_in = { .gpio_array = pins, // Array of GPIO numbers (up to 8) .array_size = 1, // Number of elements in pin list .flags = { .in_en = 0, // Disable input .out_en = 1, // Enable output .out_invert = 0, // Non-inverted } }; dedic_gpio_bundle_handle_t bundle; void IRAM_ATTR toggle_forever() { DEDIC_GPIO.gpio_out_cpu.val = 0; // Use GPIO registers, not CPU instructions for(;;) { DEDIC_GPIO.gpio_out_drt.gpio_out_drt_vlaue = 0x01; // Set channel 0 high DEDIC_GPIO.gpio_out_drt.gpio_out_drt_vlaue = 0x00; // Set channel 0 low } } void setup() { dedic_gpio_new_bundle(&config_in, &bundle); toggle_forever(); } void loop() {}
(“vlaue” was obviously a typo when they were naming registers…but once named, have to stick with it for existing code’s sake, the above is correct)
This can now toggle the line at about 10 MHz instead of 8. Still too slow. But there’s more than one way to interact with Dedicated GPIO. A particular register, gpio_out_idv, provides operations similar to the set-bits and clear-bits registers of regular GPIO. The “gotcha” is that each of the 8 outputs is controlled by 2 adjacent bits, so if trying to issue graphics or specific bit patterns, you’ll need some bit math or a pass through a lookup table to do a 6- to 12-bit expansion.
Changing the two lines inside the channel-toggling loop to access this other register:
DEDIC_GPIO.gpio_out_idv.val = 1; // Set CH0 DEDIC_GPIO.gpio_out_idv.val = 2; // Clear CH0
…yields about 18.5 MHz. Fast enough on its own…but adding the necessary overhead to read through an image in memory, it was still a bit too slow.
For whatever reason though, using the toggle-bits operation is just a smidge faster, yielding a steady 20 MHz:
DEDIC_GPIO.gpio_out_idv.val = 3; // Toggle CH0
Adafruit_Protomatter was already written to use bit-toggling operations if it can benefit, and so even with the image lookup it manages about 17 MHz on the clock strobe line, just enough for matrix driving. And with the excellent pin-MUX capabilities of the ESP32, this can work with the standard RGB Matrix FeatherWing, no need to cut or jumper anything. ESP32-S2 was saved!
All seemed good until attempting that same code on the ESP32-S3…it wouldn’t even compile. One might assume from the name that the S3 is just an S2 with some added flair, but it turns out to be an altogether different creature where some of the peripherals are concerned. Dedicated GPIO is no longer offered on the S3…only the conventional 8-MHz-toggleable GPIO is present.
So this will, finally, require that radical departure in the library, probably using the LCD driver peripheral…which, on the S3, has taken a different turn from how this worked on the prior chips (where it was a sub-feature of the I2S peripheral). This also requires learning about ESP32 DMA, so there’s entire layers of Learning Experience still to unfold…