What a journey. As discussed through several prior posts (1) (2) (3), a situation arose with getting the Adafruit_Protomatter RGB LED matrix library running smoothly on the ESP32-S2 and -S3 chips, where it turns out conventional GPIO via PORT registers was a bottleneck.
ESP32-S2 was resolved by using the Dedicated GPIO peripheral unique to that chip. For ESP32-S3, idea was to repurpose the LCD controller peripheral to provide general-purpose parallel output…a not-uncommon trick on the original ESP32 and -S2. Previously the LCD controller was a sub-module of the I2S peripheral, but with the -S3 it’s now a fully separate peripheral with a whole new set of registers. This newness means there aren’t yet any good tutorials or community examples to learn from. Espressif does have a solid library for using this as an LCD controller as intended, but hacky general-purpose parallel out was unexplored territory.
Early attempts were tantalizing but messy…
“Adafruit” should start at the leftmost pixel, and the shapes should have clean, single-pixel outlines. Something’s causing a weird stuttering, and no amount of register-fiddling would get this right. Out came the logic analyzer to see what was really going on…
The top 6 signals are the RGB data (two rows each of red, green and blue bits), and a couple lines below this is the pixel clock…there should be one rising edge per pixel, but for some reason there was randomly an extra clock pulse before the first pixel, causing this shimmery jumble. If it was a consistent extra clock tick we could work around it, but it’s random.
This was vexing enough that we ran it by Espressif engineers. Rather than inflict all of Protomatter on them, and which would also require having an LED matrix wired up, a simplified test program was made that just counts 0-63 on the 6 signal pins, then repeats. They were able to produce a version that didn’t have the random clock pulse and looked correct on a logic analyzer…but when rolled into the Protomatter library, a new problem appeared:
What’s occurring is that the matrix data is one-bitplane-and-scanline out of phase, hence the “shadow” under the Adafruit text and shapes, and that partial white row in the middle.
I puzzled over this for days before figuring out the problem was really in the test example I sent them. You wouldn’t know it from looking, but repeating the same data, and that it’s exactly 64 bytes, masked some underlying issues.
The LCD peripheral has a FIFO (first-in, first-out) output buffer. Coincidentally, discovered it’s the exact same size (64 bytes) as one bitplane of one row of this particular matrix…or the same size as the 64 bytes issued by the test program. And since the test program was issuing the same data over and over, sized exactly to that FIFO, it’s not obvious that each transfer was really showing the prior 64 bytes…or, in Protomatter with a 64-pixel-wide matrix, that the matrix is receiving buffered data from one bitplane ago. I don’t recall the FIFO size being documented, these numbers were all coincidence, but I know better than to test things like this with obvious powers of two! Had I picked a prime number like 53 for the test buffer size, and alternated between two different buffers, the issue would have jumped out!
Moving forward with this FIFO knowledge, one solution would be to have Protomatter always issuing 64 bytes ahead. Though possible, this would require a big rewrite specifically for the ESP32-S3, which I’d like to avoid…and the logic gets more complex than expected because not all matrices are 64 pixels wide; they come in a variety of sizes. Messy.
A different solution, and the one that ultimately worked in this case, is to reset the FIFO at the start of each scanline/bitplane, so it’s getting the latest, freshest data. But when doing this, that’s when the random initial clock pulse appears. Having no data in the FIFO at the start of a transfer causes this unstable clock/data synchronization.
Fix is to enable a “dummy pulse” (their term) on the clock line at the start of the transfer…putting in that extra clock bit on purpose. Because the LED matrix is just a big shift register, the extra clock (and contents of the data lines) shift harmlessly off the end of the matrix. There are some configuration bits on the LCD peripheral to enable this extra pulse. Additionally, it was necessary to lengthen the “data out” period by one clock cycle to compensate, else the last data wouldn’t transfer (only 63 bytes, so the image was stable but short by one column). Shouldn’t be needed, but is.
That extra clock pulse may be problematic for other situations, but with the matrix shift register it works fine. And in the future if we make use of the LCD peripheral for other parallel-out tasks such as driving concurrent NeoPixel strands, certain tasks like that don’t even need the clock pulses, just the data lines…all this “dummy pulse” stuff is likely not needed in many situations.
MORAL OF THE STORY: if you really want to test something, throw prime numbers at it. As computer people, we have this tendency to use computer-y powers of two, and that was my undoing, not meddling kids and their dog.
OTHER MORAL OF THE STORY: if you’ll be using the ESP32-S3’s LCD peripheral to do hacks like this, you might struggle if taking the Technical Reference Manual at face value. At a minimum, a 2-channel oscilloscope can help to observe the LCD clock and one data line…or better yet, an 8- or 16-channel logic analyzer will capture everything in parallel. Over time, more community examples of hacking this peripheral are likely to appear, and such tools might be unnecessary. But for now, definitely.