Recently we observed some glitches when running the Adafruit_Protomatter RGB LED matrix library on a new board. There can be various causes for this, but a common one is issuing data too fast for the matrix to keep up, and a common workaround for that is to forcibly slow down the code just a bit in key areas by adding NOP instructions — do-nothing opcodes that simply waste one CPU cycle.
That turned out not to be the problem this time. It was a slightly-too-high voltage powering the panel (5.3V) to where 3.3V signals aren’t reliably interpreted…possible fixes include level-shifting all the signals or just powering the matrix at a slightly lower voltage.
It had me wondering though: I’ve been adding NOPs for more and more boards in that library lately, and we’re only running it at 6 bits depth (to match Adafruit_GFX’s “565” color model). Meanwhile, bulletproof code like Henner Zeller’s rpi-rgb-led-matrix library for Raspberry Pi can run these things at 11 bits depth, allowing for gamma-correction of 8-bit values and other luxuries. Surely he must have figured out a super-fast way to clock data to the matrix, right? I’d just have to hook up a logic analyzer and see how he’s doing this, then try to model Adafruit_Protomatter’s behavior after that.
At a very gross level, one can see the two libraries are already doing the same approximate thing:
(There’s actually 6 color bits out: R1, G1, B1, R2, G2, B2, but it’s abbreviated to just B2 above to focus on the timing of other signals.)
All these RGB matrices use a technique called bit angle modulation. If you’ve ever done pulse-width modulation on LEDs, you’ve seen how duty cycle affects perceived brightness…longer “high” to “low” time in the pulses = brighter LED. Rather than linearly splitting the pulse time, bit angle modulation breaks it down into smaller intervals, each successive interval twice as long as the one preceding it: N microseconds, N*2 microseconds, N*4, N*8 and so forth. An LED can be on or off for any combination of those intervals, and adding up the “on” times determines its perceived brightness. You’d think it would be all weird and flickery, but our eyes perfectly integrate these pieces into a cohesive image…every LED billboard you’ve seen works this way.
You can see this doubling in the !OE (output-enable when low) signal here:
The 6 up-facing arrows are pointing out Protomatter’s !OE time doublings. 6 bits, 6 intervals, each 2X the one before.
The single down-facing arrow shows Protomatter clocking out data (6 RGB bits, but abbreviated to 3 in the image above). While !OE is low, it’s clocking out the data for the next time interval, so it just sort of follows (I thought) that the shortest time interval was a function of how fast those bits are sent. Gotta go fast.
Looking at just the CLK signal, on a Metro M4 with just enough NOPs to not outpace the matrix, shows a bit-to-bit period of 58 nanoseconds (about 17 MHz):
At 11 bitplanes, my assumption was the Pi code must be running it incredibly fast, that maybe just my off-kilter high-to-low clock signal shape is the culprit. Let’s take a look at rpi-rgb-led-matrix on the scope:
100 nanoseconds. 10 MHz. What the…?
Zoom out. Facepalm.
Look at the least 5 bits here. The time-to-issue is equal. They’re not doubling. And yet it manages that flawless 11-bit output.
Zoom in. Enhance.
Looking at the CLK signal, the least 5 bits are indeed about equal time to issue…and at 10 MHz, not exceptionally fast, but nice and stable. But look at the !OE time…it’s doing the correct bit-angle doubling thing, just using a small “on” time while the next data is issuing.
In other words: I was simply stuck on a wrong idea and was asking the wrong questions. Although Protomatter is currently written where the least-bit !OE time is a function of data-writing speed, that’s not a law of the universe, of course !OE could run for some fraction of that period. I was so focused on maximizing every bit of “on” time for MAXIMUM BRIGHTNESS!!!1! that I simply didn’t think to give up a couple imperceptible percent in exchange for better flexibility. So…that’ll likely be fixed in some future release. Maybe not right away because those extra bitplanes will need a lot of RAM…but as microcontroller capacities expand, it’ll become less of a burden.
Moral of the story: question your assumptions, ask better questions.
Additionally though, while I was in there, take a look at Protomatter’s clock signal up close. What’s with those longer blips?
Those are an artifact of the Protomatter code partially “unrolling” its loops. For each group of six bits out (R1, G1, B1, R2, G2, B2), there are 8 identical copies of that code, and then it repeats this four times in a loop to achieve 32 columns out. Those longer CLK bits come from branching and testing the loop, which takes a few extra cycles.
The unrolling of that loop was a carryover from the old AVR RGBmatrixPanel library, where it really helps on a 16 MHz 8-bit chip. And it does still help a little on the 48 MHz Feather M0 as well. Once we get to the M4 though, or nRF52 or anything else with a bit more speed at the core and NOPs were added…this is lunacy. Unrolling the loops for speed, only to add NOPs to throttle it? That was a cheap, band-aid fix, right up there with David Marcus using protomatter in the Genesis matrix. So this too will get some work in the future. Un-unroll those loops and simply make use of the branch-and-test time as part of the speed throttle, probably free up a couple hundred bytes of flash in the process.
Other moral of the story: logic analyzers are really handy. Before I do any more Protomatter code I’m gonna design a little passthrough/breakout board in EAGLE so I can instantly plug in and watch these signals, because there’ll be a lot more of this going forward.