The web is abuzz about this video, showing a sneak-preview of a (possibly) upcoming photoshop plugin/filter. It demonstrates a neat concept — deblurring an image using algorithmic processing. How does it work? Well, I’m not exactly sure, but here’s my hypothesis:
First, it analyzes the “shape” of the camera shake — probably by isolating a point-source highlight (using either high-pass filtering or image matching, or both) — then, it uses this shape to generate a path along which it maps point spread functions. A point spread function is sort of like an impulse response function, with the difference being that while an impulse response tracks a single-dimensioned value with respect to time, a point spread function gives you the response of an imaging system (2-D) to a point source. They’re both basically the same idea, though, and you can apply the same techniques to both. Further, by generating this path, you can map the point spread function in terms of space (because it’s two-dimensional) and time. And this is where it gets really cool:
Just like an LTI impulse response, you can deconvolve the output (the blurry image) with your new mapped-in-time point spread function, and get something much closer to the original scene (a sharper image). Because a photosensor (or film) is basically a 2-dimensional integrator*, the whole thing is linear, so this method works. The only added step which I see is that every lens/sensor system has a different point-spread function, which varies further w/r/t the lens focusing distance and depth-of-field, so you’ll need this data too, but (most importantly) you can get this data at your leisure, either empirically or through modelling. Incidentally, this custom point spread can be also be used to de-blur images with bad focus but no shaking blur.
So that’s my hypothesis anyway. Back in college I did something very similar in a MATLAB class, but my results weren’t so great (because my point spread model turned out to be lousy for the test images I was using). The biggest difference between then and now, though, is number-crunching power. I was working with (iirc) a 256×256 pixel image, and it would have to run all night on a Pentium II to generate a result. Convolution and de-convolution are numerically intensive processes, even when you’re only dealing with single-dimensioned arrays (for example, an audio stream). While the math to do this has been around for some time, the processing power has not. Convolution was long the realm of dedicated processing hardware — DSPs — which are packed full of large parallel adders and shifters. The last few years, though, desktop computer power has increased to the point where something like this is tenable on such a system (the convo operation also lends itself to multiple cores, which is nice). Eventually, we’ll probably be seeing it within cameras themselves.
*k, so actually, because we’re dealing with (quantum) photons, the image on a sensor or film is actually a superposition of probabilities in time. But then again, aren’t we all?