The web is abuzz about this video, showing a sneak-preview of a (possibly) upcoming photoshop plugin/filter. It demonstrates a neat concept — deblurring an image using algorithmic processing. How does it work? Well, I’m not exactly sure, but here’s my hypothesis:
First, it analyzes the “shape” of the camera shake — probably by isolating a point-source highlight (using either high-pass filtering or image matching, or both) — then, it uses this shape to generate a path along which it maps point spread functions. A point spread function is sort of like an impulse response function, with the difference being that while an impulse response tracks a single-dimensioned value with respect to time, a point spread function gives you the response of an imaging system (2-D) to a point source. They’re both basically the same idea, though, and you can apply the same techniques to both. Further, by generating this path, you can map the point spread function in terms of space (because it’s two-dimensional) and time. And this is where it gets really cool:
Just like an LTI impulse response, you can deconvolve the output (the blurry image) with your new mapped-in-time point spread function, and get something much closer to the original scene (a sharper image). Because a photosensor (or film) is basically a 2-dimensional integrator*, the whole thing is linear, so this method works. The only added step which I see is that every lens/sensor system has a different point-spread function, which varies further w/r/t the lens focusing distance and depth-of-field, so you’ll need this data too, but (most importantly) you can get this data at your leisure, either empirically or through modelling. Incidentally, this custom point spread can be also be used to de-blur images with bad focus but no shaking blur.
So that’s my hypothesis anyway. Back in college I did something very similar in a MATLAB class, but my results weren’t so great (because my point spread model turned out to be lousy for the test images I was using). The biggest difference between then and now, though, is number-crunching power. I was working with (iirc) a 256×256 pixel image, and it would have to run all night on a Pentium II to generate a result. Convolution and de-convolution are numerically intensive processes, even when you’re only dealing with single-dimensioned arrays (for example, an audio stream). While the math to do this has been around for some time, the processing power has not. Convolution was long the realm of dedicated processing hardware — DSPs — which are packed full of large parallel adders and shifters. The last few years, though, desktop computer power has increased to the point where something like this is tenable on such a system (the convo operation also lends itself to multiple cores, which is nice). Eventually, we’ll probably be seeing it within cameras themselves.
*k, so actually, because we’re dealing with (quantum) photons, the image on a sensor or film is actually a superposition of probabilities in time. But then again, aren’t we all?
Hey! You don’t need a supercomputer facility to de-blur images, just use your brain! The military has used computers to do this for decades. First, be clear that a blurred image is fundamentally different from a fuzzy one. If an image is fuzzy (out of focus) it is difficult to impossible to recover the missing information. If it is blurred, it can be considered as an image moving in time.
The military makes its Keyhole spy satellites as stable as possible, but at the hyper magnifications they use, images are still moving during exposure (blurred). They use (Fast) Fourier Transformation to move the data to the time domain, stack the data in time, and return it to the data domain. Unfortunately, the fact that the data are finite in size means that they have a beginning and an end. These ends are, in themselves, data which create image artifacts. Thus, you have the Hanning Window, the Hamming Window, and many others which attempt to "fade in" and "fade out" the data sample in a way which minimizes these artifacts. These artifacts appear to be parenthesizes around bright highlights in Keyhole images.
Well folks, you can do all that with the visual neural networks in your head! I have 1.5-2.0 diopters of astigmatism in my corneas. They generate an image on my retinas very similar to a blurred image. However, especially if I concentrate hard on making my vision sharp, I can do it. I can still see well enough to pass the driving vision test if I concentrate very, very hard. However, stars I see at night have parenthesizes around them! Hey, I’m not the first person to evolve this technique. Just look at Vincent Van Gogh’s "Starry Night" and you will see those parenthesizes in his painting. The dude had astigmatism just like mine (and a few zillion other folks).
Ain’t the human brain one amazing machine!
The work is very likely based on this paper:
http://www.cse.cuhk.edu.hk/~leojia/projects/motion_deblurring/index.html
🙂 “Enhance!”
that video was a little blurry 😉
I believe you’re right, John. The key thing that the internet, all abuzz, doesn’t realize is that you’ll never extract information that is lost due to the diffraction limit and the finite sample of the system by the sensor. This will not let you take a low res photo and "enhance" it a la CSI or 24.
I would assume that they’re doing some sort of blind deconvolution, because the technique has to be agnostic with respect to the PSF aberrations. But this field is a whole area of research unto itself.
The great thing with modern processors and graphics cards is a lot of these FFT techniques are becoming computationally feasible for a modest desktop system.
@ultrafastx: Agreed. This method will never amount to CSI-style “image enhancement”, and I don’t think they (Adobe) are claiming it will be, but that’s the buzz — to be fair, I don’t see them out there debunking it either. So it goes with every “miracle” technology, I suppose.
I’m not sure if they’re doing blind convo or using modeled PSFs. They could just use a general Gaussian PSF or something. The PSF is less critical for camera shake type blurring anyway — the critical thing is the shake path extraction, which has limited resolution.
Eventually this will all be taken care of with multi-axis accelerometers in the camera body, which will record the shake pattern and store it as metadata along with the image file. This has the added advantage of not just recording the visible path, but also any “dwell”, which would be very difficult to detect from the output image.
Either that, or we’ll all have gyro-stabilized hover-cameras which take perfect pictures every time. 🙂
Plus I guess they’re using the parallel processing power of GPUs (OpenCL/CUDA), the problems Photoshop faces benefit greatly from that.