Coded Aperture Projection

Max Grosse and Oliver Bimber

Bauhaus-University Weimar

1 Motivation and Related Work

Modern video projectors are remarkable devices that can display large imagery with a high resolution, brightness, and contrast. The latest high-end models even incorporate light sensors for controlling auto-focus and auto-iris objective lenses. Auto-iris lenses can greatly enhance the temporal contrast of projected images by adjusting the aperture opening to the average brightness of the displayed content. Their flexibility and low cost make projectors irreplaceable for many applications including professional presentations, home entertainment, scientific visualization, as well as museum and art installations. We envision future generations of these displays as fully integrated systems with cameras, dynamically adjustable apertures and intelligent control mechanisms.


hardware

Figure 1: Left: A static coded aperture integrated into the objective lens of a projector improves projector defocus compensation through inverse filtering. Right: Placing a transparent liquid crystal array at the aperture plane of a projector lens allows encoding the aperture’s mask pattern dynamically – depending on the perceivable frequencies of the displayed images.

With coded aperture projection, we present solutions for taking projectors to the next level. By placing static as well as dynamically coded masks at a projector’s aperture plane we show how the depth-of-fieldof a projection can be greatly enhanced. This allows focussed imagery to be shown on complex screens with varying distances to the projector’s focal plane, such as projection domes as in planetariums or cylindrical canvases as in IMAX theaters. We demonstrate that static as well as adaptive dynamic apertures outperform previous methods of defocus compensation for objective lenses with static circular apertures. In addition, our dynamic apertures can perform the type of contrast enhancement employed by common auto-iris projection lenses, and also produce high-quality de-pixelated images. The latter is beneficial for rear-projection TV sets and other close-view displays. Several approaches have been proposed to increase the depth of field of conventional projectors. Multiple projector units with differently adjusted focal planes but overlapping image areas ([1]) can be applied to increase the depth-of-field of a projection on the cost of an uneconomically complex system. Compensation of defocus using a single device is also possible by computing and projecting a compensation image that neutralizes the optical blur. As images can be digitally sharpened by convolving them with the inverse of a known blur function (called deconvolution, optical defocus of a video projector can be compensated in the same way[2, 5]. The blur function is defined by the aperture and referred to as point spread function (PSF). The PSF produced by a projector applying a regular circular aperture, is Gaussian. Due to its low-pass deconvolving Gaussian PSF, it sets clear limitations in terms of recovering fine image details. This problem has been addressed ([8]) by re-formulating the computation as an optimization problem that constraints the solution to the actual dynamic range of the projector while minimizing local optical defocus. All of these approaches share two limitations: Firstly, they are far from being able to reach real-time performance – even not if the time necessary for measuring the local blur functions is not considered. This prevents them from displaying dynamic content. Secondly, the amount of defocus that can be compensated through deconvolution is clearly limited when the PSF is Gaussians. Ringing artifacts will dominate if the blur becomes too large. In fact, only little defocus can be compensated efficiently with such techniques. Coded aperture imaging has been presented recently in the context of computational photography [4, 7], and has been applied previously in astronomy and medical imaging. In contrast to conventional apertures, coded apertures (i.e., apertures that encode a more complex binary or intensity pattern, rather than a simple round opening) in cameras enable post-exposure refocusing, reconstructing scene dept, or recording light fields. We introduce coded aperture projection, and show that if coded apertures are applied instead of simple circular ones, ringing artifacts as a result from deconvolution can be reduced and more image details can be recovered from optical defocus. Furthermore, our implementation uses the graphics hardware for computation and thus achieves interactive frame-rates of currently 8-16 fps at XGA resolution.


Overview of results

Figure 2: Image projected in focus (left), and with the same optical defocus (screen located at 2m distance to focal plane) in several different ways: with circular aperture untreated and deconvolved with Gaussian PSF, with static broadband coded aperture and with dynamic coded aperture. All images are deconvolved with the corresponding aperture kernels (shown in the sub-figures). The four images on the left have been increased in brightness by factor 1.7 to better visualize the differences in focus. The two images on the right have not been altered. The image at the right displays the opening that is needed with a circular aperture to achieve the same depth-of-field as produced with the adaptive coded aperture on the left resulting in a great loss of brightness.

2 Coded Aperture Projection Principle

As explained above, optical defocus of a projected image can be mathematically described as a convolution of the original image with a filter kernel that corresponds to the PSF of the aperture. The scale of the kernel is directly proportional to the degree of defocus: ip = ks⊗id  ,where id  is the displayed image, ks  the aperture kernel at scale s  , and ip  the optically blurred projection. Deconvolution will digitally sharpen an image and consequently compensate optical defocus: id = k-s1⊗ ip  Here k-s1  is the inverse aperture kernel. Convolution and deconvolution can be modeled easier in the frequency domain, rather than in the spatial domain, where a convolution corresponds to a multiplication Ip = Ks ⋅Id  and deconvolution equals a division Id = Ip∕Ks.   ′
I , I  , and Ks  are the Fourier transforms of  ′
i , i  , and ks  respectively. In general, this principle is also applied by related approaches, such as [5] and [2]. Once the deconvolution has been computed in frequency domain, the result I  is inverse Fourier transformed to spatial domain and projected as compensation image. Low magnitudes in the Fourier transform of the aperture kernels lead to divisions by small values in frequency domain, and consequently to intensities in spatial domain that exceed the displayable range of the projector. These intensities are clipped and therefore the corresponding frequencies are not considered – which finally results in visible ringing artifacts. As already mentioned, this is the main limitation of previous projector defocus compensation approaches, since in frequency domain the Gaussian PSF of circular apertures is a low pass and does contain a large fraction of low Fourier magnitudes. Applying only small kernel scales in combination with wide aperture openings, on the one hand, will reduce the number of low Fourier magnitudes (and consequently the ringing artifacts) – but will also lead to only minor focus improvements. Using narrow aperture openings (up to pinhole size), on the other hand, will naturally increase the focal depth, but will decrease the light throughput significantly. To overcome this problem, we integrate a static coded aperture inside a projector’s objective lens (cf. figure 1-left). The aperture is more broadband in frequency domain and its Fourier transform has initially less low magnitudes than a circular aperture. Consequently, more frequencies are retained and more image details are reconstructed with less ringing artifacts. A comparative example of defocus compensation with and without coded aperture is shown in figures 2). Increasing the depth-of-field with such a static broadband aperture, however, comes at the cost of decreased light transmission, which is one of the most crucial aspects of all projector-based display systems. Therefore, we also present an approach for computing and displaying a dynamic aperture pattern, based on the analysis of the projected image content and on limitations of human visual perception. This analysis employs an intuitive model of the human visual system (HVS) and allows us to determine and filter out spatial frequencies of the input image that cannot be perceived by a human observer. An adaptive aperture can then be computed by maximizing its light transmission while preserving the perceivable frequencies, rather than being restricted to support a constant and broad frequency band. We will show that our adaptive dynamic apertures produce better results than previous methods with the same or even an increased amount of light transmission. The sensitivity variations of the HVS according to spatial frequencies fx,fy  are well studied and mathematically defined by the contrast sensitivity function (CSF) Scsf(fx,fy)  . Various definitions of this function appear in the literature; we use the one described in [3]. The CSF depends on the viewing conditions only, not on the actual content. The sensitivity is defined as the inverse of the contrast required to produce a threshold response Scsf(fx,fy)= 1∕Cthresh(fx,fy)  , with Cthresh  being the threshold contrast. Using the definition of Michelson contrast, this is given as Cthresh(fx,fy)=ΔL (fx,fy)∕Lmean  , where ΔL  is the necessary luminance difference given in     2
cd∕m  and Lmean  is the mean image luminance. An absolute luminance threshold map can be computed as:

ΔL (fx,fy)= --Lmean--
          Scsf(fx,fy)
(1)

The threshold map is show in figure 3. For computing our dynamic apertures, we wish to eliminate all frequencies that do not contribute to perceivable image fidelity. The Fourier transform magnitudes of an image converted to absolute luminance values L(fx,fy)  correspond to the amount of spatial frequencies in the image. With this information, we can calculate a binary importance mask for the image frequencies as:

          {
M (fx,fy)=   1, |L (fx,fy)|≥ sΔL (fx,fy)
            0, otherwise
(2)


Adaptive Thresholding

Figure 3: Adaptive thresholding: The original image is converted to absolute luminance values. A binary frequency importance mask can be computed by thresholding the image frequencies according to a model of the HVS. The difference between original and filtered image is not perceivable under specific viewing conditions. Scanlines of the image’s Fourier transform and the threshold map are shown in the plot.

As illustrated in Figure 3, filtering the Fourier transform of an image with the binary importance mask M  allows us to remove spatial frequencies that do not modify the perceived image content for specific viewing conditions that include a fixed adaptation luminance, viewer position, and screen size. Now let’s take a look at how to compute the dynamic aperture itself. We define the aperture as the sum of its individual pixels a (x,y)=∑N   aipi
         i=1  , where pi  is the pixel p  at xi  and yi  (with a total of N  pixels) and ai ∈[0,1]  is its transmissivity. The Fourier transform of the aperture is F {a(x,y)}= A(fx,fy)=∑Ni=1aiPi  . Our dynamic apertures should support all important frequencies in the input image with a minimal variance of their Fourier transform. In addition, they should maximize light throughput. The variance of the aperture’s modulation transfer function (MTF) is a measure for how different frequencies are attenuated. Minimizing it for all important frequencies ensures that they are all supported. A similar criterion was employed in [6] for a one-dimensional binary temporal mask. The minimization can be mathematically expressed as an optimization problem:

minimize∥MBa - b∥2,
   a            2
(3)

where b  is a vector containing only 1s and a > 0
 i  are the aperture pixel intensities. We do not enforce the pixel intensities to be below 1 in this formulation, but simply scale the resulting values so that the maximum is 1. This is equivalent with a scaling of the MTF and does not affect the variance criterion. M  is a diagonal matrix containing the binary frequency importance mask values described above. B  is a matrix with orthogonal basis functions in its columns which represent the optical transfer function (OTF) of the N  individual aperture pixels P
 i  . This results in a linear system of the form MBa = b  . Solving this heavily over-determined system in a least-squared error sense with the additional constraint to minimize ∥a∥22  will minimize the variance of the Fourier transform of the aperture for important frequencies. This formulation also intrinsically maximizes the light transmittance of the resulting aperture, because a small squared 2-norm of a  (ai ≥ 0  ) also minimizes the variance of the normalized pixel intensities in the spatial domain. The linear system can certainly be solved with standard approaches, such as the conjugate gradient method for the normal equations or non-negative least squares solutions. However, this would not allow sufficiently high frame rates on commonly available computer hardware for standard image resolutions of 1024×768  and higher. Thus, we propose to solve the system using the pseudo-inverse matrix.Computing solutions of linear problems using the pseudo-inverse minimizes the least-squared error and the 2-norm of the resulting vector, thus solving the variance and the light transmittance problem at the same time. Reformulating our problem results in     +  +
a= B M  b  , where +  denotes the pseudo-inverse matrix. Since M  is a binary diagonal matrix then M+  =M  . B  comprises the set of orthogonal Fourier basis functions as its columns, thus B+ = B* . We need to employ the conjugate transpose B* , because B  is complex, hence:

a= B*Mb
(4)

In this formulation B* can be easily pre-computed. During run-time we solve the system with a matrix-vector multiplication. Since the solution a  can contain negative values we clip these values and scale the result so that the maximum value is 1.

3 Implementation

For the static coded aperture, the selected near-optimal aperture code was printed on transparencies (Kodak film didn’t resist the heat), and was inserted into the objective lens at the projector’s aperture plane (cf. 1-left). We applied the near-optimal 7x7 broadband pattern that has been found in [7] using an optimization approach.Static coded aperture it’s low cost and easy manufacturing, however limited as previously explained. Adaptive coded apertures lead to a higher image quality, but are slightly more complex. For implementing them, we integrating a programmable liquid crystal array (LCA) into the projector’s aperture plane, as illustrated in figure 1-right.For achieving interactive compensation rates, we have developed an optimized software algorithm: In principle, each pixel (or a very small image region) of the original image would have to be deconvolved individually depending on its defocus. Our method partitions the image into a non-uniform grid, based on the actual distribution of the kernel scales and on the capabilities of the graphics hardware being used. The entire partitioning operation can be carried out off-line, since it is independent from the image content. The Fourier transformation and its inverse require the larges amount of computation time for deconvolution. We apply CUDA’s GPU implementation of the Fast Fourier Transformation (FFT).First, we measure the efficiency of the entire deconvolution –including FFT, division and IFFT– for each possible patch size directly on the GPU, and compute the average time that is required to process one pixel in each case. In theory, this should be in the order of log(N )  for patches with N  pixels. In practice, however, the overhead of hardware and software specific implementations of the FFT/IFFT (e.g. through caching, etc.) can be significant. We start a simple asymmetric quad-tree subdivision until we reach a predefined lowest level of a highest partitioning resolution. For each atom patch, we look up the measured efficiency that corresponds to its size. When traversing the quad-tree bottom-up, we successively merge patches in each level if this leads to more efficient results. We look up the efficiency of each merge possibility based on their patch sizes and compare them with the total efficiency of the subdivision achieved for the same area in the previous level. If one merge possibility becomes more efficient than the previous subdivision, it will be used, its total efficiency is computed and passed to the next quad-tree level for supporting upcoming merge decisions. In contrast to the above image partitioning, which depends on the defocus values rather than on the image content and is therefore carried out offline, the following deconvolution steps are processed entirely on the GPU for each frame. Since the perception of focussed details is optioned mainly from the image luminance, we apply deconvolution to the luminance channel rather than to the RGB channels. In the next step, the partitioning result that has been pre-computed is used for dividing the luminance channel of the image into the desired patch structure. As mentioned earlier, CUDA’s FFT implementation is applied to each patch in this array. We then divide all Fourier transformed patches by Fourier transformed aperture kernels of different scales to perform the deconvolution. Based on the partitioning results, each patch can have a different number of scale levels while the individual scale values can locally vary.The final steps reverse the initial steps.Each pixel’s final luminance value is selected only from the patch that was deconvolved with the necessary aperture scale (i.e., the scale that corresponds to the pixel’s amount of defocus). The patches are then blended in spatial domain (i.e., in image space) and the new luminance values are recombined with the original chrominance values. The resultant compensation image is finally projected. Additional steps are required in case a dynamic coded aperture is used. First of all, the LCA that is currently applied in the prototype is limited to only one bit depth, thus a simple binarization has to be carried out. As previously described, the aperture can be computed with a simple matrix multiplication. To reach interactive framerates, the  *
B matrix is precomputed and uploaded onto the graphics hardware memory. Using NVIDIA’s BLAS implementation for CUDA, the matrix multiplication can be carried out directly on the graphics hardware, and we benefit from a parallel SIMD processing of the GPU. For this, the current input image is uploaded to the graphics hardware memory then the Fourier transform is calculated and the binary importance mask for the image is determined. The resulting importance mask is multiplied with B* , resulting in the aperture mask. Finally, the mask is binarized and rendered to the LCA. The projected image is deconvolved as explained earlier, before being displayed.

4 Results and Conclusion

With known parameters of the projector’s objective lens and adjusted aperture pattern, we measure the PSF’s scale in a deconvolved image that is projected. This can be done by finding the best match between the camera-captured projection and different simulated versions of the original image that is convolved with multiple scales of the PSF, as explained in [5]. This allows us to automatically measure the pixel-individual defocus on the screen and drive the corresponding kernel scales.


F-Number table

Figure 4: f  -number of equivalent circular apertures that produces the same depth-of-field (ˆf∕#  ) and light throughput (f˜∕#  ) as corresponding adaptive coded aperture.

With the determined scales, we compute the f  -numbers of an objective lens with a circular aperture (and constant focal length) that would lead to the same depth-of-field (fˆ∕#  ) or the same light throughput (˜
f∕#  ) as the corresponding coded aperture. To achieve the depth-of-field of the adaptive coded aperture that is used for displaying the “lenna” image in figure 2, for instance, an fˆ∕ 7.7  stop is needed. For achieving the same light throughput, however, a f˜∕ 3.7  stop would be required. In terms of light throughput, the gain is approximately         (    )
---2√- log  ˆf∕˜#-
log( 2)    f∕# = x4.3. The table in figure 4 shows that the depth-of-field versus light throughput property of unscaled adaptive coded apertures is in almost all cases significantly better than the application of broadband masks or a purely digital defocus compensation. Therefore, coded apertures (and in particular adaptive coded apertures) outperform static circular apertures. The corresponding input images are shown in figure 5.


Results

Figure 5: Different input images result in calculation of different optimale aperture patterns. Each of the three image trippels are perception optimized and have been computed for viewing at a minimal distance of 50 cm when being displayed at a maximum diagonal of 21 cm on the screen. Possible artifacts could only be perceived when observing at closer distances or larger sizes.

Our technique is also useful for planar screens that do not require a large depth-of-field: Defocussing the projector optically to make the pixel structure vanish, and applying deconvolution to recover the image details leads to better image quality. This is known as projector de-pixelation, and can be applied for close-view displays with limited resolution, such as rear-projected TV sets. Our technique enhances projector de-pixelation significantly, as shown in figure 6 (left half). For video frames with significantly different brightness, our dynamic aperture can be scaled with respect to the mean image brightness for an increasing temporal contrast as conventional auto-iris projection lenses (cf. figure 6-right half).


More results

Figure 6: Left: Using adaptive coded aperture projection for high quality projector de-pixelation. Right: Reducing the light transmission through the LCA by scaling the adapted aperture mask with respect to the average image luminance leads to an improved temporal contrast. This is similar to auto-iris objective lenses. The frames are all from the same video sequence. They are displayed through a static circular aperture (right-top) and a luminance scaled adaptive aperture (right-bottom).

The frame-rate that we can currently achieve with our approach, depends on the number of required defocus scales. Using the static coded aperture, frame-rates of 12-16 fps are possible. Due to a higher computational demand in case of the dynamic adaptive coded aperture, only a frame-rate of 8 fps is possible so far. These frame-rates are all measured using an NVIDA GeForce 8800 Ultra graphics board. This is a clear limitation, but will improve with next generation graphics hardware, or with customized integrated image processors. The main limitations of our approach are currently imposed by the employed LCAs. The low transmittance (only 30% when completely transparent) of current LCAs, for instance, results in a tremendous loss of light. Therefore, we trade light throughput for depth-of-field. As spatial light modulators (SLMs), such as a high contrast continuously valued LCA with higher transmittance, or a reflective SLM, such as a DMD, become more widely available we expect better results with these displays. Being able to use intensity masks will not only improve defocus compensation and de-pixelation, but will also allow the control of temporal contrast by scaling the transmittance intensity rather than the size of the aperture. This, however, requires higher contrast LCAs and film material. We also believe, that high brightness at low power consumption and heat development will become feasible with light engines that apply upcoming LED technology.

References

[1]   Oliver Bimber and Andreas Emmerling. Multifocal Projection: A Multiprojector Technique for Increasing Focal Depth. IEEE TVCG, 12(4):658–667, 2006.

[2]   Michael S. Brown, Peng Song, and Tat-Jen Cham. Image Pre-Conditioning for Out-of-Focus Projector Blur. In Proc. IEEE CVPR, volume II, pages 1956–1963, 2006.

[3]   S. Daly. The Visible Differences Predictor: An Algorithm for the Assessment of Image Fidelity. In A.B. Watson, editor, Digital Image and Human Vision, pages 179–206. Cambridge, MA: MIT Press, 1993.

[4]   Anat Levin, Rob Fergus, Frédo Durand, and William T. Freeman. Image and depth from a conventional camera with a coded aperture. ACM Trans. Graph. (Siggraph), 26(3):70, 2007.

[5]   Yuji Oyamada and Hideo Saito. Focal Pre-Correction of Projected Image for Deblurring Screen Image. In Proc. IEEE ProCams, 2007.

[6]   Ramesh Raskar, Amit Agrawal, and Jack Tumblin. Coded exposure photography: motion deblurring using fluttered shutter. ACM Trans. Graph., 25(3):795–804, 2006.

[7]   Ashok Veeraraghavan, Ramesh Raskar, Amit Agrawal, Ankit Mohan, and Jack Tumblin. Dappled photography: mask enhanced cameras for heterodyned light fields and coded aperture refocusing. ACM Trans. Graph. (Siggraph), 26(3):69, 2007.

[8]   L. Zhang and S. K. Nayar. Projection Defocus Analysis for Scene Capture and Image Display. ACM Trans. Graph. (Siggraph), 25(3):907–915, 2006.