The Multi-Focus Plenoptic Camera

The Multi-Focus Plenotic Camera Todor Georgiev a and Andrew Lumsdaine b a Adobe Systems, San Jose, CA, USA; b Indiana University, Bloomington, IN, USA Abstract Text for Online or Printed Programs: The focused lenotic camera is based on the Limann sensor: an array of microlenses focused on the ixels of a conventional image sensor. This device samles the radiance, or lenotic function, as an array of cameras with large deth of field, focused at a certain lane in front of the microlenses. For the urose of digital refocusing (which is one of the imortant alications) the deth of field needs to be large, but there are fundamental otical limitations to this. The solution of the above roblem is to use and array of interleaved microlenses of different focal lengths, focused at two or more different lanes. In this way a focused image can be constructed at any deth of focus, and a really wide range of digital refocusing can be achieved. This aer resents our theory and results of imlementing such camera. Real world images are demonstrating the extended caabilities, and limitations are discussed. Keywords: lenotic camera, rendering, multile focus 1. INTRODUCTION Photograhy is constantly innovating, exanding the range of its alicability in terms of both image cature and creative ostrocessing. This is esecially true in the age of comutational hotograhy, where the comuter rovides new ossibilities. High dynamic range (HDR), 1 anoramas, 2 stereo 3D 3, 4 and lightfield imaging 5, 6 are some examles of innovations that extend hotograhy beyond its traditional boundaries. Many more can be found in the recent literature (and we remark that each of these fields has its own vast literature our citations here are merely reresentative, not definitive). Integral/lightfield hotograhy in articular rovides the creative rofessional with owerful caabilities. Normally, when a hotograher takes a icture, he or she must make a number of decisions about various camera arameters such as aerture, focus, and oint of view. Then, once the icture is taken, those arameter choices are fixed for that hotograh. If the hotograher wishes to have a icture with a different set of arameter choices, a searate icture must be taken. Instead of caturing one view or 2D image of the scene with a fixed set of arameters, integral hotograhy catures the 4D radiance or lenotic function associated with the scene. Pictures are rendered from the lenotic function comutationally. Most imortantly for the creative rofessional, the decisions about camera settings are made when an image is rendered comutationally from the catured radiance, and not at cature time, allowing the creative rofessional to render an infinite number of different images from the same catured data. The lenotic function, as originally defined in, 7 is a record of the geometric structure of the lightfield as well as its deendence on arameters such as wavelength, olarization, etc. Most work to this oint has exlored urely geometric asects of the lenotic function, enabling hotograhers to comutationally maniulate the focus, deth of field, and arallax of a scene. Less attention has been aid to caturing and using other asects of the lenotic function (with some excetions, e.g., HDR cature in 8 ). In this aer, we analyze in detail the secific case of multifocus radiance cature. This aroach allows us to extend the lenotic deth of field, i.e., the range in which we can create erfectly focused images with a single cature. Our aroach to lenotic samling is based on the focused lenotic camera, 9 which, as the name imlies, reuires that the catured microimages be well-focused. Because of the wave nature of light, diffraction can be minimized only with large microlens aertures, which in turn corresonds to a shallow deth of field. In order to extend the deth of field of the focused lenotic cature, our multifocus sensor has microlenses with different focal lengths, caturing in-focus images at different deths in front of the lenslet array. In the case of multifocus cature, our aroach makes it ossible to comutationally focus on any deth, something that was not ossible with the revious lenotic camera based on a single focal length. 2. BACKGROUND In this section, we briefly review terminology and notation used in the remainder of this aer.

1 1 ( 1, 1 ) 2 2 ( 2, 2 ) Figure 1: Rays are reresented with coordinates and. The sace of all rays comrises the hase sace. T t 1 t 1 f L f (a) Shearing due to translation. (b) Shearing due to refraction by a lens. Figure 2: Translation and refraction by a lens act as shearing transformations in hase sace. 2.1 The Plenotic Function The lenotic function 7 (also called the lightfield 10 or radiance 11 ) is a density function describing the light rays in a scene. Since the radiance is a density function over the ray sace, we describe radiance transformations via transformations alied to elements of the underlying ray sace. Rays in three dimensional sace are reresented as four dimensional vectors: Two coordinates are reuired to describe osition and two are reuired to describe direction. Following the hysics-based convention of, 12 we denote the radiance at a given lane erendicular to the otical axis as r(, ), where describes the location of a ray in the lane and describes its direction (see Figure 1). (These coordinates are also used in otics texts such as. 13, 14 ) For illustrative uroses, and without loss of generality, we adot the convention of a two-dimensional lane in this aer. Translation and refraction by a lens are two fundamental transformations that can be alied to rays. Rays are transformed due to translation a distance t in the direction of the otical axis according to (, ) = ( + t, ), corresonding to a linear transformation x = T t x, where T t = [ 1 t 0 1 ]. (1) Similarly, rays are transformed due to otical refraction of a lens with focal length f according to (, ) = (, 1 f ), the linear transformation for which is x = L f x where L f = [ 1 0 1 f 1 ]. (2) In hase sace, translation and refraction by a lens are both shearing transformations. As shown in Figure 2, translation adds horizontal shear, while refraction by a lens adds vertical shear. The shearing roerty of translation in articular will be an imortant art of the analysis in the body of this aer. Given a radiance r(, ) at the image lane of a camera, an image I() is rendered for a given range of the available values according to I() = r(, )d. (3)

I() = r(, )d I() Figure 3: Each sensor ixel in a traditional camera catures its art of an image by hysically integrating the intensities of all of the rays iminging on it. 1 1 2 2 (a) The rays that converge at a inhole will searate from each other as they travel behind it and can therefore be catured individually by a sensor behind the inhole. (b) Individual inholes samle one osition in the lane, while individual ixels samle different ositions. A single image catured behind the inhole thus samles a vertical strie in the lane (c) An array of inholes samles a grid. Figure 4: An array of inholes can be used to samle the lenotic function by multilexing angular information. 2.2 Plenotic Cameras As shown in Figure 3, a traditional camera catures an image by hysically integrating the intensities of all of the rays iminging on each sensor ixel. A lenotic camera, on the other hand, catures each thin bundle of rays searately. One aroach to searating, and then individually caturing, the rays in a scene is to ut a inhole where the sensor ixel would be while lacing the sensor itself some distance b behind the inhole. In this case, the rays that converge at the inhole will diverge as they roagate behind the inhole, as shown in Figure 4a. The searate ixels in the sensor now cature searate rays. That is, the intensity as a function of osition at the sensor reresents the radiance as a function of direction at the osition of the inhole. To see how the lenotic function can be samled, note that every ray can be uniuely described with a and coordinate, by a oint in the lane. Each ixel in the sensor behind a inhole catures a distinct ray, i.e., it samles a distinct oint in the lane. Building u from the single ixel, a single image catured behind the inhole samles a vertical strie in the lane, while an array of inholes samles the lane. In ractical cameras, ixels are discrete, so a small area in the lane is samled rather than a single oint.

2.3 Limann Sensors From a more abstract ersective, our aroach is a general method of samling the lenotic function with resect to arbitrary arameters (or modes), (including, but not restricted to, the usual four dimensions of 2D osition and 2D angle. The sensor we use to carry out this samling consists of an array of lenslets that form microcameras focused at a given lane. We refer to this sensor as the Limann sensor since it was first roosed by Limann in his 1908 aer. 15 The original Limann sensor is shown in Figure 5. Our generalization is based on introducing different tyes of filters (or other modifiers) into the lenotic function samling rocess. In this regard, the filters serve a function similar to that of a Bayer array filter in a normal sensor, the difference being that the Limann sensor samles the full 4D radiance in otical hase sace, as oosed to conventional sensors samling the 2D irradiance. Figure 5: The Limann sensor caturing the image of a oint A in the world, as an array of dots. The figure is taken from the original aer. 15 Although the ideal inhole makes an ideal ray searator, microlenses are used in ractice instead to gather sufficient light and to avoid diffraction effects. Figure 6 shows a diagram of such a Limann sensor. 16 In the diagram, b is the distance from the sensor to the microlens lane, and a is the distance from the microlens lane to the main lens image lane. The microlens focal length is f; a, b, and f are assumed to satisfy the lens euation 1/a + 1/b = 1/f. Sensor ixels have size δ, and, without loss of generality, we take d to be the microlens aerture and the sacing between microlenses. The form of the Limann sensor that we use in this aer was roosed by Lumsdaine and Georgiev, 9 in which I b () r(, ) r a (, ) δ d = b d a I b () = d b r a( a b, 1 b ) b a Figure 6: Geometry of the Limann sensor for a lenotic camera. The CCD (or CMOS) sensor is laced at a distance b behind an array of microlenses. In one notable limit, b f and a.

δ a b d b δ b d a 1 a d a d a b Figure 7: Radiance samling by the focused Limann sensor at the imaging lane at distance a in front of the microlenses. Here, a, b, and f satisfy the lens euation. The geometry of a single ixel is shown in the uer right. the distance b was chosen not to be eual to f in order to form a relay system with the camera s main lens. Since the microimages are focused in this case, we refer to the sensor as the focused Limann sensor and a camera using it as the focused lenotic camera. In this case, as derived in, 9 we have the following exression for how the image catured on the sensor (I b ) samles the radiance at the microlens front focal lane (r a ): I b () = d b r a( a b, 1 ). (4) b This samling is shown grahically in Figure 7. As shown in the figure, and as reorted in, 9 each focused Limann microimage catures a slanted strie of the radiance at distance a in front of the microlens array. 3. MULTIFOCUS As discussed in revious sections, the focused lenotic camera uses an array of microlenses to re-image arts of the image lane of the main camera lens onto the CCD. This makes u the Limann sensor that samles the 4D radiance at a certain lane in front of the microlenses. For the urose of comutational focusing, the deth of field of those microcameras needs to be as large as ossible: The range of comutational focusing is defined by that deth of field. At the same time, there are fundamental otical limitations to the amount of deth of field that can be reached. Large deth of field is achieved in ray otics with small aertures, which in turn leads to high F-number and low light efficiency. This aroach also has a fundamental limiting constraint: At high F-numbers diffraction lays an increasing role, blurring the images. In order to rovide a sensor that has both large aertures and a large deth of field, we roose a sensor design based on an array of interleaved microlenses having different focal lengths, so that they are focused at two or more different lanes. If roerly designed, the deths of field corresonding to differently focused microlenses will cover the entire sace in front of the sensor (i.e., the comlete interior of the camera body). Given an arbitrary oint in the world, at least one of the sets of microlenses would be sharly focused on it. Now these microlenses could work at fully oen aertures (low F-numbers), imlementing a camera that has the highest resolution ossible, and at the same time using all the light available.

Figure 8: The multifocus Limann sensor caturing two tyes of images focused at two different deths. 4. RADIANCE CAPTURE WITH A MULTIFOCUS LIPPMANN SENSOR In this section we analyze the radiance cature model in the case of an array of microlenses with a common CCD. This will motivate the need for a multifocus sensor. Caturing the full 4D radiance in object sace is easily achieved with an array of identical microcameras. Each microcamera is focused on object sace and records a different overlaing view of a iece of the scene. Note that at this overla, two different cameras actually reresent stereo views of the 3D object. Such a device for caturing the light intensity as a function in ray sace (lenotic function), was first roosed in. 15 An otical hase-sace diagram for the radiance as a function of ray osition and angle in object sace, recorded by the Limann sensor, was shown Figure 7. The size of the image created in a microcamera is d. This size is constrained by the main lens aerture, which needs to be chosen aroriately by matching the F-number of the main lens system to that of the microlenses. 17 The distances a and b define the minification of the microcamera. Considering this fact, the object-sace size of the image for our microcamera is d a b. The viewing angle of the microcamera is that size of the object divided by the distance to the object a, which can be shown to eual d b. Each oint is viewed by the microcamera from a range of angles deending on the microlens aerture. That range of angles is d a. Note that different oints in object sace are seen by the camera at different angles. This is reresented by the tilt of the line of ixels deicting the microimage in Figure 7, and measured by the sloe 1 a. Figure 9 (right) shows how the Limann sensor samles the radiance at a lane different from the focal lane of the microlenses. Such a lane is a translation from the focal lane and so the samled radiance is transformed by shearing. In other words, Figure 9 (right) is related to Figure 9 (left) by a shearing transformation. Note that since the ixels are now tilted in hase sace, rendering from them (which is an integration in the vertical direction) will mix content from neighboring ixels, giving a blurred result. To enable the Limann sensor to samle any location with vertical ixels, we use a multifocal version that interleaves microlenses with different focal lengths. The result is that one set of microlenses will be in focus for regions of the scene for which other sets are not in focus. The hase-sace interretation of this interleaving is shown in Figure 10. Focal lengths are chosen such that the distances from the microlens array to the focal lanes are a and a/3. 4.1 Considering wave otics There is an imortant constraint on satial and angular resolution, the diffraction limit, that goes beyond ray otics and is fundamentally dictated by wave otics. Because of diffraction effects, arbitrarily thin ixels in satial coordinates are unrealistic. Rather (as is well known), the smallest diffraction-limited sot resolved by a camera is 1.22λ a d, where a d reresents the object-side F-number of the microlens in our notations. In the case of a 1D image (2D ray sace), the diffraction limited sot size is λ a d. Considering that the vertical (angular) size of our ray-sace ixels is d a, and the horizontal size cannot be less than λ a d, we come to a fundamental fact that the volume of a ixel in ray sace (i.e., the roduct of satial and angular extent of

Figure 9: Left: The Limann sensor samling attern at the microlens front focal lane (same as Figure 7). Right: The same Limann sensor, now samling the radiance at a osition different from the microlens focal lane. The translation transformation shears the samling attern shown on the left to roduce the samling on the right. Figure 10: By interleaving microlenses of different focal lengths, one set of microlenses will be in focus when the other is not. Left: Samling attern at distance a in front of the microlenses. Right: Samling attern at distance a/3.

Figure 11: The array of microlenses used to cature the image in Figure 3. Notice the different focal lengths, which are couled with the microlens diameter for our microlens array. Figure 12: The array of microimages catured with the microlenses in Figure 11. Notice the differently focused microimages, interleaved. a ixel) cannot be less than the wavelength λ. This fundamental fact has not been discussed in the context of lenotic cameras. The above diffraction constraint limits the deth of field of the microcameras in a Limann sensor. Only a certain range around a given lane of focus is imaged sharly. Points that remain outside that deth of field would be out of focus, and would aear blurry. What we achieve with the multifocus Limann sensor is that for any oint in the world at least one microimage is exactly in focus. If that s the case, any object would be rendered in focus (using the aroriate stitching and blending techniues). This motivates our choice of different focal lengths for different microlenses. 5. EXPERIMENTAL RESULTS In this section, we demonstrate interesting results that can be achieved in integral hotograhy with a multifocus microlens array. Our oint here is that all microimages need to be exactly in focus so that any object could be rendered in focus using the aroriate stitching and blending techniues. The roblem with this is that microcameras have large but finite deth of field, so all-in-focus imaging is not really ossible. That s why we have chosen the aroach of a microlens array with interleaved lenses of different focal lengths. A icture of our microlenses is shown in Figure 11. Notice the two different tyes of lenses having different focal lengths. An image catured with these microlenses is shown in Figure 12. It is imortant to notice how every other microimage in the array is in focus or out of focus. We have alied lenotic rendering to the microimages in Figure 12 to roduce the left and right image in Figure 13. Rendering is best focused on the car (at otical infinity). Notice that the left image aears out of focus. That s because we

Figure 13: Images rendered in focus from the microimages in Figure 12. The image on the left is blurry. Figure 14: Images rendered in focus from the microimages catured with the lenses in Figure 11. Notice that the flowers on the left are shar. They are rendered from the same tye of microlenses as in Figure 13 left. The flowers are close to the microlenses, so they are blurry. Also, the aertures are smaller, so a wider range of deths are in focus. have rendered it from microimages that are themselves blurry. Their microlenses are focused at a lane closer than infinity. Exactly the oosite henomenon haens in Figure 14, which is rendered from the same inut image. The left image comes from the tye of microlenses used to create the left image in Figure 13. They are focused close in and the flowers are shar. The right image is blurry because the microlenses are focused farther away. Next, we show the two comlete images, achieving best focus on both close and far objects. Such wide range of re-focusing is ossible only because of the different focal lengths of our microlenses. The full catured multifocus radiance is shown in Figure 17. 6. CONCLUSION The seemingly ever-increasing resolution of image sensors oens u fascinating ossibilities for the kinds of rich image data that might be catured by a camera. In articular, additional image dimensions, such as multile views or multile modes can be catured and then used comutationally to create an infinite variety of rendered images. In this aer, we exlored rich radiance cature for multile deths of microlens focusing. With the multifocused Limann sensor we were able to extend the lenotic deth of field to the whole scene, otentially covering every deth in focus. This enables the full ower of refocusing and stereo that can be reached in ray and wave otics. This is just one of the less trivial examles that demonstrate different asects of what is ossible with a device as versatile as the Limann sensor. Much more can be exected, including HDR, olarization, multisectral color, IR and UV and X-rays, and others. As sensor resolutions continue to grow even further, we look forward to being able to cature yet richer lenotic data in new and unexected so far ways, that give us the chance to render even richer images. REFERENCES 1. E. Reinhard, G. Ward, S. Pattanaik, P. Debevec, High Dynamic Range Imaging: Acuisition, Dislay, and Image- Based Lighting, Morgan Kaufmann, San Francisco, CA, USA, 2005.

Figure 15: Image rendered with the car in focus. Figure 16: Image rendered with the flowers in focus. 2. R. Szeliski, H.-Y. Shum, Creating full view anoramic image mosaics and environment mas, in: SIGGRAPH 97: Proceedings of the 24th annual conference on Comuter grahics and interactive techniues, New York, NY, USA, 1997,. 251 258. 3. S. T. Barnard, M. A. Fischler, Comutational stereo, ACM Comut. Surv. 14 (4) (1982) 553 572. 4. M. Z. Brown, D. Burschka, G. D. Hager, Advances in comutational stereo, IEEE Trans. Pattern Anal. Mach. Intell. 25 (2003) 993 1008. doi:10.1109/tpami.2003.1217603. 5. M. Levoy, P. Hanrahan, Light field rendering, ACM Trans. Grah. (1996) 31 42. 6. S. J. Gortler, R. Grzeszczuk, R. Szeliski, M. F. Cohen, The lumigrah, ACM Trans. Grah. (1996) 43 54. 7. E. Adelson, J. Bergen, The lenotic function and the elements of early vision, in: Comutational models of visual rocessing, MIT Press, Cambridge, MA, 1991.

Figure 17: The full catured multifocus radiance. 8. T. Georgiev, A. Lumsdaine, S. Goma, High dynamic range image cature with lenotic 2.0 camera, in: Signal Recovery and Synthesis, Otical Society of America, San Jose, CA, 2009. 9. A. Lumsdaine, T. Georgiev, The focused lenotic camera, in: IEEE International Conference on Comutational Photograhy (ICCP), 2009. 10. M. Levoy, P. Hanrahan, Light field rendering, Proceedings of the 23rd annual conference on Comuter Grahics and Interactive Techniues. URL htt://ortal.acm.org/citation.cfm?id=237170.237199 11. F. E. Nicodemus (Ed.), Self-study manual on otical radiation measurements, National Bureau of Standards, 1978. 12. V. Guillemin, S. Sternberg, Symlectic techniues in hysics, Cambridge University Press, New York, 1985. 13. A. Gerrard, J. M. Burch, Introduction to Matrix Methods in Otics, Dover Publications, Mineola, NY, 1994. 14. K. B. Wolf, Geometric Otics on Phase Sace, Sringer, New York, 2004. 15. G. Limann, Ereuves re versibles donnant la sensation du relief, J. Phys. 7 (4) (1908) 821 825. 16. G. Limann, Ereuves re versibles. Photograhies inte grales., Acade mie des Sciences (1908) 446 451. 17. R. Ng, Digital light field hotograhy, Ph.D. thesis, Stanford University, Stanford, CA, USA, adviser-patrick Hanrahan (2006).