Neurophysiologically-motivated sensor fusion for visualization and characterization of medical imagery

Neurophysiologically-motivated sensor fusion for visualization and characterization of medical imagery Mario Aguilar Knowledge Systems Laboratory MCIS Department Jacksonville State University Jacksonville, AL 36265 marioa@ksl.jsu.edu Aaron L. Garrett Knowledge Systems Laboratory MCIS Department Jacksonville State University Jacksonville, AL 36265 aarong@ksl.jsu.edu Abstract - We describe an architecture for the fusion of multiple medical image modalities based on the organization of the color vision system in humans and primates. Specifically, the preprocessing of individual images and the fusion across modalities are based on the neural connectivity of retina and visual cortex. The resulting system enhances the original imagery, improves information contrast, and combines the complementary information of the various modalities. The system has the ability to both enhance and preserve important information. In addition, the fused imagery preserves the high spatial resolution of modalities such as MRI even when combining them with poor resolution images such as SPECT scans. Results of fusing various modalities are presented, including: a) fusion of functional MRI images, b) fusion of SPECT and MRI, c) fusion of visible and infrared endoscopic images. We conclude by discussing our recent results on utilizing multi-modality fused signatures for segmentation and pattern recognition. Keywords: Image Fusion, medical image processing, medical diagnosis aids. 1 Introduction According to a recent article in Advanced Imaging [1], 80 billion electronic images are produced each year. In addition to digital photography sources, most of these are produced by entertainment, industrial, and medical industries. In fact, two billion of these are generated for medical diagnosis. This of course makes for a great argument on the unavoidable consequence of the information age: information overload. This consequence is of great concern in the medical field where the cost of diagnosis and limitations on resources greatly affect the quality of patient care. A strategy is necessary to facilitate diagnosis and expedite analysis by specialists. To this effect, we explore the application of image fusion techniques to combine multiple modality medical imagery. The first goal of our pilot study is to produce a single color image that combines the information from all the relevant modalities and reduces workload. The fusion methods presented here were first introduced in the context of dual-band fusion for night vision applications [2, 3]. In that system, both a night-capable visible camera and a thermal infrared camera were combined in real-time to provide a single color image, which preserved the information from the two separate cameras. One of the main properties of the night vision fusion system is that it enables users to discover critical relationships between the bands, which were previously unexploited. The second goal of our study is to ascertain whether medical imaging of various modalities may benefit from this property, mainly in facilitating or improving the diagnosis capabilities of specialists. 2 Methods The greatest benefits of fusion are obtained when the combined modalities are complementary to each other. As described next, the modalities we selected have the property of either recording structural detail or metabolic/functional information of the region of interest. By their nature, these two types of information are important for diagnosis and are often used together to enhance accuracy. The most prevalent form of brain imaging is MRI or magnetic resonance imaging. In this modality, the patient is exposed to a controlled magnetic field that leads to energy being emitted by protons in the brain. The amount of energy emitted, mainly a function of proton density, is measured and imaged. These images, known as MRI-PD (proton density), convey structural information. In addition, MRI scanners are capable of selectively emitting energy at a given orientation with respect to the axis of the original magnetic field. In this case, relaxation times of the protons lead to separately weighted images, T1 and T2, which measure the relaxation to parallel and perpendicular axes respectively. T1- and T2-weighted modalities are capable of measuring such characteristics as fat, melanin content, blood flow, calcification, etc. Hence, these two modalities are associated with functional information. The MRI imagery utilized in this study was obtained from the Whole Brain Atlas [4]. The imagery includes the three MRI

modalities registered to each other as obtained from a healthy (i.e. normal anatomy) patient. We also went on to analyze the efficacy of the fusion method in combining MRI and SPECT imagery. SPECT, or single photon emission computed tomography, is a technique in which a radiolabeled compound is injected into the patient so that their emissions can later be measured. These emissions are indicative of functional changes such as metabolism, blood flow, etc. These images have been used in the past for detecting presence or progress of brain defects such as tumors. The registered SPECT images utilized in this study were also obtained from the Whole Brain Atlas [4]. Finally, we investigated the fusion of visible and nearinfrared (NIR) endoscopic images. These images were obtained using a CCD video camera attached to an endoscope. The visible image was obtained via the built-in capture function of the camera. The NIR image was obtained using an appropriate filter placed on the camera. The images we used are of the internal cavity of a pig s stomach. In the following subsections, we describe the fusion architecture and its components. First, we introduce the biological principles that inspired and guided the development of the fusion architecture. Then, we present the neural network-based fusion architecture for the case of two, three, and four band combinations. Finally, we detail the design of the non-linear image operator used in preprocessing and combining the images in each of the stages of the fusion architecture. 2.1 Biological fusion systems The visual system of primates and humans contains three types of light sensors known as cones, which have overlapping sensitivities (short, medium, and long wavelengths). It is through the combination of these three sensor sets that we obtain our perception of color. Circuitry in the retina is functionally divided into two stages. The first one utilizes non-linear neural activations and lateral inhibition within bands to enhance and normalize the inputs. The second stage utilizes similar neural components in an arrangement of connections that lead to between-band competition that in turn produces a number of combinations of the three original bands [5]. This last stage of processing enhances the complementary information that exists in each of the bands (e.g. a spectral decorrelation operation). The fusion architecture presented here is motivated by this basic connectivity. Each of the processing stages is implemented via a non-linear neural network known as the shunt (described later in section 2.3). Further definition of specific band combinations in our system found inspiration in the connectivity of neurons in the fusion system of some species of rattle and boa snakes. These snakes possess a series of sensory pits capable of detecting thermal signatures of their surroundings (i.e. thermal infrared sensors). These sensors are used in conjunction with visual input to allow the snake to detect, locate, and capture its prey. Newman and Hartline [6] discovered that neurons in the optic tectum (an area of the brain associated with visual processing) of these snakes were being modulated by inputs from both types of sensors, visual and thermal. Their studies went on to demonstrate the very non-linear relationship that existed between the activation of neurons by each of these modalities. It is in this stage of processing that a dual-band fusion process begins to combine the signals to produce the perceptual experience of the snake. These non-linear combinations lead to information decorrelation not unlike what is usually targeted by principal component analysis techniques. However, in the biological systems and in our architecture, the non-linear operator has a very narrow spatial window providing a better-tuned decorrelation. In addition, the operator is modulated by more globally defined statistical characteristics of the input that produce normalization, smoothing, and between-band calibration. 2.2 A neural network-based fusion architecture The basic fusion architecture consists of two distinct processing stages. In the first one, as in the retina, we utilize a non-linear neural network (i.e. shunt, see next subsection) to obtain within-band image enhancement and normalization. This produces contrast enhancement, dynamic range calibration, and normalization of input images. The second stage adopts the use of the same nonlinear neural network operator to produce between-band decorrelation, information enhancement, and fusion. These stages for the two-band fusion case are illustrated in Figure 1. Here, concentric circles indicate a shunting neural network operator as described in section 2.3. The shunt combinations of the second stage, as shown in Figure 1, provide three unique sets of information rich images. The first combination performs an operation that decorrelates band 2 (MRI-T2) from band 1 (MRI-T1). In other words, it enhances information that is present in band 1 but not in band 2. The resulting image is mapped to the red channel of the color display. The second combination performs the reverse operation, mainly, enhancement of information unique to band 2. This combination is mapped to the blue channel. The final shunt contrast enhances the linear combination of the two bands. In effect, this produces an image in which areas with information common to both bands is enhanced. This image is mapped to the green channel. Another processing stage may be introduced prior to producing the final color image that remaps the color

assignments from those derived by the fusion process. Here, a mapping from RGB to HSV space allows the operator to manipulate the appearance of the image (e.g. hue remap) to obtain a more natural coloring scheme. The modified HSV values are mapped back to RGB to be used in generating the final color fused image. A second form of the two-band fusion architecture was implemented for fusing Visible and NIR endoscopic imagery. Here, the second stage produces the two biased decorrelations of the bands. These combinations are then mapped to the red and blue channels. Finally, to preserve the high contrast and natural appearance of the Visible band, its shunted image is mapped to the green channel. The resulting fused color image will convey between-band information in terms of color contrast (Blue vs. Red), while its brightness profile and resolution will be mainly defined by the Visible band imagery. The architecture for three-band MRI fusion is illustrated in Figure 3. Here, the first stage of processing is as before, where each of the input bands is separately contrast enhanced and normalized. Then, two between-band shunting operations produce distinct fusion products. The first one decorrelates the information between bands 1 (MRI-T1) and 2 (MRI-PD). The second does it for bands 3 (MRI-T2) and 2. In this case, the information derived is that which is unique to bands 1 and 3. The resulting fused images are then mapped to the I and Q (also known as redgreen and blue-yellow) components of the YIQ color space of the image. The Y or achromatic component is derived from the enhanced band 2 image that provides the most faithful structural details. The YIQ components are then mapped to RGB space. Figure 1. Two-band fusion architecture used for processing functional MRI imagery. Concentric circles represent a shunting neural network operator. See Text for details. Figure 3. Three-band MRI fusion architecture. See text for details. The architecture for four-band fusion is shown in Figure 4. Here, the second stage of processing produces the decorrelation between T1-weighted and SPECT, as well as between T2-weighted and SPECT. Notice that the decorrelation is done in both directions for each of the pairs. The most noticeable difference is the addition of a third processing stage. This additional between-band competition leads to further color contrast enhancement and decorrelation as suggested by connectivity in primary visual cortex in primates. The two resulting decorrelated images are mapped to the chromatic I and Q channels. Once again, to preserve the high resolution of the MRI imagery, the structural modality MRI-PD image is mapped to the Y or achromatic channel. Figure 2. Two-band fusion architecture for combination of visible and NIR endoscopic images. See text for details. Figure 4. Four band MRI/SPECT fusion architecture. See text for details.

2.3 The shunting image operator The basic building block of the architecture, represented as concentric circles in figures 1-4, is a non-linear neural network known as a shunting neural network [7]. This neural network, which acts like a filter, models the dynamics of neuron activation due to three contributions: an excitatory one-to-one input, an inhibitory input from surrounding neurons, and passive activation decay (i.e. a leaky integrator). The expression that captures these interactions in a dynamical system is defined in terms of the following differential equation: ij ij C S ( B xij )[ CI ] ij ( D + xij )[ GS I ] ij x& = Ax + (1) Here, x ij is the activation of each cell ij receiving input from each of the pixels in the input image. A is a decay rate, B and D are the maximum and minimum activation levels respectively and are set to 1 in the simulations, C and G s (a Gaussian) serve to weigh the excitatory input (I c ) vs. lateral inhibitory (I s ) image inputs I. The neural network consists of a two dimensional array of these shunting neurons with dimensions ij corresponding to the width and height of the input image. When the input is applied, the network rapidly reaches equilibrium, which produces the resulting output image. This equilibrium state can be understood in terms of the possible values of x after the neuron has reached a steady state as shown in the following equation: x ij = A C S + [ CI GS I ] ij C S + [ CI + GS I ] ij Here, it is straightforward to understand the numerator as a contrast enhancement operation since it represents a difference of Gaussians. The denominator serves to normalize the activation of x with respect to the activity of its neighborhood. In effect, the combination of these two operations leads to the dynamic range compression of the input image in conjunction with contrast enhancement. Parameter A serves to control the characteristics of the operator, from ratio processing (when A is small with respect to the local statistics) to linear filtering (when A is comparatively large). In the case in which the operator is used to combine two bands, the inputs mapped to the center and surround are derived from the each of the input images. In the case where band 1 is mapped to the center, each of the pixels from band 1 are used to drive the excitatory input of their corresponding shunting operator. Then, a corresponding area of the image for band 2 is used as the surround input that is fed into the same shunt operator. The result is the contrast enhancement of information in band 1 as matched (2) against band 2. The relationship between this operation and decorrelation has been previously documented [8]. 3 Results We present two series of results that demonstrate the image enhancement and fusion characteristics of the shunting operator. The first subsection presents results of processing by the first stage of the fusion architecture. The second subsection demonstrates the final color fused results obtained with each of the architectures presented in section 2.2. 3.1 Image enhancement results As previously described, the first stage of the fusion architecture applies the shunting operator to each of the input modalities in order to contrast enhance and normalize them. Figure 5 presents a comparison of the original threemodality MRI imagery (left column) and the shunting-based pre-processed results (right column) as noted in the captions. These results could be compared to those obtained with strategies such as histogram equalization which help to remap the dynamic range of the image. In such approaches, global statistics drive the remapping of gray-scale values to obtain a more uniform distribution. Unfortunately, such remapping can adversely lead to information loss because no considerations are made for local contrast information. In contrast, the shunting operator first enhances this important local information and follows with the normalization, which leads to the dynamic range remapping. 3.2 Image fusion results The first investigation focused on the fusion of MRI imagery of brain data. First, in applying dual-band fusion, we used the architecture of Figure 1 using the T1- and T2- weighted axial images as shown at the top of Figure 6. The resulting color fused image (bottom of Fig. 6) demonstrates the combination of both bands. Notice the use of brightness and color contrast to convey information from the two original images. Figure 6 also presents anatomical labelings for distinct areas identified in each of the original images, as well as in the color fused image. It is obvious, for example, that blood vessels are readily identified in the T2-weighted image but not in the T1-weighted image. As shown in the color fused result image, all anatomical identifications made for the T1 and T2 images are clearly captured and enhanced by the fusion architecture.

Original Shunted MRI Proton Density MRI T1-weighted image MRI T1-weighted MRI T2-weighted image MRI T2-weighted Figure 5. Image comparison between original coronal MRI imagery (PD, proton density, T1- weighted, and T2-weighted) from a normal patient (left column) and the corresponding shuntingprocessed imagery (right column). Color fused result Figure 6. Two-band fusion of MRI T1- (top) and T2-weighted (middle) imagery. Images have been labeled to indicate significant structural landmarks identified in each of the images. Bottom image shows the color-fused result, which demonstrates the preservation of information from the original imagery. In addition, obvious correlations are highlighted by the color differences across the image.

Next, we applied the fusion architecture of Figure 3 to combine all three MRI modalities. As previously explained, PD represents a measure of structural information, while T1 and T2 capture both structural and complementary functional information. For this reason, we paired each of the functional/structural modalities with the PD image such that decorrelation would lead to a purer measure of functional information. As shown in Figure 7, the fused image has captured much of the same complementary information as presented in Figure 6. In this case, however, the more targeted decorrelation has produced coloring patterns that highlight unique functional information. For instance, we see areas with a gradual change from red, indicating strong contribution from T2, to pink, which suggest presence of information in T1 as well. The fact that we only see a gradient from green to red suggests that all information being presented through the color contrast of the image arises from the functional information present in T1 and T2. On the other hand, the brightness contrast mainly derived from the PD image aids in preserving the structural information. By introducing SPECT registered imagery to our studies, we are able to explore the use of a four-band fusion architecture to combine them with the three MRI modalities. As shown in Figure 4, we paired the two MRI modalities associated with functional information (T1 and T2) against the SPECT image. Here, their decorrelation is further enhanced through a second application of the shunt operator (double opponent stage). Proton density MRI Color fused Figure 7. Three-band fusion of MRI imagery. Proton density MRI (left) was fused with the corresponding T1- and T2-weighted MRI images as presented in Figure 3. The image on the right presents the color fused resulting image. SPECT Color fused Figure 8. Four-band fusion of MRI imagery (fusion architecture as presented in Figure 4.) The image on the left corresponds to a registered SPECT scan, and the image on the right presents the color-fused result. The resulting decorrelated information carries a strong signal associated with information that is present in the SPECT image but is not present in the T1 or T2 images and vice versa. This information is mapped to the chromatic information of the final color image. As in the previous experiment, the PD modality of the MRI imagery is mapped to the achromatic channel to preserve the structural information and its higher resolution. The resulting color fused image is presented in Figure 8. Here, we again see the preservation of the MRI combined information. In addition, the SPECT information derived from this combination is preserved as the area with green hues in the middle of the image. The presence of the SPECT information in the final image could be made more or less evident, depending on the task, through the modulation of a fusion weighting factor controlled by the users through an interactive interface. The final investigation involved the fusion of endoscopic imagery (described in section 2). As explained, two modalities were studied, visible and near infrared. The visible modality is typically used to aid surgeons during their procedures. Near infrared signals on the other hand are being investigated as a means for helping surgeons in identifying blood vessels which hide behind fatty surfaces. This is somewhat evident by the distinct brightness profile that the NIR image presents in the blood vessel running horizontally across the middle of the image in Fig. 9 (highlighted area). This difference arises from the unique NIR signature of fatty tissue as compared to that of blood vessels. The result of the two-band fusion is presented at the bottom of Figure 9. The image, which uses the red vs. blue hue space to code information, possesses a very natural appearance that may aid in understanding the information content. In addition, the fused image clearly shows the

effectiveness of this architecture in combining the information (e.g. easily identifiable blood vessels) while preserving information and the high detail level from the original visible band. preserving information from the original input imagery. We demonstrated them in fusing various modalities of medical imagery. These modalities included those that target measurement of structural information and those designed for measuring metabolic processes associated with functional information. We have developed a visualization interface that facilitates navigation and understanding of the resulting data (Figure 10). While these results emphasize the use of fusion techniques for visualization purposes, we are currently investigating their use in the context of image segmentation and pattern recognition. Visible Band NIR Band Color Fused Figure 9. Fusion of registered visible band and nearinfrared imagery obtained from an endoscopic camera. Notice that the area circled in the NIR clearly shows the separation of blood vessel and fatty tissue. On the other hand, the visible band lacks sufficient contrast in the same area. The resulting fused image preserves this information. 4 Discussion We presented four fusion architectures derived from neurophysiological principles of sensor fusion. These architectures provide a method for combining and As previously described, the nature of the fusion combinations produces enhanced information content. Mainly, the fusion process produces a number of betweenband combinations with unique decorrelation characteristics. These combinations, together with the single-band shunted imagery, define a ``richer set of features used to represent each pixel in the image. Then, unsupervised clustering algorithm can be applied to obtain unattended segmentation of the data. This technique has been applied in segmenting the skull from the rest of the data in MRI imagery to create a 3D model (see top-left inset in Figure 10) that is more accurate than obtained with alternative methods (Garrett and Aguilar, in preparation). In addition, in work in progress, we have extended the supervised ARTMAP algorithm [10] to provide orderindependent learning for classification and recognition. Here, an interactive user interface is used to allow medical experts to define areas of interest (AOI) representing prototypes for various brain defects. With this information, and samples of healthy areas, the augmented ARTMAP system learns to discriminate between the various types of AOIs. Such user-leveraged learning techniques have been successfully applied in the context of multi-sensor band fusion for remote sensing [11]. We are currently studying the use of these techniques in characterizing metastatic carcinoma and applying the recognition system to analyze progression of the disease. Similarly, we are investigating the use of the system in identifying reduced perfusion in brains affected by Alzheimer s in order to identify critical complications of the disease. Future efforts will include assessing the image fusion techniques in reducing workload and facilitating diagnosis by medical experts. In addition, we are currently seeking collaboration with medical investigators in order to validate the pattern recognition system. Acknowledgements The work on fusion of endoscopic imagery was initiated while the first author was with the Sensor Exploitation Group at MIT Lincoln Laboratory. The support and

assistance by the staff, in particular Allen Waxman and David Fay, are gratefully acknowledged. All other work was supported by a Faculty Research Grant awarded to the first author by the faculty research committee and Jacksonville State University. Opinions, interpretations, and conclusions are those of the authors and not necessarily endorsed by the committee or Jacksonville State University. References [1] M. Aguilar, D.A. Fay, W.D. Ross, A.M. Waxman, D.B. Ireland, and J.P. Racamato, Real-time fusion of low-light CCD and uncooled IR imagery for color night vision, Proc. Of SPIE Conf. On Enhanced and Synthetic Vision, 3364, 1998. [2] A.M. Waxman, M. Aguilar, R.A. Baxter, D.A. Fay, D.B. Ireland, J.P. Racamato, and W.D. Ross, Opponent color fusion of multisensor imagery: visible, IR and SAR, Proc. Of the Meeting of the IRIS Specialty Group on Passive Sensors, I, pp.43-61, 1998. [3] P. Eggleston, Asset management and image overload: handling 80 billion images a year., Advanced Imaging, pp.12-16, 2000. [4] K.A. Johnson and J.A. Becker, Whole Brain Atlas, http://www.med.harvard.edu/aanlib/home.html, 1999. [5] P. Schiller and N.K. Logothetis, The color-opponent and broad-band channels of the primate visual system, Trends in Neuroscience, 13, pp.392-398, 1990. [6] E.A. Newman and P.H. Hartline, Integration of visual and infrared information in bimodal neurons of the rattlesnake optic tectum, Science, 213, pp.789-791, 1981. [7] S. Grossberg, Neural Networks and Natural Intelligence, Cambridge, MA: MIT Press, 1988. [8] M. Aguilar and A.M. Waxman, Comparison of opponentcolor neural processing and principal components analysis in the fusion of visible and thermal IR imagery. Proc. of the Vision, Recognition, and Action: Neural Models of Mind and Machine Conference, Boston, MA, 1997. [9] G.A. Carpenter, S. Grossberg, and J.H. Reynolds, ARTMAP: Supervised real-time learning and classification of non-stationary data by a self-organizing neural network. Neural Networks, 4, 565-588, 1991. [10] W.D. Ross, A.M. Waxman, W.W. Streilein, M. Aguilar, J. Verly, F. Liu, M.I. Braun, P. Harmon, and S. Rak, Multi- Sensor 3D Image Fusion and Interactive Search. In Proc. 3 rd International Conference on Information Fusion, Paris, France. Figure 10. Screen captured image of the fusion visualization tool. In this example, the user has selected to view the fusion of the three fmri images.