A Color-Based Approach for Automated Segmentation in Tumor Tissue Classification by Wang et. al. Adam Budde Dhaval Desai Kieran Sweeney BME/MP 574 Outline Motivation and background Color based feature extraction Color normalization Automatic feature extraction PCA learning algorithm and classification Compare semi-automatic vs. automatic segmentation results Questions 2008 Estimated US Cancer Deaths* US Mortality, 2005 Lung & bronchus 31% Prostate 10% Colon & rectum 8% Pancreas 6% Liver & intrahepatic 4% bile duct Leukemia 4% Esophagus 4% Urinary bladder 3% Non-Hodgkin 3% lymphoma Kidney & renal pelvis 3% All other sites 24% Men 294,120 Women 271,530 26% Lung & bronchus 15% Breast 9% Colon & rectum 6% Pancreas 6% Ovary 3% Non-Hodgkin lymphoma 3% Leukemia 3% Uterine corpus 2% Liver & intrahepatic bile duct 2% Brain/ONS 25% All other sites Rank Cause of Death No. of deaths % of all deaths 1. Heart Diseases 652,091 26.6 2. Cancer 559,312 22.8 3. Cerebrovascular diseases 143,579 5.9 4. Chronic lower respiratory diseases 130,933 5.3 5. Accidents (unintentional injuries) 117,809 4.8 6. Diabetes mellitus 75,119 3.1 7. Alzheimer disease 71,599 2.9 8. Influenza & pneumonia 63,001 2.6 9. Nephritis* 43,901 1.8 ONS=Other nervous system. Source: American Cancer Society, 2008. 10. Septicemia 34,136 1.4 Source: American Cancer Society, 2008. 1
Tumor Grading Histochemical staining of tumor samples Used to classify how abnormal cancer cells look (vs. cancer staging) Examine structure and growth pattern of cells Vary with cancer type Allow doctors in planning treatment and estimating patient s prognosis Grade GX G1 G2 G3 G4 Description Undetermined grade Well-differentiated (Low grade) Moderately differentiated (Intermediate grade) Poorly differentiated (High grade) Undifferentiated (High grade) [1] Tissue Extraction (Biopsy or Surgery) Proteins Nucleus Lipids Carbohydrates Fix and Process Tissue Section Tissue Stain Tissue [1] National Cancer Institute Other cellular components Histochemical staining and Tumor Grading Automated Segmentation in Tumor Tissue Classification http://www.psc.edu/science/2000/wetzel/ Grade Differentiation Breast Cancer http://www.nature.com/b jc/journal/v93/n11/fig_ta b/6602829f3.html Prostate Cancer Grade Differentiation Microscopic images Color-Based Feature Extraction (CBFE) system Color Normalization Automatic feature extraction PCA learning algorithm and classification 2
Microscopic Images Microscopy images obtained from oral cancer specimens (total 79 slides) Tri-color staining Vessel: red (i) Nucleus: blue or purple blue(ii) Background: light or dark grey i Color Normalization Why color normalize? Color space RGB space Microscopy images have different color distributions Staining Microscope used Color normalize Similar color distribution Automatic feature extraction Protocol used to color normalize: Reinhard et. al. Color Transfer between Images. Applied Perception. Sept. 2001 RGB image Red, Green, and Blue additive color model Sensing, representation, and display of images in electronic systems Device-dependent http://en.wikipedia.org/wiki/rgb 3
Color space LMS space Large, Medium, and Short cones in the retina RGB space doesn t capture the naturally occurring wavelength mixtures LMS space captures what combinations of photoreceptors are actually being stimulated LMS space can be derived from RGB space Color space lαβ space Developed by Ruderman et. al. in 1998 Discovered in the context of understanding the human visual system lαβ space is derived from LMS space Minimizes correlation between the channels Helps determine the activation of human visual system more accurately then the LMS space RGB space vs. lαβ space Why convert to lαβ space? Correlation between R, G, and B values Little correlation between l, α, and β values Allows the application of different operations in different channels Prevents cross-channel artifacts from occurring lαβ space ideal for carrying out normalization 4
Converting RGB to lαβ space Color normalization Step 1: RGB to LMS Step 2: Take log of LMS Source image: Image that needs to be normalized Target image: Image whose color information will be used to correct source image Need to convert both source and target images from RGB to lαβ color space Step 3: LMS to lαβ Step 4: Color normalize algorithm Color normalization Converting lαβ to RGB space Step 1: Compute mean and standard deviations of source and target images Step 2: Subtract the mean from the data points Step 3: Scale source image data using standard deviations of the source and target images Step 4: Add the mean computed for the target image σt = σ stddevof target image s = std devof source image Step 5: Convert back to RGB Step 1: lαβ to LMS Step 2: Take anti log of LMS Step 3: LMS to RGB Step 4: Use normalized image for Automatic Feature Extraction 5
Gamma correction Examples of color normalization Controls overall brightness Often used to correct bleaching seen in images Varying gamma changes ratios of R:G:B Gamma is invariant in log lαβ space RGB LMS: gamma changes by a small amount Source Target Normalized image Examples of color normalization Examples of color normalization Swatches Select different sets of color information in source and target images Normalize the sets separately E.g. Match blues of sky and yellows of the café and castle Source image Normalized image Source image Target image Normalized image 6
COLOR BASED IMAGE TECHNIQUES RGB Color Space Color coordinate system similar to Cartesian system (0 R 1), (0 G 1), (0 B 1) Gray pixel if R ij =G ij =B ij RGB Images RGB Images 7
RGB Images HSI Color Space Advantages Simple system for computers RGB output for LCD/CRT screens Disadvantages Unintuitive for humans Color and intensity info mixed www.wikipedia.com H: Hue (0 H <360 ) S: Saturation (0 S 1) I: Intensity, (0 I 1) More intuitive for humans to interpret HSI Color Space HSI Images Color info Intensity info H: Hue (0 H <360 ) S: Saturation (0 S 1) I: Intensity, (0 I 1) More intuitive for humans to interpret Color info decoupled from intensity info www.wikipedia.com 8
HSI Images HSI Images Hue Describes primary color of image Radial measure: Red=0 Saturation Describes amount by which color is diluted by white Intensity Brightness value RGB to HSI RGB to HSI web2.clarkson.edu/class/image_process/rgb_to_hsi.pdf 9
HSI Component Manipulation HSI Component Manipulation im=imread('istanbuleminonu.jpg'); imhsv=rgb2hsv(im); for i=1:size(imhsv,1) for j=1:size(imhsv,2) H(i,j)=imHSV(i,j,1); H(i,j) j)=h(i,j) j)*360; H(i,j)=H(i,j)+180; if H(i,j)>+360 H(i,j)=H(i,j)-360; end H(i,j)=H(i,j)./360; end end imhsv(:,:,1)=h; imrgb=hsv2rgb(imhsv); imrgb=imrgb./max(max(max(imrgb))); imshow(imrgb); HSI Component Manipulation Intensity Slicing Original Image Hue Shift Image Picker thyroid phantom Low intensity variance in current image Generate pseudocolor image using HSI Convert to HSI Assign color information based on intensity thresholds Gonzales & Woods Gonzales & Woods 10
Intensity Slicing Intensity Transforms Picker thyroid phantom Low intensity variance in current image Pseudocolor image Intensity information is much more apparent in the pseudocolor image Gonzales & Woods Intensity info is fed into transform and to generate RGB channels for each pixel RGB channels used to produce pseudocolor image Gonzales & Woods Gonzales & Woods HSI in Tumor Tissue Classification HSI in Tumor Tissue Classification Normalized Image Source Image Normalized Image Yi-Yang Wing, et al 11
HSI in Tumor Tissue Classification HSI in Tumor Tissue Classification Normalized Image Normalized Red Component Image (NRCI) Enhances vessels Hue Saturation Intensity Normalized Blue Component Image (NRCI) Enhances nuclei Yi-Yang Wing, et al Yi-Yang Wing, et al AUTOMATIC SELECTION OF TRAINING SAMPLES Automatic Selection of Training Samples Previous steps have standardized di d images Now an algorithm can be used to automatically select training samples 12
Selection of vessel samples Selection of nuclei samples Utilizes NRCI images C=1, therefore following algorithm is used Utilizes NBCI images C=2, therefore following algorithm is used Outputs (mxm) images as training samples Stops when N 1 training samples are extracted from image Outputs (mxm) images as training samples Stops when N 2 training samples are extracted from image Selection of background samples PRINCIPAL COMPONENT ANALYSIS C=3, therefore following algorithm is used Stores individual pixels that pass algorithm N3/9 of these pixels are randomly selected These selected pixels, and their neighbors, are stored to form (mxm) training samples And other observations 13
Covariance and Covariance Matrices Covariance Matrix Covariance A measure of how two signals vary together Covariance Matrix Higher order generalization of variance Diagonals contain variances Matrix Multiplication Eigenvectors A vector multiplied by eigenvector will point in same (or opposite) direction Characteristic equation Ax =λx Only square matrices have eigenvectors A linear operator 14
Eigenvectors - Orthogonality All eigenvectors are orthogonal to each other Important to PCA Data in terms of these eigenvectors Instead of in terms of original axes Because orthogonal, they are independent of one another (no info about one in the other) Eigenvalues Paired with each eigenvector A scalar If matrix is multiplied li by eigenvector, it will be eigenvalue times greater in magnitude Principle Components Analysis Invented by Karl Pearson in 1901 Used to reduce high dimension data to lower dimension data I will show a 2-D example usually much higher dimensioned Step 1: Get Data In example: p1vec is intensity of red in pixel 1 p2vec is intensity of red in pixel 2 p2vec 5 45 4.5 4 3.5 3 2.5 Data 2 1.5 1 0.5 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 p1vec 15
Step 2: Subtract Mean Step 3: Calculate Covariance Matrix So that covariance 2 matrix is equal to the 1.5 correlation matrix 1 Not a problem for 0.5 0 final result -0.5 Mean can be added -1 back later p2vec -1.5 Mean Adjusted Data Calculated as before Done by Matlab covariance matrix = 1.9336 1.8186 1.8186 1.7686-2 -2-1.5-1 -0.5 0 0.5 1 1.5 2 p1vec Step 4: Calculate Eigenvectors and Eigenvalues Done by Matlab Need to be unit eigenvectors Automatic in Matlab Results show patterns in the data eigenvectors = 0.6909-0.7230-0.7230-0.6909 eigenvalues = 0.0306 0.0000 0.0000 3.6716 Step 4: Calculate Eigenvectors and Eigenvalues Cont d 2 Major eigenvector 1.5 Goes through center 1 Follows major pattern, like line of 0.5 best fit 0 The two data sets are -0.5 related along this line -1 Large eigenvalue -1.5 3.6716 here p1vec Data with Eigenvectors -2-2 -1.5-1 -0.5 0 0.5 1 1.5 2 p1vec 16
5/2/2008 Step 4: Calculate Eigenvectors and Eigenvalues Cont d Step 5: Choose Major Eigenvectors p1vec 2 1.5 1 0.5 0-0.5 Data with Eigenvectors Minor eigenvector Shows variation from major vector Smaller eigenvalue 0.0306 here Normally many dimensions Goal is to account for a certain amount of variability About 90% is common We will take both, so that we still have two dimensions -1-1.5-2 -2-1.5-1 -0.5 0 0.5 1 1.5 2 p1vec Matlab code Multivariate Normal Distribution % Step 1 Collect Data p1vec = [ 0.5 2.6 1.2 3.6 1.5 4.7 2.2 4.0 3.2]; Generalization of 1D Gaussian p2vec = [ 0.7 2.8 0.9 3.8 1.6 4.6 2.5 3.6 3.0]; plot(p1vec, p2vec, 'r+') axis([0 5 0 5]) % Step 2 Subtract mean p1mean = mean(p1vec); p2mean = mean(p2vec); p1vec = p1vec - p1mean p2vec = p2vec - p2mean % Step 3 Calculate Covariance rbcov = cov( p1vec, p2vec ) % Step 4 Calculate Eigenvalues and Eigenvectors [eigenvectors,eigenvalues] = eig(rbcov) p = [-2 2]; slope1 = eigenvectors(1,1)/eigenvectors(2,1); slope2 = eigenvectors(1,2)/eigenvectors(2,2); plot(p1vec, p2vec, 'r+', p, slope1*p, p, slope2*p) axis([-2 2-2 2]) 17
Multivariate Normal Distribution Cont d Color Classification Can have as many dimensions as you want Create Multivariate Gaussian pdf Use Eigenvectors and Eigenvalues to determine axis Iso-probability contour: We will use 2D case Color Classification Post Processing Calculate pdf for each new sample in each category Method does not consider spatial relationships Nonlinear merging strategy Merge areas less than 5 pixels into background Result is probability that it matches category Classified as whichever category has max probability 18
Experiment - Setup Experiment Pictures 1 79 images (40 training, 39 testing) Two approaches Semiautomatic approach, training samples selected manually Fully automatic approach, training samples selected by algorithm Goal is to determine if their results are similar Original Semi-auto Fully auto Experiment Pictures 2 Experiment - Results Original Semi-auto Fully auto Greater than 90% accuracy Advantages Removes tedious manual training step Faster, less costly Sufficient for most clinical applications Tool for oral cancer detection Disadvantages Needs specific staining 19