Microarray Data Pre-processing. Ana H. Barragan Lid

Similar documents
Quality control of microarrays

Steps involved in microarray analysis after the experiments

In our previous lecture, we understood the vital parameters to be taken into consideration before data acquisition and scanning.

Computational Genomics. High-throughput experimental biology

Automatic gene expression estimation from microarray images. Daniel O. Dantas Adviser: : Junior Barrera

GenePix Application Note

NPTEL VIDEO COURSE PROTEOMICS PROF. SANJEEVA SRIVASTAVA

Low-level Analysis. cdna Microarrays. Lecture 2 Low Level Gene Expression Data Analysis. Stat 697K, CS 691K, Microbio 690K

EmbryoCellect. RHS Scanning and Analysis Instructions. for. Genepix Pro Software

NPTEL VIDEO COURSE PROTEOMICS PROF. SANJEEVA SRIVASTAVA

Automated cdna microarray image segmentation

Developed by BioDiscovery, Inc. DualChip evaluation software User Manual Version 1.1

Scanning and Image Processing -by Steve Clough

Product Information. Introduction

Analysing data from Illumina BeadArrays

GenePix Application Note

Preparation of Sample Hybridization Scanning and Image Analysis

ScanArray Overview. Principle of Operation. Instrument Components

Scan slides (Axon Genepix 4200AL)

Products - Microarray Scanners - Laser Scanners - InnoScan 900 Series and MAPIX Software

MICROARRAY IMAGE ANALYSIS PROGRAM

RealSpot: software validating results from DNA microarray data analysis with spot images

Microarray Image Analysis: Background Estimation using Region and Filtering Techniques

Illumination Correction tutorial

Image Database and Preprocessing

7-2 Mean, Median, Mode, and Range. IWBAT find the mean, median, mode, and range of a data set.

Instructions for Mapping * µarray Images using GenePix 5.0

Technical Aspects in Digital Pathology

Spotxel 1.7 Microarray Image and Data Analysis Software User s Guide

Improved Accuracy of Spot Search on HPV DNA Microarray Chip

The Bead. beadarray: : An R Package for Illumina BeadArrays. Bead Preparation and Array Production. Beads in Wells. Mark Dunning -

Regression: Tree Rings and Measuring Things

Donuts, Scratches and Blanks: Robust Model-Based Segmentation of Microarray Images

Analysing Illumina bead-based data using beadarray

Image processing for gesture recognition: from theory to practice. Michela Goffredo University Roma TRE

Crossword: A Fully Automated Algorithm for the Segmentation and Quality Control of Protein Microarray Images

Using the Advanced Sharpen Transformation

IncuCyte ZOOM Fluorescent Processing Overview

Computational Methods for Analysis of Footwear Impression Evidence

Energy Measurement in EXO-200 using Boosted Regression Trees

Operations Management

High Dynamic Range Displays

Chapter 4: Patterns and Relationships

Assessments Using Spike-In Experiments

Feature Level Data. Outline. Affymetrix GeneChip Design. Affymetrix GeneChip arrays Two color platforms

Basic Digital Image Processing. The Structure of Digital Images. An Overview of Image Processing. Image Restoration: Line Drop-outs

Capabilities of Flip Chip Defects Inspection Method by Using Laser Techniques

MAKE SURE YOUR SLIDES ARE CLEAN (TOP & BOTTOM) BEFORE LOADING DO NOT LOAD SLIDES DURING SOFTWARE INITIALIZATION

Microarray BASICA: Background Adjustment, Segmentation, Image Compression and Analysis of Microarray Images

IMAGE PROCESSING PROJECT REPORT NUCLEUS CLASIFICATION

Indoor localization using fingerprinting of ultra-wideband measurements

Preprocessing and Segregating Offline Gujarati Handwritten Datasheet for Character Recognition

Optimizing throughput with Machine Vision Lighting. Whitepaper

Chapter 17. Shape-Based Operations

THEORY AND APPROACHES TO AUTOMATED IMAGE ANALYSIS IN DIGITAL PATHOLOGY

Evaluation of laser-based active thermography for the inspection of optoelectronic devices

Probe set (Affymetrix( Affymetrix) PM MM. Probe pair. cell. Gene sequence PM MM ACCAGATCTGTAGTCCATGCGATGC ACCAGATCTGTAATCCATGCGATGC 08/07/2003 1

Color uniformity in spotlights optimized with reflectors and TIR lenses

Ultrafast Technique of Impulsive Noise Removal with Application to Microarray Image Denoising

SSB Debate: Model-based Inference vs. Machine Learning

Image Processing - License Plate Localization and Letters Extraction *

Production of HPDs for the LHCb RICH Detectors

Acute Lymphocytic Leukemia Detection and Classification (ALLDC) System

NSERC Summer Project 1 Helping Improve Digital Camera Sensors With Prof. Glenn Chapman (ENSC)

Laboratory 2: Graphing

An Effective Method for Removing Scratches and Restoring Low -Quality QR Code Images

Lecture 2: Digital Image Fundamentals -- Sampling & Quantization

GENERALIZATION: RANK ORDER FILTERS

Mod. 2 p. 1. Prof. Dr. Christoph Kleinn Institut für Waldinventur und Waldwachstum Arbeitsbereich Fernerkundung und Waldinventur

Enhancing the quality metric of protein microarray image *

White Paper Focusing more on the forest, and less on the trees

GUIDELINES: HEARTS AND ARROWS BY HRD ANTWERP

Instructions for Howto Scan µarrays

QUANTITATIVE COMPUTERIZED LAMINOGRAPHY. Suzanne Fox Buchele and Hunter Ellinger

Influence of Dictionary Size on the Lossless Compression of Microarray Images

Multiple Choice: Identify the choice that best completes the statement or answers the question.

Image analysis. CS/CME/BioE/Biophys/BMI 279 Oct. 31 and Nov. 2, 2017 Ron Dror

Teton Photography Group

Colony Imaging with powerful Analysis Software

Supplementary Figures

Chapter 6. [6]Preprocessing

Fast and accurate inspection of permanent magnets Magnetic field camera technology For R&D and production

NOT FOR DISTRIBUTION JINST_128P_1010 v2

ImageJ: Introduction to Image Analysis 3 May 2012 Jacqui Ross

Introduction to Image Analysis with

Lecture 19: Depth Cameras. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011)

Research article Microarray image analysis: background estimation using quantile and morphological filters Anders Bengtsson* and Henrik Bengtsson

Elucidating tissue specific genes using the Benford distribution

IoT Wi-Fi- based Indoor Positioning System Using Smartphones

Batch effects. 8 normal samples color: processing date. Expression. Sample

Biomedical Signals. Signals and Images in Medicine Dr Nabeel Anwar

Using Curves and Histograms

Detection of Out-Of-Focus Digital Photographs

Multihop Routing in Ad Hoc Networks

Flat Fields. S. Eikenberry Obs Tech

Image Capture TOTALLAB

Multiplexing as Essential Tool for Modern Biology

VTube-LASER Quick Start Guide

WFC3 TV3 Testing: IR Channel Nonlinearity Correction

SoilJ Technical Manual

Transcription:

Microarray Data Pre-processing Ana H. Barragan Lid

Hybridized Microarray Imaged in a microarray scanner Scanner produces fluorescence intensity measurements Intensities correspond to levels of hybridization Fluorescence intensity values are stored as image file = raw data

What is pre-processing? Convert raw data to useful biological data: Image data to intensities values Quality control Remove bias (Filtering, normalization, transformation)

Why pre-process? To avoid using bad data To distinguish noise and the actual biological data To be able to compare data from multiple arrays

Pre-processing Image Analysis Background adjustment Filtering Normalization Quality control

Image Analysis

Image Analysis Commercial microarrays: Specifically design software packages Automatically visualize and quality report But, commercial arrays are not offered for everything e.g Protein arrays Custom arrays

Image Analysis Visual inspection in scanner or platform software Look for scratches and shadows Washing artifacts Manufacture errors Odd spots (donut, star shape etc) Missing spots

Image Analysis Usually automatic from commercial software Gridding Gene annotation Spot segmentation

Image Analysis Addressing or gridding Asign coordinates/physical position to each spot Takes into account small changes caused in array production such as displacement of spots

Image Analysis Flag bad spots Spot size Circularity measure Uniformity Signal strength Spot intensity relative to background Software to extract the information/ intensities

Pre-processing Image Analysis Background adjustment Filtering Normalization Quality control

Background Adjustment Spot intensity = background + foreground Surrounding background can include: No hybridization Non specific hybridization Other fluorescent artifacts

Background adjustment Why? More accurate measure of spot intensity Reduces bias How? Make background more homogeneous

Pre-processing Image Analysis Background adjustment Filtering Normalization Quality control

Filtering Remove data that will contribute to noise or bias Low intensity, bad quality, empty spots, outliers, control probes

Filtering Filtering criteria Spot size/shape Foreground/background intensities Type of spot Number of replicas Variation in replica signal intensities

Filtering Categories of spots to filter Controls Saturated Poor quality Too weak

Filtering Missing values Removal of bad quality spots may introduce missing values for some genes Some analysis programs does not tolerate this May have to impute missing values How?

Filtering Imputing Missing Values K-nearest neighbor algorithm Identifies other genes with expression most similar to the genes of interest (euclidean distance) Weighted average of values for those genes is used to estimate the missing values KNN-method - Troyanskaya, O, Bioinformatics. 2001 17:520-525.

Pre-processing Image Analysis Background adjustment Filtering Normalization Quality control

Normalization Correct for differences not representing true biological variation between samples Remove systematic/technical variations in the relative intensities of each channel Aims to correct for differences in intensities between samples (same or different slides) Bowtell & Sambrook, DNA Microarrays: a molecular cloning manual. 2003

Normalization assumptions and approaches Some genes exhibit constant mrna levels: Housekeeping genes The level of some mrnas are known: Spike-in controls The total of all mrna remains constant: Global median and mean; Lowess The distribution of expression levels is constant quantile From: WIBR Microarray Course, Whitehead Institute, November 2004

Normalization by global mean (total intensity) Assumes that some genes are differentially expressed but most are equivalently expressed Meaning those genes up- or down-regulated will balance each other out The summed intensity values should be equal and where they differ, a constant factor can be calculated to rescale all intensity values Bowtell & Sambrook, DNA Microarrays: a molecular cloning manual. 2003

Multiply/divide all expression values for one color (or array if one-color) by the constant factor calculated to produce a constant mean (or total intensity) for every color/array Example with two one-color arrays From: WIBR Microarray Course, Whitehead Institute, November 2004

Global median normalization Transform all expression values to produce a constant median (instead of mean) Linear regression Ratio vs Intensity http://transcriptome.ens.fr/goulphar/documentation.php#method

Lowess Non linear regression Ratio vs Intensity Used on intensity-depended bias As a result, the normalization factor needs to change with spot intensity mean MA-plot M = Ratio of Red vs green channel or ratio between two different arrays A = Signal intensity

Quantile Different chips may have the same median or mean but still very different distributions Assuming the chips have a common distribution of intensities, they may be transformed to produce similar distributions From: WIBR Microarray Course, Whitehead Institute, November 2004

Normalization between arrays The intensity distributions across arrays are assumed to be the same This is not always/never true Intensity distributions need to be similar for the arrays to be comparable

Normalization Different probes / spots can be involved in the normalization process Based on all the genes on the array Based on controls Which algorithm Technology The shape of the data distribution Always look at the data before and after normalization

Quality control Many steps influence data: Sampling Extraction Labeling (sample dependent control) Hybridization (sample independent control) Scanning (sample independent control) Extraction of data

Different levels of quality control Array level Assess each spot and surroundings Foreground and background Control spots Flags Plot Experiment level Comparing all arrays to identify outliers and batch effects

Illumina, GenomeStudio Sample independent controls Sample dependent controls

Illumina, GenomeStudio

Histogram/density plot Distribution of the intensity for each array Density plot

Density plot

Box plot

Scatter plot

Clustering

Pre-processing Different ways/order Differences between technologies Be modest

Summary A certain amount of pre-processing is needed But do not over pre-process Different technologies, different people, different implementations, different ways Read and understand what you are doing

Thank you!