Statistics, Probability and Noise

Similar documents
The Discrete Fourier Transform. Claudia Feregrino-Uribe, Alicia Morales-Reyes Original material: Dr. René Cumplido

Linear Systems. Claudia Feregrino-Uribe & Alicia Morales-Reyes Original material: Rene Cumplido. Autumn 2015, CCC-INAOE

Lecture 4: Digital representation and data analysis

The fundamentals of detection theory

Noise Measurements Using a Teledyne LeCroy Oscilloscope

Name Class Date. Introducing Probability Distributions

Appendix III Graphs in the Introductory Physics Laboratory

Understanding Apparent Increasing Random Jitter with Increasing PRBS Test Pattern Lengths

Density Curves. Chapter 3. Density Curves. Density Curves. Density Curves. Density Curves. Basic Practice of Statistics - 3rd Edition.

APPENDIX 2.3: RULES OF PROBABILITY

USE OF BASIC ELECTRONIC MEASURING INSTRUMENTS Part II, & ANALYSIS OF MEASUREMENT ERROR 1

DIGITAL SIGNAL PROCESSING CCC-INAOE AUTUMN 2015

Laboratory 1: Uncertainty Analysis

Introduction. Chapter Time-Varying Signals

Chapter 3. The Normal Distributions. BPS - 5th Ed. Chapter 3 1

Jitter in Digital Communication Systems, Part 1

Signal Processing for Digitizers

Graphing Techniques. Figure 1. c 2011 Advanced Instructional Systems, Inc. and the University of North Carolina 1

CCMR Educational Programs

Math Exam 2 Review. NOTE: For reviews of the other sections on Exam 2, refer to the first page of WIR #4 and #5.

Math Exam 2 Review. NOTE: For reviews of the other sections on Exam 2, refer to the first page of WIR #4 and #5.

Timing accuracy of the GEO 600 data acquisition system

November 11, Chapter 8: Probability: The Mathematics of Chance

Image Enhancement in Spatial Domain

Application Note AN-23 Copyright September, 2009

On spatial resolution

New Features of IEEE Std Digitizing Waveform Recorders

A slope of a line is the ratio between the change in a vertical distance (rise) to the change in a horizontal

TO PLOT OR NOT TO PLOT?

Keywords: cylindrical near-field acquisition, mechanical and electrical errors, uncertainty, directivity.

Empirical Path Loss Models

Digital Image Processing

Lane Detection in Automotive

Page 21 GRAPHING OBJECTIVES:

Moore, IPS 6e Chapter 05

JIGSAW ACTIVITY, TASK # Make sure your answer in written in the correct order. Highest powers of x should come first, down to the lowest powers.

Graphs. This tutorial will cover the curves of graphs that you are likely to encounter in physics and chemistry.

Multiple Input Multiple Output (MIMO) Operation Principles

Univariate Descriptive Statistics

EE 791 EEG-5 Measures of EEG Dynamic Properties

UNIVERSITY OF UTAH ELECTRICAL AND COMPUTER ENGINEERING DEPARTMENT

Digital Image Processing. Lecture # 4 Image Enhancement (Histogram)

Objectives. Abstract. This PRO Lesson will examine the Fast Fourier Transformation (FFT) as follows:

COURSE SYLLABUS. Course Title: Introduction to Quality and Continuous Improvement

Signals A Preliminary Discussion EE442 Analog & Digital Communication Systems Lecture 2

SHOCK AND VIBRATION RESPONSE SPECTRA COURSE Unit 4. Random Vibration Characteristics. By Tom Irvine

Correlation of Model Simulations and Measurements

CHAPTER 6 SIGNAL PROCESSING TECHNIQUES TO IMPROVE PRECISION OF SPECTRAL FIT ALGORITHM

Oscilloscope Measurement Fundamentals: Vertical-Axis Measurements (Part 1 of 3)

Session 5 Variation About the Mean

Using Figures - The Basics

Getting Started. MSO/DPO Series Oscilloscopes. Basic Concepts

Joint Distributions, Independence Class 7, Jeremy Orloff and Jonathan Bloom

THE SINUSOIDAL WAVEFORM

Jitter Analysis Techniques Using an Agilent Infiniium Oscilloscope

Enhanced Sample Rate Mode Measurement Precision

8.2 Common Forms of Noise

Jitter in Digital Communication Systems, Part 2

Statistical Pulse Measurements using USB Power Sensors

HANDS-ON TRANSFORMATIONS: RIGID MOTIONS AND CONGRUENCE (Poll Code 39934)

Female Height. Height (inches)

Statistical Analysis of Modern Communication Signals

An Introduction to Jitter Analysis. WAVECREST Feb 1,

If a fair coin is tossed 10 times, what will we see? 24.61% 20.51% 20.51% 11.72% 11.72% 4.39% 4.39% 0.98% 0.98% 0.098% 0.098%

Statistics 1040 Summer 2009 Exam III

Image Enhancement using Histogram Equalization and Spatial Filtering

This page intentionally left blank

Connected Mathematics 2, 6th Grade Units (c) 2006 Correlated to: Utah Core Curriculum for Math (Grade 6)

Noise and Distortion in Microwave System

Qäf) Newnes f-s^j^s. Digital Signal Processing. A Practical Guide for Engineers and Scientists. by Steven W. Smith

Chapter 11. Sampling Distributions. BPS - 5th Ed. Chapter 11 1

The next several lectures will be concerned with probability theory. We will aim to make sense of statements such as the following:

Numerical: Data with quantity Discrete: whole number answers Example: How many siblings do you have?

Analysis of Complex Modulated Carriers Using Statistical Methods

inverting V CC v O -V EE non-inverting

Engineering Fundamentals and Problem Solving, 6e

The Discrete Fourier Transform

Name Date. Chapter 15 Final Review

The Intraclass Correlation Coefficient

Volume 3 Signal Processing Reference Manual

University of Tennessee at. Chattanooga

Abrupt Changes Detection in Fatigue Data Using the Cumulative Sum Method

LINEAR EQUATIONS IN TWO VARIABLES

Appendix C: Graphing. How do I plot data and uncertainties? Another technique that makes data analysis easier is to record all your data in a table.

Physics 2310 Lab #5: Thin Lenses and Concave Mirrors Dr. Michael Pierce (Univ. of Wyoming)

Grade 8 Math Assignment: Probability

ON WAVEFORM SELECTION IN A TIME VARYING SONAR ENVIRONMENT

Signal Processing. Naureen Ghani. December 9, 2017

Introduction to probability

Lecture - 06 Large Scale Propagation Models Path Loss

Chapter 2: Digitization of Sound

2.2 More on Normal Distributions and Standard Normal Calculations

Antennas and Propagation. Chapter 6b: Path Models Rayleigh, Rician Fading, MIMO

Lane Detection in Automotive

EXPERIMENTAL ERROR AND DATA ANALYSIS

Basic Digital Image Processing. The Structure of Digital Images. An Overview of Image Processing. Image Restoration: Line Drop-outs

DISCRETE FOURIER TRANSFORM AND FILTER DESIGN

A MONTE CARLO CODE FOR SIMULATION OF PULSE PILE-UP SPECTRAL DISTORTION IN PULSE-HEIGHT MEASUREMENT

Continuous time and Discrete time Signals and Systems

Solving Equations and Graphing

Transcription:

Statistics, Probability and Noise Claudia Feregrino-Uribe & Alicia Morales-Reyes Original material: Rene Cumplido Autumn 2015, CCC-INAOE

Contents Signal and graph terminology Mean and standard deviation Signal versus underlying process The Histogram, Pmf and Pdf The Normal Distribution Digital noise generation Precision and accuracy 2

Introduction Statistics and probability are used in DSP to characterize signals and the processes that generate them. For example, a primary use of DSP is to reduce interference, noise, and other undesirable components in acquired data. Inherent part of the signal being measured Imperfections in the data acquisition system Introduced as an unavoidable byproduct of some DSP operation. Statistics and probability allow these disruptive features to be measured and classified Aid in developing strategies to remove the offending components. 3

Signal and Graph Terminology The vertical axis may represent voltage, light intensity, sound pressure, etc. Since we don't know what it represents in this particular case, we will label it: amplitude. This parameter is also called several other names: the y axis, the dependent variable, the range, and the ordinate. 4

Signal and Graph Terminology The horizontal axis represents the other parameter of the signal, going by such names as: the x-axis, the independent variable, the domain, and the abscissa. Time is the most common parameter to appear on the horizontal axis of acquired signals Other parameters are used in specific applications. Eg. rock density at equally spaced distances along the surface of the earth. In general, label the horizontal axis: sample number. If this were a continuous signal, label would be, eg.: time, distance, x, etc 5

Signal and Graph Terminology The two parameters that form a signal are generally not interchangeable. The parameter on the y-axis (the dependent variable) is said to be a function of the parameter on the x-axis (the independent variable). The independent variable describes how or when each sample is taken, while the dependent variable is the actual measurement. Given a specific value on the x-axis, we can find the corresponding value on the y-axis, but usually not the other way around. 6

Signal and Graph Terminology Domain is a very widely used term in DSP. A signal that uses time as the independent variable is said to be in the time domain. Another common signal in DSP uses frequency as the independent variable, resulting in the term, frequency domain. Signals that use distance as the independent parameter are said to be in the spatial domain. What if the x-axis is labeled with something like sample number? Refer to them as being in the time domain. 7

Signal and Graph Terminology Although the signals in previous figures are discrete, they are displayed in this figure as continuous lines. There are too many samples to be distinguishable if they were displayed as individual markers. In graphs that portray shorter signals ( <100) the individual markers are usually shown. Continuous lines may or may not be drawn to connect the markers. A continuous line could imply what is happening between samples, or simply be an aid to help the reader's eye follow a trend in noisy data. Examine the labeling of the horizontal axis to find if you are working with a discrete or continuous signal. 8

Signal and Graph Terminology Sampling notation: The variable, N, is widely used in DSP to represent the total number of samples in a signal. Each sample is assigned a sample number or index. These are the numbers that appear along the horizontal axis. Two notations for assigning sample numbers are commonly used. Sample indexes run from 1 to N (e.g., 1 to 512) - Math Sample indexes run from 0 to N 1 (e.g., 0 to 511) - DSP 9

Mean The mean, indicated by µ is the statistician's jargon for the average value of a signal. Add all of the samples together, and divide by N. Mathematically: µ = 1 1 N N i= 0 x i 10

Standard Deviation The standard deviation is a measure of how far the signal fluctuates from the mean. The standard deviation is obtained by averaging the squares of differences of each sample with the mean. The square root is taken to compensate for the initial squaring. In equation form: σ = 1 N 1 N 1 i= 0 ( x i µ ) 2 11

Common waveforms Ratio of the peak-to-peak amplitude of the std. dev. for several common waveforms 12

Running statistics It is often desirable to recalculate the mean and standard deviation as new samples are acquired and added to the signal. This type of calculation is called running statistics. N, the total number of samples sum, the sum of these samples sum of squares, the sum of the squares of the samples 13

Signal-to-Noise Ratio and Coefficient of Variation In some situations, the mean describes what is being measured, while the standard deviation represents noise and other Interference The standard deviation is not important in itself, but only in comparison to the mean. This gives rise to the term: signal-to-noise ratio (SNR) Mean divided by the standard deviation. Another term used is the coefficient of variation (CV) standard deviation divided by the mean, multiplied by 100 percent. E.g. a signal with a CV of 2%, has an SNR of 50 Better data means a higher value for the SNR and a lower value for the CV. 14

Signal versus Underlying Process Statistics is the science of interpreting numerical data, such as acquired signals. Probability is used in DSP to understand the processes that generate signals. In DSP it is important to distinguish the acquired signal from the underlying process. 15

Signal versus Underlying Process The probabilities of the underlying process are constant, but the statistics of the acquired signal change each time the experiment is repeated E.g. A signal created by flipping a coin 1000 times. Heads -> one, Tails ->zero This random irregularity found in actual data is called by such names as: statistical variation, statistical fluctuation, and statistical noise The process that created this signal has a mean of exactly 0.5 50% heads, 50% tails The actual 1000 point signal will not necessarily have a mean of exactly 0.5. Random chance will make the number of ones and zeros slightly different each time the signal is generated 16

The Histogram, Pmf and Pdf The histogram displays the number of samples there are in a signal that have each of the possible values. A histogram is represented by H i i is an index that runs from 0 to M-1 M is the number of possible values each sample can take E.g. H 50 is the number of samples that have a value of 50. The sum of all values in the histogram is equal to the number of points in the signal. 17

The Histogram, Pmf and Pdf Mean Standard deviation Efficient to calculate mean and std. dev. of very large data sets Images Statistics are calculated per groups of samples 18

Probability mass function (pmf) Important: the acquired signal is a noisy version of the underlying process Histogram is formed from an acquired signal and calculated using a finite number of samples Corresponding underlying process is the probability mass function (pmf) Pmf is what would be obtained with an infinite number of samples. Pmf can be estimated (inferred) from the histogram, or it may be deduced by some mathematical technique Normalization, total number of samples Pmf is important because it describes the probability that a certain value will be generated. Discrete data only 19

Probability density function (pdf) Continuous signals Pdf or probability distribution function How can we calculate a probability? Pdf s vertical axis is in probability density units, rather than just probability Eg. 0.03 at 120.5 does not mean that the a voltage of 120.5 millivolts will occur 3% of the time Probability of 120.5 millivolts to occur is really small > there is an infinite number of signal values for time scale: 120.49997, 120.49998, 120.49999, etc 20

To remember The histogram, pmf, and pdf are very similar concepts. Try not to be confused. The total area under the pdf curve, the integral from - to +, will always be equal to one. The sum of all of the pmf values being equal to one The sum of all of the histogram values being equal to N. 21

Examples of continuous waveforms and their pdfs 22

The Normal Distribution Signals formed from random processes usually have a bell shaped pdf. This is called a normal distribution, a Gauss distribution, or a Gaussian After German mathematician, Karl F. Gauss (1777-1855) The basic shape of the curve can be generated by: 23

The Normal Distribution This raw curve can be converted into the complete Gaussian by adding an adjustable mean, µ, and standard deviation, σ. The equation must be normalized so that the total area under the curve is equal to one. General form of the normal distribution. 24

The Normal Distribution 25

Cumulative distribution function (cdf) Pdf integration is used to find the probability that a signal will be within a certain range of values Pdf s integral is called cumulative distribution function (cdf), Φ(x) Gaussian s integral is calculated by numerical integration Very fine discrete sampling of the continuous Gaussian curve, from -10σ to +10σ Discrete signal s samples are added to simulate integration 26

Cumulative distribution function (cdf) Φ(x), the cumulative distribution function of the normal distribution (mean = 0, standard deviation = 1). 27

Digital noise generation Random noise is an important topic in both electronics and DSP. For example, it limits how small of a signal an instrument can measure, the distance a radio system can communicate, and how much radiation is required to produce an x-ray image. A common need in DSP is to generate signals that resemble various types of random noise. This is required to test the performance of algorithms that must work in the presence of noise. 28

Random number generator The heart of digital noise generation is the random number generator. Most programming languages have this as a standard function. Each random number has a value between zero and one, with an equal probability of being anywhere between these two extremes. The mean of the underlying process that generated this signal is 0.5 The standard deviation is, and the 1/ 12 = 0.29, and The distribution is uniform between zero and one. 29

Digital noise generation (Gaussian) There are two methods for generating such signals using a random number generator. a signal obtained by adding two random numbers to form each sample, i.e., X = RND+RND. Since each of the random numbers can run from zero to one, the sum can run from zero to two. The mean is now one, and the standard deviation is 1/ 6 The pdf has changed from a uniform distribution to a triangular distribution. The signal spends more of its time around a value of one, with less time spent near zero or two. 30

Digital noise generation (Gaussian) 2 31

Digital noise generation (Gaussian) (3) Taking this idea a step further, adding twelve random numbers to produce each sample. The mean is now six The standard deviation is one. The pdf has virtually become a Gaussian. This procedure can be used to create a normally distributed noise signal with an arbitrary mean and standard deviation. For each sample in the signal: 1) add twelve random numbers 2) subtract six to make the mean equal to zero 3) multiply by the standard deviation desired 4) add the desired mean 32

Digital noise generation (Gaussian) (4) 33

Central Limit Theorem The mathematical basis for this algorithm is contained in the Central Limit Theorem, one of the most important concepts in probability. In its simplest form, the Central Limit Theorem states that a sum of random numbers becomes normally distributed as more and more of the random numbers are added together. The Central Limit Theorem does not require the individual random numbers be from any particular distribution, Or even that the random numbers be from the same distribution The Central Limit Theorem provides the reason why normally distributed signals are seen so widely in nature Whenever many different random forces are interacting, the resulting pdf becomes a Gaussian. 34

Digital noise generation 2 nd method A random number generator is invoked twice, to obtain R1 and R2. A normally distributed random number, X, can then be found from: Just as before, this approach can generate normally distributed random signals with an arbitrary mean and standard deviation. Take each number generated by this equation multiply it by the desired standard deviation, and add the desired mean. 35

Random number generators Random number generators operate by starting with a seed, a number between zero and one. When the random number generator is invoked, the seed is passed through a fixed algorithm, resulting in a new number between zero and one. This new number is reported as the random number It is then internally stored to be used as the seed the next time the random number generator is called. The algorithm that transforms the seed into the new random number is often of the form: 36

Precision and Accuracy Precision and accuracy are terms used to describe systems and methods that measure, estimate, or predict. In all these cases, we wish to know the value of some parameter This is called the true value, or simply, truth. The method provides a measured value, that you want to be as close to the true value as possible. Precision and accuracy are ways of describing the error that can exist between these two values. 37

Precision and Accuracy consider an oceanographer measuring water depth using a sonar system The mean occurs at the center of the distribution best estimate of the depth based on all measured data The standard deviation defines distribution s width how much variation occurs between successive measurements Good accuracy, poor precision Poor repeatability 38

Precision and Accuracy Precision is a measure of random noise When deciding which name to call the problem, ask yourself two questions. Poor accuracy results from systematic errors Bad calibration Eg. Converting time to distance, how? Accuracy is a measure of calibration First: Will averaging successive readings provide a better measurement? If yes, call the error precision If no, call it accuracy Second: Will calibration correct the error? If yes, call it accuracy If no, call it precision. 39