Analysis of Data Chemistry PDF Free Download

Chemistry 838 Thomas V. Atkinson, Ph.D. Senior Academic Specialist Department of Chemistry Michigan State University East Lansing, MI 4884 TABLE OF CONTENTS TABLE OF CONTENTS...1 TABLE OF TABLES...1 TABLE OF FIGURES...1 1. IMPROVING THE QUALITY OF THE DATA... 1.1. IMPROVE THE EXPERIMENT...3 1.. DURING ACQUISITION...3 1..1. Average the data as acquired...3 1... Average Scans...7 1.3. AFTER ACQUISITION...8 1.3.1. Smoothing...8 1.3.. Curve Fitting...14. ACQUISITION...17 3. ALIASING...17 4. FAST FOURIER TRANSFORMS... 4.1. FFT - GENERAL EXAMPLES... 4.. FILTERING WITH THE FFT...8 4.3. FILTERING WITH THE FFT - EXAMPLE...31 Table of Tables TABLE 1 - LORENTZIAN PARAMETERS...9 Table of Figures FIGURE 1 - FLYING AVERAGE...7 FIGURE - FLYING AVERAGE - TIME COURSE OF THE ACQUISITION...ERROR! BOOKMARK NOT DEFINED. FIGURE 3 - M=1...5 Saturday, January 9, 5-1 -

FIGURE 4 - M=...5 FIGURE 5 - M=4...6 FIGURE 6 - M=8...6 FIGURE 7 - AVERAGE SUCCESSIVE SCANS...8 FIGURE 8 - SMOOTHING - BY AVERAGING...9 FIGURE 9 - LABVIEW HELP - SAVITZKY GOLAI FILTER...11 FIGURE 1 - LABVIEW HELP - INDEX ARRAY...11 FIGURE 11 - LABVIEW HELP - ARRAY SIZE...1 FIGURE 1 - SMOOTHING SAVITSKY GOLAI - BLOCK DIAGRAM...1 FIGURE 13 - SMOOTHING SAVITSKY GOLAI - ORDER = 4, NUMBER POINTS = 3...1 FIGURE 14 - SMOOTHING SAVITSKY GOLAI - ORDER = 4, NUMBER POINTS = 8...13 FIGURE 15 - SMOOTHING SAVITSKY GOLAI - ORDER = 4, NUMBER POINTS = 4...13 FIGURE 16 - NLLS - GRID SEARCH OF PARAMETER SPACE...15 FIGURE 17 - NLLS - BINARY SEARCH OF PARAMETER SPACE...15 FIGURE 18 - NLLS - ADAPTIVE SEARCH OF PARAMETER SPACE...16 FIGURE 19 - GENERAL FFT - BLOCK DIAGRAM...5 FIGURE - GENERAL FFT - SINE WAVE...6 FIGURE 1 - GENERAL FFT - SAW TOOTH...6 FIGURE - GENERAL FFT -TRIANGLE WAVE...7 FIGURE 3 - GENERAL FFT - SQUARE WAVE...7 FIGURE 4 - SINEWAVESUBVI BLOCK DIAGRAM...3 FIGURE 5 - FILTER WITH FFT BLOCK DIAGRAM PANEL 1...33 FIGURE 6 - FILTER WITH FFT BLOCK DIAGRAM PANEL...34 FIGURE 7 - FILTER WITH FFT BLOCK DIAGRAM PANEL 3...35 FIGURE 8 - FILTER WITH FFT FRONT PANEL 1...36 FIGURE 9 - FILTER WITH FFT FRONT PANEL...37 1. Improving the Quality of the Data Experimentation is the observation of a real system with the intent of understanding the nature of that system. The observation yields a set of data that, hopefully, embodies the nature of the system being studied. Analysis of these data sets has one major goal, i.e. extracting values of the parameters of interest. This is true whether the system is a human being and the health of the person is to be understood by determined by measuring the concentration of a given species in a blood sample. Or, to understand the nature of the sun by making many observations of whatever parameters are accessible over the millions of miles separating us from that star. Very often, the understanding of a system will be couched in a mathematical model of the system. Thus, analysis of the data often consists of determining the values of the parameters of the mathematical model from the data sets. The sine wave, often used in communication systems, is completely described by the frequency, phase, and amplitude of the signal. A spectrum is often a collection of lines each of which can be modeled by a function. Typically, the line is completely characterized by the position, the amplitude, and the width of the line or, more exactly, the line shape. Thus, a major goal is the extraction of the information of interest from the data set, i.e. extracting the values of the parameters of interest. An implicit sub goal is to make the most accurate and precise measurements possible. One common issue in experimentation is the improvement of the quality of the data; typically by improving the signal to noise ratio. This document gives a brief and general introduction to many of the techniques of improving the quality of the measurements and analyzing the data. Saturday, January 9, 5 - -

1.1. Improve the experiment One obvious technique is to improve the experiment. In addition, improving the instrumentation, e.g. transducers and domain converters, being used to observe the system and acquire the data. 1.. During Acquisition 1..1. Averaging the Data as Acquired A common technique is to average values as they are being acquired. For each point to be stored a group of m points are acquired at an interval t Delta, averaged and stored. For random noise, averaging a group of values will tend to integrate out the noise. In many cases, the dependent variable can be stepped. In such cases, a number of values are acquired at a constant value of the dependent variable, x i, then averaged, then stored. n= m 1 y( xi, ti + ntdelta) n= y Stored( xi ) = Figure 8 and Figure illustrate this strategy. The prerequisite of this technique is that the dependent variables are not a function of time, i.e. you can stop at a given value of x for as long as you wish. In this example, three samples are made at each value of x, the average of these are stored for the y value for that x. Then the value of x is incremented and the process repeated. Hopefully, the resultant set of points will appropriately represent the curve y(x). m Figure 1 - Flying Average Stepped x Saturday, January 9, 5-3 -

Figure - Flying Average Stepped x - Time Course of the Acquisition 1... Flying Average Example The following illustrate the use of averaging. The signal being used is synthesized as follows. Each point y i (x i ) is the average of m values of the sine wave at the point x i plus a random number, rand(). y ( x ) = i i n= m 1 n= ( Asin( x ) + rand() Each point showed is the average of m measurements. In these cases, the signal being observed is constant over the time required to make the m measurements of that value. i m n ) Sine 6 4 - -4-6 5 1 15 5 3 35 Figure 3 - Pure Sine wave Saturday, January 9, 5-4 -

Sine plus noise 6 4 - -4-6 5 1 15 5 3 35 Figure 4 Sine plus Noise, m=1 Average Samples per point 6 4 - -4-6 5 1 15 5 3 35 Figure 5 - m= Saturday, January 9, 5-5 -

Average 4 Samples Per Point 6 4 - -4-6 -8 5 1 15 5 3 35 Figure 6 - m=4 Average 8 Samples per Point 6 4 - -4-6 -8 5 1 15 5 3 35 Figure 7 - m=8 A variation of the above technique is to acquire and average the values of the dependent variables as the independent variable(s) is being scanned over a range of values, e.g. a range of wavelengths. For each point to be stored a group of m points are acquired at an acquisition interval x Delta, averaged, and stored. Saturday, January 9, 5-6 -

y Figure 8 illustrates this strategy. Stored ( x ) i = m n= 1 m n= y( x i + nx m Delta ) Flying Average (1 Samples per Point) Sample y=f(x) Data Points Average Value for This Set Figure 8 - Flying Average Scanned x 1..3. Averaging Scans Another technique is to average sets (scans) of values. The corresponding points of each scan are averaged and stored in the final group. y Stored ( x ) i = n= m 1 n= y( x, n) m i Saturday, January 9, 5-7 -

Figure 9 - Averaging Successive Scans These two averaging techniques are equivalent as long as the signal being acquired is effectively constant over the interval need to acquire the m samples for each point in the final set for the flying average technique. The averaging of successive scans emphasizes the need to have precise time bases and triggering. Otherwise, corresponding data points in the various scans will not have the same horizontal coordinate and, hence, introduce horizontal noise into the data set. 1.3. After Acquisition 1.3.1. Smoothing Another technique is to smooth the values at a later time. A new data set is created. Each point of the new smoothed set is some function of the m+1 points surrounding the corresponding point in the original set. Many different smoothing functions have been used. y Smoothed ( i) = f ( y( i m),..., y( i + ),..., y( i + m)) y( i ), y( i 1), y( i), y( i + 1), 1.3.1.1. Data Set for Smoothing Examples A few examples of smoothing a set of data will be presented. These examples use a synthetic data set consisting of the sum of two Lorentzian lines plus random noise. The Lorentzian line shape is given in Equation 1. The two Lorentzian curves at the bottom of Figure 1 were generated using the two sets of parameters found in Table 1. These two Lorentzians were then summed yielding the third curve from the bottom of Figure 1. Random noise is then added to the composite curve yielding the curve that is fourth from the bottom of Figure 1. Saturday, January 9, 5-8 -

y( x) = 1+ a 1 x a a 3 Equation 1 Table 1 - Lorentzian Parameters Symbol Definition Lorentzian 1 Lorentzian a1 Amplitude 3 a Position of center 3 a3 Width 3 Smoothing by Averaging Lorentzian 1 Lorentzian Sum of Lorentzians Sum of plus Noise 5 Point Smooth 1 Point Smooth 3 Point Smooth 5 1 15 5 3 35 4 45 5 Figure 1 - Smoothing - By Averaging Saturday, January 9, 5-9 -

1.3.1.. Moving Average The first example uses a simple average of adjacent points. y Smoothed ( i) = m m y( i ) +... + y( i 1) + y( i) + y( i + 1) +... + y( i + 1) m Equation Applying Equation 1 with m=5, 1, and 3 to the noisy composite of two Lorentzian in Figure 1 yields the upper three examples of Figure 1. Notice that the smoothed data become more distorted as the number of points in the moving average increases. The apparent position and width of the lines change from the actual values. 1.3.1.3. Savitsky Golai Polynomial Smooth The Savitsky Golai Smooth uses a polynomial function for the smoothing function. The polynomial coefficients are evaluated for each neighborhood of points. Figure 11 - Smoothing Savitsky Golai - Block Diagram Saturday, January 9, 5-1 -

Figure 1 - Labview Help - Savitzky Golai Filter Figure 13 - LABView Help - Index Array Saturday, January 9, 5-11 -

Figure 14 - LABView Help - Array Size Figure 15 - Smoothing Savitsky Golai - Order = 4, Number Points = 3 Saturday, January 9, 5-1 -

Figure 16 - Smoothing Savitsky Golai - Order = 4, Number Points = 8 Figure 17 - Smoothing Savitsky Golai - Order = 4, Number Points = 4 Saturday, January 9, 5-13 -

Notice the distortion, e.g. shift of peak positions, introduced in Figure 17. This is a danger of smoothing 1.3.. Curve Fitting In this case, a model such as below exists that is believed to describe the real behavior of the system being studied. For the model, y is the dependent variable, x i is a set of m independent variables, and a k is a set of constant parameters. y = f ( xi, ak ) where i = 1 to m and k = 1 to n A set of experimental measurements are made yielding a set of n-tuples. ( y j, xij ) Once the data is acquired, the object is to find the best set of a k, i.e. those that make the model most closely fit the experimental data. This is usually done by minimizing the sum of the squares. j ( y model, y j measured, j ) 1.3..1. Linear Least Squares If the model is a linear function of the a k, then the solution is straight forward. The following is one example of such a model that yields to the Linear Least Squares Method. y = a + + a1x ax 1.3... Non-Linear Least Squares If the model is a nonlinear function of the a k, then there is no analytical solution. In this case, a search of the space of the parameters, a k is required. A set of values are picked for the a k and the sum of the squares calculated. A second set of values for the a k are picked and the sum of the squares calculated. A third set is chosen, hopefully in a direction that will cause the decrease of the sum of the squares. This iterative process is continued until the results converge on a minimum value of the sum of the squares. The final set of a k is taken as the answer. Unfortunately, sometimes the process diverges and no results are reached. Many strategies exist for picking the next set of a k to try. That of Runga Kutta is one of the oldest one. In essence, you are searching the a k space, looking for the best set of values for the parameters. The following is one example of such a model that requires the Nonlinear Least Squares Method. a 1x a3x y = ae + ae X-ray crystallography and kinetics are but two areas where these non-linear models are often encountered. 1.3...1. Grid Search Saturday, January 9, 5-14 -

A few simple examples of strategies for searching these non-linear parameter spaces will be presented. In this case an arbitrary example is chosen that has only one parameter to fit, hence it will be a one dimensional space, but with a complicated sum of the squares. y = f ( x, a) Figure 18 shows the relationship between the sum of the squares and a the parameter to be determined. Figure 18 also illustrates the first strategies, a simple grid search. The sum of the squares is calculated at each of a set of equally spaced values of a. The value yielding the smallest value of the sum of the squares is taken as the answer. In this case, a 4 would be the answer. Notice that the value might not be at the absolute bottom of that well and that the feature ξ was totally missed. 1.3... Binary Search Figure 18 - NLLS - Grid Search of Parameter Space Figure 19 illustrates a more efficient mode of searching. In this case, the parameter space is divided into half Figure 19 - NLLS - Binary Search of Parameter Space Saturday, January 9, 5-15 -

1.3...3. Adaptive Search Figure shows a better approach, one where the step size is modified as you near the minimum. The step size is given by the following. Notice that the next step is a function of the slope between the last two steps and will change sign, i.e. move in the opposite direction, if you start going up a hill. In addition, as the slope decreases, the size of the step decreases as well. a i = + 1 a i [y + K f(x,a i )] [y f(x,a )] i 1 a a i i 1 Start 1 1 [y-f(x,a)] 5 4 7 86 3 1 Start 5 4 3 6 a Figure - NLLS - Adaptive Search of Parameter Space A problem is that you can find false minima, i.e. you zero in on a local minima and not the absolute one. In the last example, two different minima are found depending on where you start the process. Both miss the real minima found in the feature ξ. Don t forget the old curve fitting adage: Given enough parameters, you can fit an elephant. You must always be careful to be sure that your fit makes sense, even if the fit seems to be very good. These concepts can be extended to parameter spaces of any number dimension. A great deal of work has been on these techniques. Saturday, January 9, 5-16 -

. Acquisition Sine Wave and Samples 6 4 Amplitude - Sine Wave Samples -4-6..4.6.8.1 Time Samples 6 4 Amplitude - Samples -4-6.1..3.4.5.6.7.8.9.1 Time 3. Aliasing For a given sampling frequency of f s samples/unit time and a sine wave with a frequency of f, there is a family of sine waves with frequency f shown below that will give the same set of acquired points. That is you can not distinguish one from the other without information other than the data set. f = f +k* f s (k= any integer) Following is four examples with five different sine waves that would yield the same results with the same sampling frequency. Saturday, January 9, 5-17 -

Sine Wave and Samples 6 4 Amplitude - Sine Wave Samples Alias Sine Wave -4-6..4.6.8.1 Time Parameter Value Period of Sine Wave (Time).1486 Frequency of Sine Wave (Cycles/Time) 7. Frequency of Sine Wave (Radians/Time) 4398.97156 Amplitude of Sine Wave 5. Sampling Frequency (Samples/Time) 6. Sampling Interval (Time).16667 k -1. Frequency of alias 1. Frequency of alias (radians/time) 683.1853718 Saturday, January 9, 5-18 -

Sine Wave and Samples 6 4 Amplitude - Sine Wave Samples Alias Sine Wave -4-6..4.6.8.1 Time Parameter Value Period of Sine Wave (Time).1486 Frequency of Sine Wave (Cycles/Time) 7. Frequency of Sine Wave (Radians/Time) 4398.97156 Amplitude of Sine Wave 5. Sampling Frequency (Samples/Time) 6. Sampling Interval (Time).16667 k 1. Frequency of alias 13. Frequency of alias (radians/time) 81681.4899333 Saturday, January 9, 5-19 -

Sine Wave and Samples 6 4 Amplitude - Sine Wave Samples Alias Sine Wave -4-6..4.6.8.1 Time Period of Sine Wave (Time).1486 Frequency of Sine Wave (Cycles/Time) 7. Frequency of Sine Wave (Radians/Time) 4398.97156 Amplitude of Sine Wave 5. Sampling Frequency (Samples/Time) 6. Sampling Interval (Time).16667 k -. Frequency of alias -5. Frequency of alias (radians/time) -31415.965359 Saturday, January 9, 5 - -

Sine Wave and Samples 6 4 Amplitude - Sine Wave Samples Alias Sine Wave -4-6..4.6.8.1 Time Parameter Value Period of Sine Wave (Time).1486 Frequency of Sine Wave (Cycles/Time) 7. Frequency of Sine Wave (Radians/Time) 4398.97156 Amplitude of Sine Wave 5. Sampling Frequency (Samples/Time) 6. Sampling Interval (Time).16667 k. Frequency of alias 19. Frequency of alias (radians/time) 11938.583641 Saturday, January 9, 5-1 -

4. Fast Fourier Transforms Continuous Fourier Transform: X ( f ) j πft = x( t) e dt Discrete Fourier Transform for N points (N is a power of ): X ( m) N 1 = x( n) e n= j πmn / N Inverse Discrete Fourier Transform for N points (N is a power of ): x( n) = 1 N N 1 m= X ( m) e j πmn / N 4.1. FFT - General Examples Saturday, January 9, 5 - -

Saturday, January 9, 5-3 -

Saturday, January 9, 5-4 -

Figure 1 - General FFT - Block Diagram Saturday, January 9, 5-5 -

Figure - General FFT - Sine Wave Figure 3 - General FFT - Saw Tooth Saturday, January 9, 5-6 -

Figure 4 - General FFT -Triangle Wave Figure 5 - General FFT - Square Wave Saturday, January 9, 5-7 -

4.. Filtering with the FFT Two Lorentzian plus Rand Noise 4.5 4 3.5 3 Amplitude.5 1.5 1.5 1 3 4 5 Time.6.4 6 HZ Sine Wave Amplitude. -. -.4 -.6.1..3.4.5.6.7.8.9 Time Saturday, January 9, 5-8 -

Signal + 6 HZ and Rand 4 3.5 3.5 Amplitude 1.5 1.5 -.5-1.1..3.4.5.6.7.8.9 Time FFT 9 7 Amplitude 5 3 1-1 5 1 15 5 Frequency Saturday, January 9, 5-9 -

Low Pass Filter Function 1. 1.8 Amplitude.6.4. -. 1 3 4 5 Frequency FFT (Low Pass Filter applied) 45 4 35 3 Amplitude 5 15 1 5 1 3 4 5 6 Frequency Saturday, January 9, 5-3 -

Inverse FFT 1.8 1.6 1.4 Amplitude 1. 1.8.6.4. 1 3 4 5 Time 4.3. Filtering with the FFT - Example Saturday, January 9, 5-31 -

Sine (Frequency, Time, Phase, Amplitude, Offset) Sine Phase Sine Offset Sine (*pi*f+phase) Sine Frequency Omega Sine Amplitude Time Figure 6 - SineWaveSubVI Block Diagram Saturday, January 9, 5-3 -

Figure 7 - Filter with FFT Block Diagram Panel 1 Saturday, January 9, 5-33 -

Figure 8 - Filter with FFT Block Diagram Panel Saturday, January 9, 5-34 -

Figure 9 - Filter with FFT Block Diagram Panel 3 Saturday, January 9, 5-35 -

Figure 3 - Filter with FFT Front Panel 1 Saturday, January 9, 5-36 -

Figure 31 - Filter with FFT Front Panel Saturday, January 9, 5-37 -

Analysis of Data Chemistry 838