Energy Measurement in EXO-200 using Boosted Regression Trees

Similar documents
CS221 Project Final Report Learning to play bridge

Contents. Why waveform? Waveform digitizer : Domino Ring Sampler CEX Beam test autumn 04. Summary

PandaX-III High Pressure Gas TPC and its Prototype

Backgrounds in DMTPC. Thomas Caldwell. Massachusetts Institute of Technology DMTPC Collaboration

A Modular Readout System For A Small Liquid Argon TPC Carl Bromberg, Dan Edmunds Michigan State University

CS229 - Project Final Report: Automatic earthquake detection from distributed acoustic sensing (DAS) array data

THE EXO-200 experiment searches for double beta decay

Semiconductor Detector Systems

Trigger Algorithms for the SuperCDMS Dark Matter Search

1 Detector simulation

arxiv: v2 [physics.ins-det] 30 Aug 2018

Introduction. Chapter Time-Varying Signals

arxiv: v1 [physics.ins-det] 3 Feb 2011

Nonuniform multi level crossing for signal reconstruction

PMT Calibration in the XENON 1T Demonstrator. Abstract

Finger Gesture Recognition Using Microphone Arrays

MPPC and Liquid Xenon technologies from particle physics to medical imaging

TRANSFORMS / WAVELETS

Digital trigger system for the RED-100 detector based on the unit in VME standard

18-fold segmented HPGe, prototype for GERDA PhaseII

Signal Processing. Naureen Ghani. December 9, 2017

SMILe: Shuffled Multiple-Instance Learning

ME scope Application Note 01 The FFT, Leakage, and Windowing

System analysis and signal processing

Digital Signal Processing Methods for Pixelated 3-D Position Sensitive Room-Temperature Semiconductor Detectors

INDEX. Firmware for DPP (Digital Pulse Processing) DPP-PSD Digital Pulse Processing for Pulse Shape Discrimination

Goal of the project. TPC operation. Raw data. Calibration

The Fermilab Short Baseline Program and Detectors

Direct Dark Matter Search with XMASS --- modulation analysis ---

New Features of IEEE Std Digitizing Waveform Recorders

ENGR 210 Lab 12: Sampling and Aliasing

DarkSide-50. Alessandro Razeto LNGS 26/3/14

Signal Reconstruction of the ATLAS Hadronic Tile Calorimeter: implementation and performance

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)

arxiv: v2 [physics.ins-det] 17 Oct 2015

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Status of the PRad Experiment (E )

Reference: PMU Data Event Detection

Tutors Dominik Dannheim, Thibault Frisson (CERN, Geneva, Switzerland)

arxiv: v1 [physics.ins-det] 27 Feb 2013

Digital Signal Processing Methods for Pixelated 3-D Position Sensitive Room-Temperature Semiconductor Detectors

event physics experiments

VHF Radar Target Detection in the Presence of Clutter *

Identification of worm-damaged chestnuts using impact acoustics and support vector machine

Frequency Domain Representation of Signals

SAMPLING THEORY. Representing continuous signals with discrete numbers

A small dual-phase xenon TPC with APD and PMT readout for the study of liquid xenon scintillation

Development of Personal Dosimeter Using Electronic Dose Conversion Method

EC 2301 Digital communication Question bank

arxiv: v1 [hep-ex] 12 Nov 2010

The Trigger System of the MEG Experiment

PLL FM Demodulator Performance Under Gaussian Modulation

Audio Restoration Based on DSP Tools

Mitigating high energy anomalous signals in the CMS barrel Electromagnetic Calorimeter

EEE 309 Communication Theory

6.555 Lab1: The Electrocardiogram

Front-End Electronics and Feature-Extraction Algorithm for the PANDA Electromagnetic Calorimeter

Test (Irradiate) Delivered Parts

Signal Processing for Digitizers

Assessment of Hall A Vertical Drift Chamber Analysis Software Performance Through. Monte Carlo Simulation. Amy Orsborn

DarkSide-50 and DarkSide-20k experiments: computing model and evolution of infrastructure

A Real Time Digital Signal Processing Readout System for the PANDA Straw Tube Tracker

A Simplified Extension of X-parameters to Describe Memory Effects for Wideband Modulated Signals

Prof. Feng Liu. Fall /04/2018

Single-avalanche response mesurement method for MPGD detectors

ESA400 Electrochemical Signal Analyzer

Ultra Wide Band Communications

A Simplified Extension of X-parameters to Describe Memory Effects for Wideband Modulated Signals

Lecture Fundamentals of Data and signals

PoS(PhotoDet 2012)061

Isolated Digit Recognition Using MFCC AND DTW

Machine Learning Seismic Wave Discrimination: Application to. Earthquake Early Warning

ON THE VALIDITY OF THE NOISE MODEL OF QUANTIZATION FOR THE FREQUENCY-DOMAIN AMPLITUDE ESTIMATION OF LOW-LEVEL SINE WAVES

Signal segmentation and waveform characterization. Biosignal processing, S Autumn 2012

2015 HBM ncode Products User Group Meeting

Radiation Detection Instrumentation

Digital Image Processing

Electronic Instrumentation for Radiation Detection Systems

arxiv: v1 [physics.ins-det] 26 Nov 2015

Performance of 8-stage Multianode Photomultipliers

Discrete Fourier Transform

K. Desch, P. Fischer, N. Wermes. Physikalisches Institut, Universitat Bonn, Germany. Abstract

DIGITAL IMAGE PROCESSING Quiz exercises preparation for the midterm exam

Silicon Photomultiplier

Fabio Crespi Università di Milano - INFN

EEE 309 Communication Theory

CHAPTER 8 PHOTOMULTIPLIER TUBE MODULES

CHAPTER 6 SIGNAL PROCESSING TECHNIQUES TO IMPROVE PRECISION OF SPECTRAL FIT ALGORITHM

Week 9: Chap.13 Other Semiconductor Material

Fourier Transform. louder softer. louder. softer. amplitude. time. amplitude. time. frequency. frequency. P. J. Grandinetti

When and How to Use FFT

Artificial Neural Networks architectures for stock price prediction: comparisons and applications

Intuitive Human-Device Interaction for Video Control and Feedback

Simulation of Algorithms for Pulse Timing in FPGAs

Trigger and Data Acquisition Systems. Monika Wielers RAL. Lecture 3. Trigger. Trigger, Nov 2,

EVALUATION OF BINARY PHASE CODED PULSE COMPRESSION SCHEMES USING AND TIME-SERIES WEATHER RADAR SIMULATOR

Synthesis Algorithms and Validation

Theory of Telecommunications Networks

LABORATORY 4. Palomar College ENGR210 Spring 2017 ASSIGNED: 3/21/17

AERA. Data Acquisition, Triggering, and Filtering at the. Auger Engineering Radio Array

Transcription:

Energy Measurement in EXO-2 using Boosted Regression Trees Mike Jewell, Alex Rider June 6, 216 1 Introduction The EXO-2 experiment uses a Liquid Xenon (LXe) time projection chamber (TPC) to search for neutrinoless-double beta decay(νββ), an extremely rare hypothetical decay that would indicate the Majorana nature of neutrinos.[1, 2, 3] The EXO-2 experiment has been taking data for over 2 years and has published one of the most sensitive limits on νββ half-life. Events deposit energy in the LXe through both scintillation light (175nm) and free ionization charge. The scintillation light is detected at either end of the EXO-2 detector by large area avalanche photodiodes (APDs). The ionized charge is drifted along the z-axis of the detector where it first passes a shielding/induction wire grid (V-wires) and is than collected by a second wire grid of collection wires (U-wires). Each wire is 3mm in pitch and wires are ganged into groups of three before being readout and saved. The total charge energy of an event is then calculated by determining the sum amplitude of all channels which collected charge. In order to accurately reconstruct the energy of the waveform signals have to be accurately identified as either collection or induction signals. This is currently done by performing a χ 2 fit to both a collection and induction signal and than classifying the waveform based on the ratio of these scores. Waveforms classified as induction are flagged and then not included into the sum when determining event energy. This current technique achieves reasonable efficiency at identifying collection and induction signals but energy deposits spanning multiple channels present a slight challenge because waveforms will contain both collection and induction signals. In addition, the waveforms are shaped before being saved to disk making identification and energy estimation somewhat more complicated. In this study an alternative technique for reconstruction of event energy in EXO-2 using Boosted Regression Trees from sklearn is explored.[4] 2 Simulation and Data Compression 2.1 Monte Carlo The EXO-2 Monte Carlo has been described in detail elsewhere [1]. For waveform generation energy deposits in the detector are used to simulate waveforms on each of the U-wire channels using the Shockley-Ramo theorem to calculate induced signal. To simplify this analysis Monte Carlo energy deposits were not generated with GEANT4 as in the standard analysis. Instead events with uniformly distributed energy and position were simulated in the detector. For each deposit the waveform from the collection channel is saved and tagged as collection while the waveform from the two neighboring channels are tagged as induction and saved. For each event the true energy of the deposit was also saved to be used as the target when training and testing. Each waveform in the detector consists of 248 samples taken at a sampling rate of 1MHz with a fixed trigger time at 124. In addition, standard Monte Carlo includes added noise, sampled from 1

2 real data into the waveforms. In order to simplify the first stage of the analysis no noise was added. The second stage of this analysis included added noise into the waveforms. This was done by using a database of noise waveforms seen in real data and adding these into the generated waveforms at random an example is shown in Figure 5b. 2.2 Template Generation Initially a template waveform for both induction and collection was created by simulating an event in the exact center of the detector. The resulting collection and the average of the induction signals from this event was than used as a template to represent the typical collection and induction waveform. The results of these templates in Time Space are shown in Figure 1. Although most induction and collection signals have roughly the same shape, there are some differences due to the exact position and energy of the initial deposit. Using these two templates offered a reasonable projection into collection and induction space but this did not fully capture the detailed shaped information. For the final analysis a more sophisticated template generating algorithm was implemented. This involved using a K-means clustering algorithm to find a larger subset of template waveforms that better samples the waveform space. Results for template generation using the sklearn K-means clustering algorithm with 15 clusters are shown in Figure 3. Waveforms are created using the same Monte Carlo technique. Events with varying energy are uniformly generated within the detector volume and a set of 2k WFs was generated. Each waveform is than normalized by the maximum amplitude and time shifted so the peaks all occur at the same time. This was necessary so that the clusters represented different shapes as opposed to different time off sets of the same shapes. The time offset is handled in the optimal filter stage. In addition, a windowing from sample 9 to 125 was implemented. This was required because the clustering algorithm seemed to fail when the large pre/post trigger segments with signal were included as this is identical for all pulses. The algorithm succeeds in finding both collection and induction like clusters. Typical templates are shown in Figure 3. 2.3 Optimal Filter The goal of this analysis is to learn a mapping from the space of digitized waveforms to the amount of charge deposited on a particular channel. Early work showed that learning in the full space of 248 length vectors that comprise the raw waveforms was impractical. We devised a method for compressing some of the information stored in each pulse into many fewer than 248 real parameters using optimal filters. Optimal filters take as inputs a noisy time series v(t) and a template s(t) and return the amplitude of s in v. Under certain assumptions it can be shown that an optimal filter is the best estimator of of the amplitude of s in v. The optimal filtering we employed is equivalent to LMS fitting of the template to the signal in the frequency domain. The χ 2 (Cost function) for a particular amplitude, A, is defined as follows χ 2 = ṽ(f) A s(f) 2 df J(f) Where ṽ(f), and s(f) are the Fourier transforms of the signal and the template, respectively. J(f) is the power spectral density (PSD) which is used as the weight for each frequency in the cost function. In practice, the waveforms are always sampled discretely, so that discrete Fourier transforms are implemented using FFTs and the integral in equation 1 is carried out as a sum. For waveforms generated without added noise, a uniform PSD was used but for waveforms with added noise, the average PSD was estimated using real data. (1)

3 As previously stated, we initially started by compressing waveforms into 2 component vectors where the first component is the amplitude of the optimal filter with a typical collection pulse shape used as a template and the second component is the amplitude of the optimal filter with a typical induction pulse used as a template. The results of doing this for a sample of 1 Monte Carlo Pulses is shown in Figure 2 Later analysis increased the parameter space to a 15 component vector where each component represented the amplitude of the optimal filter with each of the 15 K-means templates. 3 Energy Estimation The compressed pulse data with 2 components shown in Figure 2 was used to train a boosted regression tree implemented in Python s Scikit Lear Library to predict the energy in a sample of test pulses reserved from the training set. The boosted regression tree learned the mapping from the representation of the pulses in R 2 to the energy of the pulse by boosting 5 trees of depth 3 together. The boosted tree was trained using 1k noiseless MC WFs including both collection and induction signals. An additional 2781 WFs were used as a test set to determine the test error. A plot of the predicted energy versus the true energy for the pulses in the test set are shown in figure 4a. The average error in predicted energy for collection pulses is 13% Along the y-axis it is possible to see the 7 out of 1238 induction pulses that our model erroneously gave energies to. There are also 3 out of 1543 collection pulses that our model erroneously assigned energy. The scatter in predicted energy for collection pulses is likely due to template mismatch and should be improved by expanding the space of templates. In addition this same procedure was repeated for this same testing and training set using the 15 template WFs generated in clustering instead of the 2 typical templates input by hand. The results of this method is shown in Figure 4b here no induction signals were assigned non-zero energy (>1keV) and the energy reconstruction was.1%. This is a huge improvement over the original 2 template technique. Finally to test the generalization of this method to MC with added noise we repeated the same procedure using the 15 template WFs and 9k training WFs generated with noise and reported the error on a test set including 1k WFs with added noise. The results are shown in Figure 5. This resulted in an error of 17% in the energy estimation of collection signals and many induction signals (8%) had energy > 1keV. Given the amplitude of the noise, the intrinsic energy resolution should be 5%. The drastic decrease in performance with the presence of noise indicates that we are likely over fitting. Future work will focus on determining the optimal depth for the regression trees and the optimal number of regression trees to boost together. Furthermore, simply using a larger training set may improve the performance of this analysis. We will also experiment with weak learners different from threshold functions. 4 Conclusions Using the Boosted Regression Tree algorithm from sklearn an initial Energy Reconstruction algorithm has been implemented to predict the energy associated with U-wire Collection Signals in the EXO-2 detector. The current algorithm uses a set of n=15 templates to represent pulses as vectors in R n using an optimal filter. A boosted regression tree learns the mapping between these vectors and the energy of the pulses. This resulted in.1% error in the energy estimate for WFs with no added noise but 17% error for WFs with realistic noise. Future work by the EXO-Collaboration to improve this analysis will focus on understanding the large error of this estimation.

4 ADC Counts 2 15 EXO-2 Waveform Template ADC Counts 14 12 1 8 EXO-2 Waveform Template 1 5-5 6 4 2-2 -4-6 2 4 6 8 1 12 14 16 18 2 22 Time[µs] (a) Collection Signal. 2 4 6 8 1 12 14 16 18 2 22 Time[µs] (b) Induction Signal. Figure 1: Templates of Induction and collection signals used in the optimal filter in the time domain. Signals represent pure collection and induction pulses. 12 Induction Collection 1 Induction Amplitude 8 6 4 2 2 4 6 8 1 Collection Amplitude Figure 2: Projection of waveforms into collection and induction space using the 2 templates generated with a charge deposit in the detector center. Figure 3: Templates generated using K-means clustering with 1 clusters. Appears to find both collection like and induction like signals.

5 3. 2.5 induction charge collection predicted energy [MeV] 2. 1.5 1..5...5 1. 1.5 2. 2.5 3. true energy [MeV] (a) (b) Figure 4: Energy predicted by regression tree versus true energy for a sample of test pulses using 2 template method (13% error).4a. Predicted energy using 15 templates generated with K-means Clustering (.1% error). (a) (b) Figure 5: WFs with noise Figure 5b and the Energy prediction for these WFs Figure 5a. Observed 17% error on 1k WF test set in measurement of energy for collection signals. In addition many of the induction signals were assigned large energies. References [1] J. B. Albert et al. Improved measurement of the 2νββ half-life of 136 Xe with the EXO-2 detector. Phys. Rev., C89(1):1552, 214. [2] J. B. Albert et al. Search for Majorana neutrinos with the first two years of EXO-2 data. Nature, 51:229, 214. [3] M. Auger et al. The EXO-2 detector, part I: Detector design and construction. JINST, 7:P51, 212. [4] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825 283, 211.