PERMUTATION TESTS FOR COMPLEX DATA

Similar documents
Multivariate Permutation Tests: With Applications in Biostatistics

Department of Statistics and Operations Research Undergraduate Programmes

Contents. List of Figures List of Tables. Structure of the Book How to Use this Book Online Resources Acknowledgements

8.6 Jonckheere-Terpstra Test for Ordered Alternatives. 6.5 Jonckheere-Terpstra Test for Ordered Alternatives

EXACT P-VALUES OF SAVAGE TEST STATISTIC

CONTENTS PREFACE. Part One THE DESIGN PROCESS: PROPERTIES, PARADIGMS AND THE EVOLUTIONARY STRUCTURE

CONTENTS FOREWORD... VII ACKNOWLEDGMENTS... IX CONTENTS... XI LIST OF FIGURES... XVII LIST OF TABLES... XIX LIST OF ABBREVIATIONS...

How Many Imputations are Really Needed? Some Practical Clarifications of Multiple Imputation Theory

Sampling distributions and the Central Limit Theorem

Wireless Communications Over Rapidly Time-Varying Channels

Novelty, Information and Surprise

Permutation inference for the General Linear Model

Why Randomize? Jim Berry Cornell University

Comparative Power Of The Independent t, Permutation t, and WilcoxonTests

The Effect Of Different Degrees Of Freedom Of The Chi-square Distribution On The Statistical Power Of The t, Permutation t, And Wilcoxon Tests

Neurocomputing 73 (2010) Contents lists available at ScienceDirect. Neurocomputing. journal homepage:

Adaptive Wireless. Communications. gl CAMBRIDGE UNIVERSITY PRESS. MIMO Channels and Networks SIDDHARTAN GOVJNDASAMY DANIEL W.

Building a more stable predictive logistic regression model. Anna Elizabeth Campain

Electromagnetic Waveguides and Transmission Lines

Statistical Hypothesis Testing

THOMAS PANY SOFTWARE RECEIVERS

Analysis and Design of Autonomous Microwave Circuits

Lecture 3 - Regression

ADVANCED MODELING IN COMPUTATIONAL ELECTROMAGNETIC COMPATIBILITY

CONVERTERS IN POWER VOLTAGE-SOURCED SYSTEMS. Modeling, Control, and Applications IEEE UNIVERSITATSBIBLIOTHEK HANNOVER. Amirnaser Yazdani.

Systems Dependability Assessment

Solutions to Odd-Numbered End-of-Chapter Exercises: Chapter 13

Assignment 2 1) DAY TREATMENT TOTALS

Lecture 2.3: Symmetric and alternating groups

Complex-Valued Matrix Derivatives

POWER AND ENERGY APPLICATIONS

Permutation and Randomization Tests 1

Causality, Correlation and Artificial Intelligence for Rational Decision Making

Statistics and Computing. Series Editors: J. Chambers D. Hand

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Chapter 25. One-Way Analysis of Variance: Comparing Several Means. BPS - 5th Ed. Chapter 24 1

TABLE OF CONTENTS CHAPTER NO. TITLE PAGE NO. LIST OF TABLES LIST OF FIGURES LIST OF SYMBOLS AND ABBREVIATIONS

Jerry Reiter Department of Statistical Science Information Initiative at Duke Duke University

Machinery Failure Analysis and Troubleshooting

Online Computation and Competitive Analysis

Comparing Means. Chapter 24. Case Study Gas Mileage for Classes of Vehicles. Case Study Gas Mileage for Classes of Vehicles Data collection

RF AND MICROWAVE ENGINEERING

Overview. Scotland s Census. Development of methods. What did we do about it? QA panels. Quality assurance and dealing with nonresponse

Computer-Aided Manufacturing

Mark S. Litaker and Bob Gutin, Medical College of Georgia, Augusta GA. Paper P-715 ABSTRACT INTRODUCTION

Using Administrative Records for Imputation in the Decennial Census 1

Package Anaquin. January 12, 2019

The Mobile Radio Propagation Channel Second Edition

Cognitive Radio Techniques

Fundamentals of Quality Control and Improvement

ECTS Guide International Joint Cross-Border PhD Programme in International Economic Relations and Management

2011, Stat-Ease, Inc.

MODERN CENSUS IN POLAND

Author Manuscript Behav Res Methods. Author manuscript; available in PMC 2012 September 01.

Exact Permutation Algorithm for Paired Observations: A General and Efficient Version

Methods for Assessor Screening

Games and Big Data: A Scalable Multi-Dimensional Churn Prediction Model

COMPARATIVE ANALYSIS OF ACCURACY ON MISSING DATA USING MLP AND RBF METHOD V.B. Kamble 1, S.N. Deshmukh 2 1

Exploring the multivariate structure of missing values using the R package VIM

Vincent Thomas Mule, Jr., U.S. Census Bureau, Washington, DC

CONTENTS. Note Concerning the Numbering of Equations, Figures, and References; Notation, xxi. A Bridge from Mathematics to Engineering in Antenna

Computational Principles of Mobile Robotics

MODELING THE EFFECTS OF HURRICANES ON POWER DISTRIBUTION SYSTEMS. A Thesis SURAJ CHANDA

Contents. Acknowledgments

Synchronization in Digital Communications

Older adults attitudes toward assistive technology. The effects of device visibility and social influence. Chaiwoo Lee. ESD. 87 December 1, 2010

Principles of Modern Radar

Dynamic Games: Backward Induction and Subgame Perfection

GRADE 3 TEKS ALIGNMENT CHART

Investigation of data reporting techniques and analysis of continuous power quality data in the Vector distribution network

Digital Signal Processing

INTRODUCTION TO DIGITAL SIGNAL PROCESSING AND FILTER DESIGN

Teacher s Notes. Problem of the Month: Courtney s Collection

17. Symmetries. Thus, the example above corresponds to the matrix: We shall now look at how permutations relate to trees.

A PERMUTATION TEST FOR A REPEATED MEASURES DESIGN

Full Length Research Article

Development of an improved flood frequency curve applying Bulletin 17B guidelines

Electrical Machines Diagnosis

MOHD ZUL-HILMI BIN MOHAMAD

Digital Signal Processing

Graph-of-word and TW-IDF: New Approach to Ad Hoc IR (CIKM 2013) Learning to Rank: From Pairwise Approach to Listwise Approach (ICML 2007)

27th Seismic Research Review: Ground-Based Nuclear Explosion Monitoring Technologies

Evolutionary Programming Optimization Technique for Solving Reactive Power Planning in Power System

SIGNAL-MATCHED WAVELETS: THEORY AND APPLICATIONS

Table of Contents. Frequently Used Abbreviation... xvii

Signals, Sound, and Sensation

Name: 1. Match the word with the definition (1 point each - no partial credit!)

Ancestral Recombination Graphs

Contents 1 Introduction Optical Character Recognition Systems Soft Computing Techniques for Optical Character Recognition Systems

Coalescence. Outline History. History, Model, and Application. Coalescence. The Model. Application

JAMP: Joint Genetic Association of Multiple Phenotypes

SF2972: Game theory. Mark Voorneveld, February 2, 2015

NEW ASSOCIATION IN BIO-S-POLYMER PROCESS

Bayesian Estimation of Tumours in Breasts Using Microwave Imaging

Unified Growth Theory

Formalising Event Reconstruction in Digital Investigations

Computational Intelligence Optimization

Chapter 20. Inference about a Population Proportion. BPS - 5th Ed. Chapter 19 1

Automated Multi-Camera Surveillance Algorithms and Practice

Detecting Heterogeneity in Population Structure Across the Genome in Admixed Populations

Transcription:

PERMUTATION TESTS FOR COMPLEX DATA Theory, Applications and Software Fortunato Pesarin Luigi Salmaso University of Padua, Italy TECHNISCHE INFORMATIONSBiBUOTHEK UNIVERSITATSBIBLIOTHEK HANNOVER V WILEY A John Wiley and Sons, Ltd., Publication

Preface xv Notation and Abbreviations xix 1 Introduction 1 1. J On Permutation Analysis 1 1.2 The Permutation Testing Principle 4 1.2.1 Nonparametric Family of Distributions 4 1.2.2 The Permutation Testing Principle 5 1.3 Permutation Approaches 7 1.4 When and Why Conditioning is Appropriate 7 1.5 Randomization and Permutation 9 1.6 Computational Aspects 10 1.7 Basic Notation 11 1.8 A Problem with Paired Observations 13 1.8.1 Modelling Responses 13 1.8.2 Symmetry Induced by Exchangeability 15 1.8.3 Further Aspects 15 1.8.4 The Student's t-paired Solution 16 1.8.5 The Signed Rank Test Solution 17 1.8.6 The McNemar Solution 18 1.9 The Permutation Solution 18 1.9.1 General Aspects 18 7.9.2 The Permutation Sample Space 19 1.9.3 The Conditional Monte Carlo Method 20 1.9.4 Approximating the Permutation Distribution 22 1.9.5 Problems and Exercises 23 1.10 A Two-Sample Problem 23 1.10.1 Modelling Responses 24 1.10.2 The Student t Solution 25 1.10.3 The Permutation Solution 25 1.10.4 Rank Solutions 28 1.10.5 Problems and Exercises 28 1.11 One-WayANOVA 29 1.11.1 Modelling Responses 29 7.77.2 Permutation Solutions 30 7.77.3 Problems and Exercises 32

viii 2 Theory of One-Dimensional Permutation Tests 33 2.1 Introduction 33 2.1.1 Notation and Basic Assumptions 33 2.1.2 The Conditional Reference Space 35 2.1.3 Conditioning on a Set of Sufficient Statistics 39 2.2 Definition of Permutation Tests 41 2.2./ General Aspects 41 2.2.2 Randomized Permutation Tests 42 2.2.3 Non-randomized Permutation Tests 43 2.2.4 The p-value 43 2.2.5 A CMC Algorithm for Estimating the p-value 44 2.3 Some Useful Test Statistics 45 2.4 Equivalence of Permutation Statistics 47 2.4.1 Some Examples 49 2.4.2 Problems and Exercises 50 2.5 Arguments for Selecting Permutation Tests 51 2.6 Examples of One-Sample Problems 53 2.6.1 A Problem with Repeated Observations 59 2.6.2 Problems and Exercises 63 2.7 Examples of Multi-Sample Problems 64 2.8 Analysis of Ordered Categorical Variables 74 2.8.1 General Aspects 74 2.8.2 A Solution Based on Score Transformations 76 2.8.3 Typical Goodness-of-Eit Solutions 77 2.8.4 Extension to Non-Dominance Alternatives and C Groups 79 2.9 Problems and Exercises 80 3 Further Properties of Permutation Tests 83 3.1 Unbiasedness of Two-sample Tests 83 3.1.1 One-Sided Alternatives 83 3.1.2 Two-Sided Alternatives 90 3.2 Power Functions of Permutation Tests 93 3.2.1 Definition and Algorithmfor the Conditional Power 93 3.2.2 The Empirical Conditional ROC Curve 97 3.2.3 Definition and Algorithmfor the Unconditional Power: Fixed Effects 97 3.2.4 Unconditional Power; Random Effects 98 3.2.5 Comments on Power Functions 98 3.3 Consistency of Permutation Tests 99 3.4 Permutation Confidence Interval for S 99 3.4.1 Problems and Exercises 103 3.5 Extending Inference from Conditional to Unconditional 104 3.6 Optimal Properties 106 3.6. J Problems and Exercises 107 3.7 Some Asymptotic Properties 108 3.7.1 Introduction 108 3.7.2 Two Basic Theorems 109 3.8 Permutation Central Limit Theorems 111 3.8.1 Basic Notions 111 3.8.2 Permutation Central Limit Theorems 111 3.9 Problems and Exercises 113

ix 4 The Nonparametric Combination Methodology 117 4.1 Introduction 117 4.1.1 General Aspects 117 4.1.2 Bibliographic Nates 118 4.1.3 Main Assumptions and Notation 120 4.1.4 Some Comments 121 4.2 The Nonparametric Combination Methodology 122 4.2.1 Assumptions on Partial Tests 122 4.2.2 Desirable Properties of Combining Functions 123 4.2.3 A Two-Phase Algorithm for Nonparametric Combination 125 4.2.4 Some Useful Combining Functions 128 4.2.5 Why Combination is Nonparametric 134 4.2.6 On Admissible Combining Functions 135 4.2.7 Problems and Exercises 135 4.3 Consistency, Unbiasedness and Power of Combined Tests 137 4.3.1 Consistency 137 4.3.2 Unbiasedness 137 4.3.3 A Non-consistent Combining Function 139 4.3.4 Power of Combined Tests 139 4.3.5 Conditional Multivariate Confidence Region for S 141 4.3.6 Problems and Exercises 142 4.4 Some Further Asymptotic Properties 143 4.4.1 General Conditions 143 4.4.2 Asymptotic Properties 143 4.5 Finite-Sample Consistency 146 4.5.1 Introduction 146 4.5.2 Finite-Sample Consistency 147 4.5.3 Some Applications of Finite-Sample Consistency 152 4.6 Some Examples of Nonparametric Combination 156 4.6.1 Problems and Exercises 172 4.7 Comments on the Nonparametric Combination 173 4.7.1 General Comments 173 4.7.2 Final Remarks 174 5 Multiplicity Control and Closed Testing 177 5.1 Defining Raw and Adjusted p-values 177 5.2 Controlling for Multiplicity 178 5.2.1 Multiple Comparison and Multiple Testing 178 5.2.2 Some Definitions of the Global Type I Error 179 5.3 Multiple Testing 180 5.4 The Closed Testing Approach 181 5.4.1 Closed Testing for Multiple Testing 182 5.4.2 Closed Testing Using the MinP Bonferroni-Holm Procedure 183 5.5 Mult Data Example 186 5.5.1 Analysis Using MATLAB 186 5.5.2 Analysis Using R 187 5.6 Washing Test Data 189 5.6.1 Analysis Using MATLAB 189 5.6.2 Analysis Using R 191 5.7 Weighted Methods for Controlling FWE and FDR 193

X 5.8 Adjusting Stepwise p-values 194 5.5.1 Showing Biasedness of Standard p-values for Stepwise Regression 195 5.8.2 Algorithm Description 195 5.8.3 Optimal Subset Procedures 196 6 Analysis of Multivariate Categorical Variables 197 6.1 Introduction 197 6.2 The Multivariate McNemar Test 198 6.2.1 An Extension of the Multivariate McNemar Test 200 6.3 Multivariate Goodness-of-Fit Testing for Ordered Variables 201 6.3.1 Multivariate Extension of Fisher's Exact Probability Test 203 6.4 MANOVA with Nominal Categorical Data 203 6.5 Stochastic Ordering 204 6.5.1 Formal Description 204 6.5.2 Further Breaking Down the Hypotheses 205 6.5.3 Permutation Test 206 6.6 Mullifocus Analysis 207 6.6.1 General Aspects 207 6.6.2 The Multifocus Solution 208 6.6.3 An Application 210 6.7 Isotonic Inference 211 6.7.1 Introduction 211 6.7.2 Allelic Association Analysis in Genetics 212 6.7.3 Parametric Solutions 213 6.7.4 Permutation Approach 214 6.8 Test on Moments for Ordered Variables 215 6.8.1 General Aspects 215 6.8.2 Score Transformations and Univariate Tests 216 6.8.3 Multivariate Extension 217 6.9 Heterogeneity Comparisons 218 6.9.1 Introduction 218 6.9.2 Tests for Comparing Heterogeneities 219 6.9.3 A Case Study in Population Genetics 220 6.10 Application to PhD Programme Evaluation Using SAS 221 6.10.1 Description of the Problem 221 6.10.2 Global Satisfaction Index 222 6.10.3 Multivariate Performance Comparisons 224 7 Permutation Testing for Repeated Measurements 225 7.1 Introduction 225 7.2 Carry-Over Effects in Repeated Measures Designs 226 7.3 Modelling Repeated Measurements 226 7.3.1 A General Additive Model 226 7.3.2 Hypotheses of Interest 228 7.4 Testing Solutions 228 7.4. J Solutions Using the NPC Approach 228 7.4.2 Analysis of Two-Sample Dominance Problems 230 7.4.3 Analysis of the Cross-Over (AB-BA) Design 230 7.4.4 Analysis of a Cross-Over Design with Paired Data 231 7.5 Testing for Repeated Measurements with Missing Data 232

xi 7.6 General Aspects of Permutation Testing with Missing Data 232 7.6.7 Bibliographic Notes 232 7.7 On Missing Data Processes 233 7.7.1 Data Missing Completely at Random 233 7.7.2 Data Missing Not at Random 234 7.8 The Permutation Approach 234 7.8.1 Deletion, Imputation and Intention to Treat Strategies 235 7.8.2 Breaking Down the Hypotheses 236 7.9 The Structure of Testing Problems 237 7.9.1 Hypotheses for MNAR Models 237 7.9.2 Hypotheses for MCAR Models 238 7.9.3 Permutation Structure with Missing Values 239 7.10 Permutation Analysis of Missing Values 240 7.10.1 Partitioning the Permutation Sample Space 240 7.10.2 Solution for Two-Sample MCAR Problems 241 7.10.3 Extensions to Multivariate C-Sample Problems 242 7.10.4 Extension to MNAR Models 243 7.11 Germina Data: An Example of an MNAR Model 244 7.11.1 Problem Description 245 7.11.2 The Permutation Solution 245 7.11.3 Analysis Using MATLAB 248 7.11.4 Analysis Using R 248 7.12 Multivariate Paired Observations 251 7.13 Repeated Measures and Missing Data 252 7.13.1 An Example 253 7.14 Botulinum Data 254 7.14.1 Analysis Using MATLAB 256 7.14.2 Analysis Using R 258 7.15 Waterfalls Data 260 7.15.1 Analysis Using MATLAB 260 7.75.2 Analysis Using R 264 8 Some Stochastic Ordering Problems 267 8.1 Multivariate Ordered Alternatives 267 8.2 Testing for Umbrella Alternatives 269 8.2.7 Hypotheses and Tests in Simple Stochastic Ordering 270 8.2.2 Permutation Tests for Umbrella Alternatives 271 8.3 Analysis of Experimental Tumour Growth Curves 273 8.4 Analysis of PERC Data 276 8.4.1 Introduction 276 8.4.2 A Permutation Solution 278 8.4.3 Analysis Using MATLAB 279 8.4.4 Analysis Using R 286 9 NPC Tests for Survival Analysis 289 9.1 Introduction and Main Notation 289 9.7.7 Failure Time Distributions 289 9.7.2 Data Structure 290 9.2 Comparison of Survival Curves 291 9.3 An Overview of the Literature 292

xii 9.3,1 Permutation Tests in Sunival Analysis 294 9.4 Two NPC Tests 295 9.4.1 Breaking Down the Hypotheses 295 9.4.2 The Test Structure 296 9.4.3 NPC Test for Treatment-Independent Censoring 297 9.4.4 NPC Test for Treatment-Dependent Censoring 298 9.5 An Application to a Biomedical Study 300 10 NPC Tests in Shape Analysis 303 10.1 Introduction 303 10.2 A Brief Overview of Statistical Shape Analysis 304 70.2.7 How to Describe Shapes 304 70.2.2 Multivariate Morphometries 306 10.3 Inference with Shape Data 308 10.4 NPC Approach to Shape Analysis 309 70.4.7 Notation 309 70,4.2 Comparative Simulation Study 311 10.5 NPC Analysis with Correlated Landmarks 312 10.6 An Application to Mediterranean Monk Seal Skulls 316 10.6.1 The Case Study 316 70.6.2 Some Remarks 319 70.6.5 Shape Analysis Using MATLAB 319 70.6.4 Shape Analysis Using R 321 11 Multivariate Correlation Analysis and Two-Way ANOVA 325 11.1 Autofluorescence Case Study 325 77.7.7 A Permutation Solution 326 77.7.2 Analysis Using MATLAB 329 77.7.5 Analysis Using R 329 11.2 Confocal Case Study 333 77.2.7 A Permutation Solution 333 77.2.2 MATLAB and R Codes 335 11.3 Two-Way (M)ANOVA 344 77.5.7 Brief Overview of Permutation Tests in Two-Way ANOVA 344 77.5.2 MANOVA Using MATLAB and R Codes 346 12 Some Case Studies Using NPC Test R10 and SAS Macros 351 12.1 An Integrated Approach to Survival Analysis in Observational Studies 351 72.7.7 A Case Study on Oesophageal Cancer 351 72.7.2 A Permutation Solution 353 72.7.5 Suivival Analysis with Stratification by Propensity Score 353 12.2 Integrating Propensity Score and NPC Testing 354 72.2.7 Analysis Using MATLAB 358 12.3 Further Applications with NPC Test R10 and SAS Macros 359 72.5.7 A Two-Sample Epidemiological Survey: Problem Description 359 72.5.2 Analysing SETIG Data Using MATLAB 360 72.5.5 Analysing the SETIG Data Using R 362 72.5.4 Analysing the SETIG Data Using NPC Test 365 12.3.5 Analysis of the SETIG Data Using SAS 369 12.4 A Comparison of Three Survival Curves 370

xiii 12.4.1 Unstratified Survival Analysis 371 12.4.2 Survival Analysis with Stratification by Propensity Score 371 12.5 Survival Analysis Using NPC Test and SAS 375 12.5.1 Survival Analysis Using NPC Test 375 12.5.2 Survival Analysis Using SAS 377 12.5.3 Survival Analysis Using MATLAB 378 12.6 Logistic Regression and NPC Test for Multivariate Analysis 378 72.6./ Application to Lymph 12.6.2 Application Node Metastases 378 to Bladder Cancer 380 12.6.3 NPC Results 382 12.6.4 Analysis by Logistic Regression 384 12.6.5 Some Comments 385 References 387 Index 409