Dawn E. Holmes and Lakhmi C. Jain (Eds.) Data Mining: Foundations and Intelligent Paradigms

Similar documents
MATLAB Guide to Finite Elements

Ernesto Sanchez, Giovanni Squillero, and Alberto Tonda Industrial Applications of Evolutionary Algorithms

Cognitive Systems Monographs

Studies in Systems, Decision and Control

George A. Tsihrintzis and Lakhmi C. Jain (Eds.) Multimedia Services in Intelligent Environments Integrated Systems

Studies in Computational Intelligence

Knowledge-Based Vision-Guided Robots

Elisabeth Rakus-Andersson, Ronald R. Yager, Nikhil Ichalkaranje, and Lakhmi C. Jain (Eds.) Recent Advances in Decision Making

Health Information Technology Standards. Series Editor: Tim Benson

Introduction to Fuzzy Logic using MATLAB

Tauseef Gulrez and Aboul Ella Hassanien (Eds.) Advances in Robotics and Virtual Reality

Álvaro Herrero and Emilio Corchado Mobile Hybrid Intrusion Detection

Design for Innovative Value Towards a Sustainable Society

Architecture Design and Validation Methods

Application of Evolutionary Algorithms for Multi-objective Optimization in VLSI and Embedded Systems

Computational Intelligence for Network Structure Analytics

Lecture Notes in Applied and Computational Mechanics

Scientific Data Mining and Knowledge Discovery

Advances in Modern Tourism Research

Communications in Computer and Information Science 85

Modeling Manufacturing Systems. From Aggregate Planning to Real-Time Control

Lecture Notes in Artificial Intelligence. Lecture Notes in Computer Science

Anthony Finn and Lakhmi C. Jain (Eds.) Innovations in Defence Support Systems 1

Advances in Metaheuristic Algorithms for Optimal Design of Structures

Research and Practice on the Theory of Inventive Problem Solving (TRIZ)

Founding Editor Martin Campbell-Kelly, University of Warwick, Coventry, UK

Springer Series on. Signals and Communication Technology

Management and Industrial Engineering. Series editor J. Paulo Davim, Aveiro, Portugal

TECHNOLOGY, INNOVATION, and POLICY 3. Series of the Fraunhofer Institute for Systems and Innovation Research (lsi)

Applied Technology and Innovation Management

Acoustic Emission Testing

Dry Etching Technology for Semiconductors. Translation supervised by Kazuo Nojiri Translation by Yuki Ikezi

3 Forensic Science Progress

Lecture Notes in Computational Science and Engineering 68

Handbook of Engineering Acoustics

U. Lindemann (Ed.) Human Behaviour in Design

ZEW Economic Studies. Publication Series of the Centre for European Economic Research (ZEW), Mannheim, Germany

Simulation by Bondgraphs

Technology Roadmapping for Strategy and Innovation

Palgrave Studies in the History of Science and Technology

Future-Oriented Technology Analysis

Risk-Based Ship Design

SpringerBriefs in Space Development

SpringerBriefs in Space Development

Robust Hand Gesture Recognition for Robotic Hand Control

Foundations in Signal Processing, Communications and Networking

Statistics and Computing Series Editors: J. Chambers D. Hand W. Härdle

SpringerBriefs in Astronomy

Computational Social Sciences

Introduction to Computational Optimization Models for Production Planning in a Supply Chain

Georgios Miaoulis and Dimitri Plemenos (Eds.) Intelligent Scene Modelling Information Systems

StraBer Wahl Graphics and Robotics

The Test and Launch Control Technology for Launch Vehicles

Intelligent Systems Reference Library

Matthias Pilz Susanne Berger Roy Canning (Eds.) Fit for Business. Pre-Vocational Education in European Schools RESEARCH

Lecture Notes in Computer Science Edited by G. Goos, J. Hartmanis and J. van Leeuwen

Studies in Computational Intelligence

Saumyadipta Pyne B.L.S. Prakasa Rao S.B. Rao Editors. Big Data Analytics. Methods and Applications

Data Assimilation: Tools for Modelling the Ocean in a Global Change Perspective

Hierarchy Process. The Analytic. Bruce L. Golden Edward A. Wasil Patrick T. Harker (Eds.) Applications and Studies

Monica Bianchini, Marco Maggini, Franco Scarselli, and Lakhmi C. Jain (Eds.) Innovations in Neural Information Paradigms and Applications

AIRCRAFT CONTROL AND SIMULATION

ICT for the Next Five Billion People

Surface Mining Machines

Studies in Economic Ethics and Philosophy

Lecture Notes in Computer Science 2500 Edited by G. Goos, J. Hartmanis, and J. van Leeuwen

Advanced Information and Knowledge Processing

Requirements Engineering for Digital Health

Advances in Computer Vision and Pattern Recognition

SpringerBriefs in Electrical and Computer Engineering

Management of Software Engineering Innovation in Japan

To be published by IGI Global: For release in the Advances in Computational Intelligence and Robotics (ACIR) Book Series

Anthony Brabazon and Michael O Neill (Eds.) Natural Computing in Computational Finance: Volume 2

2 Forensic Science Progress

Lecture Notes in Control and Information Sciences

Fuzzy Management Methods. Series editors Andreas Meier, Fribourg, Switzerland Witold Pedrycz, Edmonton, Canada Edy Portmann, Bern, Switzerland

Advanced Decision Making for HVAC Engineers

Broadband Networks, Smart Grids and Climate Change

Lakhmi C. Jain, Vasile Palade and Dipti Srinivasan (Eds.) Advances in Evolutionary Computing for System Design

Advanced Information and Knowledge Processing

ANALOG CIRCUITS AND SIGNAL PROCESSING

Computer-Aided Production Management

Computer Supported Cooperative Work. Series Editor Richard Harper Cambridge, United Kingdom

Offshore Energy Structures

Current Technologies in Vehicular Communications

Socio-technical Design of Ubiquitous Computing Systems

Dao Companion to the Analects

Faster than Nyquist Signaling

Lecture Notes in Control and Information Sciences 283. Editors: M. Thoma M. Morari

Dynamics of Fibre Formation and Processing

Enabling Manufacturing Competitiveness and Economic Sustainability

The Cultural and Social Foundations of Education. Series Editor A.G. Rud College of Education Washington State University USA

Digital Image Processing

Studies in Empirical Economics

CMOS Test and Evaluation

Trends in Logic. Volume 45

Management of Recreation and Nature Based Tourism in European Forests

COOP 2016: Proceedings of the 12th International Conference on the Design of Cooperative Systems, May 2016, Trento, Italy

SpringerBriefs in Applied Sciences and Technology

SpringerBriefs in Computer Science

Transcription:

Dawn E. Holmes and Lakhmi C. Jain (Eds.) Data Mining: Foundations and Intelligent Paradigms

Intelligent Systems Reference Library, Volume 25 Editors-in-Chief Prof. Janusz Kacprzyk Systems Research Institute Polish Academy of Sciences ul. Newelska 6 01-447 Warsaw Poland E-mail: kacprzyk@ibspan.waw.pl Prof.LakhmiC.Jain University of South Australia Adelaide Mawson Lakes Campus South Australia 5095 Australia E-mail: Lakhmi.jain@unisa.edu.au Further volumes of this series can be found on our homepage: springer.com Vol. 1. Christine L. Mumford and Lakhmi C. Jain (Eds.) Computational Intelligence: Collaboration, Fusion and Emergence, 2009 ISBN 978-3-642-01798-8 Vol. 2.Yuehui Chen and Ajith Abraham Tree-Structure Based Hybrid Computational Intelligence, 2009 ISBN 978-3-642-04738-1 Vol. 3. Anthony Finn and Steve Scheding Developments and Challenges for Autonomous Unmanned Vehicles, 2010 ISBN 978-3-642-10703-0 Vol. 4. Lakhmi C. Jain and Chee Peng Lim (Eds.) Handbook on Decision Making: Techniques and Applications, 2010 ISBN 978-3-642-13638-2 Vol. 5. George A. Anastassiou Intelligent Mathematics: Computational Analysis, 2010 ISBN 978-3-642-17097-3 Vol. 6. Ludmila Dymowa Soft Computing in Economics and Finance, 2011 ISBN 978-3-642-17718-7 Vol. 7. Gerasimos G. Rigatos Modelling and Control for Intelligent Industrial Systems, 2011 ISBN 978-3-642-17874-0 Vol. 8. Edward H.Y. Lim, James N.K. Liu, and Raymond S.T. Lee Knowledge Seeker Ontology Modelling for Information Search and Management, 2011 ISBN 978-3-642-17915-0 Vol. 9. Menahem Friedman and Abraham Kandel Calculus Light, 2011 ISBN 978-3-642-17847-4 Vol. 10. Andreas Tolk and Lakhmi C. Jain Intelligence-Based Systems Engineering, 2011 ISBN 978-3-642-17930-3 Vol. 11. Samuli Niiranen and Andre Ribeiro (Eds.) Information Processing and Biological Systems, 2011 ISBN 978-3-642-19620-1 Vol. 12. Florin Gorunescu Data Mining, 2011 ISBN 978-3-642-19720-8 Vol. 13. Witold Pedrycz and Shyi-Ming Chen (Eds.) Granular Computing and Intelligent Systems, 2011 ISBN 978-3-642-19819-9 Vol. 14. George A. Anastassiou and Oktay Duman Towards Intelligent Modeling: Statistical Approximation Theory, 2011 ISBN 978-3-642-19825-0 Vol. 15. Antonino Freno and Edmondo Trentin Hybrid Random Fields, 2011 ISBN 978-3-642-20307-7 Vol. 16. Alexiei Dingli Knowledge Annotation: Making Implicit Knowledge Explicit, 2011 ISBN 978-3-642-20322-0 Vol. 17. Crina Grosan and Ajith Abraham Intelligent Systems, 2011 ISBN 978-3-642-21003-7 Vol. 18. Achim Zielesny From Curve Fitting to Machine Learning,2011 ISBN 978-3-642-21279-6 Vol. 19. George A. Anastassiou Intelligent Systems: Approximation by Artificial Neural Networks, 2011 ISBN 978-3-642-21430-1 Vol. 20. Lech Polkowski Approximate Reasoning by Parts, 2011 ISBN 978-3-642-22278-8 Vol. 21. Igor Chikalov Average Time Complexity of Decision Trees, 2011 ISBN 978-3-642-22660-1 Vol. 22. Przemys law Różewski, Emma Kusztina, Ryszard Tadeusiewicz, and Oleg Zaikin Intelligent Open Learning Systems, 2011 ISBN 978-3-642-22666-3 Vol. 23. Dawn E. Holmes and Lakhmi C. Jain (Eds.) Data Mining: Foundations and Intelligent Paradigms, 2012 ISBN 978-3-642-23165-0 Vol. 24. Dawn E. Holmes and Lakhmi C. Jain (Eds.) Data Mining: Foundations and Intelligent Paradigms, 2012 ISBN 978-3-642-23240-4 Vol. 25. Dawn E. Holmes and Lakhmi C. Jain (Eds.) Data Mining: Foundations and Intelligent Paradigms, 2012 ISBN 978-3-642-23150-6

Dawn E. Holmes and Lakhmi C. Jain (Eds.) Data Mining: Foundations and Intelligent Paradigms Volume 3: Medical, Health, Social, Biological and other Applications 123

Prof.DawnE.Holmes Department of Statistics and Applied Probability University of California, Santa Barbara, CA 93106 USA E-mail: holmes@pstat.ucsb.edu Prof.LakhmiC.Jain Professor of Knowledge-Based Engineering University of South Australia Adelaide Mawson Lakes, SA 5095 Australia E-mail: Lakhmi.jain@unisa.edu.au ISBN 978-3-642-23150-6 e-isbn 978-3-642-23151-3 DOI 10.1007/978-3-642-23151-3 Intelligent Systems Reference Library ISSN 1868-4394 Library of Congress Control Number: 2011936705 c 2012 Springer-Verlag Berlin Heidelberg This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Typeset & Cover Design: Scientific Publishing Services Pvt. Ltd., Chennai, India. Printed on acid-free paper 987654321 springer.com

Preface There are many invaluable books available on data mining theory and applications. However, in compiling a volume titled DATA MINING: Foundations and Intelligent Paradigms: Volume 3: Medical, Health, Social, Biological and other Applications we wish to introduce some of the latest developments to a broad audience of both specialists and non-specialists in this field. The term data mining was introduced in the 1990 s to describe an emerging field based on classical statistics, artificial intelligence and machine learning. By combining techniques from these areas, and developing new ones researchers are able to innovatively analyze large datasets productively. Patterns found in these datasets are subsequently analyzed with a view to acquiring new knowledge. These techniques have been applied in a broad range of medical, health, social and biological areas. In compiling this volume we have sought to present innovative research from prestigious contributors in the field of data mining. Each chapter is self-contained and is described briefly in Chapter 1. This book will prove valuable to theoreticians as well as application scientists/engineers in the area of Data Mining. Postgraduate students will also find this a useful sourcebook since it shows the direction of current research. We have been fortunate in attracting top class researchers as contributors and wish to offer our thanks for their support in this project. We also acknowledge the expertise and time of the reviewers. Finally, we also wish to thank Springer for their support. Dr. Dawn E. Holmes University of California Santa Barbara, USA Dr. Lakhmi C. Jain University of South Australia Adelaide, Australia

Contents Chapter 1 Advances in Intelligent Data Mining... 1 Dawn E. Holmes, Jeffrey W. Tweedale, Lakhmi C. Jain 1 Introduction... 1 2 Medical Influences........................................ 2 3 Health Influences......................................... 2 4 Social Influences... 2 4.1 InformationDiscovery... 2 4.2 On-LineCommunities... 3 5 Biological Influences...................................... 3 5.1 BiologicalNetworks... 3 5.2 EstimationsinGeneExpression... 4 6 ChaptersIncludedintheBook... 4 7 Conclusion... 6 References... 6 Chapter 2 Temporal Pattern Mining for Medical Applications... 9 Giulia Bruno, Paolo Garza 1 Introduction... 9 2 TypesofTemporalDatainMedicalDomain... 10 3 Definitions... 11 4 TemporalPatternMining Algorithms... 11 4.1 Temporal Pattern Mining from a Set of Sequences...... 12 4.2 Temporal Pattern Mining from a Single Sequence...... 14 5 MedicalApplications... 15 6 Conclusions... 17 References... 18 Chapter 3 BioKeySpotter: An Unsupervised Keyphrase Extraction Technique in the Biomedical Full-Text Collection... 19 Min Song, Prat Tanapaisankit 1 Introduction... 19

VIII Contents 2 Backgrounds and Related Work............................ 20 3 TheProposedApproach... 21 4 Evaluation... 23 4.1 Dataset... 24 4.2 ComparisonAlgorithms... 24 4.3 ExperimentalResults... 25 5 Conclusion... 26 References... 27 Chapter 4 Mining Health Claims Data for Assessing Patient Risk... 29 Ian Duncan 1 WhatIsHealthRisk?... 29 2 TraditionalModelsforAssessingHealthRisk... 33 3 RiskFactor-BasedRiskModels... 37 4 DataSources... 39 4.1 EnrollmentData... 40 4.2 ClaimsandCodingSystems... 40 4.3 InterpretationofClaimsCodes... 49 5 ClinicalIdentificationAlgorithms... 51 6 Sensitivity-SpecificityTrade-Off... 56 6.1 ConstructinganIdentificationAlgorithm... 56 6.2 SourcesofAlgorithms... 57 7 ConstructionandUseofGrouperModels... 58 7.1 DrugGrouperModels... 61 7.2 Drug-BasedRiskAdjustmentModels... 61 8 SummaryandConclusions... 62 References... 62 Chapter 5 Mining Biological Networks for Similar Patterns... 63 Ferhat Ay, Günhan Gülsoy, Tamer Kahveci 1 Introduction... 63 2 Metabolic Network Alignment with One-to-One Mappings..... 67 2.1 Model... 68 2.2 ProblemFormulation... 69 2.3 PairwiseSimilarityofEntities... 70 2.4 SimilarityofTopologies... 74 2.5 CombiningHomologyandTopology... 76 2.6 ExtractingtheMappingofEntities... 78 2.7 SimilarityScoreofNetworks... 79 2.8 ComplexityAnalysis... 80 3 Metabolic Network Alignment with One-to-Many Mappings... 80 3.1 Homological Similarity of Subnetworks................ 82 3.2 Topological Similarity of Subnetworks................. 83

Contents IX 3.3 CombiningHomologyandTopology... 84 3.4 Extracting Subnetwork Mappings.................... 84 4 SignificanceofNetworkAlignment... 88 4.1 IdentificationofAlternativeEntities... 88 4.2 Identification of Alternative Subnetworks.............. 89 4.3 One-to-Many Mappings within and across Major Clades... 91 5 Summary... 92 6 FurtherReading... 93 References... 96 Chapter 6 Estimation of Distribution Algorithms in Gene Expression Data Analysis... 101 Elham Salehi, Robin Gras 1 Introduction... 101 2 EstimationofDistributionofAlgorithms... 102 2.1 ModelBuildinginEDA... 103 2.2 Notation... 104 2.3 ModelswithIndependentVariables... 104 2.4 ModelswithPairWiseDependencies... 105 2.5 ModelswithMultipleDependencies... 106 3 Application of EDA in Gene Expression Data Analysis........ 108 3.1 State-of-Art of the Application of EDAs in Gene ExpressionDataAnalysis... 110 4 Conclusion... 116 References... 116 Chapter 7 Gene Function Prediction and Functional Network: The Role of Gene Ontology... 123 Erliang Zeng, Chris Ding, Kalai Mathee, Lisa Schneper, Giri Narasimhan 1 Introduction... 124 1.1 GeneFunctionPrediction... 125 1.2 FunctionalGeneNetworkGeneration... 127 1.3 RelatedWorkandLimitations... 128 2 GO-BasedGeneSimilarityMeasures... 129 3 Estimating Support for PPI Data with Applications to FunctionPrediction... 132 3.1 MixtureModelofPPIData... 132 3.2 DataSets... 133 3.3 FunctionPrediction... 134 3.4 EvaluatingtheFunctionPrediction... 135 3.5 ExperimentalResults... 137 3.6 Discussion... 147

X Contents 4 A Functional Network of Yeast Genes Using Gene Ontology Information... 149 4.1 DataSets... 149 4.2 ConstructingaFunctionalGeneNetwork... 149 4.3 UsingSemanticSimilarity(SS)... 150 4.4 Evaluating the Functional Gene Network............. 151 4.5 ExperimentalResults... 151 4.6 Discussion... 158 5 Conclusions... 159 References... 160 Chapter 8 Mining Multiple Biological Data for Reconstructing Signal Transduction Networks... 163 Thanh-Phuong Nguyen, Tu-Bao Ho 1 Introduction... 163 2 Background... 164 2.1 SignalTransductionNetwork... 164 2.2 Protein-ProteinInteraction... 166 3 Constructing Signal Transduction Networks Using Multiple Data... 167 3.1 RelatedWork... 167 3.2 MaterialsandMethods... 168 3.3 Clustering and Protein-Protein Interaction Networks.... 169 3.4 Evaluation... 174 4 SomeResultsofYeastSTNReconstruction... 178 5 Outlook... 180 6 Summary... 181 References... 181 Chapter 9 Mining Epistatic Interactions from High-Dimensional Data Sets... 187 Xia Jiang, Shyam Visweswaran, Richard E. Neapolitan 1 Introduction... 187 2 Background... 188 2.1 Epistasis... 188 2.2 Detecting Epistasis... 189 2.3 High-DimensionalData Sets... 190 2.4 BarrierstoLearningEpistasis... 191 2.5 MDR... 191 2.6 BayesianNetworks... 193 3 DiscoveringEpistasisUsingBayesianNetworks... 196 3.1 A Bayesian Network Model for Epistatic Interactions... 196 3.2 TheBNMBLScore... 197

Contents XI 3.3 Experiments... 197 4 EfficientSearch... 202 4.1 Experiments... 203 5 Discussion,Limitations,andFutureResearch... 206 References... 207 Chapter 10 Knowledge Discovery in Adversarial Settings... 211 D.B. Skillicorn 1 Introduction... 211 2 Characteristics of Adversarial Modelling..................... 214 3 TechnicalImplications... 216 4 Conclusion... 221 References... 222 Chapter 11 Analysis and Mining of Online Communities of Internet Forum Users... 225 Miko laj Morzy 1 Introduction... 225 1.1 WhatIsWeb2.0?... 225 1.2 New Forms of Participation Push or Pull?.......... 228 1.3 Internet Forums as New Forms of Conversation........ 229 2 Social-DrivenData... 231 2.1 WhatAreSocial-DrivenData?... 231 2.2 DatafromInternetForums... 234 3 InternetForums... 237 3.1 CrawlingInternetForums... 237 3.2 StatisticalAnalysis... 239 3.3 IndexAnalysis... 246 3.4 NetworkAnalysis... 253 4 RelatedWork... 260 5 Conclusions... 261 References... 262 Chapter 12 Data Mining for Information Literacy... 265 Bettina Berendt 1 Introduction... 265 2 Background... 267 2.1 InformationLiteracy... 267 2.2 CriticalLiteracy... 269 2.3 EducationalDataMining... 270 3 Towards Critical Data Literacy: A Frame for Analysis and Design... 270

XII Contents 3.1 AFrameofAnalysis:TechniqueandObject... 270 3.2 On the Chances of Achieving Critical Data Literacy: Principles of Successful Learning as Description Criteria... 272 4 Examples: Tools and Other Approaches Supporting Data MiningforInformationLiteracy... 273 4.1 Analysing Data: Do-It-Yourself Statistics Visualization... 273 4.2 Analysing Language: Viewpoints and Bias in Media Reporting... 277 4.3 Analysing Data Mining: Building, Comparing and Re-using Own and Others Conceptualizations of a Domain... 282 4.4 Analysing Actions: Feedback and Awareness Tools...... 284 4.5 Analysing Actions: Role Reversals in Data Collection andanalysis... 288 5 SummaryandConclusions... 292 References... 293 Chapter 13 Rule Extraction from Neural Networks and Support Vector Machines for Credit Scoring... 299 Rudy Setiono, Bart Baesens, David Martens 1 Introduction... 299 2 Re-RX: Recursive Rule Extraction from Neural Networks...... 300 2.1 MultilayerPerceptron... 300 2.2 Finding Optimal Network Structure by Pruning........ 303 2.3 RecursiveRuleExtraction... 304 2.4 ApplyingRe-RXforCreditScoring... 306 3 ALBA: Rule Extraction from Support Vector Machines....... 311 3.1 Support Vector Machine............................ 311 3.2 ALBA: Active Learning Based Approach to SVM Rule Extraction... 313 3.3 ApplyingALBAforCreditScoring... 316 4 Conclusion... 318 References... 318 Chapter 14 Using Self-Organizing Map for Data Mining: A Synthesis with Accounting Applications... 321 Andriy Andreev, Argyris Argyrou 1 Introduction... 321 2 DataPre-processing... 322 2.1 TypesofVariables... 322 2.2 DistanceMetrics... 323

Contents XIII 2.3 Rescaling Input Variables........................... 323 3 Self-OrganizingMap... 324 3.1 IntroductiontoSOM... 324 3.2 FormationofSOM... 324 4 PerformanceMetricsandClusterValidity... 326 5 ExtensionsofSOM... 328 5.1 Non-metricSpaces... 328 5.2 SOMforTemporalSequenceProcessing... 329 5.3 SOMforClusterAnalysis... 331 5.4 SOM for Visualizing High-Dimensional Data........... 333 6 FinancialApplicationsofSOM... 334 7 CaseStudy:ClusteringAccountingDatabases... 335 7.1 DataDescription... 335 7.2 DataPre-processing... 336 7.3 Experiments... 337 7.4 ResultsPresentationandDiscussion... 338 References... 338 Chapter 15 Applying Data Mining Techniques to Assess Steel Plant Operation Conditions... 343 Khan Muhammad Badruddin, Isao Yagi, Takao Terano 1 Introduction... 343 2 BriefDescriptionofEAF... 345 2.1 PerformanceEvaluationCriteria... 346 2.2 InnovationsinElectricArcFurnaces... 346 2.3 DetailsoftheOperation... 347 2.4 Understanding SCIPs and Stages of a Heat... 349 3 ProblemDescription... 350 4 DataMiningProcess... 351 4.1 Data... 351 4.2 DataPreprocessing... 351 4.3 AttributePruning... 353 4.4 TheExperiments... 354 4.5 DataMining Techniques... 354 5 Results... 355 5.1 Discussion... 358 6 ConcludingRemarks... 359 References... 360 Author Index... 363

Editors Dr. Dawn E. Holmes serves as Senior Lecturer in the Department of Statistics and Applied Probability and Senior Associate Dean in the Division of Undergraduate Education at UCSB. Her main research area, Bayesian Networks with Maximum Entropy, has resulted in numerous journal articles and conference presentations. Her other research interests include Machine Learning, Data Mining, Foundations of Bayesianism and Intuitionistic Mathematics. Dr. Holmes has co-edited, with Professor Lakhmi C. Jain, volumes Innovations in Bayesian Networks and Innovations in Machine Learning. Dr. Holmes teaches a broad range of courses, including SAS programming, Bayesian Networks and Data Mining. She was awarded the Distinguished Teaching Award by Academic Senate, UCSB in 2008. As well as being Associate Editor of the International Journal of Knowledge-Based and Intelligent Information Systems, Dr. Holmes reviews extensively and is on the editorial board of several journals, including the Journal of Neurocomputing. She serves as Program Scientific Committee Member for numerous conferences; including the International Conference on Artificial Intelligence and the International Conference on Machine Learning. In 2009 Dr. Holmes accepted an invitation to join Center for Research in Financial Mathematics and Statistics (CRFMS), UCSB. She was made a Senior Member of the IEEE in 2011. Professor Lakhmi C. Jain is a Director/Founder of the Knowledge-Based Intelligent Engineering Systems (KES) Centre, located in the University of South Australia. He is a fellow of the Institution of Engineers Australia. His interests focus on the artificial intelligence paradigms and their applications in complex systems, artscience fusion, e-education, e-healthcare, unmanned air vehicles and intelligent agents.