Scientific Data Mining and Knowledge Discovery

Similar documents
Design for Innovative Value Towards a Sustainable Society

Future-Oriented Technology Analysis

ICT for the Next Five Billion People

Broadband Networks, Smart Grids and Climate Change

Technology Roadmapping for Strategy and Innovation

SpringerBriefs in Space Development

Health Information Technology Standards. Series Editor: Tim Benson

Requirements Engineering for Digital Health

ANALOG CIRCUITS AND SIGNAL PROCESSING

Dry Etching Technology for Semiconductors. Translation supervised by Kazuo Nojiri Translation by Yuki Ikezi

MATLAB Guide to Finite Elements

Applied Technology and Innovation Management

Springer Series on. Signals and Communication Technology

Application of Evolutionary Algorithms for Multi-objective Optimization in VLSI and Embedded Systems

SpringerBriefs in Space Development

Dao Companion to the Analects

Sustainable Development

Cognitive Systems Monographs

Founding Editor Martin Campbell-Kelly, University of Warwick, Coventry, UK

Matthias Pilz Susanne Berger Roy Canning (Eds.) Fit for Business. Pre-Vocational Education in European Schools RESEARCH

Advances in Computer Vision and Pattern Recognition

Handbook of Engineering Acoustics

Architecture Design and Validation Methods

Management of Recreation and Nature Based Tourism in European Forests

Neutron Scattering Applications and Techniques

Modeling Manufacturing Systems. From Aggregate Planning to Real-Time Control

Advances in Metaheuristic Algorithms for Optimal Design of Structures

Risk-Based Ship Design

Communications in Computer and Information Science 85

TECHNOLOGY, INNOVATION, and POLICY 3. Series of the Fraunhofer Institute for Systems and Innovation Research (lsi)

Human-Computer Interaction Series

U. Lindemann (Ed.) Human Behaviour in Design

Socio-technical Design of Ubiquitous Computing Systems

Offshore Energy Structures

Enabling Manufacturing Competitiveness and Economic Sustainability

Lecture Notes in Artificial Intelligence. Lecture Notes in Computer Science

Advances in Modern Tourism Research

Studies in Empirical Economics

ZEW Economic Studies. Publication Series of the Centre for European Economic Research (ZEW), Mannheim, Germany

SpringerBriefs in Computer Science

Simulation by Bondgraphs

Computer-Aided Production Management

Drones and Unmanned Aerial Systems

Lecture Notes in Applied and Computational Mechanics

Better Business Regulation in a Risk Society

3 Forensic Science Progress

Lecture Notes in Computer Science Edited by G. Goos, J. Hartmanis and J. van Leeuwen

Statistics and Computing Series Editors: J. Chambers D. Hand W. Härdle

B.I. Dundas M. Levine P.A. Østvær O. Röndigs. Motivic Homotopy Theory. Lectures at a Summer School in Nordfjordeid, Norway, August 2002 ABC

Hierarchy Process. The Analytic. Bruce L. Golden Edward A. Wasil Patrick T. Harker (Eds.) Applications and Studies

The Cultural and Social Foundations of Education. Series Editor A.G. Rud College of Education Washington State University USA

Lecture Notes in Computer Science

Innovation Policy in a Knowledge-Based Economy

Acoustic Emission Testing

Studies in Economic Ethics and Philosophy

Music and Human-Computer Interaction

Longitudinal Research with Latent Variables

Active Perception in the History of Philosophy

Advances in Real-Time Systems

Computer Supported Cooperative Work. Series Editor Richard Harper Cambridge, United Kingdom

Faster than Nyquist Signaling

Health Informatics. For further volumes:

Data Assimilation: Tools for Modelling the Ocean in a Global Change Perspective

BIOSEMIOTICS. Aims and Scope of the Series VOLUME 8. For further volumes:

2 Forensic Science Progress

Lecture Notes in Computational Science and Engineering 68

Introduction to Computational Optimization Models for Production Planning in a Supply Chain

Lecture Notes in Computer Science 2599 Edited by G. Goos, J. Hartmanis, and J. van Leeuwen

Dynamics of Fibre Formation and Processing

Foundations in Signal Processing, Communications and Networking

StraBer Wahl Graphics and Robotics

Lecture Notes in Control and Information Sciences 283. Editors: M. Thoma M. Morari

CMOS Test and Evaluation

Introduction to Fuzzy Logic using MATLAB

Lecture Notes in Computer Science. Edited by G. Goos, J. Hartmanis and J. van Leeuwen

Advanced Information and Knowledge Processing

Palgrave Studies in Comics and Graphic Novels. Series Editor Roger Sabin University of the Arts London London, United Kingdom

Lecture Notes in Computer Science 2500 Edited by G. Goos, J. Hartmanis, and J. van Leeuwen

The Future of Civil Litigation

Inside the Smart Home

Physiology in Health and Disease. Published on behalf of The American Physiological Society by Springer

Discursive Constructions of Corporate Identities by Chinese Banks on Sina Weibo

Trends in Logic. Volume 45

Computational Intelligence for Network Structure Analytics

Pierre-Yves Henin (Ed.) Advances in Business Cycle Research

Speech and Audio Processing for Coding, Enhancement and Recognition

Lecture Notes in Computer Science

Palgrave Studies in Comics and Graphic Novels. Series Editor Roger Sabin University of the Arts London London, United Kingdom

Patterns, Programming and Everything

The International Politics of the Armenian-Azerbaijani Conflict

Lecture Notes in Control and Information Sciences

Computational Social Sciences

Lecture Notes in Computer Science

Privacy, Data Protection and Cybersecurity in Europe

Grid and Cloud Computing

Hiroyuki Kajimoto Satoshi Saga Masashi Konyo. Editors. Pervasive Haptics. Science, Design, and Application

Human and Mediated Communication around the World

K-Best Decoders for 5G+ Wireless Communication

SpringerBriefs in Electrical and Computer Engineering

Advances in Behavioral Economics

Transcription:

Scientific Data Mining and Knowledge Discovery

Mohamed Medhat Gaber Editor Scientific Data Mining and Knowledge Discovery Principles and Foundations ABC

Editor Mohamed Medhat Gaber Caulfield School of Information Technology Monash University 900 Dandenong Rd. Caulfield East, VIC 3145 Australia mohamed.m.gaber@gmail.com Color images of this book you can find on www.springer.com/978-3-642-02787-1 ISBN 978-3-642-02787-1 e-isbn 978-3-642-02788-8 DOI 10.1007/978-3-642-02788-8 Springer Heidelberg Dordrecht London New York Library of Congress Control Number: 2009931328 ACM Computing Classification (1998): I.5, I.2, G.3, H.3 c Springer-Verlag Berlin Heidelberg 2010 This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Cover design: KuenkelLopka GmbH Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)

This book is dedicated to: My parents: Dr. Medhat Gaber and Mrs. Mervat Hassan My wife: Dr. Nesreen Hassaan My children: Abdul-Rahman and Mariam

Contents Introduction... 1 Mohamed Medhat Gaber Part I Background Machine Learning... 7 Achim Hoffmann and Ashesh Mahidadia Statistical Inference... 53 Shahjahan Khan The Philosophy of Science and its relation to Machine Learning... 77 Jon Williamson Concept Formation in Scientific Knowledge Discovery from a Constructivist View... 91 Wei Peng and John S. Gero Knowledge Representation and Ontologies...111 Stephan Grimm Part II Computational Science Spatial Techniques...141 Nafaa Jabeur and Nabil Sahli Computational Chemistry...173 Hassan Safouhi and Ahmed Bouferguene String Mining in Bioinformatics...207 Mohamed Abouelhoda and Moustafa Ghanem vii

viii Contents Part III Data Mining and Knowledge Discovery Knowledge Discovery and Reasoning in Geospatial Applications...251 Nabil Sahli and Nafaa Jabeur Data Mining and Discovery of Chemical Knowledge...269 Lu Wencong Data Mining and Discovery of Astronomical Knowledge...319 Ghazi Al-Naymat Part IV Future Trends On-board Data Mining...345 Steve Tanner, Cara Stein, and Sara J. Graves Data Streams: An Overview and Scientific Applications...377 Charu C. Aggarwal Index...399

Contributors Mohamed Abouelhoda Cairo University, Orman, Gamaa Street, 12613 Al Jizah, Giza, Egypt Nile University, Cairo-Alex Desert Rd, Cairo 12677, Egypt Charu C. Aggarwal IBM T. J. Watson Research Center, NY, USA, AL 35805, USA, charu@us.ibm.com Ghazi Al-Naymat School of Information Technologies, The University of Sydney, Sydney, NSW 2006, Australia, ghazi@it.usyd.edu.au Ahmed Bouferguene Campus Saint-Jean, University of Alberta, 8406, 91 Street, Edmonton, AB, Canada T6C 4G9 Mohamed Medhat Gaber Centre for Distributed Systems and Software Engineering, Monash University, 900 Dandenong Rd, Caul eld East, VIC 3145, Australia, Mohamed.Gaber@infotech.monash.edu.au John S. Gero Krasnow Institute for Advanced Study and Volgenau School of Information, Technology and Engineering, George Mason University, USA, john@johngero.com Moustafa Ghanem Imperial College, South Kensington Campus, London SW7 2AZ, UK Sara J. Graves University of Alabama in Huntsville, AL 35899, USA, sgraves@itsc.uah.edu Stephan Grimm FZI Research Center for Information Technologies, University of Karlsruhe, Baden-Württemberg, Germany, grimm@fzi.de Achim Hoffmann University of New South Wales, Sydney 2052, NSW, Australia Nafaa Jabeur Department of Computer Science, Dhofar University, Salalah, Sultanate of Oman, nafaa jabeur@du.edu.om Shahjahan Khan Department of Mathematics and Computing, Australian Centre for Sustainable Catchments, University of Southern Queensland, Toowoomba, QLD, Australia, khans@usq.edu.au Ashesh Mahidadia University of New South Wales, Sydney 2052, NSW, Australia ix

x Contributors Wei Peng Platform Technologies Research Institute, School of Electrical and Computer, Engineering, RMIT University, Melbourne VIC 3001, Australia, w.peng@rmit.edu.au Cara Stein University of Alabama in Huntsville, AL 35899, USA, cgall@itsc.uah.edu Hassan Safouhi Campus Saint-Jean, University of Alberta, 8406, 91 Street, Edmonton, AB, Canada T6C 4G9 Nabil Sahli Department of Computer Science, Dhofar University, Salalah, Sultanate of Oman, nabil sahli@du.edu.om Steve Tanner University of Alabama in Huntsville, AL 35899, USA, stanner@itsc.uah.edu Lu Wencong Shanghai University, 99 Shangda Road, BaoShan District, Shanghai, Peoples Republic of China, wclu@shu.edu.cn Jon Williamson Kings College London, Strand, London WC2R 2LS, England, UK, j.williamson@kent.ac.uk