Scientific Data Mining and Knowledge Discovery
Mohamed Medhat Gaber Editor Scientific Data Mining and Knowledge Discovery Principles and Foundations ABC
Editor Mohamed Medhat Gaber Caulfield School of Information Technology Monash University 900 Dandenong Rd. Caulfield East, VIC 3145 Australia mohamed.m.gaber@gmail.com Color images of this book you can find on www.springer.com/978-3-642-02787-1 ISBN 978-3-642-02787-1 e-isbn 978-3-642-02788-8 DOI 10.1007/978-3-642-02788-8 Springer Heidelberg Dordrecht London New York Library of Congress Control Number: 2009931328 ACM Computing Classification (1998): I.5, I.2, G.3, H.3 c Springer-Verlag Berlin Heidelberg 2010 This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Cover design: KuenkelLopka GmbH Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)
This book is dedicated to: My parents: Dr. Medhat Gaber and Mrs. Mervat Hassan My wife: Dr. Nesreen Hassaan My children: Abdul-Rahman and Mariam
Contents Introduction... 1 Mohamed Medhat Gaber Part I Background Machine Learning... 7 Achim Hoffmann and Ashesh Mahidadia Statistical Inference... 53 Shahjahan Khan The Philosophy of Science and its relation to Machine Learning... 77 Jon Williamson Concept Formation in Scientific Knowledge Discovery from a Constructivist View... 91 Wei Peng and John S. Gero Knowledge Representation and Ontologies...111 Stephan Grimm Part II Computational Science Spatial Techniques...141 Nafaa Jabeur and Nabil Sahli Computational Chemistry...173 Hassan Safouhi and Ahmed Bouferguene String Mining in Bioinformatics...207 Mohamed Abouelhoda and Moustafa Ghanem vii
viii Contents Part III Data Mining and Knowledge Discovery Knowledge Discovery and Reasoning in Geospatial Applications...251 Nabil Sahli and Nafaa Jabeur Data Mining and Discovery of Chemical Knowledge...269 Lu Wencong Data Mining and Discovery of Astronomical Knowledge...319 Ghazi Al-Naymat Part IV Future Trends On-board Data Mining...345 Steve Tanner, Cara Stein, and Sara J. Graves Data Streams: An Overview and Scientific Applications...377 Charu C. Aggarwal Index...399
Contributors Mohamed Abouelhoda Cairo University, Orman, Gamaa Street, 12613 Al Jizah, Giza, Egypt Nile University, Cairo-Alex Desert Rd, Cairo 12677, Egypt Charu C. Aggarwal IBM T. J. Watson Research Center, NY, USA, AL 35805, USA, charu@us.ibm.com Ghazi Al-Naymat School of Information Technologies, The University of Sydney, Sydney, NSW 2006, Australia, ghazi@it.usyd.edu.au Ahmed Bouferguene Campus Saint-Jean, University of Alberta, 8406, 91 Street, Edmonton, AB, Canada T6C 4G9 Mohamed Medhat Gaber Centre for Distributed Systems and Software Engineering, Monash University, 900 Dandenong Rd, Caul eld East, VIC 3145, Australia, Mohamed.Gaber@infotech.monash.edu.au John S. Gero Krasnow Institute for Advanced Study and Volgenau School of Information, Technology and Engineering, George Mason University, USA, john@johngero.com Moustafa Ghanem Imperial College, South Kensington Campus, London SW7 2AZ, UK Sara J. Graves University of Alabama in Huntsville, AL 35899, USA, sgraves@itsc.uah.edu Stephan Grimm FZI Research Center for Information Technologies, University of Karlsruhe, Baden-Württemberg, Germany, grimm@fzi.de Achim Hoffmann University of New South Wales, Sydney 2052, NSW, Australia Nafaa Jabeur Department of Computer Science, Dhofar University, Salalah, Sultanate of Oman, nafaa jabeur@du.edu.om Shahjahan Khan Department of Mathematics and Computing, Australian Centre for Sustainable Catchments, University of Southern Queensland, Toowoomba, QLD, Australia, khans@usq.edu.au Ashesh Mahidadia University of New South Wales, Sydney 2052, NSW, Australia ix
x Contributors Wei Peng Platform Technologies Research Institute, School of Electrical and Computer, Engineering, RMIT University, Melbourne VIC 3001, Australia, w.peng@rmit.edu.au Cara Stein University of Alabama in Huntsville, AL 35899, USA, cgall@itsc.uah.edu Hassan Safouhi Campus Saint-Jean, University of Alberta, 8406, 91 Street, Edmonton, AB, Canada T6C 4G9 Nabil Sahli Department of Computer Science, Dhofar University, Salalah, Sultanate of Oman, nabil sahli@du.edu.om Steve Tanner University of Alabama in Huntsville, AL 35899, USA, stanner@itsc.uah.edu Lu Wencong Shanghai University, 99 Shangda Road, BaoShan District, Shanghai, Peoples Republic of China, wclu@shu.edu.cn Jon Williamson Kings College London, Strand, London WC2R 2LS, England, UK, j.williamson@kent.ac.uk