Foundations of Genetic Programming

Similar documents
Modeling Manufacturing Systems. From Aggregate Planning to Real-Time Control

Architecture Design and Validation Methods

Application of Evolutionary Algorithms for Multi-objective Optimization in VLSI and Embedded Systems

MATLAB Guide to Finite Elements

Lecture Notes in Artificial Intelligence. Lecture Notes in Computer Science

StraBer Wahl Graphics and Robotics

Cognitive Systems Monographs

Genetic Programming Approach to Benelearn 99: II

TECHNOLOGY, INNOVATION, and POLICY 3. Series of the Fraunhofer Institute for Systems and Innovation Research (lsi)

Computer-Aided Production Management

Design for Innovative Value Towards a Sustainable Society

Studies in Economic Ethics and Philosophy

U. Lindemann (Ed.) Human Behaviour in Design

Data Assimilation: Tools for Modelling the Ocean in a Global Change Perspective

Advances in Computer Vision and Pattern Recognition

Computational Intelligence for Network Structure Analytics

Introduction to Computational Optimization Models for Production Planning in a Supply Chain

Matthias Pilz Susanne Berger Roy Canning (Eds.) Fit for Business. Pre-Vocational Education in European Schools RESEARCH

Dry Etching Technology for Semiconductors. Translation supervised by Kazuo Nojiri Translation by Yuki Ikezi

ZEW Economic Studies. Publication Series of the Centre for European Economic Research (ZEW), Mannheim, Germany

Advances in Modern Tourism Research

Health Information Technology Standards. Series Editor: Tim Benson

Hierarchy Process. The Analytic. Bruce L. Golden Edward A. Wasil Patrick T. Harker (Eds.) Applications and Studies

K-Best Decoders for 5G+ Wireless Communication

Springer Series on. Signals and Communication Technology

Lecture Notes in Computer Science

Statistics and Computing Series Editors: J. Chambers D. Hand W. Härdle

Simulation by Bondgraphs

Founding Editor Martin Campbell-Kelly, University of Warwick, Coventry, UK

Future-Oriented Technology Analysis

Pierre-Yves Henin (Ed.) Advances in Business Cycle Research

Advanced Electronic Circuits

Lecture Notes in Computer Science 2500 Edited by G. Goos, J. Hartmanis, and J. van Leeuwen

Lindenmayer Systems. G. Rozenberg A. Salomaa (Eds.) Impacts on Theoretical Computer Science, Computer Graphics, and Developmental Biology

An Evolutionary Approach to the Synthesis of Combinational Circuits

ANALOG CIRCUITS AND SIGNAL PROCESSING

.. Algorithms and Combinatorics 17

On Evolution of Relatively Large Combinational Logic Circuits

Introduction to Fuzzy Logic using MATLAB

Lecture Notes in Computer Science

Lecture Notes in Computer Science. Edited by G. Goos, J. Hartmanis and J. van Leeuwen

Dao Companion to the Analects

Advances in Metaheuristic Algorithms for Optimal Design of Structures

Sergey Ablameyko and Tony Pridmore. Machine Interpretation of Line Drawing Images. Technical Drawings, Maps and Diagrams.

Lecture Notes in Computational Science and Engineering 68

Scientific Data Mining and Knowledge Discovery

Applied Technology and Innovation Management

Endless forms (of regression models) James McDermott

Innovation Policy in a Knowledge-Based Economy

Technology Roadmapping for Strategy and Innovation

Foundations in Signal Processing, Communications and Networking

Handbook of Engineering Acoustics

Risk-Based Ship Design

ICT for the Next Five Billion People

COOP 2016: Proceedings of the 12th International Conference on the Design of Cooperative Systems, May 2016, Trento, Italy

Broadband Networks, Smart Grids and Climate Change

Lecture Notes in Computer Science 2599 Edited by G. Goos, J. Hartmanis, and J. van Leeuwen

Memetic Crossover for Genetic Programming: Evolution Through Imitation

Lecture Notes in Control and Information Sciences 283. Editors: M. Thoma M. Morari

Lecture Notes in Computer Science Edited by G. Goos, J. Hartmanis and J. van Leeuwen

Sustainable Development

Palgrave Studies in Comics and Graphic Novels. Series Editor Roger Sabin University of the Arts London London, United Kingdom

SpringerBriefs in Space Development

Explaining Technical Change in a Small Country. The Finnish National Innovation System

Requirements Engineering for Digital Health

GENETIC PROGRAMMING. In artificial intelligence, genetic programming (GP) is an evolutionary algorithmbased

Studies in Empirical Economics

CONTENTS PREFACE. Part One THE DESIGN PROCESS: PROPERTIES, PARADIGMS AND THE EVOLUTIONARY STRUCTURE

Product Development Strategy

Dynamics of Fibre Formation and Processing

Lecture Notes in Computer Science

@'1? CAD. ~ Office. Integration

Communications in Computer and Information Science 85

Evolving Control for Distributed Micro Air Vehicles'

Management of Recreation and Nature Based Tourism in European Forests

Victorian Telegraphy Before Nationalization

Modelling Non-Stationary Time Series

2 Forensic Science Progress

Enabling Manufacturing Competitiveness and Economic Sustainability

Acoustic Emission Testing

Robust Hand Gesture Recognition for Robotic Hand Control

S.P. Vingron Switching Theory

COMP SCI 5401 FS2015 A Genetic Programming Approach for Ms. Pac-Man

Offshore Energy Structures

Chess Skill in Man and Machine

3 Forensic Science Progress

Bioinformatics for Evolutionary Biologists

SpringerBriefs in Space Development

SpringerBriefs in Astronomy

Lecture Notes in Applied and Computational Mechanics

Advances in Behavioral Economics

Socio-technical Design of Ubiquitous Computing Systems

Computer Chess Compendium

Knowledge-Based Vision-Guided Robots

Intelligent Control Systems with LabVIEW

This page intentionally left blank

Lexicographic Parsimony Pressure

Computer Supported Cooperative Work. Series Editor Richard Harper Cambridge, United Kingdom

Enacting Research Methods in Information Systems: Volume 2

International Entrepreneurship

Transcription:

Foundations of Genetic Programming

Springer-Verlag Berlin Heidelberg GmbH

William B. Langdon Riccardo Poli Foundations of Genetic Programming With 117 Figures and 12 Tables Springer

William B. Langdon Computer Science University College, London Gower Street London, WCIE 6BT UK W.Langdon@cs.uel.ac.uk Riccardo Poli Department of Computer Science The University of Essex Wivenhoe Park Colchester, C04 3SQ UK rpoli@essex.ac.uk Library of Congress Cataloging-in-Publication Data Langdon, W.B. (William B.) Foundations of genetic programming/william B. Langdon, Riccardo Poli. p.cm. Includes bibliographical references and index. ISBN 978-3-642-07632-9 ISBN 978-3-662-04726-2 (ebook) DOI 10.1007/978-3-662-04726-2 1. Genetic programming (Computer science) I. Poli, Riccardo, 1961-1I. Title QA 76.623.L35 2001 006.3' l-dc21 2001049394 ACM Subject Classification (1998): F.1.1, D.1.2-3, G.2.1, G.1.6, G.1.2, E.1, G.3, 1.2.6, 1.2.8, 1.1.1-3 This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer-Verlag. Violations are liable for prosecution under the German Copyright Law. http://www.springer.de Springer-Verlag Berlin Heidelberg 2002 Originally published by Springer-Verlag Berlin Heidelberg New York in 2002 Softcover reprint of the hardcover 1 st edition 2002 The use of general descriptive names, trademarks, etc. in this publicat ion does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Typesetting: Camera-ready by the authors Cover Design: design & production, Heidelberg Printed on acid-free paper SPIN 11395881-45/31l1SR - 5 4 3 21

To Caterina and Ludovico R.P.

Preface Genetic programming (GP) has been highly successful as a technique for getting computers to automatically solve problems without having to tell them explicitly how to do it. Since its inception more than ten years ago genetic programming has been used to solve practical problems but along with this engineering approach there has been interest in how and why it works. This book consolidates this theoretical work. One of the goals of any theoretical work is to better understand the subject. This is useful in its own right and as an aid to designing improvements. We will describe several new genetic operators that arose naturally from theoretical work and suggest modest changes to the way existing GP systems could be used on specific problems to yield improved performance. No doubt these operators and suggestions will be of direct practical interest, even to those who are not interested in "theory" for its own sake. Genetic programming is one of a wide range of evolutionary computation techniques, such as evolutionary strategies and evolutionary programming, being itself a descendent of one of the oldest, Genetic Algorithms (GAs). It is nice to be able to report in this book that theoretical results from the "new boy", GP, can be directly applied to GAs. Since GP is more expressive than GAs, it can be viewed as a generalisation of GAs. In the same way, GP theory is a generalisation of GA theory, although, in fact, some recent advances in GP theory came first and the corresponding GA theory was derived by specialising the more general GP theory. In effect we are getting GA theory for free, from the GP theory. In this way the various strands of evolutionary computation theory are themselves coming together (although convergence is some way off). The title of our book has, itself, a genetic pedigree. Its direct ancestor is a workshop of the same name held at the first Genetic and Evolutionary Computation Conference [Banzhaf et al., 1999], which we organised (together with Una-May O'Reilly, Justinian Rosca and Thomas Haynes) in July 1999, in Orlando. Prior to this (starting in 1990) there has been a long-running series of workshops called Foundations of Genetic Algorithms (FOGA). More generally, the inspiration for "Foundations of Genetic Programming" came from a panel called "The next frontiers of AI: the role of foundations", held at EPIA 1995 [Pinto-Ferreira and Mamede, 1995]. On that occasion Riccardo

VIn Preface put forward the view that the foundations of Artificial Intelligence (AI) are fundamental principles which are common to all disciplines within AI, be they artificial neural networks, evolutionary computation, theorem proving, etc. (see figures on the next page). The common feature of these techniques is search (although the representation being used to express solutions and the search used may be radically different). In our opinion search (be it deterministic or stochastic, complete or incomplete, blind, partially sighted, heuristic, etc.), the related representation, operators and objective functions are the foundations of AI. So Foundations of Genetic Programming should not be viewed only as a collection of techniques that one needs to know in order to be able to do GP well but also as a first attempt to chart and explore the mechanisms and fundamental principles behind genetic programming as a search algorithm. In writing this book we hoped to cast a tiny bit of light onto the theoretical foundations of Artificial Intelligence as a whole. Acknowledgements We would like to thank Andy Singleton, Trevor Fenner, Tom Westerdale, Paul Vitanyi, Peter Nordin, Wolfgang Banzhaf, Nic McPhee, David Fogel, Tom Haynes, Sidney R. Maxwell III, Peter Angeline, Astro Teller, Rafael Bordini, Lee Spector, Lee Altenberg, Jon Rowe, Julian Miller, Xin Yao, Kevin P. Lucas, Martijn Bot, Robert Burbidge, Michael O'Neill, the people of the Chair of Systems Analysis (University of Dortmund), the Centrum voor Wiskunde en Informatica, Amsterdam, University College, London, and the members of the EEBIC group at the University of Birmingham. We would also like to thank Axel Grossmann, Aaron Sloman, Stefano Cagnoni, Jun He, John Woodward, Vj Varma, Tim Kovacs, Marcos Quintana Hernandez and Peter Coxhead for their useful comments on drafts of the book. Finally, we would like to thank numerous anonymous referees of our work over several years for particularly helpful comments and suggestions. October 2001 W.E. Langdon Riccardo Poli

Preface IX Genetic Algorithms Artificial Intelligence can b e seen as a cluster of islands in the sea. Neural etwork Classical Artificial Intelligence Artificial Intelligence can be seen as a cluster of islands in the sea sharing a set of common foundations (cross-sectional view).

Contents 1. Introduction... 1 1.1 Problem Solving as Search............................... 2 1.1.1 Microscopic Dynamical System Models.............. 4 1.1.2 Fitness Landscapes............................... 4 1.1.3 Component Analysis.............................. 6 1.1.4 Schema Theories................................. 7 1.1.5 No Free Lunch Theorems.......................... 8 1.2 What is Genetic Programming?.......................... 9 1.2.1 Tree-based Genetic Programming................... 10 1.2.2 Modular and Multiple Tree Genetic Programming.... 11 1.2.3 Linear Genetic Programming...................... 13 1.2.4 Graphical Genetic Programming................... 14 1.3 Outline of the Book..................................... 15 2. Fitness Landscapes..................... 17 2.1 Exhaustive Search...................................... 17 2.2 Hill Climbing... 17 2.3 Fitness Landscapes as Models of Problem Difficulty......... 19 2.4 An Example GP Fitness Landscape....................... 20 2.5 Other Search Strategies................................. 21 2.6 Difficulties with the Fitness Landscape Metaphor........... 23 2.7 Effect of Representation Changes......................... 25 2.8 Summary... 26 3. Program Component Schema Theories.................... 27 3.1 Price's Selection and Covariance Theorem................. 28 3.1.1 Proof of Price's Theorem.......................... 29 3.1.2 Price's Theorem for Genetic Algorithms............. 31 3.1.3 Price's Theorem with Tournament Selection......... 31 3.1.4 Applicability of Price's Theorem to GAs and GPs.... 32 3.2 Genetic Algorithm Schemata............................. 33 3.3 From GA Schemata to GP Schemata... '...... 35 3.4 Koza's Genetic Programming Schemata... '. 38 3.5 Altenberg's GP Schema Theory... "................ 39

XII Contents 3.6 O'Reilly's Genetic Programming Schemata... 43 3.7 Whigham's Genetic Programming Schemata............... 45 3.8 Summary... 46 4. Pessimistic G P Schema Theories... " 49 4.1 Rosca's Rooted Tree Schemata........................... 49 4.2 Fixed-Size-and-Shape Schemata in GP.................... 51 4.3 Point Mutation and One-Point Crossover in GP............ 56 4.4 Disruption-Survival GP Schema Theorem.................. 60 4.4.1 Effect of Fitness Proportionate Selection... 60 4.4.2 Effect of One-Point Crossover...................... 61 4.4.3 Effect of Point Mutation... 65 4.4.4 GP Fixed-size-and-shape Schema Theorem... 65 4.4.5 Discussion... 66 4.4.6 Early Stages of a GP Run......................... 66 4.4.7 Late Stages of a GP Run.......................... 67 4.4.8 Interpretation... 68 4.5 Summary... 68 5. Exact GP Schema Theorems... 69 5.1 Criticisms of Schema Theorems... 69 5.2 The Role of Schema Creation............................ 71 5.3 Stephens and Waelbroeck's GA Schema Theory............ 73 5.4 GP Hyperschema Theory................................ 74 5.4.1 Theory for Programs of Fixed Size and Shape... 74 5.4.2 Hyperschemata... 77 5.4.3 Microscopic Exact GP Schema Theorem............. 77 5.4.4 Macroscopic Schema Theorem with Schema Creation. 80 5.4.5 Macroscopic Exact GP Schema Theorem............ 82 5.5 Examples... 83 5.5.1 Linear Trees..................................... 83 5.5.2 Comparison of Bounds by Different Schema Theorems 87 5.5.3 Example of Schema Equation for Binary Trees....... 89 5.6 Exact Macroscopic Schema Theorem for GP with Standard Crossover... 89 5.6.1 Cartesian Node Reference Systems... 90 5.6.2 Variable Arity Hyperschema....................... 91 5.6.3 Macroscopic Exact Schema Theorem for GP with Standard Crossover.......................... 92 5.7 Summary... 95 6. Lessons from the GP Schema Theory... 97 6.1 Effective Fitness........................................ 97 6.1.1 Goldberg's Operator-Adjusted Fitness in GAs........ 97 6.1.2 Nordin and Banzhaf's Effective Fitness in GP........ 98

Contents XIII 6.1.3 Stevens and Waelbroeck's Effective Fitness in GAs... 99 6.1.4 Exact Effective Fitness for GP... 100 6.1.5 Understanding GP Phenomena with Effective Fitness. 100 6.2 Operator Biases and Linkage Disequilibrium for Shapes... 105 6.3 Building Blocks in GAs and GP... 107 6.4 Practical Ideas Inspired by Schema Theories... 109 6.5 Convergence, Population Sizing, GP Hardness and Deception 110 6.6 Summary... 111 7. The Genetic Programming Search Space... 113 7.1 Experimental Exploration of GP Search Spaces... 113 7.2 Boolean Program Spaces... 114 7.2.1 NAND Program Spaces... 114 7.2.2 Three-Input Boolean Program Spaces... 119 7.2.3 Six-Input Boolean Program Spaces... 119 7.2.4 Full Trees... 123 7.3 Symbolic Regression... 123 7.3.1 Sextic Polynomial Fitness Function... 124 7.3.2 Sextic Polynomial Fitness Distribution... 124 7.4 Side Effects, Iteration, Mixed Arity: Artificial Ant... 124 7.5 Less Formal Extensions... 127 7.5.1 Automatically Defined Function... 127 7.5.2 Memory... 128 7.5.3 Turing-Complete Programs... 128 7.6 Tree Depth... 129 7.7 Discussion... 130 7.7.1 Random Trees... 130 7.7.2 Genetic Programming and Random Search... 131 7.7.3 Searching Large Programs... 131 7.7.4 Implications for GP... 131 7.8 Conclusions... 132 8. The GP Search Space: Theoretical Analysis... 133 8.1 Long Random Linear Programs... 133 8.1.1 An Illustrative Example... 135 8.1.2 Rate of Convergence and the Threshold... 136 8.1.3 Random Functions... 138 8.1.4 The Chance of Finding a Solution... 139 8.2 Big Random Tree Programs... 139 8.2.1 Setting up the Prooffor Trees... 139 8.2.2 Large Binary Trees... 142 8.2.3 An Illustrative Example... 143 8.2.4 The Chance of Finding a Solution... 144 8.2.5 A Second Illustrative Example... 144 8.3 XOR Program Spaces... 145

XIV Contents 8.3.1 Parity Program Spaces... 145 8.3.2 The Number of Parity Solutions... 146 8.3.3 Parity Problems Landscapes and Building Blocks... 148 8.4 Conclusions... 150 9. Example I: The Artificial Ant... 151 9.1 The Artificial Ant Problem... 151 9.2 Size of Program and Solution Space... 154 9.3 Solution of the Ant Problem... 157 9.3.1 Uniform Random Search... 157 9.3.2 Ramped Half-and-Half Random Search... 157 9.3.3 Comparison with Other Methods... 158 9.4 Fitness Landscape... 158 9.5 Fixed Schema Analysis... 159 9.5.1 Competition Between Programs of Different Sizes... 160 9.5.2 Competition Between Programs of Size 11... 162 9.5.3 Competition Between Programs of Size 12... 163 9.5.4 Competition Between Programs of Size 13... 164 9.6 The Solutions... 167 9.7 Discussion... 168 9.8 Reducing Deception... 170 9.9 Conclusions... 171 10. Example II: The Max Problem... 175 10.1 The MAX Problem... 176 10.2 GP Parameters... 176 10.3 Results... 176 10.3.1 Impact of Depth Restriction on Crossover... 178 10.3.2 Trapping by Suboptimal Solutions... 178 10.3.3 Modelling the Rate of Improvement... 179 10.3.4 Number of Steps to Climb the Hill... 182 10.4 Variety... 183 10.4.1 Variety in the Initial Population... 183 10.4.2 Evolution of Variety... 184 10.4.3 Modelling Variety... 185 10.5 Selection Pressure... 186 10.6 Applying Price's Covariance and Selection Theorem... 189 10.7 Conclusions... 192 11. GP Convergence and Bloat... 193 1l.1 Convergence... 193 1l.2 Bloat... 197 11.2.1 Examples of Bloat... 198 11.2.2 Convergence of Phenotype... 198 1l.2.3 Theories of Bloat... 199

Contents XV 11.2.4 Fitness Variation is Needed for Bloat... 201 11.3 Subquadratic Bloat... 202 11.3.1 Evolution of Program Shapes... 203 11.3.2 Experiments... 206 11.3.3 Results.......................................... 207 11.3.4 Convergence... 211 11.4 Depth and Size Limits... 211 11.5 Discussion... 212 11.6 AntiBloat Techniques... 214 11.7 Conclusions... 216 12. Conclusions... 219 A. Genetic Programming Resources... 223 Bibliography.................................................. 225 List of Special Symbols....................................... 241 Glossary...................................................... 247 Index... 255