Evolutionary Artificial Neural Networks For Medical Data Classification

Evolutionary Artificial Neural Networks For Medical Data Classification GRADUATE PROJECT Submitted to the Faculty of the Department of Computing Sciences Texas A&M University-Corpus Christi Corpus Christi, Texas In Partial Fulfillment of the Requirements for the Degree of Master of Science in Computer Science By Narendra Kumar Reddy Obbineni Spring 2017 Committee Members Dr. Ajay Katangur Committee Chairperson Dr. David Thomas Committee Member

ABSTRACT Artificial Neural Networks have been increasingly used in classifying of medical data as they perform consistently better than other techniques like K-Nearest Neighbor, Decision Trees and Support Vector Machines for plain data. This project utilizes Biogeography Based Optimizer and Genetic Algorithm techniques to train a Multilayer Perceptron using the Pima Diabetes dataset using the factors such as Number of Pregnancies, Glucose, Blood Pressure, Triceps Skin Thickness, Insulin, Body Mass Index, Diabetes Pedigree Function and Age to predict whether the female patients has diabetes. The user can interact with the application either through the GUI or the Matlab IDE [1]. The user can also run the application using K- Fold cross validation technique to get a more precise estimate model of the classification algorithm prediction. 2

TABLE OF CONTENTS 1 INTRODUCTION... 10 2 BACKGROUND & RATIONALE... 11 3 MULTILAYER PERCEPTRON... 12 3.1 Weights and Biases... 12 3.2 Training... 12 3.3 Deep Learning... 12 4 BIOGEOGRAPHY BASED OPTIMIZER... 14 4.1 Overview of the BBO Algorithm:... 15 5 GENETIC ALGORITHMS... 16 5.1 Selection in Genetic Algorithm:... 16 5.2 Crossover in Genetic Algorithm:... 16 5.3 Mutation in Genetic Algorithm:... 18 6 NARRATIVE... 19 6.1 Problem Statement:... 19 6.2 Project Objective:... 19 6.3 Scope of the Project:... 19 6.4 Project Functionality:... 20 7 PROPOSED SYSTEM DESIGN... 21 7.1 Use Case Diagram... 21 3

7.2 Class Diagram... 22 7.3 Sequence Diagram... 23 7.4 User Interface... 24 7.5 BBO Options:... 25 7.6 GA Options:... 26 7.7 General Options... 27 7.8 Output Window... 28 7.9 Output of Convergence Curves and Classification Accuracies:... 29 8 IMPLEMENTATION OF THE APPLICATION MODULES... 31 8.1 Normalizing the Dataset provided... 31 8.2 Using the GUI to input the Algorithm & General Options:... 32 8.3 Using the Matlab IDE to edit the Algorithm & Data Options... 34 8.4 Running the application with a K-Fold Cross Validation Technique:... 36 8.5 Initiating the training of the MLP with BBO & GA... 38 8.6 Initializing the MLP:... 39 8.7 Calculating the output using Multilayer Perceptron:... 40 8.8 Calculating the cost of a Generation:... 41 8.9 Training the MLP using Biogeography Based Optimizer... 43 8.10 Training the MLP using Genetic Algorithms:... 46 8.11 Testing the MLP on the testing dataset:... 48 8.12 Displaying the convergence curves & classification rates in a new GUI... 49 4

9 TESTING AND EVALUATION... 51 9.1 Normalizing data before the application is run:... 51 9.2 Running the application from a GUI... 51 9.3 Running the application from Matlab IDE:... 55 9.4 Running the application by changing the values of BBO, GA and General parameters:... 56 9.5 Testing the Application with a new set of parameters:... 58 9.6 Running the application using the K-Fold Cross Validation (K-Fold Value as 10): 60 9.7 Running the application using the K-Fold Cross Validation (K-Fold Value as 5): 61 9.8 The Comparison of Convergence Curves for the K-Fold Run:... 62 9.9 The Comparison of Convergence Curves for the Multiple Populations Run:... 64 10 CONCLUSION... 65 11 FUTURE WORK... 66 12 Bibliography... 67 5

LIST OF FIGURES Figure 3.3.1 How the species move between habitats... 14 Figure 3.3.2 The probability that a habitat contains s species... 15 Figure 5.2.1 Single Point Crossover... 17 Figure 5.2.2 Two Point Crossover... 17 Figure 5.2.3 Uniform Crossover... 18 Figure 5.3.1 Mutation example... 18 Figure 7.1.1 Use Case diagram for Training of Multilayer Perceptron with BBO & GA.. 22 Figure 7.2.1 Class Diagram for training the Multilayer Perceptron using BBO and GA... 23 Figure 7.3.1 Sequence diagram for training the Multilayer Perceptron using BBO and GA 24 Figure 7.4.1 The user interface for entering the Algorithm and Data Options... 25 Figure 7.5.1 BBO Options... 26 Figure 7.6.1 Genetic Algorithm Options... 27 Figure 7.7.1 General Options... 28 Figure 7.8.1 Output Window... 29 Figure 7.9.1 Cost Convergence curve and Classification Accuracies barchart... 30 Figure 8.1.1 Function to normalize diabetes dataset... 32 Figure 8.2.1 Code Snippet for the GUI... 33 Figure 8.2.2 Code Snippet of the Main Class... 34 Figure 8.3.1 Code Snippet for setting the General Options... 35 Figure 8.3.2 Code Snippet for saving the training and testing data options into a Struct... 35 6

Figure 8.3.3 Code Snippet for saving the BBO options into a Struct... 36 Figure 8.3.4 Code Snippet for saving the GA options into a Struct... 36 Figure 8.4.1 Visual representation of how the K-Fold Cross Validation happens... 37 Figure 8.4.2 Code snippet that will display all the points where the data array will be split 37 Figure 8.4.3 Code snippet that performs the K-Fold Cross validation splitting and starts the application... 38 Figure 8.5.1 Code snippet where the training of the MLP will be initiated... 39 Figure 8.6.1 Code snippet where the weights and hidden node biases are initialized... 40 Figure 8.7.1 Code snippet where the output of the multilayer perceptron is calculated... 41 Figure 8.8.1 Code snippet where the cost for every member of the population is calculated 42 Figure 8.9.1 Code snippet for calculating the species count of all the islands... 43 Figure 8.9.2 Code snippet for calculating the immigration and emigration rates... 44 Figure 8.9.3 Code snippet for calculating the immigration and emigration rates... 44 Figure 8.9.4 Code snippet to mutate worst half of the habitats... 45 Figure 8.9.5 Code snippet to replace worst performers of current generation with best performers of the previous generation... 45 Figure 8.10.1 Code snippet for Selection... 46 Figure 8.10.2 Code snippet for Crossover... 47 Figure 8.10.3 Code snippet for Mutation... 47 Figure 8.11.1 Code snippet for determining the classification rates... 49 Figure 8.12.1 Code snippet for displaying the cost convergence curves and classification 7

rates... 50 Figure 9.2.1 Loading the GUI... 51 Figure 9.2.2 Matlab Canvas Editor... 52 Figure 9.2.3 The application GUI... 53 Figure 9.2.4 Cost Convergence Curve and Classification Accuracies... 55 Figure 9.3.1 Classification Accuracies displayed in Matlab Console... 55 Figure 9.3.2 Cost Convergence Curve and Classification Accuracies... 56 Figure 9.4.1 Options GUI... 57 Figure 9.4.2 Cost Convergence Curves and Classification Accuracies... 58 Figure 9.5.1 Options GUI... 59 Figure 9.5.2 Cost Convergence Curve and Classification Accuracies... 60 Figure 9.6.1 Classification Accuracies displayed in the console... 61 Figure 9.7.1 Classification accuracies displayed at the console... 62 Figure 9.8.1 BBO Cost Convergence Curves... 63 Figure 9.8.2 GA Cost Convergence Curves... 63 Figure 9.9.1 Cost Convergence Curves for Multiple Population Runs... 64 8

LIST OF TABLES Table 9.2.1 Default Parameters for the application... 54 Table 9.4.1 Parameters that have been changed for the run... 56 Table 9.5.1 Parameters that have been changed for the run... 58 Table 9.6.1 Parameters that have been changed for the run... 60 Table 9.7.1 Parameters changed for the run... 61 Table 9.9.1 K-Fold Cross Validation Results... 65 9

1 INTRODUCTION In this project, a Matlab application is developed for training the multilayer perceptron [2] using techniques like Biogeography based Optimizer (BBO) and Genetic Algorithms (GA). The application has a GUI that has been developed on Matlab platform. The GUI allows the user to input various BBO algorithm options such as Habitat Modification Probability, Initial Mutation Probability, Maximum Immigration Rate for each Island, Maximum Emigration Rate for each Island, Lower bound for immigration probability per gene, Upper Bound for immigration probability per gene, Number of best habitats to persist between generations and various GA options such as Crossover type (Single Point, Two Point, Uniform), Crossover probability, Mutation probability, Number of best individuals to keep from one generation to the next. The GUI allows the user to select the number of records to be considered for training, testing and the number of input parameters in the dataset, as well as the number of hidden nodes for the multi-layer perceptron, population size and the number of Generations the BBO and GA algorithms will be run to train the multilayer perceptron. The GUI also displays the output back from the classification program. The user also has the option to run the program directly from the Matlab IDE. In this case, the program reads the varied BBO and GA algorithms options as well as the General options to train the multilayer perceptron from a text file. The user will also have the option to select whether to run the program using the K-Fold Cross Validation Technique [3]. The program has been trained and tested with a total of 770 data records. If the program is selected to being run just once then 693 data records will be used for training and 77 data records will be used for testing by default. If the K-Fold cross validation technique option is being chosen then the data set will be divided into 10 equal parts by default and the program will be run 10 times. During each run one of the blocks will be used for training cycling through all the blocks during the 10 runs while the other 9 blocks will be used for testing. 10

2 BACKGROUND & RATIONALE Artificial neural network (ANNs) [4] are a mathematical model, which uses artificial neurons inspired by the neurons and axons connectivity in the human brain. The reasoning for the ANNs is that if the computers were more like the brain they could be good at some of the things humans are good at like pattern recognition and classification. The basic block of a ANN is an artificial neuron. Each artificial neuron can have one or more weighted inputs feeding into it and making use of an activation function generates the output. ANNs are increasing being for machine learning as they can be trained using the existing dataset rather than through detailed logic. ANNs are good at universal approximation as they can learn over multiple generations on how to provide a good mapping of the data they train on related to the output the data expects. A basic ANN might consist of multiple layers (A layer is a set of neurons arranged in a row) of artificial neurons where in each neuron in current layer connects to each neuron in the next layer thereby forming a network. One of the types of networks that is relevant to the current topic is the Feed forward neural network. In a Feed forward neural network the connections between the individual neurons/layers of the neurons do not form a cycle. A feed forward neural network typically consists of an input layer followed by one or more hidden layers and a final output layer. A set of inputs is passed to the first hidden layer and the activations from that layer are passed to the next layer and so on until you reach the output layer where the results of the classification are determined by the scores at each node. This happens for each set of inputs. This series of events starting from the input where each activation is sent to the next layer, and then the next, all the way to the output, is known as forward propagation. 11

3 MULTILAYER PERCEPTRON The first neural nets were born out of the need to address the inaccuracy of an early classifier, the perceptron. It is shown that by using a layered web of perceptrons [4], the accuracy of predictions could be improved. As a result, these new breeds of neural nets are called Multi-Layer Perceptron or MLP. 3.1 Weights and Biases Even though each node in a MLP has the same classifier and none of them fire randomly. If we repeat the input we get the same output. Each edge, which connects a neuron in a layer to a neuron in the next layer, has a unique weight. All neurons that receive their input from neurons in earlier nodes has a unique bias. This means that the combination used for each activation is also unique which is why the neurons fire differently. 3.2 Training The process of improving a neural net s accuracy is called training. The prediction accuracy of a neural nets depends on its weights and biases. For the neural network to predict a value that is as close to the actual output as possible the biases and outputs needs be changed slightly until the desired output is achieved. In training the neural net, the output from forward propagation is compared to the output that is known to be correct and the cost is the difference between the two. The point of training is to make that cost as small as possible, across hundreds of training examples. To do this, the neural network tweaks the weights and biases step by step until the prediction closely matches the correct output. Once trained well, a neural network has the potential to make accurate predictions each time. 3.3 Deep Learning To analyze simple patterns, a basic classification tool like an SVM or Logistic 12

Regression is typically good enough. But when the data has 10s of different inputs or more, neural nets starts to win out over other methods. Still, as the patterns get even more complex, shallow neural networks with a small number of layers can become unusable - the only practical choice is a deep net [5]. One of the key reasons that the deep net can recognize these complex patterns is because the deep nets are able to break the complex patterns down into a series of simpler patterns. For example, let s say that a neural network had to decide whether an image contained a human face. The deep net would first use edges to detect different parts of the face - the lips, nose, eyes, ears and so on - and would then combine the results together to form the whole face. This important feature to use simpler patterns as building blocks to detect complex patterns is what gives deep nets their strength. The accuracy of these deep nets has been very impressive. 13

4 BIOGEOGRAPHY BASED OPTIMIZER Biogeography-based optimizer (BBO) is an optimization algorithm [6], which has been inspired by parts of biological evolution how organisms migrate between islands (also known as habitats) and evolve. It arrives at the global optima for a function by randomly determining the possible solutions and then iteratively improving them (by creating new solutions or moving solutions between habitats) based on how closely the proposed solutions are to the expected solutions. Figure 3.3.1 How the species move between habitats Habitat Suitability Index (HSI) denotes how supportive to life an island is based on suitability index variables (SIV) such as rainfall, vegetation, topography, temperature etc. Islands with high HSI have more population because of the supporting factors for survival and reproduction. Species move from high HSI to low HSI as islands with high HSI have too many species. In the BBO algorithm when species migrate from an island it is assumed that the species are extinct on that island. 14

3.3.2 P s - The probability that a habitat contain s species is calculated as shown in Figure Figure 3.3.2 The probability that a habitat contains s species 4.1 Overview of the BBO Algorithm 1. Compute initial set of parameters (weights and biases) 2. Calculate the Habitat Suitability Index (or fitness) for each of the habitats 3. Compute the immigration rate, emigration rate and number of species for all solution. 4. Move species between habitats based on the immigration and emigration rate. 5. Mutate the worst 1/2 of the population based on the fitness function. 6. Replace the worst habitats of the current generation with the best habitats of the previous generation. 7. Repeat from step (2) until desired number of generations is reached 15

5 GENETIC ALGORITHMS Genetic Algorithm [7] is one of the evolutionary algorithms inspired by the evolutionary process of natural selection. Genetic algorithms approach the solution to the optimization problems by initially creating a random set of solutions and evolves it toward better solutions by using techniques such as Selection, Crossover (how the parents are combined to form children), Mutation (changes to parts of the parents to form children). 5.1 Selection in Genetic Algorithm In this stage, the individuals of the current generation will be selected and then passed on to the crossover stage to produce individuals for the next generation. One of ways to this can be by sorting the individuals by their fitness values and selecting two sets of individuals starting from the 1st individual until the end. 5.2 Crossover in Genetic Algorithm In this stage, the individuals who have been passed on from the selection process (parents) will be combined to form new individuals (children). Each child will share at least some of the characteristics from both parents. The techniques that can be used to perform the crossover are Single-Point, Two-Point and Uniform etc. Single Point: A single point is selected on each of the parents and one of the parts are swapped in the parents results in the children. The example for Single-point crossover is shown in the Figure 5.2.1. 16

Figure 5.2.1 Single Point Crossover Two Point: Two points are selected on each of the parents and the part between them is swapped resulting in the children. The example for Two-point crossover is shown in the Figure 5.2.2. Figure 5.2.2 Two Point Crossover Uniform: In This technique, a random value is generated and compared to an initial agreed upon crossover probability. If the random value is greater than the probability then no swapping will result for that variable. But if the random value is less than the probability then the parent variables are swapped and then assigned to the children. The example for Uniform crossover is shown in the Figure 5.2.3. 17

Figure 5.2.3 Uniform Crossover 5.3 Mutation in Genetic Algorithm In this stage, a set of variables of the individual will be replaced by new variables. The number of variables that will be replaced will be based on the mutation probability. The example for mutation is shown in the figure 5.3.1. Figure 5.3.1 Mutation example 18

6 NARRATIVE 6.1 Problem Statement Arizona Pima Indians as a group has the highest rate of diabetes in the United States compared to genetically similar Pima Indians in Mexico. In 1890 when the water Arizona Pima Indians drink has changed and shift in a diet from traditional to a diet laden with high fat and sugar, the percentage of population with obesity and diabetes has risen. A subset of Pima Indians who has immigrated to Mexico and maintained a traditional diet has remained healthy. The diabetes dataset of 21 year and older Arizona Pima Indian women has been provided based on 8 properties like Number of times pregnant, Plasma Glucose Concentration, Diastolic Blood Pressure, Triceps skinfold thickness, 2-h serum insulin, Body Mass Index, Diabetes pedigree function and Age. The output has also been provided. If the woman has diabetes the value is 1 and 0 otherwise. 6.2 Project Objective The main objective of the proposed system is to train a multi perceptron [8] using techniques such as BBO [9] and GA on a subset of data provided and test on the remaining data to see how viable the techniques are. At the end of the run the proposed system generates how well each of BBO and GA performed in the test. 6.3 Scope of the Project With the explosion of medical data generated, tools are in dire need to make sense of the data. Neural networks have been increasingly used in training medical data as they can be trained using data alone and their classification is reliable. The network thus developed in this project can be used to train other types of medical data in the future. 19

6.4 Project Functionality The project has been developed in Matlab. The following are the functionalities of the application. 1. Ability to run the application from a GUI. 2. Ability to input BBO, GA algorithm as well as General Training and Testing data options on the GUI before the run. 3. Ability to run the application from the Matlab IDE. 4. Ability to read/edit the BBO, GA algorithm variables as well as General Training and Testing Options from the text file when the application is run from the Matlab IDE. 5. Ability to run the application using K-Fold cross validation from the Matlab IDE. 6. Ability to set the number of K-Fold cross validation runs. 20

7 PROPOSED SYSTEM DESIGN 7.1 Use Case Diagram The use case diagram for training of the Multi-Layer Perceptron using BBO and GA algorithms is shown in figure 7.1.1. All the ovals represent the functionality for the user. The user can run the application by using the GUI or running from the Matlab IDE. When the user chooses to run the application through the GUI all the General Data Options, BBO algorithm options and GA options will be sent to the Matlab program and the program starts training the multilayer perceptron and the program output will be sent back to the GUI. In the new tab of the Matlab GUI called Output all the output sent back from the program will be displayed. If the user chooses to edit the data options from a text file and run the program from Matlab IDE then the output of the training will be displayed to the Matlab IDE console. The convergence curves and the classification rates for BBO and GA will be displayed in the new GUI at the end of the run for both ways the user selects to run the application. 21

Figure 7.1.1 Use Case diagram for Training of Multilayer Perceptron with BBO & GA 7.2 Class Diagram The class diagram shown in Figure 7.2.1 displays the most important parts of the application such as the Main class which initiates the training of the BBO and GA algorithms. BBO and GA algorithm in turn calls the MLP Trainer class [10] to train the multilayer perceptron over multiple generations. After the training is done, the Main class calls the MLP Trainer class to identify how well the trained multilayer perceptron performs on the testing data. 22

Figure 7.2.1 Class Diagram for training the Multilayer Perceptron using BBO and GA 7.3 Sequence Diagram The sequence diagram displays how the communication happens between the user and the application. Shown in the Figure 7.3.1 is the sequence diagram of how the communication happens when the user inputs the General data options as well as BBO and GA options on the Matlab GUI and runs the application. As shown in the Figure 7.3.1, over multiple generations BBO and GA will train the Multilayer Perceptron and the output will be displayed at the end 23

of each generation back to the GUI window. After the MLP has been trained and tested, the convergence and classification data will be displayed in a GUI window. Figure 7.3.1 Sequence diagram for training the Multilayer Perceptron using BBO and GA 7.4 User Interface The user interface shown in Figure 7.4.1 has been developed using the Matlab GUI editor. It initially loads with a set of BBO, GA and General options. 24

Figure 7.4.1 The user interface for entering the Algorithm and Data Options 7.5 BBO Options The BBO options that can be changed are Habitat Modification Probability, Initial Mutation Probability, and Maximum Immigration rate for island, Maximum Emigration rate for island, Lower Bound for immigration probability per gene, Upper bound for immigration probability per gene, Number of best habitats to persist from one generation to the next. The GUI for the BBO options is shown in the figure 7.5.1. 25

Figure 7.5.1 BBO Options 7.6 GA Options The GA options that can be changed on the GUI are Crossover Type (one of the options among single point, two point and uniform ), Crossover Probability, Initial mutation probability and Number of best individuals to persist from one generation to the next. The GUI for GA options is shown in figure 7.6.1. 26

Figure 7.6.1 Genetic Algorithm Options 7.7 General Options The General Options that can be changed on the GUI are Number of Data Records, Number of Records for training, Number of input Parameters, and Number of hidden Nodes, Population Size and Number of Generations. The General options GUI is shown in figure 7.7.1. 27

Figure 7.7.1 General Options 7.8 Output Window After the user clicks Run, all the options on the GUI will be sent to the application to be run against. Once the application starts running it will send the output back to the GUI. The sample output displayed in the output window is shown in the Figure 7.8.1. 28

Figure 7.8.1 Output Window 7.9 Output of Convergence Curves and Classification Accuracies After the application is finishing training and testing, the resulting convergence curves and classification accuracies will be displayed in a new window. In convergence curve the number of generations are plotted against the cost of each generation. The classification bar chart is plotted against the algorithm used to train the MLP against the Classification accuracies of said algorithm. Classification accuracies are denoted in percentages. A sample GUI of Convergence Curves and Classification Accuracies is shown in Figure 7.9.1 29

Figure 7.9.1 Cost Convergence curve and Classification Accuracies barchart 30

8 IMPLEMENTATION OF THE APPLICATION MODULES 8.1 Normalizing the Dataset provided The Arizona Pima Indian diabetes dataset provided will be normalized in this step. The original diabetes dataset provided is saved in diabetes.csv file. All the columns in the dataset except the outcome will be normalized using the Min-Max normalization technique. Some of the input parameters are mentioned as 0 in the dataset. Those values are set to NaN before normalization. To get a min-max normalized value of a value in the column the value will be subtracted with the minimum value of the column and divided by the difference of the maximum value of the column and the minimum value of the column. The code snippet for the normalization function is shown in Figure 8.1.1. 31

Figure 8.1.1 Function to normalize diabetes dataset 8.2 Using the GUI to input the Algorithm & General Options: The GUI has been developed using the Matlab drag and drop editor [11]. All the data options entered on the interface will be part of the app. When the user clicks Run - The 32

app is being sent to the Main class where the training process of MLP can start. In the Main class the app variables are being read and then assigned to the local variables of the class and hence used by the Main class and the functions it calls. The code snippet for the GUI is shown in Figure 8.2.1. Figure 8.2.1 Code Snippet for the GUI The code snippet for the Main Class where the app is being read for the algorithm and data options is shown in Figure 8.2.2. 33

Figure 8.2.2 Code Snippet of the Main Class 8.3 Using the Matlab IDE to edit the Algorithm & Data Options The second way to start the application will be by using the Matlab IDE. The user can edit the BBO, GA and General Options and call the Main class where the training of the MLP can begin. Global variables in Matlab can be used to edit and fetch the value of the variable across the entire project. A global struct called OPTIONS is being declared here which will be filled up with the Algorithm & Data options. The struct OPTIONS will be read across the project to fetch the Algorithm & Data options. The code snippet which shows how the General Options can be set when the application is run from Matlab IDE is shown in the Figure 8.3.1. 34

Figure 8.3.1 Code Snippet for setting the General Options The code snippet which shows how the General data options are being saved to the global struct OPTIONS is shown in the Figure 8.3.2. Figure 8.3.2 Code Snippet for saving the training and testing data options into a Struct The code snippet which shows how the BBO algorithm options are being saved to the 35

global struct OPTIONS is shown in Figure 8.3.3. Figure 8.3.3 Code Snippet for saving the BBO options into a Struct The code snippet which shows how the GA algorithm options are being saved to the global struct OPTIONS is shown in Figure 8.3.4. Figure 8.3.4 Code Snippet for saving the GA options into a Struct 8.4 Running the application with a K-Fold Cross Validation Technique During the K-Fold Cross Validation technique, the dataset of 770 records will be split 36

into 10 groups [3]. The application will be run 10 times. During each of the runs one of the groups will be selected as the testing group starting from the first group and the other 9 groups will be used to train the MLP. Figure 8.4.1 Visual representation of how the K-Fold Cross Validation happens in Figure 8.4.2 The snippet of the code showing all the positions where the array will be split is shown Figure 8.4.2 Code snippet that will display all the points where the data array will be split 37

The snippet of the code, which performs the K-Fold, cross validation splitting and starts the application is shown in Figure 8.4.3 Figure 8.4.3 Code snippet that performs the K-Fold Cross validation splitting and starts the application 8.5 Initiating the training of the MLP with BBO & GA After the algorithms and general options has been loaded into the Matlab space, the training of the MLP using BBO and GA will be initiated. The snippet of code where the training is initiated is shown in the Figure 8.5.1 38

Figure 8.5.1 Code snippet where the training of the MLP will be initiated 8.6 Initializing the MLP The MLP will have three layers of nodes. It will be initialized according to the values mentioned in the General data options. The first layer consists of input nodes [ref: No. Of Input Params in the GUI]. The second layer consists of hidden nodes [ref: No. Of hidden nodes in the GUI] and the third layer consists of a single output node [according to the Diabetes dataset]. All the input nodes connect to each hidden node and all hidden nodes will connect to the output node. Each hidden node will also have a bias term. Connections are represented as weights in the application. All the weights and bias terms will be initialized during the creation of the MLP for each individual member of the population [ref: Population Size in GUI]. The number of variables that needs to be initialized for each member of the population will be (No. of input nodes * No. of hidden nodes) + No. of hidden nodes + No. of hidden nodes The code snippet where the weights and hidden node biases are initialized is shown in Figure 8.6.1. 39

Figure 8.6.1 Code snippet where the weights and hidden node biases are initialized 8.7 Calculating the output using Multilayer Perceptron The output of the MLP will be calculated upon the following parameters. A single row from the dataset will be the set of inputs. The linear of combination of inputs and weights from the input layer to the hidden node is calculated. The input to the hidden node will be the linear combination calculated earlier and the bias term. The output of the hidden node will be the sigmoidal function of the input to it. The output function of the hidden node is represented as 1/(1 + e x ). The Linear combination of all the hidden node outputs and weights from the hidden layer to the output node is calculated. The input to the output node will be the linear combination calculated earlier and the bias term. The output of the output node will be the sigmoidal function of the input to it. The output function of the output node is represented as 1/(1 + e x ). The code snippet where the output of the multilayer perceptron is calculated based on the input values, the weights and bias terms is shown in the Figure 8.7.1 40

Figure 8.7.1 Code snippet where the output of the multilayer perceptron is calculated 8.8 Calculating the cost of a Generation The goal of training the MLP is to get the output of the MLP as close to the expected output as possible. The cost variable tells us how close the output of the MLP is to the expected output. In a Generation, the cost will be calculated across all the training datasets and taking all the members of the population into account. Error is the difference between the expected output and the output of the MLP. The mean squared error will be calculated for all 41

the training datasets and for all members of the population. The code snippet for the cost calculation for every member of the population is shown in the Figure 8.8.1. Figure 8.8.1 Code snippet where the cost for every member of the population is calculated 42

8.9 Training the MLP using Biogeography Based Optimizer In the context of BBO, the Population Size mentioned in the GUI is mapped to the islands. In the first step, a set of variables (weights & biases for the MLP) for every member of the island will be initialized. The cost will be calculated for each member of the island using the fitness function. The island with lower cost has a high Habitability Suitability Index (HSI) hence more species exist on that island. If the islands are sorted based on cost in an ascending order, then their respective species counts will be in descending order. The code snippet for calculating the species count of all the islands is shown in Figure 8.9.1. Figure 8.9.1 Code snippet for calculating the species count of all the islands After the species count has been initialized, the immigration rate and emigration rate for all islands will be calculated. The code snippet for calculating the immigration and emigration rates is shown in Figure 8.9.2. 43

Figure 8.9.2 Code snippet for calculating the immigration and emigration rates The immigration and emigration rates will be used to move species between islands. The code snippet for exchanging species between islands is shown in Figure 8.9.3 Figure 8.9.3 Code snippet for calculating the immigration and emigration rates 44

Based on the fitness function, the worst half of the habitats will be mutated. The code snippet to mutate worst half of habitats is shown in the Figure 8.9.4. Figure 8.9.4 Code snippet to mutate worst half of the habitats The variables of the two worst performing islands of the current generation will be replaced by the variables of the two best performing islands from the previous generation. The code snippet for replacing the two worst performing islands of the current generation with the two best performing islands from the previous generation is shown in the Figure 8.9.5. Figure 8.9.5 Code snippet to replace worst performers of current generation with best performers of the previous generation All the above steps will be repeated until the desired number of generations is reached. 45

8.10 Training the MLP using Genetic Algorithms During the first stage, the MLP will be initialized for all members of the population with random weights and biases. After the initialization, the cost will be calculated for each member of the population. The fitness score for each member will be the inverse of its cost. The training of the MLP using GA will be done over multiple generations. In every generation, the current population will go through selection, crossover and mutation to generate a set of new populations. In the process of selection, two of the members of the population will be selected (called parents) based on technique based on roulette wheel of Inverse Costs. The code snippet for selecting 2 of the members from a current set of populations based on their inverse costs is shown in the Figure 8.10.1. Figure 8.10.1 Code snippet for Selection The members of the population selected will be passed on to the Crossover stage. Crossover can be done using either one of the techniques such as single point crossover, multipoint crossover and uniform crossover. The code snippet for single point crossover is 46

shown in the Figure 8.10.2. Figure 8.10.2 Code snippet for Crossover The parent population, besides the top performing members ( the best performing members to persist from one generation to the next ) from the previous generation, will be replaced with the new members obtained by the crossover. After the crossover, the members of the population, besides the top performing members ( the best performing members to persist from one generation to the next ) from the previous generation will be mutated based on the mutation probability. The code snippet for mutating the members of the population is shown in the Figure 8.10.3. Figure 8.10.3 Code snippet for Mutation 47

Mutation will finish the process of generating the new population. All the above steps will be performed until the desired number of generations is reached. 8.11 Testing the MLP on the testing dataset After the training of the MLP has been finished, the MLP will be tested for determining the classification accuracies of the BBO and GA. The classification accuracies need to be determined for the testing dataset. The input parameters of each row in the training dataset will be sent to the MLP and if the output of the MLP is close to the desired output (the absolute difference between expected output and generated output should be less than 0.1), then the classification is a success otherwise it is a failure. The code snippet for determining the classification accuracies is shown in Figure 8.11.1. 48

Figure 8.11.1 Code snippet for determining the classification rates 8.12 Displaying the convergence curves & classification rates in a new GUI After the testing of the MLP has been done and classification rates has been determined the cost of MLP with BBO and GA across multiple generations needs to be displayed as a convergence curve. The classification rates will be displayed in bar chart for BBO and GA. The code snippet that displays the cost convergence curves and classification rates in the new GUI is shown in the Figure 8.12.1. 49

Figure 8.12.1 Code snippet for displaying the cost convergence curves and classification rates 50

9 TESTING AND EVALUATION In this chapter, functional evaluation of the product will be discussed. The application has been tested using Matlab version R2017a (9.2.0.538062). 9.1 Normalizing data before the application is run The Arizona Pima Indian diabetes dataset will be normalized before it can be used to train a MLP. The original dataset is saved as diabetes.csv. Normalized dataset will be saved as diabetesprocessed.csv. All the columns of the dataset will be normalized using the Min- Max normalization function. Running the NormalizeData class from the Matlab IDE will normalize the data. 9.2 Running the application from a GUI The GUI canvas designer can be shown by selecting Open from the Matlab window and selecting the file MLP_BBO_GA.mlapp is shown in Figure 9.2.1. Figure 9.2.1 Loading the GUI The file MLP_BBO_GA.mlapp will be displayed in the Matlab GUI canvas designer 51

as shown in Figure 9.2.2. Figure 9.2.2 Matlab Canvas Editor Clicking Run will display the window where the user input the Algorithm and Data Options is shown in Figure 9.2.3. 52

Figure 9.2.3 The application GUI 53

The default parameters of the application are shown in the Table 9.2.1. Table 9.2.1 Default Parameters for the application Parameter Value Habitat Modification Probability (BBO) 1 Initial Mutation Probability (BBO) 0.005 Max. Immigration rate for island (BBO) 1 Max. Emigration rate for island (BBO) 1 Lower Bound for immigration probability per gene (BBO) 0 Upper Bound for immigration probability per gene (BBO) 1 No. of top habitats to persist between generations (BBO) 2 Crossover Type (GA) Single Point Initial Mutation Probability (GA) 0.01 No. of Top individuals to persist between Generations (GA) 2 No. of Data Records (General) 770 No. of Records for Training (General) 693 No. of Input Params (General) 8 No. of Hidden Params (General) 19 Population Size (General) 50 Generations (General) 249 Upon clicking Run, the application will start training the MLP using BBO and GA. After the MLP has been trained using the BBO and GA using the training data set, the remaining dataset will be used to test the MLP. After the testing is done, the classification accuracies will be calculated. The cost convergence curves along with the classification accuracies will be displayed in a new window as shown in Figure 9.2.4. 54

0.035 Convergence Curves BBO GA 80 70 Classification Accuracies 0.03 60 MSE 0.025 0.02 0.015 Classification rate (%) 50 40 30 20 0.01 10 50 100 150 200 250 Generation 0 BBO GA BBO GA BBO Algorithm Figure 9.2.4 Cost Convergence Curve and Classification Accuracies 9.3 Running the application from Matlab IDE The user can run the application from the Matlab IDE. The starting point for the application is RunFromTextFile class. The user can edit the algorithm and data options in this file directly After editing the options and saving the file, the user can start the application by clicking the Run in the toolbar of the Matlab IDE. The application will start and the MLP will be trained using the BBO and GA algorithms on the training dataset. After the MLP has been trained, the MLP will be tested using the testing dataset for calculating the classification accuracy. After the testing has finished, the classification accuracies will be displayed to the console as shown in Figure 9.3.1. Figure 9.3.1 Classification Accuracies displayed in Matlab Console 55

The cost convergence curves and the classification rates for BBO and GA will be displayed in a new GUI as shown in Figure 9.3.2. 0.026 0.024 Convergence Curves BBO GA 80 70 Classification Accuracies 0.022 60 MSE 0.02 0.018 0.016 0.014 0.012 Classification rate (%) 50 40 30 20 0.01 10 0.008 50 100 150 200 250 Generation 0 BBO GA BBO GA BBO Algorithm Figure 9.3.2 Cost Convergence Curve and Classification Accuracies 9.4 Running the application by changing the values of BBO, GA and General parameters The parameters have been changed on the GUI are shown in Table 9.4.1. Table 9.4.1 Parameters that have been changed for the run Parameter Value Habitat Modification Probability (BBO) 0.5 (Default: 1) Crossover Type (GA) Two Point (Default: Single Point ) No. Of hidden Nodes (General Options) 29 (Default: 19 ) 56

The options GUI with the parameter values changed is shown in Figure 9.4.1. Figure 9.4.1 Options GUI After the application has finished running with these parameters, the cost convergence curves and classification rates are shown in the Figure 9.4.2. 57

0.022 Convergence Curves BBO GA 80 70 Classification Accuracies 0.02 60 MSE 0.018 0.016 0.014 0.012 Classification rate (%) 50 40 30 20 0.01 10 0.008 50 100 150 200 250 Generation 0 BBO GA BBO GA BBO Algorithm Figure 9.4.2 Cost Convergence Curves and Classification Accuracies 9.5 Testing the Application with a new set of parameters The parameters have been changed on the GUI are displayed in Table 9.5.1. Table 9.5.1 Parameters that have been changed for the run Parameter Value Max. Immigration rate for island (BBO) 0.5 (Default: 1) Max. Emigration rate for island (BBO) 0.5 (Default: 1) Initial Mutation Probability (GA) 0.02 (Default: 0.01) No. Of Generations (General Options) 349 (Default: 249) 58

The options GUI with the parameter values changed is shown in Figure 9.5.1. Figure 9.5.1 Options GUI After the application has finished running with these parameters, the cost convergence curve and classification rates are shown in the Figure 9.5.2. 59

Figure 9.5.2 Cost Convergence Curve and Classification Accuracies 9.6 Running the application using the K-Fold Cross Validation (K-Fold Value as 10) To run the application using K-Fold means cross validation, the value of the variable iskfoldon in the class RunFromTextFile needs to be set to 1. The parameters that needs to be set before the run are shown in the Table 9.6.1. Table 9.6.1 Parameters that have been changed for the run Parameter Value iskfoldon 1 (Default: 0) numberofkfolds 10 (Default: 10) The Application can start using the Run in the toolbar of the Matlab IDE. The 60

application starts training the MLP using BBO and GA on the training dataset. After the training has finished, the MLP will be tested using the testing dataset to get the classification accuracies. At the end of the K-Fold Run, the classification accuracies for the 10 distinct testing datasets along with the average classification rate will be displayed at the console. The classification accuracies of the BBO are displayed in the first column and the classification accuracies of GA are displayed in the second column as shown in the Figure 9.6.1. Figure 9.6.1 Classification Accuracies displayed in the console 9.7 Running the application using the K-Fold Cross Validation (K-Fold Value as 5) To run the application using the k-fold cross validation by splitting the dataset in to 5 groups, the parameters that needs to be changed in the RunFromTextFile file are shown in Table 9.7.1. Table 9.7.1 Parameters changed for the run Parameter Value iskfoldon 1 (Default: 0) 61

numberofkfolds 5 (Default: 10) At the end of the K-Fold Run, the classification accuracies for the 5 distinct testing datasets along with the average classification rate will be displayed at the console as shown in Figure 9.7.1. Figure 9.7.1 Classification accuracies displayed at the console 9.8 The Comparison of Convergence Curves for the K-Fold Run The convergence curves for the K-Fold Run has been compared with the Value K being 10. The Convergence Curves for BBO are shown in Figure 9.8.1. 62

MSE 0.03 0.025 0.02 0.015 1 2 3 4 5 MSE 0.035 0.03 0.025 0.02 0.015 6 7 8 9 10 0.01 0.01 50 100 150 200 250 Generation 50 100 150 200 250 Generation 0.02 0.012 MSE - Average 0.015 0.01 MSE - Best 0.011 0.01 0.009 0.008 50 100 150 200 250 Generation 50 100 150 200 250 Generation Figure 9.8.1 BBO Cost Convergence Curves The Convergence Curves for GA are shown in Figure 9.8.2. 0.03 1 0.035 6 0.025 2 3 0.03 7 8 MSE 0.02 0.015 4 5 MSE 0.025 0.02 0.015 9 10 0.01 0.01 50 100 150 200 250 Generation 50 100 150 200 250 Generation 0.02 0.012 MSE - Average 0.015 0.01 MSE - Best 0.011 0.01 0.009 0.008 50 100 150 200 250 Generation 50 100 150 200 250 Generation Figure 9.8.2 GA Cost Convergence Curves 63

In the images, the top left plot displays the convergence curves for the first 5 runs. The top right plot displays the convergence curves for the next 5 runs. The bottom left plot displays the average of the convergence curves and the bottom right plot displays the best of the convergence curves. 9.9 The Comparison of Convergence Curves for the Multiple Populations Run The comparison of Convergence Curves for runs with population sizes as 50, 100, 200 and 400 is shown in the Figure 9.9.1. Figure 9.9.1 Cost Convergence Curves for Multiple Population Runs In the image, the top left plot is for the run with population size 50, the top right plot is for run with population size 100, the bottom left plot is for population run 100 and the bottom right plot is for population run 200. 64

10 CONCLUSION We have got the classification results averaging about 71.5% during multiple runs. For the K-Fold cross validation the results have been displayed in the Table 9.9.1 Table 9.9.1 K-Fold Cross Validation Results K-Fold No BBO GA 1 64.9351 63.6364 2 83.1169 74.0260 3 68.8312 64.9351 4 58.4416 66.2338 5 75.3247 75.3247 6 71.4286 71.4286 7 80.5195 79.2208 8 79.2208 79.2208 9 71.4286 71.4286 10 72.7273 68.8312 Average % 72.5974 71.4286 The best classification accuracies we have received are 80.5175% and 79.2208% for BBO and GA respectively. For the K-Fold Cross Validation run, the average classification accuracies for BBO and GA are 72.5974% and 71.4286% for BBO and GA respectively. The results show that ANNs are indeed a viable method for classification. 65

11 FUTURE WORK The optimal values of the variables for the BBO options, GA options and Number of hidden nodes can be found out by changing the values of these variables across multiple runs and seeing which values are yielding the best classification rates for the dataset Architecture of the ANN is one of the crucial things, which can change the classification accuracies. So, multiple architectures can be developed and tested to see if they can improve the classification rate. 66

12 Bibliography [1] Mathworks, "Matlab," [Online]. Available: https://www.mathworks.com/products/matlab.html. [2] Alex, "Neural Networks (Part 2) - Training," [Online]. Available: https://www.youtube.com/watch?v=uhpkdzlutu0. [3] "k-fold cross-validation," [Online]. Available: https://en.wikipedia.org/wiki/crossvalidation_(statistics)#k-fold_cross-validation. [4] Alex, "Neural Networks (Part 1)," [Online]. Available: https://www.youtube.com/watch?v=p02xwy63q6u. [5] M. Nielsen, "Neural Networks and Deep Learning," [Online]. Available: http://neuralnetworksanddeeplearning.com/index.html. [6] D. Simon, "Biogeography-Based Optimization," [Online]. Available: http://ieeexplore.ieee.org/document/4475427/references. [7] Wikipedia, "Genetic Algorithms," [Online]. Available: https://en.wikipedia.org/wiki/genetic_algorithm. [8] H. Temurtas, "A comparative study on diabetes disease diagnosis using neural networks," [Online]. Available: http://www.sciencedirect.com/science/article/pii/s0957417408007306. [9] S. Mirjalili, "Let a biogeography-based optimizer train your Multi-Layer Perceptron," [Online]. Available: http://www.sciencedirect.com/science/article/pii/s0020025514000747. [10] P. S. Sengupta, "Multi-Class Classification Using Multi-layered Perceptrons," [Online]. Available: https://www.youtube.com/watch?v=pn1-u6zpo0e. [11] Mathworks, "MATLAB App Designer," [Online]. Available: https://www.mathworks.com/products/matlab/app-designer.html. 67

[12] S. Mirjalili, "Neural Networks Projects," [Online]. Available: www.alimirjalili.com/projects.html. 68