Artificial Neural Networks Artificial Intelligence Santa Clara, 2016
Simulate the functioning of the brain Can simulate actual neurons: Computational neuroscience Can introduce simplified neurons: Neural networks
Neural nodes: Have input links Input function Activation function Output Connection to other neurons
Bias Weight a 0 = 1 a j = g(in j ) w 0,j a i wi,j Σ in j g a j Input Links Input Function Activation Function Output Output Links
Example: Perceptron network (two layers) 1 w 1,3 3 1 w 1,3 3 w 3,5 5 w 1,4 w 1,4 w 3,6 2 w 2,3 w 2,4 4 2 w 2,3 w 2,4 4 w 4,5 w 4,6 6 (a) (b)
Neural networks: Composed of nodes Nodes emit an activation value Connected by links with various weights 1 7 2 5 8 3 6 9 4
Behavior of a node: Calculate input as weighted sum in j = jx i=0 w i,j a i Apply activation function g to yield activation a j = g in j = jx w i,j a i i=0
Activation function is either a threshold function yielding a perceptron or a sigmoid function: yielding a sigmoid perceptron or something unusual
1 0.5 0-8 -6-4 -2 0 2 4 6 8 A threshold function
1 Sigmoid function y = 1 1+e x 0.5 0-6 -4-2 0 2 4 6
Neural networks can be organized as Recurrent networks: Output is fed back to the input nodes Implement some type of short-term memory Outputs could stabilize, oscillate, or remain chaotic Feed-forward networks Connections only move forward Most popular
Feed-forward networks are usually arranged in layers Networks with two layers: Severely limited because of linearity Input 1 Input 2 Output x 1 1 x 1 1 x 1 1? 0 0 1 x 2 0 0 1 x 2 0 0 1 x 2 (a) x 1 and x 2 (b) x1 or x2 (c) x1 xor x2
Multilayer feed-forward neural networks Standard architecture until recently Network is a function parametrized by the vector of weights Networks learns by adjusting weights
Input 1 Middle 1 Output Input 2 Middle 2
Use two soft threshold functions to produce a ridge h W (x 1, x 2 ) 1 0.8 0.6 0.4 0.2 0-4 -2 0 x 2-4 -2 0 1 4 2 4 x 2
Combine two ridges to form a bump h W (x 1, x 2 ) 1 0.8 0.6 0.4 0.2 0-4 -2 0 x 1 2 4-4 -2 0 2 4 x 2
Single hidden layer gives lots of possibilities for generating functions
Learning: Need to correct errors in output by adjusting weights Back-propagation algorithm: Output error can be adjusted by the weights between hidden layer and output layer Middle 1 Input 1 Middle 2 Input 2 Middle 3 Output Input 3 Middle 4 Input Layer Hidden Layer Output Layer
Middle 1 Input 1 Back-propagation Input 2 Middle 2 Output Also: make each hidden Input 3 Middle 3 layer node responsible for Middle 4 part of the error Input Layer Hidden Layer Output Layer This is the backpropagation Then adjust the weights between input and hidden layer by using the same update rule
There is no good theory to decide: The number of hidden layers The number of nodes in each hidden layer Too many nodes: Tend to overtrain Too few nodes Tend to not train well The number of connections Can remove connections experimentally The optimal brain damage algorithm
Training a neural network Give a set of examples: inputs with the expected output the training set Then try out goodness on an additional set of examples the validation set Need to keep both sets strictly separate Need to try out various architectures and select the one that works best
Deep Neural Networks Neural networks with many more hidden layers Require additional learning methods Learning for partial networks Neural networks are good at pattern recognition Adding layers to the input might facilitate learning This sub-neural network is trained for the identity function, but it is hoped that it smoothes the input to make it easier to learn for the rest of the network
Deep Neural Networks Started to win competitions Used now in many applications