ANN Development with EC Tools An Overview (Artificial Intelligence)

INTRODUCTION

Among all of the Artificial Intelligence techniques, Artificial Neural Networks (ANNs) have shown to be a very powerful tool (McCulloch & Pitts, 1943) (Haykin, 1999). This technique is very versatile and therefore has been succesfully applied to many different disciplines (classification, clustering, regression, modellization, etc.)

However, one of the greatest problems when using ANNs is the great manual effort that has to be done in their development. A big myth of ANNs is that they are easy to work with and their development is almost automatically done. This development process can be divided into two parts: architecture development and training and validation. As the network architecture is problem-dependant, the design process of this architecture used to be manually performed, meaning that the expert had to test different architectures and train them until finding the one that achieved best results after the training process. The manual nature of the described process determines its slow performance although the training part is completely automated due to the existence of several algorithms that perform this part.

With the creation of Evolutionary Computation (EC) tools, researchers have worked on the application of these techniques to the development of algorithms for automatically creating and training ANNs so the whole process (or, at least, a great part of it) can be automatically performed by computers and therefore few human efforts has to be done in this process.

BACKGROUND

EC is a set of tools based on the imitation of the natural behaviour of the living beings for solving optimization problems. One of the most typical subset of tools inside EC is called Evolutionary Algorithms (EAs), which are based on natural evolution and its implementation on computers. All of these tools work with the same basis: a population of solutions to that particular problem is randomly created and an evolutionary process is applied to it. From this initial random population, the evolution is done by means of selection and combination of the best individuals (although the worst ones also have a small probability of being chosen) to create new solutions. This process is carried out by selection, crossover, and mutation operators. These operators are typically used in biology in its evolution for adaptation and survival. After several generations, it is hoped that the population contains a good solution to the problem.

The first EA to appear was Genetic Algorithms (GAs), in 1975 (Holland, 1975). With the working explained above, GAs use a binary codification (i.e., each solution is codified into a string of bits). Later, in the early 90s a new technique appeared, called Genetic Programming (GP). This one is based ob the evolution oftrees, i.e., each individual is codified as a tree instead of a binary string. This allows its application to a wider set of environments.

Although GAs and GP are the two most used techniques in EAs, more tools can be classified as part of this world, such as Evolutionary Programming or Evolution Strategies, all of them with the same basis: the evolution of a population following the natural evolution rules.

DEVELOPMENT OF ANNS WITH EC TOOLS

The development of ANNs is a topic that has been extensively dealt with very diverse techniques. The world of evolutionary algorithms is not an exception, and proof of that is the great amount of works that have been published about different techniques in this area (Cantu-Paz & Kamath, 2005). These techniques follow the general strategy of an evolutionary algorithm: an initial population consisting of different genotypes, each one of them codifying different parameters (typically, the weight of the connections and / or the architecture of the network and / or the learning rules), and is randomly created. This population is evaluated in order to determine the fitness of each individual. Afterwards, this population is repeatedly made to evolve by means of different genetic operators (replication, crossover, mutation, etc.) until a determined termination criteria is fulfilled (for example, a sufficiently good individual is obtained, or a predetermined maximum number of generations is achieved).

Essentially, the ANN generation process by means of evolutionary algorithms is divided into three main groups: evolution of the weights, architectures, and learning rules.

Evolution of Weights

The evolution of the weights begins with a network with a predetermined topology. In this case, the problem is to establish, by means of training, the values of the network connection weights. This is generally conceived as a problem of minimization of the network error, taken, for example, as the result of the Mean Square Error of the network between the desired outputs and the ones achieved by the network. Most the training algorithms, such as the backpropagation algorithm (BP) (Rumel-hart, Hinton & Williams, 1986), are based on gradient minimization. This has several drawbacks (Whitley, Starkweather & Bogart, 1990), the most important is that quite frequently the algorithm becomes stuck in a local minimum of the error function and is unable of finding the global minimum, especially if the error function is multimodal and / or non-differentiable. One way of overcoming these problems is to carry out the training by means of an Evolutionary Algorithm (Whitley, Starkweather & Bogart, 1990); i.e., formulate the training process as the evolution of the weights in an environment defined by the network architecture and the task to be done (the problem to be solved). In these cases, the weights can be represented in the individuals’ genetic material as a string of binary values (Whitley, Starkweather & Bogart, 1990) or a string of real numbers (Greenwood, 1997). Traditional genetic algorithms (Holland, 1975) use a genotypic codification method with the shape of binary strings. In this way, much work has emerged that codifies the values of the weights by means of a concatenation of the binary values which represent them (Whitley, Starkweather & Bogart, 1990). The big advantage of these approximations is their generality and that they are very simple to apply, i.e., it is very easy and quick to apply the operators of uniform crossover and mutation on a binary string. The disadvantage of using this type of codification is the problem of permutation. This problem was raised upon considering that the order in which the weights are taken in the string causes equivalent networks to possibly correspond with totally different individuals. This leads the crossing operator to become very inefficient. Logically, the weight value codification has also emerged in the form of real number concatenation, each one of them associated with a determined weight (Greenwood 1997). By means of genetic operators designed to work with this type of codification, and given that the existing ones for bit string cannot be used here, several studies (Montana & Davis, 1989) showed that this type of codification produces better results and with more efficiency and scalability than the BP algorithm.

Evolution of the Architectures

The evolution of the architectures includes the generation of the topological structure; i.e., the topology and connectivity of the neurons, and the transfer function of each neuron of the network. The architecture of a network has a great importance in order to successfully apply the ANNs, as the architecture has a very significant impact on the process capacity of the network. In this way, on one hand, a network with few connections and a lineal transfer function may not be able to resolve a problem that another network having other characteristics (distinct number of neurons, connections or types of functions) would be able to resolve. On the other hand, a network having a high number of non-lineal connections and nodes could be overfitted and learn the noise which is present in the training as an inherent part of it, without being able to discriminate between them, and in the end, not have a good generalization capacity. Therefore, the design of a network is crucial, and this task is classically carried out by human experts using their own experience, based on “trial and error”, experimenting with a different set of architectures. The evolution of architectures has been possible thanks to the appearance of constructive and destructive algorithms (Sietsma & Dow, 1991). In general terms, a constructive algorithm begins with a minimum network (with a small number of layers, neurons and connections) and successively adds new layers, nodes and connections, if they are necessary, during the training. A destructive algorithm carries out the opposite operation, i.e., it begins with a maximum network and eliminates unnecessary nodes and connections during the training. However, the methods based on Hill Climbing algorithms are quite susceptible into falling to a local minimum (Angeline, Suders & Pollack, 1994).

In order to develop ANN architectures by means of an evolutionary algorithm, it is necessary to decide how to codify a network inside the genotype so it can be used by the genetic operators. For this, different types of network codifications have emerged.

In the first codification method, direct codification, there is a one-to-one correspondence between the genes and the phenotypic representation (Miller, Todd & Hedge, 1989). The most typical codification method consists of a matrix C=(c„) of NxN size which represents an architecture of N nodes, where c.. indicates the presence or absence of a connection between the i and j nodes. It is possible to use c„=1 to indicate a connection and c=0 to indicate an absence of connection. In fact, c could take real values instead of Booleans to i) represent the value of the connection weight between neuron “i” and “j”, and in this way, architecture and connections can be developed simultaneously (Alba, Aldana & Troya, 1993). The restrictions which are required in the architectures can easily be incorporated into this representational scheme. For example, a feedforward network would have non-zero coefficients only in the upper right hand triangle of the matrix. These types of codification are generally very simple and easy to implement. However, they have a lot of disadvantages, such as scalability, the impossibility of codifying repeated structures, or permutation (i.e., different networks which are functionally equivalent can correspond with different genotypes) (Yao & Liu, 1998).

As a counterproposal to this type of direct codification method, there are also the indirect codification types in existence. With the ob.ective of reducing the length of the genotypes, only some of the characteristics of the architecture are codified into the chromosome. Within this type of codification, there are various types of representation.

First, the parametric representations have to be mentioned. The network can be represented by a set of parameters such as the number of hidden layers, the number of connections between two layers, etc. There are several ways of codifying these parameters inside the chromosome (Harp, Samad & Guha, 1989). Although the parametric representations can reduce the length of the chromosome, the evolutionary algorithm makes a search in a limited space within the possible searchable space that represents all the possible architectures. Another type of non-direct codification is based on a representational system with the shape of grammatical rules (Yao & Shi, 1995). In this system, the network is represented by a set of rules, with shape of production rules, which will build a matrix that represents the network.

Other types of codification, more inspired in the world of biology, are the ones known as “growing methods”. With them, the genotype does not codify the network any longer, but instead it contains a set of instructions. The decodification of the genotype consists of the execution of these instructions, which will provoke the construction of the phenotype (Husbands, Harvey, Cliff & Miller, 1994). These instructions usually include neural migrations, neuronal duplication or transformation, and neuronal differentiation.

Finally, and within the indirect codification methods, there are other methods which are very different from the ones already described. Andersen describes a technique in which each individual of a population represents a hidden node instead of the architecture (Andersen & Tsoi, 1993). Each hidden layer is constructed automatically by means of an evolutionary process which uses a genetic algorithm. This method has the limitation that only feed-forward networks can be constructed and there is also a tendency for various nodes with a similar functionality to emerge, which inserts some redundancy inside the network that must be eliminated.

One important characteristic is that, in general, these methods only develop architectures, which is the most common, or else architectures and weights together. The transfer function of each architecture node is assumed to have been previously determined by a human expert, and that it is the same for all of the network nodes (at least, for all of the nodes of the same layer), although the transfer function has been shown to have a great importance on the behaviour of the network (Lovell & Tsoi, 1992). Few methods have been developed which cause the transfer function to evolve, and, therefore, had little repercussion in the world of ANNs with EC.

Evolution of the Learning Rule

Another interesting approximation to the development of ANNs by means of EC is the evolution of the learning rule. This idea emerges because a training algorithm works differently when it is applied to networks with different architectures. In fact, and given that a priori, the expert usually has very few knowledge about a network, it is preferable to develop an automatic system to adapt the learning rule to the architecture and the problem to be resolved.

There are several approximations to the evolution of the learning rule (Crosher, 1993) (Turney, Whitley & Anderson, 1996), although most of them are based only on how the learning can modify or guide the evolution, and in the relation between the architecture and the connection weights. Actually, there are few works that focus on the evolution of the learning rule in itself (Bengio & Bengio, Cloutier & Gecsei, 1992) (Ribert, Stocker, Lecourtier & Ennaji, 1994).

One of the most common approaches is based on setting the parameters of the BP algorithm: learning rate and momentum. Some authors propose methods in which an evolutionary process is used to find these parameters while leaving the architecture constant (Kim, Jung, Kim & Park, 1996). Other authors, on the other hand, propose codifying these BP algorithm parameters together with the network architecture inside of the individuals of the population (Harp, Samad & Guha, 1989).

FUTURE TRENDS

The evolution of ANNs has been a research topic since some decades ago. The creation of new EC and, in general, new AI techniques and the evolution and improvement of the existing ones allow the development of new methods of automatically developing of ANNs. Although there are methods that (more or less) automatically develop ANNs, they are usually not very efficient, since evolution of architectures, weights and learning rules at once leads to having a very big search space, so this feature definitely has to be improved.

CONCLUSION

The world of EC has provided a set of tools that can be applied to optimization problems. In this case, the problem is to find an optimal architecture and/or weight value set and/or learning rule. Therefore, the development of ANNs was converted into an optimization problem. As the described techniques show, the use of EC techniques has made possible the development of ANNs without human intervention, or, at least, minimising the participation of the expert in this task.

As has been explained, these techniques have some problems. One of them is the already explained permutation problem. Another problem is the loss of efficiency: the more complicated the structure to evolve is (weigths, learning rule, architecture), less efficient the system will be, because the search space becomes much bigger. If the system has to evolve several things at a time (for example, architecture and weights so the ANN development is completely automated), this loss of efficiency increases. However, these systems still work faster than the whole manual process of designing and training several times an ANN.

KEY TERMS

Artificial Neural Networks: Interconnected set of many simple processing units, commonly called neurons, that use a mathematical model, that represents an input/output relation,

Back-Propagation Algorithm: Supervised learning technique used by ANNs, that iteratively modifies the weights of the connections of the network so the error given by the network after the comparison of the outputs with the desired one decreases.

Evolutionary Computation: Set of Artificial Intelligence techniques used in optimization problems, which are inspired in biologic mechanisms such as natural evolution.

Genetic Programming: Machine learning technique that uses an evolutionary algorithm in order to optimise the population of computer programs according to a fitness function which determines the capability of a program for performing a given task.

Genotype: The representation of an individual on an entire collection of genes which the crossover and mutation operators are applied to.

Phenotype: Expression of the properties coded by the individual’s genotype.

Population: Pool of individuals exhibiting equal or similar genome structures, which allows the application of genetic operators.

Search Space: Set of all possible situations of the problem that we want to solve could ever be in.