Counter Propagation Networks Notes
of a bi-directional mapping between the input and output layers. In essence, while data is presented to the input layer to generate a classification pattern on the output layer, the output layer in turn would accept an additional input vector and generate an output classification on the network's input layer. The network got its name from this counter-posing flow of information through its structure.
Most developers use a uni-flow variant of this formal representation of counter-propagation. In other words. there is only one feedforward path from input layer to output layer. An example network is shown in Figure 5.2.2. The uni-directional counter-propagation network has three layers. If the inputs are not already normalized before they enter the network., a fourth layer is sometimes added.
The main layers include an input buffer layer, a self-organizing Kohonen layer, and an output layer which uses the Delta Rule to modify its incoming connection weights. Sometimes this layer is called a Grossberg Outstar layer.
1-The size of the input layer depends upon how many
separable parameters define the problem. With too few, the network may not generalize sufficiently. With too many, the processing time takes too long. For the network to operate properly, the input vector must be normalized. This means that for every combination of input values, the total "length" of the input vector must add up to one.
This can be done with a preprocessor, before the data is entered into the counter-propagation network. Or, a normalization layer can be added between the input and Kohonen layers. The normalization layer requires one processing element for each input, plus one more for a balancing element. This layer modifies the input set before going to the Kohonen layer to guarantee that all input sets combine to the same total.
2- Normalization of the inputs is necessary to insure that the
Kohonen layer finds the correct class for the problem. Without normalization, larger input vectors bias many of the Kohonen processing elements such that weaker value input sets cannot be properly classified. Because of the competitive nature of the
Kohonen layer, the larger value input vectors overpower the smaller vectors. Counter-propagation uses a standard Kohonen paradigm which self-organizes the input sets into classification zones. It follows the classical Kohonen learning law described in section 4.2 of this report.
This layer acts as a nearest neighbor classifier in that the processing elements in the competitive layer autonomously adjust their connection weights to divide up the input vector space in approximate correspondence to the frequency with which the inputs occur.
3- There needs to be at least as many processing
elements in the Kohonen layer as output classes. The Kohonen layer usually has many more elements than classes simply because additional processing elements provide a finer resolution between similar objects. The output layer for counter-propagation is basically made up of processing elements which learn to produce an output when a particular input is applied.
Since the Kohonen layer includes competition, only a single output is produced for a given input vector. This layer provides a way of decoding that input to a meaningful output class. It uses the Delta Rule to back-propagate the error between the desired output class and the actual output generated with the training set.
4- The errors only adjust the connection weights coming
into the output layer. The Kohonen layer is not effected. Since only one output from the competitive Kohonen layer is active at a time and all other elements are zero, the only weight adjusted for the output processing elements are the ones connected to the winning element in the competitive layer. In this way the output layer learns to reproduce a certain pattern for each active processing element in the competitive layer. If several competitive elements belong to the same class, that output processing
5- Element will evolve weights in response to those
competitive processing elements and zero for all others. There is a problem which could arise with this architecture. The competitive Kohonen layer learns without any supervision. It does not know what class it is responding to. This means that it is possible for a processing element in the
Kohonen layer to learn to take responsibility for two or more training inputs which belong to different classes. When this happens, the output of the network will be ambiguous for any inputs which activate this processing element. To alleviate this problem, the processing elements in the Kohonen layer could be pre-conditioned to learn only about a particular class.
6- Probabilistic Neural Network.
The probabilistic neural network was developed by Donald Specht. His network architecture was first presented in two papers, Probabilistic Neural Networks for Classification, Mapping or Associative Memory and Probabilistic Neural Networks, released in 1988 and 1990, respectively.
This network provides a general solution to pattern classification problems by following an approach developed in statistics, called Bayesian classifiers. Bayes theory, developed in the 1950's, takes into account the relative likelihood of events and uses a priori information to improve prediction.
7- The network paradigm also uses Parzen
Estimators which were developed to construct the probability density functions required by Bayes theory. The probabilistic neural network uses a supervised training set to develop distribution functions within a pattern layer. These functions, in the recall mode, are used to estimate the likelihood of an input feature vector being part of a learned category, or class.
The learned patterns can also be combined, or weighted, with the a priori probability, also called the relative frequency, of each category to determine the most likely class for a given input vector. If the relative frequency of the categories is unknown, then all categories can be assumed to be equally likely and the determination of category is solely based on the closeness of the input feature vector to the distribution function of a class.
8- An example of a probabilistic neural network
is shown in Figure 5.2.3. This network has three layers. The network contains an input layer which has as many elements as there are separable parameters needed to describe the objects to be classified. It has a pattern layer, which organizes the training set such that each input vector is represented by an individual processing element.
, The network contains an output layer, called the summation layer, which has as many processing elements as there are classes to be recognized. Each element in this layer combines via processing elements within the pattern layer which relate to the same class and prepares
that category for output. Sometimes a fourth layer is added to normalize the input vector, if the inputs are not already normalized before they enter the network. As with the counter-propagation network, the input vector must be normalized to provided proper object separation in the pattern layer.