What is the concept of potential neural network
The probabilistic neural network was developed by Donald Specht. His network architecture was first presented in two papers, Probabilistic Neural Networks for Classification, Mapping or Associative Memory and Probabilistic Neural Networks, released in 1988 and 1990, respectively.
This network provides a general solution to pattern classification problems by following an approach developed in statistics, called Bayesian classifiers. Bayes theory, developed in the 1950's, takes into account the relative likelihood of events and uses a priori information to improve prediction.
1- The network paradigm also uses Parzen
Estimators which were developed to construct the probability density functions required by Bayes theory. The probabilistic neural network uses a supervised training set to develop distribution functions within a pattern layer. These functions, in the recall mode, are used to estimate the likelihood of an input feature vector being part of a learned category, or class.
2- The learned patterns can also be combined,
Or weighted, with the a priori probability, also called the relative frequency, of each category to determine the most likely class for a given input vector. If the relative frequency of the categories is unknown, then all categories can be assumed to be equally likely and the determination of category is solely based on the closeness of the input feature vector to the distribution function of a class.
An example of a probabilistic neural network is shown in Figure 5.2.3. This network has three layers. The network contains an input layer which has as many elements as there are separable parameters needed to describe the objects to be classified. It has a pattern layer, which organizes the training set such that each input vector is represented by an individual processing element.
3- And finally, the network contains an output layer,
Called the summation layer, which has as many processing elements as there are classes to be recognized. Each element in this layer combines via processing elements within the pattern layer which relate to the same class and prepares
that category for output. Sometimes a fourth layer is added to normalize the input vector, if the inputs are not already normalized before they enter the network. As with the counter-propagation network, the input vector must be normalized to provided proper object separation in the pattern layer.
4- A Probabilistic Neural Network Example.
As mentioned earlier, the pattern layer represents a neural implementation of a version of a Bayes classifier, where the class dependent probability density functions are approximated using a Parzen estimator. This approach provides an optimum pattern classifier in terms of minimizing the expected risk of wrongly classifying an object.
With the estimator, the approach gets closer to the true underlying class density functions as the number of training samples increases, so long as the training set is an adequate representation of the class distinctions. In the pattern layer, there is a processing element for each input vector in the training set. Normally, there are equal amounts of processing elements for each output class.
5- Otherwise, one or more classes may be
Skewed incorrectly and the network will generate poor results. Each processing element in the pattern layer is trained once. An element is trained to generate a high output value when an input vector matches the training vector. The training function may include a global smoothing factor to better generalize classification results. In any case, the training vectors do not have to be in any special order in the training set, since the category of a particular vector is specified by the desired output of the input. The learning function
simply selects the first untrained processing element in the correct output class and modifies its weights to match the training vector. The pattern layer operates competitively, where only the highest match to an input vector wins and generates an output. In this way, only one classification category is generated for any given input vector.
6- If the input does not relate well to any patterns
Programmed into the pattern layer, no output is generated. The Parzen estimation can be added to the pattern layer to fine tune the classification of objects, This is done by adding the frequency of occurrence for each training pattern built into a processing element.
Basically, the probability distribution of occurrence for each example in a class is multiplied into its respective training node. In this way, a more accurate expectation of an object is added to the features which make it recognizable as a class member. Training of the probabilistic neural network is much simpler than with back-propagation.
However, the pattern layer can be quite huge if the distinction between categories is varied and at the same time quite similar is special areas. There are many proponents for this type of network, since the groundwork for optimization is founded in well known, classical mathematics.
7- Networks for Data Association
The previous class of networks, classification, is related to networks for data association. In data association, classification is still done. For example, a character reader will classify each of its scanned inputs. However, an additional element exists for most applications. That element is the fact that some data is simply erroneous.
Credit card applications might have been rendered unreadable by water stains. The scanner might have lost its light source. The card itself might have been filled out by a five year old. Networks for data association recognize these occurrances as simply bad data and they recognize that this bad data can span all classifications.
8- Hopfield Network.
John Hopfield first presented his cross-bar associative network in 1982 at the National Academy of Sciences. In honor of Hopfield's success and his championing of neural networks in general, this network paradigm is usually referred to as a Hopfield Network. The network can be conceptualized i n terms of its energy and the physics of dynamic systems.
A processing element in the Hopfield layer, will change state only if the overall "energy" of the state space is reduced. In other words, the state of a processing element will vary depending whether the change will reduce the overall "frustration level" of the network. Primary applications for this sort of network have included
associative, or content-addressable, memories and a whole set of optimization problems, such as the combinatoric best route for a traveling salesman. The Figure 5.3.1 outlines a basic Hopfield network. The original network had each processing element operate in a binary format.
This is where the elements compute the weighted sum of the inputs and quantize the output to a zero or one. These restrictions were later relaxed, in that the paradigm can use a sigmoid based transfer function for finer class distinction. Hopfield himself showed that the resulting network is equivalent to the original network designed in 1982.