Shallow Neural Networks Glossary

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

ADALINE: Acronym for a linear neuron: ADAptive LINear Element.
adaption: Training method that proceeds through the specified sequence of inputs, calculating the output, error, and network adjustment for each input vector in the sequence as the inputs are presented.
adaptive filter: Network that contains delays and whose weights are adjusted after each new input vector is presented. The network adapts to changes in the input signal properties if such occur. This kind of filter is used in long distance telephone lines to cancel echoes.
adaptive learning rate: Learning rate that is adjusted according to an algorithm during training to minimize training time.
architecture: Description of the number of the layers in a neural network, each layer's transfer function, the number of neurons per layer, and the connections between layers.
backpropagation learning rule: Learning rule in which weights and biases are adjusted by error-derivative (delta) vectors backpropagated through the network. Backpropagation is commonly applied to feedforward multilayer networks. Sometimes this rule is called the generalized delta rule.
backtracking search: Linear search routine that begins with a step multiplier of 1 and then backtracks until an acceptable reduction in performance is obtained.
batch: Matrix of input (or target) vectors applied to the network simultaneously. Changes to the network weights and biases are made just once for the entire set of vectors in the input matrix. (The term batch is being replaced by the more descriptive expression “concurrent vectors.”)
batching: Process of presenting a set of input vectors for simultaneous calculation of a matrix of output vectors and/or new weights and biases.
Bayesian framework: Assumes that the weights and biases of the network are random variables with specified distributions.
BFGS quasi-Newton algorithm: Variation of Newton's optimization algorithm, in which an approximation of the Hessian matrix is obtained from gradients computed at each iteration of the algorithm.
bias: Neuron parameter that is summed with the neuron's weighted inputs and passed through the neuron's transfer function to generate the neuron's output.
bias vector: Column vector of bias values for a layer of neurons.
Brent's search: Linear search that is a hybrid of the golden section search and a quadratic interpolation.
cascade-forward network: Layered network in which each layer only receives inputs from previous layers.
Charalambous' search: Hybrid line search that uses a cubic interpolation together with a type of sectioning.
classification: Association of an input vector with a particular target vector.
competitive layer: Layer of neurons in which only the neuron with maximum net input has an output of 1 and all other neurons have an output of 0. Neurons compete with each other for the right to respond to a given input vector.
competitive learning: Unsupervised training of a competitive layer with the instar rule or Kohonen rule. Individual neurons learn to become feature detectors. After training, the layer categorizes input vectors among its neurons.
competitive transfer function: Accepts a net input vector for a layer and returns neuron outputs of 0 for all neurons except for the winner, the neuron associated with the most positive element of the net input n.
concurrent input vectors: Name given to a matrix of input vectors that are to be presented to a network simultaneously. All the vectors in the matrix are used in making just one set of changes in the weights and biases.
conjugate gradient algorithm: In the conjugate gradient algorithms, a search is performed along conjugate directions, which produces generally faster convergence than a search along the steepest descent directions.
connection: One-way link between neurons in a network.
connection strength: Strength of a link between two neurons in a network. The strength, often called weight, determines the effect that one neuron has on another.
cycle: Single presentation of an input vector, calculation of output, and new weights and biases.
dead neuron: Competitive layer neuron that never won any competition during training and so has not become a useful feature detector. Dead neurons do not respond to any of the training vectors.
decision boundary: Line, determined by the weight and bias vectors, for which the net input n is zero.
delta rule: See Widrow-Hoff learning rule.
delta vector: The delta vector for a layer is the derivative of a network's output error with respect to that layer's net input vector.
distance: Distance between neurons, calculated from their positions with a distance function.
distance function: Particular way of calculating distance, such as the Euclidean distance between two vectors.
early stopping: Technique based on dividing the data into three subsets. The first subset is the training set, used for computing the gradient and updating the network weights and biases. The second subset is the validation set. When the validation error increases for a specified number of iterations, the training is stopped, and the weights and biases at the minimum of the validation error are returned. The third subset is the test set. It is used to verify the network design.
epoch: Presentation of the set of training (input and/or target) vectors to a network and the calculation of new weights and biases. Note that training vectors can be presented one at a time or all together in a batch.
error jumping: Sudden increase in a network's sum-squared error during training. This is often due to too large a learning rate.
error ratio: Training parameter used with adaptive learning rate and momentum training of backpropagation networks.
error vector: Difference between a network's output vector in response to an input vector and an associated target output vector.
feedback network: Network with connections from a layer's output to that layer's input. The feedback connection can be direct or pass through several layers.
feedforward network: Layered network in which each layer only receives inputs from previous layers.
Fletcher-Reeves update: Method for computing a set of conjugate directions. These directions are used as search directions as part of a conjugate gradient optimization procedure.
function approximation: Task performed by a network trained to respond to inputs with an approximation of a desired function.
generalization: Attribute of a network whose output for a new input vector tends to be close to outputs for similar input vectors in its training set.
generalized regression network: Approximates a continuous function to an arbitrary accuracy, given a sufficient number of hidden neurons.
global minimum: Lowest value of a function over the entire range of its input parameters. Gradient descent methods adjust weights and biases in order to find the global minimum of error for a network.
golden section search: Linear search that does not require the calculation of the slope. The interval containing the minimum of the performance is subdivided at each iteration of the search, and one subdivision is eliminated at each iteration.
gradient descent: Process of making changes to weights and biases, where the changes are proportional to the derivatives of network error with respect to those weights and biases. This is done to minimize network error.
hard-limit transfer function: Transfer function that maps inputs greater than or equal to 0 to 1, and all other values to 0.
Hebb learning rule: Historically the first proposed learning rule for neurons. Weights are adjusted proportional to the product of the outputs of pre- and postweight neurons.
hidden layer: Layer of a network that is not connected to the network output (for instance, the first layer of a two-layer feedforward network).
home neuron: Neuron at the center of a neighborhood.
hybrid bisection-cubic search: Line search that combines bisection and cubic interpolation.
initialization: Process of setting the network weights and biases to their original values.
input layer: Layer of neurons receiving inputs directly from outside the network.
input space: Range of all possible input vectors.
input vector: Vector presented to the network.
input weight vector: Row vector of weights going to a neuron.
input weights: Weights connecting network inputs to layers.
Jacobian matrix: Contains the first derivatives of the network errors with respect to the weights and biases.
Kohonen learning rule: Learning rule that trains a selected neuron's weight vectors to take on the values of the current input vector.
layer: Group of neurons having connections to the same inputs and sending outputs to the same destinations.
layer diagram: Network architecture figure showing the layers and the weight matrices connecting them. Each layer's transfer function is indicated with a symbol. Sizes of input, output, bias, and weight matrices are shown. Individual neurons and connections are not shown.
layer weights: Weights connecting layers to other layers. Such weights need to have nonzero delays if they form a recurrent connection (i.e., a loop).
learning: Process by which weights and biases are adjusted to achieve some desired network behavior.
learning rate: Training parameter that controls the size of weight and bias changes during learning.
learning rule: Method of deriving the next changes that might be made in a network or a procedure for modifying the weights and biases of a network.
Levenberg-Marquardt: Algorithm that trains a neural network 10 to 100 times faster than the usual gradient descent backpropagation method. It always computes the approximate Hessian matrix, which has dimensions n-by-n.
line search function: Procedure for searching along a given search direction (line) to locate the minimum of the network performance.
linear transfer function: Transfer function that produces its input as its output.
link distance: Number of links, or steps, that must be taken to get to the neuron under consideration.
local minimum: Minimum of a function over a limited range of input values. A local minimum might not be the global minimum.
log-sigmoid transfer function: Squashing function of the form shown below that maps the input to the interval (0,1). (The toolbox function is logsig.)
$f (n) = \frac{1}{1 + e^{- n}}$
Manhattan distance: The Manhattan distance between two vectors x and y is calculated as
D = sum(abs(x-y))
maximum performance increase: Maximum amount by which the performance is allowed to increase in one iteration of the variable learning rate training algorithm.
maximum step size: Maximum step size allowed during a linear search. The magnitude of the weight vector is not allowed to increase by more than this maximum step size in one iteration of a training algorithm.
mean square error function: Performance function that calculates the average squared error between the network outputs a and the target outputs t.
momentum: Technique often used to make it less likely for a backpropagation network to get caught in a shallow minimum.
momentum constant: Training parameter that controls how much momentum is used.
mu parameter: Initial value for the scalar µ.
neighborhood: Group of neurons within a specified distance of a particular neuron. The neighborhood is specified by the indices for all the neurons that lie within a radius d of the winning neuron i*:
Ni(d) = {j,d_ij ≤ d}
net input vector: Combination, in a layer, of all the layer's weighted input vectors with its bias.
neuron: Basic processing element of a neural network. Includes weights and bias, a summing junction, and an output transfer function. Artificial neurons, such as those simulated and trained with this toolbox, are abstractions of biological neurons.
neuron diagram: Network architecture figure showing the neurons and the weights connecting them. Each neuron's transfer function is indicated with a symbol.
ordering phase: Period of training during which neuron weights are expected to order themselves in the input space consistent with the associated neuron positions.
output layer: Layer whose output is passed to the world outside the network.
output vector: Output of a neural network. Each element of the output vector is the output of a neuron.
output weight vector: Column vector of weights coming from a neuron or input. (See also outstar learning rule.)
outstar learning rule: Learning rule that trains a neuron's (or input's) output weight vector to take on the values of the current output vector of the postweight layer. Changes in the weights are proportional to the neuron's output.
overfitting: Case in which the error on the training set is driven to a very small value, but when new data is presented to the network, the error is large.
pass: Each traverse through all the training input and target vectors.
pattern: A vector.
pattern association: Task performed by a network trained to respond with the correct output vector for each input vector presented.
pattern recognition: Task performed by a network trained to respond when an input vector close to a learned vector is presented. The network “recognizes” the input as one of the original target vectors.
perceptron: Single-layer network with a hard-limit transfer function. This network is often trained with the perceptron learning rule.
perceptron learning rule: Learning rule for training single-layer hard-limit networks. It is guaranteed to result in a perfectly functioning network in finite time, given that the network is capable of doing so.
performance: Behavior of a network.
performance function: Commonly the mean squared error of the network outputs. However, the toolbox also considers other performance functions. Type help nnperformance for a list of performance functions.
Polak-Ribiére update: Method for computing a set of conjugate directions. These directions are used as search directions as part of a conjugate gradient optimization procedure.
positive linear transfer function: Transfer function that produces an output of zero for negative inputs and an output equal to the input for positive inputs.
postprocessing: Converts normalized outputs back into the same units that were used for the original targets.
Powell-Beale restarts: Method for computing a set of conjugate directions. These directions are used as search directions as part of a conjugate gradient optimization procedure. This procedure also periodically resets the search direction to the negative of the gradient.
preprocessing: Transformation of the input or target data before it is presented to the neural network.
principal component analysis: Orthogonalize the components of network input vectors. This procedure can also reduce the dimension of the input vectors by eliminating redundant components.
quasi-Newton algorithm: Class of optimization algorithm based on Newton's method. An approximate Hessian matrix is computed at each iteration of the algorithm based on the gradients.
radial basis networks: Neural network that can be designed directly by fitting special response elements where they will do the most good.
radial basis transfer function: The transfer function for a radial basis neuron is
$r a d b a s (n) = e^{- n^{2}}$
regularization: Modification of the performance function, which is normally chosen to be the sum of squares of the network errors on the training set, by adding some fraction of the squares of the network weights.
resilient backpropagation: Training algorithm that eliminates the harmful effect of having a small slope at the extreme ends of the sigmoid squashing transfer functions.
saturating linear transfer function: Function that is linear in the interval (-1,+1) and saturates outside this interval to -1 or +1. (The toolbox function is satlin.)
scaled conjugate gradient algorithm: Avoids the time-consuming line search of the standard conjugate gradient algorithm.
sequential input vectors: Set of vectors that are to be presented to a network one after the other. The network weights and biases are adjusted on the presentation of each input vector.
sigma parameter: Determines the change in weight for the calculation of the approximate Hessian matrix in the scaled conjugate gradient algorithm.
sigmoid: Monotonic S-shaped function that maps numbers in the interval (-∞,∞) to a finite interval such as (-1,+1) or (0,1).
simulation: Takes the network input p, and the network object net, and returns the network outputs a.
spread constant: Distance an input vector must be from a neuron's weight vector to produce an output of 0.5.
squashing function: Monotonically increasing function that takes input values between -∞ and +∞ and returns values in a finite interval.
star learning rule: Learning rule that trains a neuron's weight vector to take on the values of the current input vector. Changes in the weights are proportional to the neuron's output.
sum-squared error: Sum of squared differences between the network targets and actual outputs for a given input vector or set of vectors.
supervised learning: Learning process in which changes in a network's weights and biases are due to the intervention of any external teacher. The teacher typically provides output targets.
symmetric hard-limit transfer function: Transfer that maps inputs greater than or equal to 0 to +1, and all other values to -1.
symmetric saturating linear transfer function: Produces the input as its output as long as the input is in the range -1 to 1. Outside that range the output is -1 and +1, respectively.
tan-sigmoid transfer function: Squashing function of the form shown below that maps the input to the interval (-1,1). (The toolbox function is tansig.)
$f (n) = \frac{1}{1 + e^{- n}}$
tapped delay line: Sequential set of delays with outputs available at each delay output.
target vector: Desired output vector for a given input vector.
test vectors: Set of input vectors (not used directly in training) that is used to test the trained network.
topology functions: Ways to arrange the neurons in a grid, box, hexagonal, or random topology.
training: Procedure whereby a network is adjusted to do a particular job. Commonly viewed as an offline job, as opposed to an adjustment made during each time interval, as is done in adaptive training.
training vector: Input and/or target vector used to train a network.
transfer function: Function that maps a neuron's (or layer's) net output n to its actual output.
tuning phase: Period of SOFM training during which weights are expected to spread out relatively evenly over the input space while retaining their topological order found during the ordering phase.
underdetermined system: System that has more variables than constraints.
unsupervised learning: Learning process in which changes in a network's weights and biases are not due to the intervention of any external teacher. Commonly changes are a function of the current network input vectors, output vectors, and previous weights and biases.
update: Make a change in weights and biases. The update can occur after presentation of a single input vector or after accumulating changes over several input vectors.
validation vectors: Set of input vectors (not used directly in training) that is used to monitor training progress so as to keep the network from overfitting.
weight function: Weight functions apply weights to an input to get weighted inputs, as specified by a particular function.
weight matrix: Matrix containing connection strengths from a layer's inputs to its neurons. The element w_i,jof a weight matrix W refers to the connection strength from input j to neuron i.
weighted input vector: Result of applying a weight to a layer's input, whether it is a network input or the output of another layer.
Widrow-Hoff learning rule: Learning rule used to train single-layer linear networks. This rule is the predecessor of the backpropagation rule and is sometimes referred to as the delta rule.