Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
NEURAL NETWORK DESIGN: NONLINEAR REGRESSION EXAMPLE

Subject: NEURAL NETWORK DESIGN: NONLINEAR REGRESSION EXAMPLE

From: Greg Heath

Date: 27 Aug, 2013 00:18:05

Message: 1 of 1

% This is a demo of the FITNET function which should be
% used for nonlinear regression and curvefitting. It calls
% the generic function FEEDFORWARDNET which never
% has to be used explicitly. FITNET replaces the obsolete
% NEWFIT which calls the obsolete NEWFF.
%
% The demo illustrates a simplistic, but useful, approach
% to dealing with the age-old questions
%
% 1. How many hidden layers?
% 2. How many hidden nodes per layer?
% 3. How much training data?
%
% A recommended approach:
%
% 1. Always begin with 1 hidden layer. The Multilayer
% Perceptron (MLP) with a single hidden layer of H,
% sufficiently many, sigmoidal transfer functions is a
% universal approximator. On rare occasions it is
% useful to add a second hidden layer to reduce the
% necessary number of hidden nodes (H1+H2 < H).
% 2. Estimate, by trial and error, the minimum number
% of hidden nodes necessary for successfully approxi-
% mating the underlying input-output transformation.
% For a smooth function with Nlocmax local maxima
% ( endpoint maxima only count 1/2) a reasonable
% lower bound is H >= 2*Nlocmax. The addition of
% real-world noise and measurement error will not
% change that minimum number. However, the
% contamination may make it difficult to identify the
% significant error-free maxima.
% 3. The minimum number of training input/target pairs
% needed to adequately estimate the resulting number
% of weights, Nw, tends to vary linearly with H. If the
% output target vectors are O-dimensional, the Ntrn
% training pairs yield Ntrneq = Ntrn*O training
% equations for estimating Nw unknown weights. If the
% input vectors are I-dimensional, the number of
% weights for a static MLP is given by
%
% Nw = (I+1)*H+(H+1)*O = O+(I+O+1)*H
%
% 4. The number of estimation degrees of freedom ( See
% Wikipedia ) is Ndof = Ntrneq - Nw. When there are
% more unknown weights than training equations (i.e.,
% Nw > Ntrneq and Ndof < 0) the net is said to be
% OVERFIT with too many weights because an exact
% training data solution can be obtained with
% ~ abs(Ndof) weights fixed to any arbitrary finite value.
% This tends to prevent the net from performing
% adequately on nontraining data
%
%5. There are several methods used to train overfit nets
% ( See the comp.ai.neural-nets FAQ). VALIDATION
% SET STOPPING and REGULARIZATION are two
% methods that are readily available with the MATLAB
% NNTBX. However, they will not be addressed here.
%
% 6. The training technique used below is to merely avoid
% overfitting by limiting the number of hidden nodes
% so that the number of unknown weights is smaller
% than the number of training equations and the
% resulting number of estimation degrees of freedom
% is positive.
%
%7. The success of the error minimization algorithm
% depends on a forfituous choice of initial weight values.
% Therefore, if the specified training goal is not achieved
% initially, multiple random weight intialization trials
% should be implemented. Given H, Ntrials = 10 is
% usually sufficient.
%
%8. If the training data is resubstituted into the net to get
% an estimate of the generalization performance (i.e.,
% the peformance on nondesign data ) the estimate
% will obviously be biased. However, the bias can be
% somewhat mitigated by dividing the sum of absolute
% or squared errors by the estimation degrees of
% freedom, Ndof, instead of the number of training
% equations, Ntrneq. If there is a significant difference
% between the biased (e.g., MSE, NMSE or R^2) and
% adjusted (MSEa, NMSEa and Ra^2) performance
% estimates, another method of estimation should be
% used. The obvious choice is to use a sufficiently
% large holdout set of nondesign test data. If that is
% not possible, averaging over multiple random
% design/test data division and random weight
% initialization trials are two of many alternatives.
% Although the better known stratified cross-validation
% option is available via the CROSSVAL function in
% the STATS TBX, it is more difficult to implement.

 close all, clear all, clc, plt = 0;
tic
[ x, t ] = simplefit_dataset;
[ I N ] = size(x) % [ 1 94 ]
[ O N ] = size(t) % [ 1 94 ]
Neq = prod(size(t)) % 94
% MSE normalization references
MSE00 = mean(var(t',1)) % 8.3378
MSE00a = mean(var(t')) % 8.4274

plt=plt+1, figure(plt) % figure 1
plot( x, t, 'LineWidth', 2)
title( ' SIMPLEFIT DATASET ')
Nlocmax = 2.5 % 2.5 local maxima

xt = [ x; t ];
rangext = minmax(xt)
% rangext = 0 9.9763
% 0 10
% No need to standardize or normalize

% H >= 2*Nlocmax = 5
% Nw = (I+1)*H+(H+1)*O;
% Neq > Nw ==> H <= Hub
Hub = -1+ceil( (Neq-O) / (I+O+1)) % 30
Hmax = 2*Nlocmax+1 % 6
dH = 1
Hmin =0

j=0
rng(0)
for h = Hmin:dH:Hmax
    j=j+1;
    h=h
    if h==0
        net = fitnet([]);
        Nw = (I+1)*O
    else
        net = fitnet(h);
        Nw = (I+1)*h+(h+1)*O
    end
    Ndof = Neq-Nw
    net.divideFcn = ''; % No nontraining data
    [ net tr y ] = train(net,x,t);
    
    plt = plt+1,figure(plt)
    hold on
    plot( x, t, '.', 'LineWidth', 2 )
    plot( x, y, 'ro', 'LineWidth', 2 )
    legend( 'TARGET', 'OUTPUT' )
    title( [' No. HIDDEN NODES = ', ...
                             num2str(h)], 'LineWidth', 2 )
    
    stopcrit{j,1} = tr.stop;
    numepochs(j,1) = tr.num_epochs;
    bestepoch(j,1) = tr.best_epoch;
    MSE(j,1) = tr.perf(tr.best_epoch+1);
    MSEa(j,1) = Neq*MSE(j)/Ndof;
    
end

 stopcrit = stopcrit

% stopcrit = 'Minimum gradient reached.'
% 'Minimum gradient reached.'
% 'Maximum epoch reached.'
% 'Minimum gradient reached.'
% 'Minimum gradient reached.'
% 'Minimum gradient reached.'
% 'Minimum gradient reached.'

H= (Hmin:dH:Hmax)';
R2 = 1 - MSE/MSE00;
R2a = 1 - MSEa/MSE00a;
format short g
summary = [ H bestepoch R2 R2a ]
toc % Elapsed time ~20 sec

% summary =
% H bestepoch R2 R2a
% 0 2 0.54902 0.54412
% 1 25 0.83429 0.82876
% 2 1000 0.87641 0.86789
% 3 83 0.87641 0.86317
% 4 27 0.99430 0.99345
% 5 81 0.99999 0.99998
% 6 425 0.99999 0.99998

Hope this helps.

Greg

Tags for this Thread

No tags are associated with this thread.

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us