How can i improve the performance of a closed loop NARX neural network?

37 views (last 30 days)
I've created an open loop NARX network for system identification with (ntstool) toolbox. My open loop network's performance is 1e-07 but when I close the network the error(mse) highly increases (around 1.15). Is there any way that I can do for improving the close loop performance

Accepted Answer

Greg Heath
Greg Heath on 13 Feb 2013
Try narxnet. Use the autocorrelation of the target and the crosscorrelation of input and target to find the lags that are statistically significant for input and feedback delays.
Hope this helps.
Thank you for formally accepting my answer
Greg
  4 Comments
Lucas Ferreira-Correia
Lucas Ferreira-Correia on 18 Jul 2019
I realise this thread is quite old, but by original data do you mean the same data used to train the open loop?
Juan Hynek
Juan Hynek on 26 Aug 2019
Edited: Juan Hynek on 17 Oct 2019
Hi Lucas,
I have received good results from using the same data but using the weights determined by the open-loop training as a starting point for closed-loop training. Also make sure to initialise open-loop training to predetermined weights when working with large datasets. This will help avoid local minima.

Sign in to comment.

More Answers (6)

Greg Heath
Greg Heath on 21 Feb 2013
Edited: Greg Heath on 21 Feb 2013
I have run ramin's 11 Feb 2013 at 13:13 timedelaynet code , with modifications, on the simpleseries_dataset using the same input parameters (except for series length and 'divideblock'):
size(input) = [ 1 100]
size(target) = [ 1 100]
ID = 0:9
H = 65
trn/val/tst ratios = 0.65/0.20/0.15
divideFcn = 'dividerand' % default, but I used 'divideblock'
net.trainParam.goal = 0 % default, but I used 0.01*MSEtrn00
I also used Ntrials = 20 % Multiple random weight initializations
To begin with there are two obvious suspects.
1. 'dividerand' is used which destroys all of the correlations on which good timeseries performance is based.
2. H is HUGE. Therefore, there is a good chance the openloop design is overfit with too many weights. If the overfit net is overtrained, it will fit the training data almost perfectly, but may fit nontraining data very,very badly.
Recall that if there are only Ntrneq = Ntrn*O = 60 training equations to estimate Nw = (10+1)*65+(65+1)*1 = 781 unknown weights, there are an infinite number of solutions that will yield approximately zero training error. However, most of these solutions will not yield acceptable nontraining error.
Unfortunately, documentation examples and the GUI based codes only yield the combined performance on trn/val/tst data without looking at each one separately. The total performance measure looked great with R^2 close to 1. However, when I separated the performances I found that the 65% training data R^2 were very close to unity but the validation(20%) and test (15%) data performances were so bad they were NEGATIVE! That means that just using the mean target value for the output yields a better nontraining performance (R^2= 0).
So, remember, when evaluating a net, look at the nontraining data performances!
Bottom line: Your open loop performance was really terrible and your closed loop performance just followed suit.
result =
Trial Epochs R2trn R2val R2tst
1 2 0.99908 -1.6637 -0.76142
2 2 0.99934 -0.55658 -2.136
3 2 0.99985 -1.5766 -1.24
4 2 0.99956 -0.93949 -0.60495
5 2 0.99396 -1.3556 -1.038
6 2 0.99935 -3.1616 -2.636
7 2 0.9994 -0.080548 -1.7186
8 2 0.99993 -0.18275 -0.27919
9 2 0.99858 0.043727 -0.83779
10 2 0.99989 -3.2046 -2.1909
11 2 0.99934 -1.2554 -1.434
12 2 0.99894 -0.56353 -0.40946
13 2 0.99986 -1.3566 -2.1323
14 2 0.99971 -1.1912 -0.012865
15 2 0.99992 -0.69673 -1.9198
16 2 0.99984 -1.7842 -0.58905
17 2 0.99969 -0.71866 -1.226
18 2 0.99945 -1.0188 -5.066
19 2 0.99993 -3.1073 -1.3749
20 2 0.99994 -3.5529 -2.9
Hope this helps,
Greg
P.S. The largest value for H that will keep Nw < Ntrneq is Hub = 8!

Greg Heath
Greg Heath on 21 Feb 2013
FRANCISCO on 17 Feb 2013 at 17:09
% I've been testing my code in pollution_dataset with indications dao me % and I have several questions that I would like to comment:
What does "dao" mean?
% 1-VARIABLES ON DIFFERENT SCALES,. SHOULD NORMALIZE You % pollution_dataset normalizes the variables?
No. I standardized the variables (help/doc zscore) to have zero-mean and unit-variance.
% 3-NARXNET INPUTS ARE NOT OPTIMAL. Significant FIND AUTO AND % CROSSCORRELATIONS. H VARY IF NEEDED You how to do this step?, Trains the % network and observes the autocorrelation and crosscorrelation charts and % notes that the errors are within the confidence limits (dotted red line)?
No. Calculate the auto and cross correlation functions on the data. Use xcorr or crosscor if you have the appropriate toolboxes. Otherwise use nncorr which has bugs which are corrected in some of my recent code (search NEWSGROUP an ANSWERS using greg nncorr). Also can use ifft(fft(conj(a).*b)/N.
If you can't find the significance levels from the program or manuals, then calculate the crosscorrelations for two randn(1,N) functions. Sort the 2*N-1 values and the one that is 95% from the beginning has the approximate value of the 95% significance threshold. I typically run 100 trials and average the results.
% 4-DO NOT USE THE DEFAULT DIVIDERAND I've tried also with dividetrain but % the data will not let me divide them into train, validation, test because % I guess as in help, do train with all targets. Still, I think it makes a % lot error. I'd like you to tell me how you use dividetrain.
There is no division in dividetrain. Use divideblock
% 5-INITIALIZE THAT RUNS THE RNG SO CAN BE DUPLICATED How I do it and when? % After training and simulation openloop?. I use rng (sprev)?
Before the doubleloop over H and random initial weights. I have posted many, many examples
% 6-NORMALIZE MSE THE VARIANCE WITH THE AVERAGE AND TARGET OBTAIN THE % DETERMINATION OF COEFFICIENT R ^ 2 I have read the wikipedia and do not % know how to do this apartado.Habría some code where I could see? I find % it hard to understand.
Search greg R2 R2a
% 7-H ~ Hub/10 I do not understand this indication.
Neq > Nw when H <= Hub
Search greg Nw Hub
% If you know of a place where he could see the complete code concerning % these questions I'd appreciate it, applied to pollution_dataset or other % data in order to understand it better.
I had to revise the one I was working on when I found ramin's overfitting problem. It is a hobby, not a priority, so I cannot promise when I will finish.
  2 Comments
FRANCISCO
FRANCISCO on 21 Feb 2013
I think I understand the idea: 1 - in the autocorrelation and cross correlation I find the highest peaks and these lags would indicate that I should use 2 - the balance between equations and weights would be a rough estimate H. 3 - In the error evaluation is recommended to study the errors separately (train, validation, test), and this as you would?? 4 - I believe that the implementation of cross-validation would be an improvement in the accuracy of the prediction, of course would have to try it.
Thank you very much for your time Greg
Greg Heath
Greg Heath on 30 Jul 2014
How would 10-fold crossvalidation be implemented on a timeseries where the order of surrounding data needs to be maintained???
I don't see it.
Greg

Sign in to comment.


Shashank Prasanna
Shashank Prasanna on 8 Feb 2013
honestly, i wish there is one quick answer to this question but reality is there isn't. make sure your training set includes all dynamics you wish to see in the model. In addition how does your data look? is there a trend? if the data is not stationary, narx may not do a great job.
  1 Comment
ramin
ramin on 11 Feb 2013
Edited: ramin on 11 Feb 2013
yes my data icludes all dynamic modes. The input is an oscillatory function with changing frequencies(sweep input) and output is something like that (is my simulink simulation's output. if NARX could not do that, what's your suggestion for selecting the best architecture for my network?

Sign in to comment.


Greg Heath
Greg Heath on 9 Feb 2013
See the answer I just posted to FRANCISCO re his implemetation. Like him, don't expect to get a good answer to your question without posting your code.
Running your code on MATLAB data
help nndata
will probably help us help you.
Greg
  3 Comments
ramin
ramin on 11 Feb 2013
My problem is not dimension problem like FRANCISCO. i wanna improve my closed-loop network performance and decrease the error.

Sign in to comment.


Greg Heath
Greg Heath on 16 Feb 2013
Edited: Greg Heath on 16 Feb 2013
NARXNET is the most general timeseries design function. TIMEDELAYNET and NARNET are special cases.
narxnet(ID,[],H) should yield the same results as timedelaynet.
narxnet([],FD,H) should yield the same results as narnet.
DO NOT USE the default 'dividerand'. Random sampling to create the trn/val/tst data division destroys correlations betwen the current output and the delayed inputs and outputs.
'divideblock' is probably the smartest choice. Although 'divideint' also yields uniform timesteps, it increases the timestep by a factor of 3.
Choose ID from the significant lags of the crosscorrelation function.
Choose FD from the significant lags of the autocorrelation function.
Do not use a very large value of H. That is analogous to fitting a noisy straight line with a high order polynomial. It will have erroneous wiggles between training points as well as before and after the domain of the training points. It neither interpolates well nor extrapolates well. So, even though the openloop results are acceptable, or even great, closed loop performance may be disastrously poor!
I use the estimation number of degrees of freedom as a guide.
Ndof = Ntrneq-Nw
where
Ntrneq = No. oftraining equations = prod(size(ttrn)) = Ntrn*O
and
Nw = No. of weights to be estimated = net.numberWeightElements.
If MXID = max(ID) and MXFD = max(FD), for I-dimensional inputs and O-dimensional outputs,
Nw = (MXID*I + MXFD*O +1 )*H + (H+1)*O
and the requirement Ntrneq > Nw yields the upper bound
Hub = -1 + ceil( (Ntrneq-O) / ( MXID*I + MXFD*O + O +1) )
This can be exceeded using validation stopping and/or regularization. However, without them H << Hub is the best bet to mitigate noise and measurement error. The optimal value of the ratio r = Ntrneq/Nw depends on the data. However, I feel relatively safe with
Hmax = floor(Hub/10)
but will try values up to floor(Hub/2) if necessary. Typically, I choose a reasonable range and spacing for candidate values of H (Hmin: dH : Hmax), and design numH*Ntrials candidate nets where Ntrials is the number of random initial weight configurations for each design. If the training error is etrn = ttrn-ytrn, the lowest training set error is estimated using
MSEtrna = sse(etrn)/Ndof.
The DOF "a"djustment is used to decrease the bias in MSEtrn = sse(etrn)/Ntrneq caused by using the same data to estimate weights and evaluate the resulting performance.
Your poor closed loop performance could have been caused from the combination of using 'dividerand' with a very large H.
Hope this helps.
Thank you for formally accepting my answer.
Greg
  3 Comments
Greg Heath
Greg Heath on 16 Feb 2013
You should be able to get better closeloop results without resorting to crossval for which the NNTBX has no function. I fixed your code and ran it on the pollution_dataset with interesting results R2a = 0.92 for openloop and 0.88 for closed loop. ID = 1:2, FD=1:2 and H = 16, divideFcn = 'dividetrain'. ID and FD are defaults and H ~ Hub/10
FRANCISCO
FRANCISCO on 17 Feb 2013
I've been testing my code in pollution_dataset with indications dao me and I have several questions that I would like to comment:
1-VARIABLES ON DIFFERENT SCALES,. SHOULD NORMALIZE   You pollution_dataset normalizes the variables?
3-NARXNET INPUTS ARE NOT OPTIMAL. Significant FIND AUTO      AND CROSSCORRELATIONS. H VARY IF NEEDED You how to do this step?, Trains the network and observes the autocorrelation and crosscorrelation charts and notes that the errors are within the confidence limits (dotted red line)?
4-DO NOT USE THE DEFAULT DIVIDERAND I've tried also with dividetrain but the data will not let me divide them into train, validation, test because I guess as in help, do train with all targets. Still, I think it makes a lot error. I'd like you to tell me how you use dividetrain.
5-INITIALIZE THAT RUNS THE RNG SO CAN BE DUPLICATED How I do it and when? After training and simulation openloop?. I use rng (sprev)?
6-NORMALIZE MSE THE VARIANCE WITH THE AVERAGE AND TARGET OBTAIN THE DETERMINATION OF COEFFICIENT R ^ 2 I have read the wikipedia and do not know how to do this apartado.Habría some code where I could see? I find it hard to understand.
7-H ~ Hub/10 I do not understand this indication.
If you know of a place where he could see the complete code concerning these questions I'd appreciate it, applied to pollution_dataset or other data in order to understand it better.
Thank you very much Greg.

Sign in to comment.


Yushan Jiang
Yushan Jiang on 28 Jun 2021
there exists the same problem in my NARX model !

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!