Logsig activation function in irradiance post-processing

5 views (last 30 days)
Hello,
I have irradiance and temperature forecasts, and I'm trying to improve the irradiance forecasts using a neural network for my Master's thesis. For that, I'm using fitnet function (is this MLP?). I'm currently testing one and two hidden layer networks with different sizes.
My question is mainly about the activation functions in the hidden layers and in the output layer. I have normalized the irradiance (both the forecasted and the targets) and the temperature, and they are ranging from 0 to 1 (at least for irradiance, it is the normalization range used - it doesn't make sense to use from -1 to 1). As such, I have removed mapminmax from preprocessing:
net.input.processFcns = {'removeconstantrows'};
net.output.processFcns = {'removeconstantrows'};
It makes sense, right?
Additionally, having read the NN's User's Guide, I saw that the default transfer function is tansig, which outputs in the range of [-1; 1], and changed it to logsig [0; 1]. I read in the guide that "...if you want to constrain the outputs of a network (such as between 0 and 1), then the output layer should use a sigmoid transfer function (such as logsig).". My problem is that I don't see different results when using tansig and logsig. I actually think (this was not thoroughly tested yet) that logsig in the output layer provides slightly worse results. Do these results make sense? And does it make sense to use logsig?
net.layers{1}.transferFcn = 'logsig';
net.layers{2}.transferFcn = 'logsig';
net.layers{3}.transferFcn = 'logsig'; %If using 2 hidden layers
Also, is it important (and even possible) to "tell" the network the the input1 is irradiance, same as the only output? (I mean, should the network know that I have G and T as inputs and G as output, or it is completely irrelevant and it treats it like X and Y as inputs and Z as output?).
One last question: I have used
net.divideFcn = 'divideint'; %Interleaved division
Does this guarantee that the in all trainings the test set is composed of the same elements (for example, entries 7, 14 and 21 are always used in test)?
I'm sorry for the long post, I really hope someone can enlighten me! If it matters, I'm attaching my data and code.
Thank you,
Bernardo Fonseca
  1 Comment
Greg Heath
Greg Heath on 2 May 2016
fitnet is a MLP
1 hidden layer is sufficient for a universal approximator
Scaled inputs should be relatively symmetric about 0 and hidden node transfer functions should be tansig(tanh)
Outliers should be removed or modified. It is easier when inputs and targets are first standardized to zero-mean & unit-variance
Output transfer functions are usually linear unless there are mathematical or physical reasons why the outputs should be bounded. Then logsig or tansig may be appropriate.
You have the choice of removing the default net normalizations or leaving them in and using the appropriate normalizations before and/or after calling the net.
For a classifier, outputs should be estimates of the output probabilities conditional on the input. Softmax is appropriate for exclusive classes with {0,1) unit vector targets. Logsig is appropriate for nonexclusive classes with nonnegative unit sum targets.
Divideint yields 1,4,7... for training data, 2,5,8.. for validation data and 1,6,9,.. for test data.
Hope this helps.
Greg

Sign in to comment.

Accepted Answer

Greg Heath
Greg Heath on 27 Apr 2016
You have overthought the design. You probably only have to change the random initial weights and/or the number of hidden nodes in a single hidden layer.
1. See the documentation example.
help fitnet
doc fitnet
2. Make the following modifications ( note the omitted semicolons):
[ x ,t ] = simplefit_dataset ;
[ I N ] = size(x)
[ O N ] = size(t)
vart1 = mean( var( t', 1 ))% Reference MSE
% vart1 = var( t, 1 ) when O = 1
net = fitnet; % H = 10 is the default
rng('default') % Initialize the RNG for reproducibility
[ net tr y e ] = train( net, x, t );
% y = net(x); e = t - y;
view(net)
NMSE = mse(e)/vart1 % Normalized MSE
Rsq = 1- NMSE % Fraction of target variance modeled by net
% See https://en.wikipedia.org/wiki/R2
3. Use the code, as is, with your example. If Rsq is not close to
unity, train Ntrials = 10 examples in a loop to obtain different
random initial weights and random trn/val/tst datadivisions
...
for i = 1:Ntrials
net = configure( net, x, t); % New initial weights
[ net tr y e ] = train( net, x, t );
% y = net(x); e = t - y;
view(net)
NMSE(i) = mse(e)/vart1 % Normalized MSE
end
Rsq = 1- NMSE
4. If this is not successful, vary the number of hidden nodes in an outer loop:
rng('default')
j = 0
for h = Hmin:dH:Hmax
j = j+1
if h == 0
net = fitnet( [] );
else
net = fitnet( h );
end
for i = 1:Ntrials
net = configure( net, x, t);
[ net tr y e ] = train( net, x, t );
% y = net(x); e = t - y;
% view(net) Probably don't need this now
Rsq(i,j) = 1- mse(e)/vart1;
end
end
I have posted zillions of eamples. Search both the NEWSGROUP and ANSWERS using
fitnet tutorial
fitnet greg
Hope this helps.
Thank you for formally accepting my answer
Greg
  3 Comments
Bernardo Fonseca
Bernardo Fonseca on 29 Apr 2016
Greg or anyone, could you help me?
I am really debating why should one use or not logsig in the output layer...
I would really appreciate some input.
Thank you!
Greg Heath
Greg Heath on 2 May 2016
A feedforward does not contain connectons for backward moving signals.
rng('default') intializes the Random Number Generator to a specified state. This allows reproducibility of the net.
Always initialize the RNG to a specified state of your choice before assigning initial weigts. Then you can reproduce the design later.
Since e=t-y does not contain trn, val or tst subsripts ...
Hope this helps.
Greg

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!