Logsig activation function in irradiance post-processing

Question

Bernardo Fonseca on 26 Apr 2016

0
Link

Direct link to this question

https://www.mathworks.com/matlabcentral/answers/281186-logsig-activation-function-in-irradiance-post-processing

Commented: Greg Heath on 2 May 2016

Hello,

I have irradiance and temperature forecasts, and I'm trying to improve the irradiance forecasts using a neural network for my Master's thesis. For that, I'm using fitnet function (is this MLP?). I'm currently testing one and two hidden layer networks with different sizes.

My question is mainly about the activation functions in the hidden layers and in the output layer. I have normalized the irradiance (both the forecasted and the targets) and the temperature, and they are ranging from 0 to 1 (at least for irradiance, it is the normalization range used - it doesn't make sense to use from -1 to 1). As such, I have removed mapminmax from preprocessing:

    net.input.processFcns = {'removeconstantrows'};
    net.output.processFcns = {'removeconstantrows'};

It makes sense, right?

Additionally, having read the NN's User's Guide, I saw that the default transfer function is tansig, which outputs in the range of [-1; 1], and changed it to logsig [0; 1]. I read in the guide that "...if you want to constrain the outputs of a network (such as between 0 and 1), then the output layer should use a sigmoid transfer function (such as logsig).". My problem is that I don't see different results when using tansig and logsig. I actually think (this was not thoroughly tested yet) that logsig in the output layer provides slightly worse results. Do these results make sense? And does it make sense to use logsig?

net.layers{1}.transferFcn = 'logsig';
net.layers{2}.transferFcn = 'logsig';
net.layers{3}.transferFcn = 'logsig'; %If using 2 hidden layers

Also, is it important (and even possible) to "tell" the network the the input1 is irradiance, same as the only output? (I mean, should the network know that I have G and T as inputs and G as output, or it is completely irrelevant and it treats it like X and Y as inputs and Z as output?).

One last question: I have used

net.divideFcn = 'divideint'; %Interleaved division

Does this guarantee that the in all trainings the test set is composed of the same elements (for example, entries 7, 14 and 21 are always used in test)?

I'm sorry for the long post, I really hope someone can enlighten me! If it matters, I'm attaching my data and code.

Thank you,

Bernardo Fonseca

1 Comment
Show -1 older commentsHide -1 older comments

Greg Heath on 2 May 2016

fitnet is a MLP

1 hidden layer is sufficient for a universal approximator

Scaled inputs should be relatively symmetric about 0 and hidden node transfer functions should be tansig(tanh)

Outliers should be removed or modified. It is easier when inputs and targets are first standardized to zero-mean & unit-variance

Output transfer functions are usually linear unless there are mathematical or physical reasons why the outputs should be bounded. Then logsig or tansig may be appropriate.

You have the choice of removing the default net normalizations or leaving them in and using the appropriate normalizations before and/or after calling the net.

For a classifier, outputs should be estimates of the output probabilities conditional on the input. Softmax is appropriate for exclusive classes with {0,1) unit vector targets. Logsig is appropriate for nonexclusive classes with nonnegative unit sum targets.

Divideint yields 1,4,7... for training data, 2,5,8.. for validation data and 1,6,9,.. for test data.

Hope this helps.

Greg

Sign in to comment.

Sign in to answer this question.

Answer 1

Greg Heath on 27 Apr 2016

0
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/281186-logsig-activation-function-in-irradiance-post-processing#answer_219577

Open in MATLAB Online

You have overthought the design. You probably only have to change the random initial weights and/or the number of hidden nodes in a single hidden layer.

 1. See the documentation example. 
     help fitnet
     doc fitnet
 2. Make the following modifications ( note the omitted semicolons):
    [ x ,t ] = simplefit_dataset ;
    [ I N ]  = size(x)
    [ O N ]  = size(t)
    vart1    = mean( var( t', 1 ))% Reference MSE
    % vart1  = var( t, 1 ) when O = 1
    net      = fitnet;  % H = 10 is the default
    rng('default') % Initialize the RNG for reproducibility
    [ net tr y e ] = train( net, x, t );
    % y = net(x); e = t - y;
    view(net)
    NMSE = mse(e)/vart1 % Normalized MSE
    Rsq  = 1- NMSE % Fraction of target variance modeled by net
    % See https://en.wikipedia.org/wiki/R2
 3. Use the code, as is, with your example.  If Rsq is not close to 
    unity, train Ntrials = 10 examples in a loop to obtain different 
    random initial weights and random trn/val/tst datadivisions
    ...
    for i = 1:Ntrials
       net           =  configure( net, x, t); % New initial weights
      [ net tr y e ] = train( net, x, t );
      % y = net(x); e = t - y;
      view(net)
      NMSE(i) = mse(e)/vart1 % Normalized MSE
    end
    Rsq = 1- NMSE

4. If this is not successful, vary the number of hidden nodes in an outer loop:

    rng('default') 
    j = 0
    for h = Hmin:dH:Hmax 
    j = j+1
       if h == 0
          net = fitnet( [] );
       else
          net = fitnet( h );
       end
       for i = 1:Ntrials
          net = configure( net, x, t);
          [ net tr y e ] = train( net, x, t );
          % y = net(x); e = t - y;
          % view(net) Probably don't need this now
          Rsq(i,j) = 1- mse(e)/vart1;
       end
   end

I have posted zillions of eamples. Search both the NEWSGROUP and ANSWERS using

 fitnet tutorial
 fitnet greg

Hope this helps.

Thank you for formally accepting my answer

Greg

3 Comments
Show 1 older commentHide 1 older comment

Bernardo Fonseca on 27 Apr 2016

Open in MATLAB Online

Greg,

Thank you for your quick reply.

I feel that, even though I may be overthinking, I wish to understand every bit of what I have done, and that although your code did provide me results fairly similar, I want to find a solution that makes almost perfect sense, not worrying much about its complexity.

By the way, best R2 (Rsq) using your simpler code value was 0.81 (I tried from 1 to 30 neurons, best was for 28), and I am getting slightly better values with mine.

So, my question still remain:

1. As all my inputs and targets are normalized from 0 to 1, does it make sense to use logsig in both the hidden and output layers? Or should the output use a linear transfer function? (to be honest, I think a linear output function gives better results, but I can't understand why!!)

2. (this may be a dumb question, but I want to make sure. But the important question is the one above) is fitnet multilayer perceptron? I know the perceptron function is only kept for historical reasons, but I am confused if fitnet is an MLP net or if it is just a feedforward net (although I haven't yet understood the diference - and I have been reading about ANNs for months).

3. I am comparing the performance of the networks by comparing only the test set performance. As such, what do you think are the benefits of using rng('default') vs using net.divideFcn = 'divideint' ?

4. When you use

[ net tr y e ] = train( net, x, t );

I know e=t-y, but the above format does not appear in the documentation. My question is if that error is calculated for all sets, or just for the test set.

Thank you for your previous answer, and I would really appreciate it if you could help me with my questions (mostly number 1).

Bernardo

Bernardo Fonseca on 29 Apr 2016

Greg or anyone, could you help me?

I am really debating why should one use or not logsig in the output layer...

I would really appreciate some input.

Thank you!

Greg Heath on 2 May 2016

A feedforward does not contain connectons for backward moving signals.

rng('default') intializes the Random Number Generator to a specified state. This allows reproducibility of the net.

Always initialize the RNG to a specified state of your choice before assigning initial weigts. Then you can reproduce the design later.

Since e=t-y does not contain trn, val or tst subsripts ...

Hope this helps.

Greg

Sign in to comment.

Logsig activation function in irradiance post-processing

1 Comment
Show -1 older commentsHide -1 older comments

Accepted Answer

3 Comments
Show 1 older commentHide 1 older comment

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

Logsig activation function in irradiance post-processing

1 Comment Show -1 older commentsHide -1 older comments

Accepted Answer

3 Comments Show 1 older commentHide 1 older comment

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

1 Comment
Show -1 older commentsHide -1 older comments

3 Comments
Show 1 older commentHide 1 older comment