How to further debug this code?

6 views (last 30 days)
pliz
pliz on 24 Mar 2013
This is probably easy for most people, but not me. I separately ran each of these "P" sections (P=1 to P = 4 for offline learning - neural networks) in their own loop and successfully reached an error (E) of less than 0.001, pretty quickly.
But the combined error of P = 1,2,3,and 4 together don't decrease to below 0.004 as I would like, instead they're stuck on a summed error of around 0.64 and even plotting the error of combining only P=1 and 2 (like I've done in the copy of my code below - which is why P = 3 and P = 4 are so jumbled) gets stuck at around 0.32. That's the extent of my ability to debug this code.
Can anyone see what obvious mistake I'm making (or otherwise optimize my crude coding)? Because I can't.
Thanks in advance!
clc;
clear all;
close all;
eta = -1;
x1 = 0.1;
x2 = 0.1;
x3 = 1;
w1(1)=1*(rand(1)-0.5);
w2(1)=-1*(rand(1)-0.5);
w3(1)=2.3*(rand(1)-0.5);
w4(1)=2.1*(rand(1) - 0.5);
w5(1)=-2*(rand(1) - 0.5);
w6(1)=-2*(rand(1)-0.5);
w7(1)=1*(rand(1) - 0.5);
w8(1)=2*(rand(1) - 0.5);
w9(1)=2*(rand(1) - 0.5);
i=1;
for icount=1:10000
%P = 1
x1 = 0.1;
x2 = 0.1;
alpha1 = w1(i)*x1 + w3(i)*x2 + w5(i)*x3;
z1 = 1./(1 + exp(-alpha1));
alpha2 = w2(i)*x1 + w4(i)*x2 + w6(i)*x3;
z2 = 1./(1 + exp(-alpha2));
alpha3 = w7(i)*z1 + w8(i)*z2 + w9(i)*x3;
y1 = 1./(1 + exp(-alpha3));
%Hidden layer gate z1
changew11 = eta*x1*z1*(1-z1)*w7(i)*y1*(1-y1)*(y1-0.1);
changew31 = eta*x2*z1*(1-z1)*w8(i)*y1*(1-y1)*(y1-0.1);
changew51 = eta*x3*z1*(1-z1)*w9(i)*y1*(1-y1)*(y1-0.1);
%Hidden layer gate z2
changew21 = eta*x1*z2*(1-z2)*w7(i)*y1*(1-y1)*(y1-0.1);
changew41 = eta*x2*z2*(1-z2)*w8(i)*y1*(1-y1)*(y1-0.1);
changew61 = eta*x3*z2*(1-z2)*w9(i)*y1*(1-y1)*(y1-0.1);
%Output layer
changew71 = eta*z1*y1*(1-y1)*(y1-0.1);
changew81 = eta*z2*y1*(1-y1)*(y1-0.1);
changew91 = eta*x3*y1*(1-y1)*(y1-0.1);
E1(i) = (y1-0.1)^2;
%P = 2
x1 = 0.1; x2 = 0.9;
alpha1 = w1(i)*x1 + w3(i)*x2 + w5(i)*x3;
z1 = 1./(1 + exp(-alpha1));
alpha2 = w2(i)*x1 + w4(i)*x2 + w6(i)*x3;
z2 = 1./(1 + exp(-alpha2));
alpha3 = w7(i)*z1 + w8(i)*z2 + w9(i)*x3;
y2 = 1./(1 + exp(-alpha3));
%Hidden layer gate z1
changew12 = eta*x1*z1*(1-z1)*w7(i)*y2*(1-y2)*(y2-0.9);
changew32 = eta*x2*z1*(1-z1)*w8(i)*y2*(1-y2)*(y2-0.9);
changew52 = eta*x3*z1*(1-z1)*w9(i)*y2*(1-y2)*(y2-0.9);
%Hidden layer gate z2
changew22 = eta*x1*z2*(1-z2)*w7(i)*y2*(1-y2)*(y2-0.9);
changew42 = eta*x2*z2*(1-z2)*w8(i)*y2*(1-y2)*(y2-0.9);
changew62 = eta*x3*z2*(1-z2)*w9(i)*y2*(1-y2)*(y2-0.9);
%Output layer
changew72 = eta*z1*y2*(1-y2)*(y2-0.9);
changew82 = eta*z2*y2*(1-y2)*(y2-0.9);
changew92 = eta*x3*y2*(1-y2)*(y2-0.9);
E2(i) = (y2-0.9)^2;
% %P = 3
% x1 = 0.9;
% x2 = 0.1;
% alpha1 = w1(i)*x1 + w3(i)*x2 + w5(i)*x3;
% z1 = 1./(1 + exp(-alpha1));
% alpha2 = w2(i)*x1 + w4(i)*x2 + w6(i)*x3;
% z2 = 1./(1 + exp(-alpha2));
% alpha3 = w7(i)*z1 + w8(i)*z2 + w9(i)*x3;
% y3 = 1./(1 + exp(-alpha3));
% %Hidden layer gate z1
% changew13 = eta*x1*z1*(1-z1)*w7(i)*y3*(1-y3)*(y3-0.9);
% changew33 = eta*x2*z1*(1-z1)*w8(i)*y3*(1-y3)*(y3-0.9);
% changew53 = eta*x3*z1*(1-z1)*w9(i)*y3*(1-y3)*(y3-0.9);
% %Hidden layer gate z2
% changew23 = eta*x1*z2*(1-z2)*w7(i)*y3*(1-y3)*(y3-0.9);
% changew43 = eta*x2*z2*(1-z2)*w8(i)*y3*(1-y3)*(y3-0.9);
% changew63 = eta*x3*z2*(1-z2)*w9(i)*y3*(1-y3)*(y3-0.9);
% %Output layer
% changew73 = eta*z1*y3*(1-y3)*(y3-0.9);
% changew83 = eta*z2*y3*(1-y3)*(y3-0.9);
% changew93 = eta*x3*y3*(1-y3)*(y3-0.9);
% E3(i) = (y3-0.9)^2;
% %P = 4 % x1 = 0.9; x2 = 0.9; x3 = 1;
% alpha1 = w1(i)*x1 + w3(i)*x2 + w5(i)*x3;
% z1 = 1./(1 + exp(-alpha1));
% alpha2 = w2(i)*x1 + w4(i)*x2 + w6(i)*x3;
% z2 = 1./(1 + exp(-alpha2));
% alpha3 = w7(i)*z1 + w8(i)*z2 + w9(i)*x3;
% y4 = 1./(1 + exp(-alpha3));
% %Hidden layer gate z1
% changew14 = eta*x1*z1*(1-z1)*w7(i)*y4*(1-y4)*(y4-0.1);
% changew34 = eta*x2*z1*(1-z1)*w8(i)*y4*(1-y4)*(y4-0.1);
% changew54 = eta*x3*z1*(1-z1)*w9(i)*y4*(1-y4)*(y4-0.1);
% %Hidden layer gate z2
% changew24 = eta*x1*z2*(1-z2)*w7(i)*y4*(1-y4)*(y4-0.1);
% changew44 = eta*x2*z2*(1-z2)*w8(i)*y4*(1-y4)*(y4-0.1);
% changew64 = eta*x3*z2*(1-z2)*w9(i)*y4*(1-y4)*(y4-0.1);
% %Output layer
% changew74 = eta*z1*y4*(1-y4)*(y4-0.1);
% changew84 = eta*z2*y4*(1-y4)*(y4-0.1);
% changew94 = eta*x3*y4*(1-y4)*(y4-0.1);
% E4(i) = (y4-0.1)^2;
sumE(i) = E1(i) + E2(i); %+ E3(i) + E4(i);
if sumE(i)<=0.004
break
end
i=i+1;
w1(i) = w1(i-1) + changew11+changew12;%+changew13+changew14;
w2(i) = w2(i-1) + changew21+changew22;%+changew23+changew24;
w3(i) = w3(i-1) + changew31+changew32;%+changew33+changew34;
w4(i) = w4(i-1) + changew41+changew42;%+changew43+changew44;
w5(i) = w5(i-1) + changew51+changew52;%+changew53+changew54;
w6(i) = w6(i-1) + changew61+changew62;%+changew63+changew64;
w7(i) = w7(i-1) + changew71+changew72;%+changew73+changew74;
w8(i) = w8(i-1) + changew81+changew82;%+changew83+changew84;
w9(i) = w9(i-1) + changew91+changew92;%+changew93+changew94;
end
figure(1);
grid on;
title('W values 1-9 Vs Iteration Number');
hold on;
plot(w1,'red');
plot(w2,'green');
plot(w3,'blue');
plot(w4,'cyan');
plot(w5,'magenta');
plot(w6,'yellow');
plot(w7,'black');
plot(w8,':red');
plot(w9,'-.green');
legend('w1','w2','w3','w4','w5','w6','w7','w8','w9','Location','Best');
figure(2);
grid on;
title('Error Vs Iteration Number');
hold on;
plot(sumE);
  4 Comments
Walter Roberson
Walter Roberson on 24 Mar 2013
Do you really have large blocks of commented-out code just before sumE(i) calculation?
pliz
pliz on 29 Mar 2013
Edited: pliz on 29 Mar 2013
I tried to respond to the comments. Per and Image Analyst, I followed the tutorial on editing the code and could copy and paste the above revision into my file editor and the output was as I intended (meaning it shows the problems I'm having).
Walter - yes I commented out the blocks to show the troubleshooting I've done. Again, each individual "P" reaches the required error. And if I only comment-out "Ps" sharing the same error calculation ((y-0.1) or y-0.9)), the required error is reached. It's only when I run epochs with different error calculations (which includes running all 4 "P" values together) that no solution is reached.
Greg - thank you for the background. Unfortunately I'm not familiar with a lot of the subject matter. I'll try and upload some of the notes we were given for motivation on this problem.
Again, thank you everyone for your help. I'll respond more quickly this time, and I appreciate any further assistance.

Sign in to comment.

Accepted Answer

Greg Heath
Greg Heath on 25 Mar 2013
1. If the number of unknowns is greater than the number of equations, then a solution is not unique (How many solutions to x1 + x2 = 1?).
2. A net with more unknown weights than the number of training equations is said to be OVERFIT. The nonuniqueness of exact solutions is typically mitigated by various techniques mentioned below.
3. If an overfit net is trained with data consisting of signal + random contamination (noise, measurement error, roundoff and/or truncation error). A LMSE (least-mean-square-error) solution obtained from a signal with a particular set of contamination may yield a large MSE for the same signal with different contamination.
4. A net that performs well on nontraining data, that can be assumed to be drawn from the same source as the training data, is said to have good generalization, i.e., it generalizes well to nontraining data.
5. If a net is overfit but the signal to contamination power ratio is sufficiently high, iterative solutions tend to pass though regions of good generalization on the way to minimizing the training MSE. Such nets are said to be OVERTRAINED.
6. There are several methods to mitigate overtraining an overfit net. See the comp.ai.neural-nets FAQ and search for overfit, overfitting and/or generalization.
7. For a single hidden layer MLP with H hidden nodes and an I-H-O node topology trained by Ntrn pairs of I-dimensional inputs and O-dimensional outputs:
Ntrneq = Ntrn*O % No. of training equations
Nw = (I+1)*H+(H+1)*O % No. of unknown weights.
Typically, Ntrn, I and O are given and a choice of H has to be made. To avoid overfitting, choose H to be less than the upperbound
Hub = -1 + ceil( Ntrneq-O)/(I+O+1) ).
Sometimes this can be achieved by reducing I, O, and/or H by pruning connections.
8. If avoiding overfitting does not yield an acceptable solution, then there are other mitigation techniques for not overtraining an overfit net(See the comp.ai.neural-nets FAQ):
a. Validation set stopping
b. Regularization of the minimization objective
1. Weight decay
2. Weight elimination
3. Bayesian regularization
c. Jittering(Training with added noise)
Bottom Line:
If you have 9 unknown weights you might want at least 45 or 90 equations or else use a mitigation technique.
Hope this helps.
Thank you for formally accepting ny answer
Greg

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!