How to improve accuracy for unseen data

4 views (last 30 days)
This is Neural Network Pattern Recognition. Entire dataset is consists of (10 users and 8 samples per user) total 80 images to classify. I have divided entire dataset in two parts- 50 images for training (10 users x 5 samples per user) and 30 images as unseen images (10 users x 3 samples per user). Using nprtool patternet neural network is designed, the code is as follows:
inputs = mapstd(train_data); %%train_data [I N ] = [ 60 50 ]
targets = mapstd(Targets); %%[ O N ] = [ 10 50 ]
% Create a Pattern Recognition Network
hiddenLayerSize = 10;
net = patternnet(hiddenLayerSize);
% Choose Input and Output Pre/Post-Processing Functions
% For a list of all processing functions type: help nnprocess
net.inputs{1}.processFcns = {'removeconstantrows','mapminmax'};
net.outputs{2}.processFcns = {'removeconstantrows','mapminmax'};
% Setup Division of Data for Training, Validation, Testing
% For a list of all data division functions type: help nndivide
net.divideFcn = 'dividerand'; % Divide data randomly
net.divideMode = 'sample'; % Divide up every sample
net.divideParam.trainRatio = 70/100;
net.divideParam.valRatio = 15/100;
net.divideParam.testRatio = 15/100;
% For help on training function 'trainlm' type: help trainlm
% For a list of all training functions type: help nntrain
net.trainFcn = 'trainscg'; % Scaled conjugate gradient backpropagation
% Choose a Performance Function
% For a list of all performance functions type: help nnperformance
net.performFcn = 'mse'; % Mean squared error
%net.trainParam.max_fail= 6; %Maximum validation failures
%net.trainParam.lr = 0.25;
% Choose Plot Functions
% For a list of all plot functions type: help nnplot
net.plotFcns = {'plotperform','plottrainstate','ploterrhist', ...
'plotregression', 'plotroc', 'plotconfusion'};
%net = configure(net,inputs,targets);
% Train the Network
[net,tr] = train(net,inputs,targets);
% Test the Network
outputs = net(inputs);
errors = gsubtract(targets,outputs);
performance = perform(net,targets,outputs)
% Recalculate Training, Validation and Test Performance
trainTargets = targets .* tr.trainMask{1};
valTargets = targets .* tr.valMask{1};
testTargets = targets .* tr.testMask{1};
trainPerformance = perform(net,trainTargets,outputs)
valPerformance = perform(net,valTargets,outputs)
testPerformance = perform(net,testTargets,outputs)
% View the Network
% view(net)
% Plots
% Uncomment these lines to enable various plots.
%figure, plotperform(tr)
%figure, plottrainstate(tr)
%figure, plotconfusion(targets,outputs)
%figure, ploterrhist(errors)
%figure, plotroc(targets,outputs)
save net net
disp('training completed')
*For testing network performance on unseen data*
test_data = mapstd(test_data);
load net;
netoutput = sim(net,test_data) %%simulation with test data
% locate index of maximum value of output node
[y, ind] = max(netoutput);
Training stops on validation and confusion matrix shows very poor accuracy. What should I do for getting good classification accuracy? I refer ‘Greg Heath’ post on Matlab central. Thank to ‘Greg Heath’ for providing insight on neural network. As per post reply I have calculated H = 4 for may problem, still I am getting good accuracy.
[ I N ] = size(train_data) % [ 60 50 ] each column contains real data of 3 digit
[ O N ] = size(Targets)% [ 10 50 ]
Ntrn = N-2*round(0.15*N) % 34 training examples
Ntrneq = Ntrn*O % 340 training equations
%For a robust design desire Ntrneq >> Nw or
Hub = -1+ceil( (Ntrneq-O) / (I+O+1)) % 4
Nw = (I+1)*H+(H+1)*O % Number of unknown weights
%H << Hub = -1+ceil( (Ntrneq-O) / (I+O+1))
Please guide me how to proceed further…… Thank you in advance
  2 Comments
Greg Heath
Greg Heath on 21 May 2014
I wrote out a detailed response. However, my computer doesn't allow me to post or email. I will try to get it fixed tomorrow and post the response.
Chetana
Chetana on 22 May 2014
Thank you Greg sir. I am excited to see your reply. I tried to improve accuracy by changing
1) number of hidden nodes (Now it is 40)
2) Using,
MSEgoal = 0.1*mean(var(targets',1)); %
MinGrad = MSEgoal/ 100;
net.trainParam.goal = MinGrad;
3) Using,
net.performFcn = 'mse'; % Mean squared error
net.trainParam.max_fail= 100; %Maximum validation failures
net.trainParam.epochs = 1000;
net.trainParam.lr = 0.25;
Now, confusion matrix shows, 90% classification
But, network does not classify unseen images correctly. Only 4 images out of 30 unseen images (10 users x 3 samples per user)gets classified correctly. I think my network learned for train data but fails in generalizing it. Please guide me. Thank you.

Sign in to comment.

Accepted Answer

Greg Heath
Greg Heath on 26 May 2014
0. please excuse no caps. originally had problems and now too lazy to change...
1. do the 60-dim inputs represent extracted features (e.g., plsregress (not pca!)) or a columnized image? if the latter, what size?
2. have you tried to reduce the input vector size via plsregress?
3. are the target columns from eye(10) so that
target = ind2vec(trueclassindices)
trueclassindices = vec2ind(target)
estimatedclassindices = vec2ind(output)
4. no need to explicitly separate 'unseen data': test data is in no way used for training or validation. therefore, it can be used to obtain "unbiased" estimates of unseen data performance. this holds true for averages of multiple designs obtained from
a. random or stratified (e.g., k-fold xval) data divisions
b. and/or random weight initializations
5. wise to use minmax to check for outliers after using mapstd
6. no need to explicitly include assignments of defaults.
7. syntax error: save net net
8. need numh*numw designs: numw sets of random initial weights and/or
data divisions for each of numh different values for hidden layer
size, h.
9. explicitly calculate trn/val/tst classification error rates for
each design (unless you can figure out how to get them from the confusion
matrix).
10. for each value of h, rank designs w.r.t. performance on validation data.
11. select best val performers at minimum successful value of h. combine
corresponding test performances to obtain unbiased estimates of unseen
data error rates and confidence intervals.
12. search for my examples in the newsgroup and answers
greg patternnet ntrials
13. test on a matlab classification dataset so we can compare results
help nndatasets
doc nndatasets
hth
greg

More Answers (5)

Greg Heath
Greg Heath on 22 May 2014
The dramatic difference between your training and nontraining performance is a classic example of OVERTRAINING an OVERFIT net when H/Hub and max_fail are too large.
1. Why would you use H = 40 when Hub = 4???
2. Why would you use max_fail = 100 when H >> Hub AND the max_fail default is only 6???
Training performance tends to be irrelevant. Rank multiple designs using the validation performance. Obtain unbiased estimates of performance on unseen data from the test subset performance on the best ranked designs. You do not need a separate "unseen" test data set. Performance uncertaincies are easy to estimate by designing multiple examples with random data divisions and random initial weights.
3. Why do you have a 60-dim input when you only have 50-2*round(0.15*50) = 34 training examples? At most, they span a 33-D input space.
===========================================================================
You need to get a better feel for your problem. Start simple:
1. How many inputs do you really need? Considering linear models can be very helpful
a. Stepwisefit results tend to be useful for selecting original variables.
b. Plsregress results tend to be useful for selecting linear combinations
c. PCA is not guaranteed to be useful for classification.
2. Back to NNs
a. Use all data for training: divideFcn = 'dividetrain'
b. Otherwise USE ALL DEFAULTS except for a chosen RNG seed so that results can be duplicated.
c. Obtain 10 designs, explicitly calculate class error rates and compare with confusion matrix results
3. Repeat 2 but try to minimize H as much as possible.
4. Using the minimum acceptible value for H, try to reduce the number of inputs.
5. Finally, you can return to the original goal of estimating performance on nondesign test data. Confidence limits can be deduced from multiple designs with random data divisions and random initial weights.
Hope this helps.
Greg
PS Sorry my original response is still unavailable. Maybe tomorrow.
  4 Comments
Greg Heath
Greg Heath on 26 May 2014
I think there is a misunderstanding. With 'dividetrain' all of the data is used for training and max_fail is irrelevant. The purpose of this excercise is to
1. Make sure that the data is consistent
2. Find the minimimum no. of hidden nodes that are necessary.
For unbiased estimates of performance on nontraining data. Design multiple nets with different random initial weights. The multiplicity allows the estimation of confidence intervals.
P.S. Am trying to deal with the computer virus myself. I will soon post my previous comments.
Greg Heath
Greg Heath on 31 May 2014
Data successfully received. However, currently, I don't have much time. Will be traveling until June 4 with malfunctioning laptop.
I STRONGLY suggest you look into reducing the input dimensions.
It would also be helpful to use more examples per class.
Currently, you are "defining" a class in 60 dimensions with at most, 6 training examples that, span at most, 5 dimensions.
That would not be bad if the different 5-dimensional class subspaces do not intersect. However, that is like whistling in the dark to ward off demons. I tend to feel comfortable with classes defined with, at least, two independent examples per dimension.
Again: Try to reduce the dimensionality. I made some recommendations in an earlier post. Also search the NEWSGROUP and ANSWERS with combination of keywords like
image feature extraction
Hope this helps.

Sign in to comment.


Greg Heath
Greg Heath on 1 Jun 2014
Revelations from New Data:
clear all, close all, clc
tic
load P1.txt
whos
% P1 61x350 170800 double
inputs = P1(1:end-1,:);
trueclasses = P1(end,:);
minmaxindices = minmax(trueclasses) % [ 1 50 ]
Nclasses = numel(unique(trueclasses)) % 50
targets = ind2vec(trueclasses);
[ I N ] = size(inputs) % [ 60 350 ]
[ O N ] = size(targets) % [ 50 350 ]
%No val or test data
Ntrneq = N*O % 17500 training equations
% NAIVE CONSTANT MODEL
ynaive = mean(targets,2); % 0.02*ones(O,1)
Nw00 = numel(ynaive) % O = 50
Ndof00 = Ntrneq-Nw00 % 17450 DegsOfFreedom
y00 = repmat(ynaive,1,N); % [50 350]
SSE00 = sse(targets-y00) % 343
MSE00 = SSE00/Ntrneq % 0.0196 biased
MSE00a = SSE00/Ndof00 % 0.0197 DOF adjusted
% MSE00 = mean(var(targets',1)) % 0.0196
% MSE00a = mean(var(targets',0)) % 0.0197
% LINEAR MODEL y0 = W0*[ones(1,N); inputs];
W0 = targets/[ones(1,N); inputs];
Nw0 = numel(W0) % 3050
Ndof0 = Ntrneq-Nw0 % 14450 DegsOfFreedom
y0 = W0*[ones(1,N); inputs];
SSE0 = sse(targets-y0) % 278.3153
MSE0 = SSE0/Ntrneq % 0.0159 biased
MSE0a = SSE0/Ndof0 % 0.0193 DOFa
R20 = 1-MSE0/MSE00 % 0.1886 ~ 19%
R20a = 1-MSE0a/MSE00a % 0.0201 ~ 2%
Elapsedtime = toc % 180 sec
% When the degree of freedom adjustment is made to compensate for estimating performance with training data, a conjecture that ~19% of the target variance is "explained" (R20) must be modified to only ~2% (R20a). Therefore, the Linear Model doesn't appear to be significantly better than the Naive Constant Model that is based on apriori probabilities. Complicating the analysis via trn/val/tst data division will not improve results.
% I will let you calculate the resulting 50 class error rates
% Escalating to quadratic classifiers, one for each of the 50 classes might be feasible. The simplest model would use 50 hidden node radial basis functions centered at the class means. This can be constructed using NEWRB.
Hope this helps.
Greg

Greg Heath
Greg Heath on 2 Jun 2014
%A quick way to see if any of the variables or classes appear to be different from the others is to standardize the inputs to zero-mean/unit-variance and compare the rows and columns of W0
%STANDARDIZATION (To compare Linear Model coefficients)
zinputs = zscore(inputs')';
W0z = targets/[ones(1,N); zinputs];
minmaxW0z = minmax(W0z); % [ 50 2 ]
minmaxW0zp = minmax(W0z'); % [ 61 2 ]
whos
figure
subplot(2,1,1)
hold on
plot(1:50,minmaxW0z(:,1),'bo')
plot(1:50,minmaxW0z(:,2),'ro')
subplot(2,1,2)
hold on
plot(1:61,minmaxW0zp(:,1),'bo')
plot(1:61,minmaxW0zp(:,2),'ro')
%When the inputs are standardized, I see no significant differences between the weights associated with different classes or different variables

Greg Heath
Greg Heath on 2 Jun 2014
Although previous results are not encouraging, I'm curious what a biased MLP design would yield.
BIAS: All of the data is used for training and regularization (e.g., TRAINBR ) is NOT used
% NEURAL NETWORK MODEL
Hub = -1+ceil( (Ntrneq-O) / (I+O+1)) % 157
Ntrials = 10
rng(0)
j=0
for h =round([Hub/10, Hub/2, Hub])
j = j+1
h = h
Nw = (I+1)*h+(h+1)*O
Ndof = Ntrneq-Nw
net = patternnet(h);
net.divideFcn = ''; % 'dividetrain'
for i = 1:Ntrials
net = configure(net,inputs,targets);
[ net tr outputs regerrors ] = train(net,inputs,targets);
assignedclasses = vec2ind(outputs);
classerr = assignedclasses~=trueclasses;
Nerr(i,j) = sum(classerr);
% FrErr = Fraction of Errors (Nerr/N)
[FrErr(i,j),CM,IND,ROC] = confusion(targets,outputs);
FN(i,j) = mean(ROC(:,1)); % Fraction of False Negatives
TN(i,j) = mean(ROC(:,2)) ; % Fraction of True Negatives
TP(i,j) = mean(ROC(:,3)); % Fraction of True Positives
end
end
PctErr=100*Nerr/N
elapsedtime = toc %~412 sec
%%%%%% Percent Error
% ____________________________
% H = 16 79 157
% ____________________________
% 80.9 96.9 63.4
% 57.4 89.7 82.3
% 75.1 >19.1 19.4<
% 84.0 56.9 19.1<
% 60.3 75.4 96.3
% >41.7 83.7 77.1
% 94.9 84.3 21.1<
% 46.0 61.7 94.9
% 57.1 90.6 83.7
% 48.0 79.7 >17.8<
  1 Comment
Greg Heath
Greg Heath on 4 Jun 2014
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% Repeat with Ntrials = 20
%
%%%%%%Percent Error
% ________________________
% H = 16 79 157
% _________________________
% 82.3 74.0 >19.7
% 56.0 >24.3 64.0
% 72.6 94.9 >21.7
% 83.1 65.4 86.3
% 62.3 >26.9 >20.0
% >40.9 30.6 30.0
% 96.0 86.3 88.3
% >46.6 32.6 72.6
% 55.4 62.3 >19.7
% >46.3 >20.6 74.6
% 77.7 95.4 86.3
% 70.0 63.1 >18.9
% 97.7 79.4 89.4
% 67.4 84.6 83.4
% 56.9 82.9 84.6
% 70.6 67.4 74.0
% 55.1 55.7 77.1
% 57.4 78.3 88.0
% 75.1 34.9 72.9
% 74.9 43.1 82.9

Sign in to comment.


farzad
farzad on 21 Feb 2015
Hi All
I tried to use this code , though not the updated one , containing Proff. Heath's points , because I couldn't know to which part of the code should I add them , it is also important for me to use the NNW after training by giving it a new input to get the desired answer , but yet I couldn't know how to use sim, you have used test_data , which MATLAB gives error and doesn't know that , what should I use instead ?

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!