How to do a classification using Matlab?

14 views (last 30 days)
Hi Smart Guys,
I have got the data (can be downloaded here: [enter link description here][1]) and tried to run a simple LDA based classification based on the 11 features stored in the dataset, ie, F1, F2, ..., F11.
Here I wrote some codes in Matlab using only 2 features. May I ask some questions based on the codes I have got please?
clc; clf; clear all; close all;
%%Load the extracted features
features = xlsread('ExtractedFeatures.xls');
numFeatures = 23;
%%Define ground truth
groundTruthGroup = cell(numFeatures,1);
groundTruthGroup(1:15) = cellstr('Good');
groundTruthGroup(16:end) = cellstr('bad');
%%Select features
featureSelcted = [features(:,3), features(:,9)];
%%Run LDA
[ldaClass, ldaResubErr] = classify(featureSelcted(:,1:2), featureSelcted(:,1:2), groundTruthGroup, 'linear');
bad = ~strcmp(ldaClass,groundTruthGroup);
ldaResubErr2 = sum(bad)/numFeatures;
[ldaResubCM,grpOrder] = confusionmat(groundTruthGroup,ldaClass);
%%Scatter plot
gscatter(featureSelcted(:,1), featureSelcted(:,2), groundTruthGroup, 'rgb', 'osd');
xlabel('Feature 3');
ylabel('Feature 9');
hold on;
plot(featureSelcted(bad,1), featureSelcted(bad,2), 'kx');
hold off;
%%Leave one out cross validation
leaveOneOutPartition = cvpartition(numFeatures, 'leaveout');
ldaClassFun = @(xtrain, ytrain, xtest)(classify(xtest, xtrain, ytrain, 'linear'));
ldaCVErr = crossval('mcr', featureSelcted(:,1:2), ...
groundTruthGroup, 'predfun', ldaClassFun, 'partition', leaveOneOutPartition);
%%Display the results
clc;
disp('______________________________________ Results ______________________________________________________');
disp(' ');
disp(sprintf('Resubstitution Error of LDA (Training Error calculated by Matlab build-in): %d', ldaResubErr));
disp(sprintf('Resubstitution Error of LDA (Training Error calculated manually): %d', ldaResubErr2));
disp(' ');
disp('Confusion Matrix:');
disp(ldaResubCM)
disp(sprintf('Cross Validation Error of LDA (Leave One Out): %d', ldaCVErr));
disp(' ');
disp('______________________________________________________________________________________________________');
I. My first question is how to do a feature selection? For example, using forward or backward feature selection, and t-test based methods?
I have checked that the Matlab has got the `sequentialfs` method but not sure how to incorporate it into my codes.
II. How do using the Matlab `classify` method to do a classification with more than 2 features? Should we perform the PCA at first? For example, currently we have 11 features, and we run PCA to produce 2 or 3 PCs and then run the classification? (I am expecting to write a loop to add each feature one by one to do a forward feature selection. Not just run PCA to do a dimension reduciton.)
III. I have also try to run a ROC analysis. I refer to the webpage [enter link description here][2] which has got an implementation of a simple LDA method and produce the linear scores of the LDA. Then we can use `perfcurve` to get the ROC curve.
IIIa. However, I am not sure how to use `classify` method with `perfcurve` to get the ROC.
IIIb. Also, how to do a ROC with the cross-validation?
IIIc. After we have got the `OPTROCPT`, which is the best cut-off point, how can we use this cut-off point to produce better classification?
%%ROC Analysis
featureSelcted = [features(:,3), features(:,9)];
groundTruthNumericalLable = [zeros(15,1); ones(8,1)];
% Calculate linear discriminant coefficients
ldaCoefficients = LDA(featureSelcted, groundTruthNumericalLable);
% Calulcate linear scores for the training data
ldaLinearScores = [ones(numFeatures,1) featureSelcted] * ldaCoefficients';
% Calculate class probabilities
classProbabilities = exp(ldaLinearScores) ./ repmat(sum(exp(ldaLinearScores),2),[1 2]);
% Fit probabilities for scores
figure,
[FPR, TPR, Thr, AUC, OPTROCPT] = perfcurve(groundTruthNumericalLable(:,1), classProbabilities(:,1), 0);
plot(FPR, TPR, 'or-')
xlabel('False positive rate (FPR, 1-Specificity)'); ylabel('True positive rate (TPR, Sensitivity)')
title('ROC for classification by LDA')
grid on;
IV. Currently, I calculate the accuracy of the training and cross validation errors by the classify and `crossval` functions. May I ask how to get those values in a summary by using `classperf`?
V. If anyone knows a good tutorial of using Matlab statistic toolbox to do machine learning task with a full example please tell me.
Some Matlab Help examples are really confusing to me because the examples are made in pieces and I am really a novice to machine learning. Sorry if I asked some question bot proper. Thanks very much for your help.
A.

Accepted Answer

Ilya
Ilya on 15 Mar 2013
If you are using a relatively recent release, I suggest switching to ClassificationDiscriminant. It has more functionality and provides a better interface. For example, cross-validating is easy.
I. You can mean different things by "t-test based methods". One popular approach in classification is: Start with a null set and add features one by one until classification error stops decreasing. There is no formal test. This may or may not be what you want.
II. You don't need to do anything special to use discriminant with more than 2 features. Pass all features to classify, and it will figure out what to do. A linear transformation (orthogonal rotation in case of PCA) cannot help find the optimal linear class boundary.
IIIc. The 3rd output from perfcurve is an array of thresholds on the classification scores. (In case of discriminant, these scores are posterior class probabilities.) Once you've found the optimal FPR and TPR, you can find the respective threshold. Classify an observation as positive if the posterior probability for the positive class is above this threshold and classify as negative otherwise.
For other questions, please refer to the documentation. There are examples for all functions you mention. If something remains unclear, show what you tried and ask a more specific question.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!