What is a good Dimensionality Reduction technique I can use?

5 views (last 30 days)
I'm currently analyzing human gait, and designing a system for automatic recognition based on those unique traits. My features are extracted by accumulating the difference between sequential frames taken from video sequences of walking subjects. and then applying DCT at one stage to get one feature vector per sample, of dimension 100. I'm using a linear classifier. Upon testing the classifier, I get a Recognition Rate of around 75%. One of the techniques I'm trying to enhance my Rate is Dimensionality Reduction. I'v tried one Pattern-Recognition Tool Box I found on the web, including techniques like PCA, LPP, etc. I've also tried the Matlab function stepwisefit. However, none of the mentioned seems to work with me.
I'd appreciate if anyone can advise me for a good technique I should try. whether a built-in Matlab function or another code? What would be the best way to test these techniques?

Accepted Answer

Ilya
Ilya on 26 Sep 2012
Your best chance would be to set up variable selection based on that linear classifier you are using (you don't say what it is). For 100 features, sequentialfs from Statistics Tlbx could produce results within reasonable time; it depends, of course, on how many observations you have. If your data has two classes, I am surprised stepwisefit did not help since linear regression often gives a decent approximation to linear binary classification. If you have R2012a or later, you could try ClassificationDiscriminant with thresholding and possibly regularization. It is also possible that 75% is as good as you can ever do with a linear technique.
  4 Comments
Amer
Amer on 16 Nov 2012
So far, I've tried a few techniques in my research, including those you recommended. I'm now re-visiting linear regression & dimensionality reduction, although initial setup didn't work for me. Hoping you may input on this. My database is actually 3 sets of different walking conditions; normal, carrying bags and wearing coats. Training features are taken from the first set. Using test samples under the same condition yields excellent rate of 98-99% with only LDA (applying PINV or CLASSIFY functions). However, this rate drops when using probe samples from the other two sets. using sequentialfs was not so helpful. Below is part of the code I used applying sequentialfs function:
fun = @(train_features, train_labels, test_features, test_labels) sum(test_labels~= classify(test_features, train_features, train_labels));
fs = sequentialfs(fun,features,labels);
[val ind]=find(fs==1);
'features' is 744x100 matrix of which the 372x100 'train_features' matrix and 372x100 'test_features' matrix were manually selected. After this I used 'ind' to compare with the targets and calculate the recognition rate based again on linear functions. With this method I've attained only 73% recognition rate. Did I do it right, or something needs to be altered?
Regarding the 'stepwisefit' function, I used the basic syntax:
B=STEPWISEFIT(train_features,train_labels)
B is a 100x1 coefficients vector. I'm not sure how to use this vector for classification?
I must apologize for the lengthy and boring text. I hope to get some clarifications on this. thanks
Ilya
Ilya on 18 Nov 2012
It appears that you obtain low classification accuracy because the training set is sampled from one distribution and the test set is sampled from another. This issue may not be fixable by any feature selection procedure. In particular, stepwisefit only looks at the training data and therefore does not address this issue at all. You don't say what data are passed to sequentialfs, but I suspect you pass only the "normal" set as well.
Here is an obvious question: Why not include your sets 2 and 3 in the training set?

Sign in to comment.

More Answers (0)

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!