Got Questions? Get Answers.
Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Determine relative importance of original variables after performing PCA

Subject: Determine relative importance of original variables after performing PCA

From: Maureen

Date: 24 Jan, 2013 16:04:08

Message: 1 of 8

I have 350 observation and 27 variables. So I want to use PCA for dimension reduction purpose to plot the 350 observation on a 2D plot, which effectively means that I will only be using PC1 and PC2. My purpose is just to see their relationship on a 2D plot.

But how do I determine which of my original variables contribute most to the first two principle components and which of the variables are less important in which I can discard? I have saw many similar post online but have not come up with a solution. Where should I go from here?

I have read through the documentation on feature selection, and some people suggested using stepwisefit and other regression methods. I do not have much background with regression, so do correct me if I am wrong. Based on my readings, I believe I would need to have a set of criteria to select the features, in which I do not have an idea what should the criteria be. Also there should be a set of output, Y in order to perform stepwisefit. But for my case, all 27 variables are my features, which is the input so to speak and I do not have a set of output.

So if not using regression, may I know where do I go from here, so that I can determine the importance of my original set of variables? In other words, I need to find the contribution of the original variables to PC1 and PC2.

Appreciate any help/ suggestion. Thanks in advance!

Subject: Determine relative importance of original variables after performing PCA

From: Ilya Narsky

Date: 24 Jan, 2013 23:08:18

Message: 2 of 8

"Maureen " <maureen_510@hotmail.com> wrote in message
news:kdrm1o$e57$1@newscl01ah.mathworks.com...
> I have 350 observation and 27 variables. So I want to use PCA for
> dimension reduction purpose to plot the 350 observation on a 2D plot,
> which effectively means that I will only be using PC1 and PC2. My purpose
> is just to see their relationship on a 2D plot.
>
> But how do I determine which of my original variables contribute most to
> the first two principle components and which of the variables are less
> important in which I can discard? I have saw many similar post online but
> have not come up with a solution. Where should I go from here?
> I have read through the documentation on feature selection, and some
> people suggested using stepwisefit and other regression methods. I do not
> have much background with regression, so do correct me if I am wrong.
> Based on my readings, I believe I would need to have a set of criteria to
> select the features, in which I do not have an idea what should the
> criteria be. Also there should be a set of output, Y in order to perform
> stepwisefit. But for my case, all 27 variables are my features, which is
> the input so to speak and I do not have a set of output.
>
> So if not using regression, may I know where do I go from here, so that I
> can determine the importance of my original set of variables? In other
> words, I need to find the contribution of the original variables to PC1
> and PC2.
>
> Appreciate any help/ suggestion. Thanks in advance!
>

PCA is not really suited for variable selection. The typical workflow is to
discard principal components, not the original variables. The simplest
approach would be to keep enough components so that the cumulative
percentage of the total variance explained by these components (see 5th
output from the pca function) would be above a certain threshold, something
like 0.7-0.9. Better approaches can be found in popular textbooks and review
articles.

Here is a related thread that might help you:
http://www.mathworks.com/matlabcentral/answers/49134-determining-variables-that-contribute-to-principal-components

There are research papers on variable selection for PCA, and there are many
more papers on unsupervised variable selection, which is what you want since
you do not have response Y. Google them if you'd like. These are not
available from official MATLAB. There may be something on the File Exchange;
I have not checked.

-Ilya
 

Subject: Determine relative importance of original variables after performing PCA

From: Greg Heath

Date: 25 Jan, 2013 20:05:08

Message: 3 of 8

"Maureen " <maureen_510@hotmail.com> wrote in message <kdrm1o$e57$1@newscl01ah.mathworks.com>...
> I have 350 observation and 27 variables. So I want to use PCA for dimension reduction purpose to plot the 350 observation on a 2D plot, which effectively means that I will only be using PC1 and PC2. My purpose is just to see their relationship on a 2D plot.
>
> But how do I determine which of my original variables contribute most to the first two principle components and which of the variables are less important in which I can discard? I have saw many similar post online but have not come up with a solution. Where should I go from here?

You have not indicated

1. whether the task is classification or regression
2. if any of the 27 are ouputs
3. the number of output variables

The most important is 1 because PCA is inapprorpriate for classification. Therefore, I'll
assume the task is regression.

The next most important is 2 because PCA is only used to transform the input space.
Therefore I'll assume 27 original input variables.

3 is still important becase it affects what algorithms/techniques should be used.

> I have read through the documentation on feature selection, and some people >suggested using stepwisefit and other regression methods.

Yes. The best criterion to use is one that optimizes a specific function of the output variables.

>I do not have much background with regression, so do correct me if I am wrong. Based on my readings, I believe I would need to have a set of criteria to select the features, in >which I do not have an idea what should the criteria be.
 
If it's regression, it is simple, just read the STEPWISEFIT documentation.

If it is classification, then you should not be using PCA because there is no reason why
PCA space should be preferred over the original.

> Also there should be a set of output, Y in order to perform stepwisefit. But for my case, all 27 variables are my features, which is the input so to speak and I do not have a set of output.
>
> So if not using regression, may I know where do I go from here, so that I can determine the importance of my original set of variables? In other words, I need to find the contribution of the original variables to PC1 and PC2.
>
> Appreciate any help/ suggestion. Thanks in advance!

If you don't know what you want to optimize, then there is no reason to use PCA
over the original variables.

What do you want to do with the data?? What is your ultimate goal.

Greg

P.S. I want to to carpentry with two tools. Which 2 should I use?

Subject: Determine relative importance of original variables after performing PCA

From: Maureen

Date: 26 Jan, 2013 18:59:08

Message: 4 of 8

"Greg Heath" <heath@alumni.brown.edu> wrote in message <kduohk$nrj$1@newscl01ah.mathworks.com>...
> "Maureen " <maureen_510@hotmail.com> wrote in message <kdrm1o$e57$1@newscl01ah.mathworks.com>...
> > I have 350 observation and 27 variables. So I want to use PCA for dimension reduction purpose to plot the 350 observation on a 2D plot, which effectively means that I will only be using PC1 and PC2. My purpose is just to see their relationship on a 2D plot.
> >
> > But how do I determine which of my original variables contribute most to the first two principle components and which of the variables are less important in which I can discard? I have saw many similar post online but have not come up with a solution. Where should I go from here?
>
> You have not indicated
>
> 1. whether the task is classification or regression
> 2. if any of the 27 are ouputs
> 3. the number of output variables
>
> The most important is 1 because PCA is inapprorpriate for classification. Therefore, I'll
> assume the task is regression.
>
My task is definitely not classification, but I am not sure if it is regression though (not too familiar with regression even after reading through some materials). My main objective is to plot out the 350 observation on a 2D plot and examine their relationship, in the sense if the observations are plotted closer together are they more similar based on the 27 features.

> The next most important is 2 because PCA is only used to transform the input space.
> Therefore I'll assume 27 original input variables.
>
> 3 is still important becase it affects what algorithms/techniques should be used.
>
The 27 original input variables are inputs and I do not have an output variable.

> > I have read through the documentation on feature selection, and some people >suggested using stepwisefit and other regression methods.
>
> Yes. The best criterion to use is one that optimizes a specific function of the output variables.
>
I have got no idea what should the criteria be since I do not have a specific function of the output variables.
I just want to examine if the relationship of the observations and the variables, to find out if the closer the points (observations) on the plot, if the higher the similarity between the observation.

> >I do not have much background with regression, so do correct me if I am wrong. Based on my readings, I believe I would need to have a set of criteria to select the features, in >which I do not have an idea what should the criteria be.
>
> If it's regression, it is simple, just read the STEPWISEFIT documentation.
>
> If it is classification, then you should not be using PCA because there is no reason why
> PCA space should be preferred over the original.
>
This is not for classification purpose.

Am I right to say that if I use STEPWISEFIT, the variable Y used would be PC1? And similarly I would have to do it for at least PC2 and PC3 as well, since I am interested to know how much the original variables contribute to PC1 and PC2 (PC3 maybe, depending on how many dimension I intend to go for later). And after which adding up the absolute values from the 2 results from STEPWISEFIT (PC1 and PC2) to determine which original variables contribute the most to the first two PCs.

> > Also there should be a set of output, Y in order to perform stepwisefit. But for my case, all 27 variables are my features, which is the input so to speak and I do not have a set of output.
> >
> > So if not using regression, may I know where do I go from here, so that I can determine the importance of my original set of variables? In other words, I need to find the contribution of the original variables to PC1 and PC2.
> >
> > Appreciate any help/ suggestion. Thanks in advance!
>
> If you don't know what you want to optimize, then there is no reason to use PCA
> over the original variables.
>
> What do you want to do with the data?? What is your ultimate goal.
>
My main purpose is to optimise the plot so that observations that are generally similar are plotted closer and those that differs greatly are plotter further away from each other. But right now I seem to have a little bit of overfit problem whereby admits the similar observations, there are a few plots that are plotted at the wrong place.

> Greg
>
> P.S. I want to to carpentry with two tools. Which 2 should I use?

Subject: Determine relative importance of original variables after performing PCA

From: Greg Heath

Date: 27 Jan, 2013 11:50:08

Message: 5 of 8

"Maureen " <maureen_510@hotmail.com> wrote in message <ke191s$t5t$1@newscl01ah.mathworks.com>...
> "Greg Heath" <heath@alumni.brown.edu> wrote in message <kduohk$nrj$1@newscl01ah.mathworks.com>...
> > "Maureen " <maureen_510@hotmail.com> wrote in message <kdrm1o$e57$1@newscl01ah.mathworks.com>...
> > > I have 350 observation and 27 variables. So I want to use PCA for dimension reduction purpose to plot the 350 observation on a 2D plot, which effectively means that I will only be using PC1 and PC2. My purpose is just to see their relationship on a 2D plot.
> > >
> > > But how do I determine which of my original variables contribute most to the first two principle components and which of the variables are less important in which I can discard? I have saw many similar post online but have not come up with a solution. Where should I go from here?
> >
> > You have not indicated
> >
> > 1. whether the task is classification or regression
> > 2. if any of the 27 are ouputs
> > 3. the number of output variables
> >
> > The most important is 1 because PCA is inapprorpriate for classification. Therefore, I'll
> > assume the task is regression.
> >
> My task is definitely not classification, but I am not sure if it is regression though (not too familiar with regression even after reading through some materials). My main objective is to plot out the 350 observation on a 2D plot and examine their relationship, in the sense if the observations are plotted closer together are they more similar based on the 27 features.
>
> > The next most important is 2 because PCA is only used to transform the input space.
> > Therefore I'll assume 27 original input variables.
> >
> > 3 is still important becase it affects what algorithms/techniques should be used.
> >
> The 27 original input variables are inputs and I do not have an output variable.
>
> > > I have read through the documentation on feature selection, and some people >suggested using stepwisefit and other regression methods.
> >
> > Yes. The best criterion to use is one that optimizes a specific function of the output variables.
> >
> I have got no idea what should the criteria be since I do not have a specific function of the output variables.
> I just want to examine if the relationship of the observations and the variables, to find out if the closer the points (observations) on the plot, if the higher the similarity between the observation.
>
> > >I do not have much background with regression, so do correct me if I am wrong. Based on my readings, I believe I would need to have a set of criteria to select the features, in >which I do not have an idea what should the criteria be.
> >
> > If it's regression, it is simple, just read the STEPWISEFIT documentation.
> >
> > If it is classification, then you should not be using PCA because there is no reason why
> > PCA space should be preferred over the original.
> >
> This is not for classification purpose.
>
> Am I right to say that if I use STEPWISEFIT, the variable Y used would be PC1? And similarly I would have to do it for at least PC2 and PC3 as well, since I am interested to know how much the original variables contribute to PC1 and PC2 (PC3 maybe, depending on how many dimension I intend to go for later). And after which adding up the absolute values from the 2 results from STEPWISEFIT (PC1 and PC2) to determine which original variables contribute the most to the first two PCs.
>
> > > Also there should be a set of output, Y in order to perform stepwisefit. But for my case, all 27 variables are my features, which is the input so to speak and I do not have a set of output.
> > >
> > > So if not using regression, may I know where do I go from here, so that I can determine the importance of my original set of variables? In other words, I need to find the contribution of the original variables to PC1 and PC2.
> > >
> > > Appreciate any help/ suggestion. Thanks in advance!
> >
> > If you don't know what you want to optimize, then there is no reason to use PCA
> > over the original variables.
> >
> > What do you want to do with the data?? What is your ultimate goal.
> >
> My main purpose is to optimise the plot so that observations that are generally similar are plotted closer and those that differs greatly are plotter further away from each other. But right now I seem to have a little bit of overfit problem whereby admits the similar observations, there are a few plots that are plotted at the wrong place.

You seem rather confused.

Originally you postulated that you want to do something based on PCA without fully understanding why you are using PCA.

PCA is predominantly used to discover and rank the orthogonal directions in which the input data has the most spread without considering the task for which the data will be used.

PCA is used in regression based on the idea that the orthogonal tranformed input variables with the most spread are probably the variables that best explain the spread in the output data. This is not always true but the transformation can still be useful if the subset of PCs that is selected is based on the ability to represent the spread in the output data.

PCA is used in classification based on the idea that the orthogonal transformed input variables with the most spread are probably the variables that best explain the separation between classes of data. This is not always true but the transformation can still be useful if the subset of PCs that is selected is based on the ability to represent the class separation.

Whenever the output data is available, PLS tends to be better than PCA because it ranks
tranformed input variables based on how much they contribute to the understanding of the I/O relationship in both classification and regression.

Since you are just interested in visualizing general relationships between all variables without regard to spread or separation,

1. Standardize (help zscore) the variables to zero mean/unit variance
2. Project the results on all xj vs xi (j > i) planes
3. Transform to an orthogonal basis e.g., PCA
4. Repeat 2.

You may also want to cluster the data and color code the projections based
on cluster membership.

Hpe this helps.

Greg

Subject: Determine relative importance of original variables after performing PCA

From: Maureen

Date: 28 Jan, 2013 03:35:08

Message: 6 of 8

Initially, I am interested in dimension reduction as I wanted to reduce the plot down from 27 to either a 2 or 3 dimensional plot, that was why I decided to use PCA.

I also understand that in PCA the orthogonal transformed input gives the most spread and I thought it could be helpful in visualizing my data with maximum spread on the input variables. I am not doing any form of classification, just to clarify, I do not have classes in which I hope my data will sit into.

But after plotting, I realised some overfit issue and I thought maybe I used too many input variables. Thus, I decided to remove some of the variables, but I do not know which constitute more and which are less significant in which I can remove. I tried by removing the variables that produce smaller projection on the plot and the result did not seem to improve, instead worsen. Hence, I thought maybe I should find out which variables contribute most to the first 2 PCs for a 2D plot.

Is my line of thoughts right? So if that is right, will STEPWISEFIT as mentioned earlier in the discussion, help in finding the variable importance? Or would some other method be more effective?

Please advice. Thanks.

"Greg Heath" <heath@alumni.brown.edu> wrote in message <ke349g$gpu$1@newscl01ah.mathworks.com>...
> "Maureen " <maureen_510@hotmail.com> wrote in message <ke191s$t5t$1@newscl01ah.mathworks.com>...
> > "Greg Heath" <heath@alumni.brown.edu> wrote in message <kduohk$nrj$1@newscl01ah.mathworks.com>...
> > > "Maureen " <maureen_510@hotmail.com> wrote in message <kdrm1o$e57$1@newscl01ah.mathworks.com>...
> > > > I have 350 observation and 27 variables. So I want to use PCA for dimension reduction purpose to plot the 350 observation on a 2D plot, which effectively means that I will only be using PC1 and PC2. My purpose is just to see their relationship on a 2D plot.
> > > >
> > > > But how do I determine which of my original variables contribute most to the first two principle components and which of the variables are less important in which I can discard? I have saw many similar post online but have not come up with a solution. Where should I go from here?
> > >
> > > You have not indicated
> > >
> > > 1. whether the task is classification or regression
> > > 2. if any of the 27 are ouputs
> > > 3. the number of output variables
> > >
> > > The most important is 1 because PCA is inapprorpriate for classification. Therefore, I'll
> > > assume the task is regression.
> > >
> > My task is definitely not classification, but I am not sure if it is regression though (not too familiar with regression even after reading through some materials). My main objective is to plot out the 350 observation on a 2D plot and examine their relationship, in the sense if the observations are plotted closer together are they more similar based on the 27 features.
> >
> > > The next most important is 2 because PCA is only used to transform the input space.
> > > Therefore I'll assume 27 original input variables.
> > >
> > > 3 is still important becase it affects what algorithms/techniques should be used.
> > >
> > The 27 original input variables are inputs and I do not have an output variable.
> >
> > > > I have read through the documentation on feature selection, and some people >suggested using stepwisefit and other regression methods.
> > >
> > > Yes. The best criterion to use is one that optimizes a specific function of the output variables.
> > >
> > I have got no idea what should the criteria be since I do not have a specific function of the output variables.
> > I just want to examine if the relationship of the observations and the variables, to find out if the closer the points (observations) on the plot, if the higher the similarity between the observation.
> >
> > > >I do not have much background with regression, so do correct me if I am wrong. Based on my readings, I believe I would need to have a set of criteria to select the features, in >which I do not have an idea what should the criteria be.
> > >
> > > If it's regression, it is simple, just read the STEPWISEFIT documentation.
> > >
> > > If it is classification, then you should not be using PCA because there is no reason why
> > > PCA space should be preferred over the original.
> > >
> > This is not for classification purpose.
> >
> > Am I right to say that if I use STEPWISEFIT, the variable Y used would be PC1? And similarly I would have to do it for at least PC2 and PC3 as well, since I am interested to know how much the original variables contribute to PC1 and PC2 (PC3 maybe, depending on how many dimension I intend to go for later). And after which adding up the absolute values from the 2 results from STEPWISEFIT (PC1 and PC2) to determine which original variables contribute the most to the first two PCs.
> >
> > > > Also there should be a set of output, Y in order to perform stepwisefit. But for my case, all 27 variables are my features, which is the input so to speak and I do not have a set of output.
> > > >
> > > > So if not using regression, may I know where do I go from here, so that I can determine the importance of my original set of variables? In other words, I need to find the contribution of the original variables to PC1 and PC2.
> > > >
> > > > Appreciate any help/ suggestion. Thanks in advance!
> > >
> > > If you don't know what you want to optimize, then there is no reason to use PCA
> > > over the original variables.
> > >
> > > What do you want to do with the data?? What is your ultimate goal.
> > >
> > My main purpose is to optimise the plot so that observations that are generally similar are plotted closer and those that differs greatly are plotter further away from each other. But right now I seem to have a little bit of overfit problem whereby admits the similar observations, there are a few plots that are plotted at the wrong place.
>
> You seem rather confused.
>
> Originally you postulated that you want to do something based on PCA without fully understanding why you are using PCA.
>
> PCA is predominantly used to discover and rank the orthogonal directions in which the input data has the most spread without considering the task for which the data will be used.
>
> PCA is used in regression based on the idea that the orthogonal tranformed input variables with the most spread are probably the variables that best explain the spread in the output data. This is not always true but the transformation can still be useful if the subset of PCs that is selected is based on the ability to represent the spread in the output data.
>
> PCA is used in classification based on the idea that the orthogonal transformed input variables with the most spread are probably the variables that best explain the separation between classes of data. This is not always true but the transformation can still be useful if the subset of PCs that is selected is based on the ability to represent the class separation.
>
> Whenever the output data is available, PLS tends to be better than PCA because it ranks
> tranformed input variables based on how much they contribute to the understanding of the I/O relationship in both classification and regression.
>
> Since you are just interested in visualizing general relationships between all variables without regard to spread or separation,
>
> 1. Standardize (help zscore) the variables to zero mean/unit variance
> 2. Project the results on all xj vs xi (j > i) planes
> 3. Transform to an orthogonal basis e.g., PCA
> 4. Repeat 2.
>
> You may also want to cluster the data and color code the projections based
> on cluster membership.
>
> Hpe this helps.
>
> Greg

Subject: Determine relative importance of original variables after performing PCA

From: Greg Heath

Date: 28 Jan, 2013 11:48:07

Message: 7 of 8

PLEASE DO NOT TOP-POST: 'IT IS CONSIDERED A HEINOUS BREACH OF GOOGLE GROUP ETIQUETTE TO POST REPLIES ABOVE A PREVIOUS POST.'

"Maureen " <maureen_510@hotmail.com> wrote in message <ke4rlc$lvc$1@newscl01ah.mathworks.com>...
> Initially, I am interested in dimension reduction as I wanted to reduce the plot down from 27 to either a 2 or 3 dimensional plot, that was why I decided to use PCA.

PCA ranks variables according to spread. However, if you are not using techniques that depend on spread ranking, you cannot expect the display to provide any more information than the practical dimensionality of the data and the corresponding linearly dependent combinations of variables (negligibly small singular values).

If you are more interested in correlations among variables, then standardize the original variables to have zero-mean and unit-variance. The resulting covariance matrix is then the correlation coefficient matrix. In addition to providing the correlation information,
projections onto the new PC planes may yield useful info.

> I also understand that in PCA the orthogonal transformed input gives the most spread and I thought it could be helpful in visualizing my data with maximum spread on the input variables. I am not doing any form of classification, just to clarify, I do not have classes >in which I hope my data will sit into.

Then use unsupervised clustering. It can tell you if classes of data appear to be present.
 
> But after plotting, I realised some overfit issue and I thought maybe I used too many >input variables.

You don't have to guess

help cond
doc cond
help rank
doc rank
 
>Thus, I decided to remove some of the variables, but I do not know which constitute more and which are less significant in which I can remove. I tried by removing the variables that produce smaller projection on the plot and the result did not seem to >improve, instead worsen.

What result? If you can quantify a result goal or bound then maybe we can help.

>Hence, I thought maybe I should find out which variables contribute most to the first 2 >PCs for a 2D plot.
>
> Is my line of thoughts right?

I don't know what you are looking for. If you want to rank variables according to spread,
just sort var(X) where 27 = size(X,2) or use the diagonal of the covariance matrix.

>So if that is right, will STEPWISEFIT as mentioned earlier in the discussion, help in >finding the variable importance? Or would some other method be more effective?

STEPWISEFIT is for linear regression. If you have no specified output variables or classes, it is of no use.

It might help if you

1. Sort the variables w.r.t. variance
2. Explain what each variable does in real life
3. Posted the 27 sorted variances and resulting correlation coeffiient matrix in a form suitable for cutting and pasting.

Greg

Subject: Determine relative importance of original variables after performing PCA

From: Maureen

Date: 31 Jan, 2013 14:59:08

Message: 8 of 8

"Greg Heath" <heath@alumni.brown.edu> wrote in message <ke5ohn$sfv$1@newscl01ah.mathworks.com>...
> PLEASE DO NOT TOP-POST: 'IT IS CONSIDERED A HEINOUS BREACH OF GOOGLE GROUP ETIQUETTE TO POST REPLIES ABOVE A PREVIOUS POST.'
>
> "Maureen " <maureen_510@hotmail.com> wrote in message <ke4rlc$lvc$1@newscl01ah.mathworks.com>...
> > Initially, I am interested in dimension reduction as I wanted to reduce the plot down from 27 to either a 2 or 3 dimensional plot, that was why I decided to use PCA.
>
> PCA ranks variables according to spread. However, if you are not using techniques that depend on spread ranking, you cannot expect the display to provide any more information than the practical dimensionality of the data and the corresponding linearly dependent combinations of variables (negligibly small singular values).
>
> If you are more interested in correlations among variables, then standardize the original variables to have zero-mean and unit-variance. The resulting covariance matrix is then the correlation coefficient matrix. In addition to providing the correlation information,
> projections onto the new PC planes may yield useful info.
>
> > I also understand that in PCA the orthogonal transformed input gives the most spread and I thought it could be helpful in visualizing my data with maximum spread on the input variables. I am not doing any form of classification, just to clarify, I do not have classes >in which I hope my data will sit into.
>
> Then use unsupervised clustering. It can tell you if classes of data appear to be present.
>
> > But after plotting, I realised some overfit issue and I thought maybe I used too many >input variables.
>
> You don't have to guess
>
> help cond
> doc cond
> help rank
> doc rank
>
> >Thus, I decided to remove some of the variables, but I do not know which constitute more and which are less significant in which I can remove. I tried by removing the variables that produce smaller projection on the plot and the result did not seem to >improve, instead worsen.
>
> What result? If you can quantify a result goal or bound then maybe we can help.
>
> >Hence, I thought maybe I should find out which variables contribute most to the first 2 >PCs for a 2D plot.
> >
> > Is my line of thoughts right?
>
> I don't know what you are looking for. If you want to rank variables according to spread,
> just sort var(X) where 27 = size(X,2) or use the diagonal of the covariance matrix.
>
> >So if that is right, will STEPWISEFIT as mentioned earlier in the discussion, help in >finding the variable importance? Or would some other method be more effective?
>
> STEPWISEFIT is for linear regression. If you have no specified output variables or classes, it is of no use.
>
> It might help if you
>
> 1. Sort the variables w.r.t. variance
> 2. Explain what each variable does in real life
> 3. Posted the 27 sorted variances and resulting correlation coeffiient matrix in a form suitable for cutting and pasting.
>
> Greg

First and foremost, I would like to apologize for posting above a previous post. It was an overlook on my part. I was typing and I forgot to reshuffle the content below. Thanks for pointing out. I will pay more attention to the details in future.

 I am still working on some details now. Will follow up with the post soon. Thanks.

Tags for this Thread

No tags are associated with this thread.

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us