Matrix element counting by rows, histograms, etc.

2 views (last 30 days)
Hi -- this is kind of tricky. I am using a 2D array (matrix) to store some information. The first n-1 columns hold indexes of variables I'm dealing with -- in other words think of them as names not numbers. The last column contains a count for the number of times I've seen the variables in the preceding n-1 columns. There are many rows with such counts, all with the same number of variables. So if I'm counting single variables it could look like this:
1 3
2 2
3 1
Meaning I've seen variable 1 three times, variable 2 twice, and variable 3 once. If I'm counting two variable combos, it could look like this:
1 2 4
1 3 3
2 4 3
3 4 1
which means I've seen variables 1 and 2 together four times, I've seen variables 1 and 3 together three times, and I've also seen variables 2 and 4 together three times, and finally I've seen variables 3 and 4 together once.
Similar structures can exist for 3, 4, 5 variables, maybe more.
What I need help with is turning these structures into a single vector of variables repeated for every time they've been counted. So for that first example with single variables the vector would contain:
1 1 1 2 2 3
For that second example the vector would contain:
1 1 1 1 2 2 2 2 1 1 1 3 3 3 2 2 2 4 4 4 3 4
These vectors will allow me to do some histogram type analysis, but I'm not sure how to replicate these variables into the new vector based on the counts in that last column. Any help would be appreciated.
PS - for n variables followed by a count column, the ACTUAL data structure I'm using has n extra columns inserted between the variables and the counts (i.e. 2n+1 columns). The information in those columns isn't relevant to the question, but it implies the following. For n=1 variable, the structure has three columns: the first for the variables, the second for the extra information not relevant to this question, and the third for the count. For n=2 variables, the structure has five columns: the first two for the variables, the second two for the extra information not relevant to this question, and the fifth for the count. For n=3 variables it has seven columns -- 3, 3 and 1...
Thanks in advance!
Mike
  1 Comment
Mohsin Shah
Mohsin Shah on 15 May 2019
Quite late but I need to ask you how you did it in the second example - counting the occurences of rows? I need to apply this in my work.

Sign in to comment.

Accepted Answer

Daniel Shub
Daniel Shub on 11 May 2012
It is not the fastest and you probably could preallocate z if you wanted to. It also ignores your ps, but getting rid of the extra columns shouldn't be hard
z = [];
for ii = 1:size(x, 1)
y = repmat(x(ii, 1:(end-1)), x(ii, end), 1);
y = reshape(y, 1, numel(y));
z = [z, y];
end
To deal with your ps instead of
x(ii, 1:(end-1))
you want to stop where the data stops.

More Answers (1)

Sean de Wolski
Sean de Wolski on 11 May 2012
Why would you want to do this? You have all of the information you need in a nice condensed easy to understand package...
  2 Comments
Daniel Shub
Daniel Shub on 11 May 2012
I am guessing it is for an anovan or some other statistical analysis.
Michael
Michael on 11 May 2012
I need to create some text-based reports on these nice, condensed packages :)

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!