Random Sorting within a set of columns

4 views (last 30 days)
Hi,
I have a matrix (ParmXStrtMatrix) with 1 million rows and 4 columns (providing names for easy reference) :
Parm1,Parm2,XStrtPoint1, XStrtPoint2
Let's assume I have 10,000 different combinations of parameters Parm1/Parm2. Within each parameter combination, lets suppose I have 100 Start points XStrtPoint1/XStrtPoint2.
How can I randomly sort by parameter combination so that in every set of 10K rows starting from the top there is only one instance of a particular parameter combination and any one random start point combination attached to that particular parameter combination?
Background (for context): - The parameter columns in the matrix serve as inputs to definition of an anonymous function. I'm running fmincon in a loop across all the rows in order to minimize the function for different sets of parameters. For any particular combination of parameters I also have the remaining 2 columns as start points for suppyling to fmincon.
Since each individual run of fmincon takes ~2 seconds (using symbolic function for high precision), so it does take almost 3 weeks...In order to not wait till complete execution, I can pause the algorithm after let's say 2 days, look at results and depending on its quality I can be happy and quit or let it continue from the paused point.
Presently I use,
ParmXStrtMatrix = ParmXStrtMatrix(randperm(size(ParmXStrtMatrix,1)),:)
Only issue with above approach is, I would like to get representative results from fmincon whenever I pause it. But with randperm there is no specification around sorting randomly by groups so there will not be equal representation from each group. For example if at the time of pausing I have already processed 100K rows then I would like to have 10% of the start points from each of the parameter combinations.
One potentially correct (but very inefficient and hardcoded) approach is to use a loop for each of the 10,000 parameter combination and within the loop use randperm to randomly sort the Xstart points and then creating a new column to give numbers from 1 to 100 corresponding to the randomly sorted 100 X start points. Finally outside the loop sorting using the newly created column.
Any ideas appreciated.
Thanks
  2 Comments
Yu Jiang
Yu Jiang on 13 Aug 2014
Hi Hari
It would be easier to understand what you would like to acheive, if you could provide a simple example.
-Yu Jiang
Hari
Hari on 13 Aug 2014
Hi Yu,
Please find an example herewith. Let's assume that following are the 4 columns in the matrix (sample data typed in excel) and 12 rows as given below under "Initial"
In the above data, we have 4 unique combinations of Parm1 and Parm2 (indicated by different colors). Given a unique parameter combination there are 3 distinct X start point combinations.
I want to re-arrange the 12 rows in such a way that first 4 rows contain only one instance of each parameter combination (The set of X start points that get selected has to be random). Same for second set of 4 rows and so on.
One valid re-arrangement/sorting can be seen in "Final"
Hope this is helpful
Thanks, Hari

Sign in to comment.

Accepted Answer

Roger Stafford
Roger Stafford on 14 Aug 2014
The following code is a bit awkward and could be slow for a 1000000 x 4 matrix, but if you are taking three weeks to use it, I doubt if rearranging it initially is an important consideration.
I'm going to call your matrix 'P' for brevity instead of 'ParmXStrtMatrix'.
The use of 'randperm' in the first line below is meant to make the ordering in your "start points" random if that is desired rather than being determined by the initial ordering in P.
P = P(randperm(size(P,1)),:); % <-- Optional
[~,~,ib] = unique(P(:,1:2),'rows');
[t,p] = sort(ib);
f = find([true;diff(t)~=0;true]);
n = length(f)-1;
f1 = f(1:n);
f2 = f(2:n+1);
P2 = zeros(size(P));
k = 0;
b = true;
while b
b = false;
for ix = 1:n
if f1(ix) < f2(ix)
k = k+1;
P2(k,:) = P(p(f1(ix)),:);
f1(ix) = f1(ix)+1;
b = true;
end
end
end
P2 contains all the rows that are in P, but in a different ordering. Each succeeding "block" in P2 will have one copy of each possible parameter pair from the first two columns, along with single pairs of "start points" in the second two columns. Different pairs of "start points" are taken in different blocks for like parameter pairs.
  1 Comment
Hari
Hari on 14 Aug 2014
Thanks for the code and explanation. As per profiler it takes exactly 2 minutes and based on some quick QC, the results are as expected!. Appreciate timely help.

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!