Why is sort() taking longer to resolve on multiples of 512?

3 views (last 30 days)
I am using the sort() funtion to sort the columns of massive matrixes. By chance, I discovered that the sort algorithm took a lot longer than expected to run when I used a matrix with 256^3 columns than when using 255^3. After some experimentation I was doumbfunded to discover that using 257^3 columns was also much faster than 256^3. By now, I was really confused, so I i did a little experiment:
for i = 1:10000
test = rand(i,100);
tstart = tic;
sort(test,2);
out(i) = toc(tstart);
end
plot(1:10000,out)
ylabel('Time elapsed/s')
xlabel('nbr of columns')
That gave me this little nice plot, which looks something like an emission spectrum:
The major peaks seem to be at multiples of 512, i.e. 512,1024,1536,2048 etc. Can anyone explain this behaviour? As the peaks are at familiar "digital"-numbers I am guessing this has something to do with the algorithm-implementation.

Answers (3)

Sean de Wolski
Sean de Wolski on 8 Sep 2014
Edited: per isakson on 8 Sep 2014
I don't think sort has anything to do with the behavior you are seeing. Look at what happens if you preallocate out
out = zeros(1000,1);
for i = 1:1000
test = rand(i,100);
tstart = tic;
sort(test,2);
out(i) = toc(tstart);
end
plot(1:1000,out)
ylabel('Time elapsed/s')
xlabel('nbr of columns')
Now the out array is not growing on each iteration, you do not see spikes (just expected fluctuations from sorting random numbers).
More information here:
  3 Comments
Magnus
Magnus on 8 Sep 2014
i7-3960X 3.30GHz CPU and 64 GB RAM running Win 7pro 64-bit

Sign in to comment.


per isakson
per isakson on 8 Sep 2014
Edited: per isakson on 8 Sep 2014
I have reproduced your result with R2013a,64bit,Win7,8GB on a five year old vanilla desktop (Processor Intel® Core™2 Quad CPU Q9400 @ 2.66GHz 2.66 GHz).
out(1,10000) = nan;
for ii = 1:10000
test = rand(ii,100);
tstart = tic;
sort(test,2);
out(ii) = toc(tstart);
end
figure
plot(1:10000,out)
ylabel('Time elapsed/s')
xlabel('nbr of columns')
ff = filtfilt( fir1( 20, 0.03), 1, out );
find( (out-ff) > 0.5*ff )/256
ans =
Columns 1 through 9
1.0000 2.0000 3.0000 3.6680 4.0000 5.0000 5.7695 6.0000 6.5078
Columns 10 through 18
7.0000 7.5000 8.0000 8.0898 9.0000 9.6680 10.0000 11.0000 12.0000
Columns 19 through 27
13.0000 14.0000 15.0000 16.0000 17.0000 18.0000 19.0000 20.0000 21.0000
Columns 28 through 36
22.0000 23.0000 24.0000 25.0000 26.0000 27.0000 28.0000 29.0000 30.0000
Columns 37 through 45
31.0000 32.0000 33.0000 34.0000 35.0000 36.0000 37.0000 38.0000 39.0000

José-Luis
José-Luis on 8 Sep 2014
Edited: José-Luis on 8 Sep 2014
Maybe it has something to do with data padding. If you use single precision format, then the jumps come at multiples of 1024. (Shamelessly plagiarizing Per's code):
out(1,10000) = nan;
for ii = 1:10000
test = single(rand(ii,100)); %just changed this
tstart = tic;
sort(test,2);
out(ii) = toc(tstart);
end
figure
plot(1:10000,out)
ylabel('Time elapsed/s')
xlabel('nbr of columns')
ff = filtfilt( fir1( 20, 0.03), 1, out );
find( (out-ff) > 0.5*ff )/256
EDIT
On second thought, it might have nothing to do with Matlab but with your processor's cache size. Maybe 512 bytes happens to be a multiple of some cache-level size and when you fill it completely, it provokes many cache misses and therefore a jump in decreasing performance.
To find the culprit would be difficult without knowing how sort() is implemented, and the results might be different from processor to processor.
  2 Comments
Adam
Adam on 8 Sep 2014
The same spike seems to occur with the median or even min or max functions too
Magnus
Magnus on 9 Sep 2014
Edited: Magnus on 9 Sep 2014
@José Yes, that is what I am thinking too, but it seems a bit strange. I mean, basically the sort algorithm as used in my example sorts a number of independent vectors of data, stored as columns in a matrix. Sorting 257 such columns takes much less time than sorting 256 columns. Well, I agree, probably this is some low-level issue which is difficult for us mortals to pin down.
After analyzing the data in more detail it seems that the jump in processing time is occuring (to a greater or lesser degree) at every multiple of 64, with multiples of 512 and especially 1024 being extra prominent.

Sign in to comment.

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!