How does MATLAB REALLY pass arguments to functions?

I know that MATLAB might pass non-handle objects arguments "by reference" for optimization's sake. For example when argument isn't changed in function. But in practice it seems way more complicated to me.
Let's see the function for recursive quick sort:
function input = quick_sort(input, s, e)
%--RECURSION BREAK---
if (e - s) < 1
return;
end
%--PARTITION (This part isn't really related to the question)---
i = s;
j = e;
m = input(s + floor((e - s) / 2));
while (true)
while (input(i) < m)
i = i + 1;
end
while (input(j) > m)
j = j - 1;
end
if i < j
[input(i), input(j)] = deal(input(j), input(i));
i = i + 1;
j = j - 1;
else
break;
end
end
%--RECURSIVE CALLS---
input = quick_sort(input, s, i - 1);
input = quick_sort(input, j + 1, e);
end
As I understand, it "should" copy whole vector on each rucursive call. And it obviously "should" take additional time.
While this code does the same, but uses special wrapper handle-class for passing an array "by reference":
function input = quick_sort_h(input, s, e)
%--RECURSION BREAK---
if (e - s) < 1
return;
end
%--PARTITION---
% Same as in previous code snippet, but uses input.arr(...) instead of input(...)
%--RECURSIVE CALLS---
input = quick_sort_h(input, s, i - 1);
input = quick_sort_h(input, j + 1, e);
end
Wrapper class:
classdef largematrix < handle
properties
arr %Bruh
end
end
But when I measure time taken by these two algorithms, they show comparable results (handle one is actually even slower):
(There also a "subarray copying", it's when I copy only part of an array that i need to pass recursively)
So am I correct to assume that MATLAB somehow copies the array initially and then passes it by reference internally (or something like that)? Or is just handle class that slow?
I also haven't find any complete explanation how MATLAB behaves internally in such situations, so would also appriciate if someone explain it thoroughly.

 Accepted Answer

By the way, at least in the online version, changing the line
[input(i), input(j)] = deal(input(j), input(i));
to
[input(i), input(j)] = swap(input(i), input(j));
% Yes here the same order! i,j <-- i,j
...
function [a,b] = swap(b, a) % This function has no body!
end
reduces the runtime by a factor 10 for sorting 1e5 random numbers.

13 Comments

Thank you for the links. The only conclusion I see is that I should write my code normally and let MATLAB optimize it as it wants (kinda hard thing to do, since I came from c++).
And about your swap remark:
1) Is it necessary for swap function to be local?
2) Why such a difference?
A local function (in the same M-file) is processes slightly faster than a function in the same directory, which has a tiny benefit to functions in other directories. This is the order Matlab searches for a matching function. Using a nested function was 20 times slower, but this might depend on the Online interpreter.
You could try the old style swapping also:
tmp = input(i);
input(i) = input(j);
input(j) = tmp; % Faster?!
I'm not sure why this is so much faster. deal has a certain overhead, because varargin and varargout are not really cheap also. (See: edit deal)
Relying on Matlab's magic JIT acceleration is an emotional challenge. We have to change programming paradigms frequently. It took years to trust the rumor "loops are slow in Matlab" and "vectorizing is pure gold". But modern CPUs got so fast, that accessing the memory beyond the level-1or -2 caches is the bottleneck. A lot of codes profit from dull loops now. The JIT has learnt to recognize inplace usage of variables and hyperthreading can accelerate the processing, if no functions are called, which use multi-threading: With load on only one core, the other cores can spend their thermal credit.
Optimizing code was a craft and now it gets art combined with a dicsussion about thermal limits and their effects to with worldwide climate change. ;-)
Under Matlab R2018b:
% Elapsed time is 14.633630 seconds. Original with DEAL
% Elapsed time is 0.551816 seconds. tmp=a, a=b, b=tmp
% Elapsed time is 0.442130 seconds. function [b,a]=SWAP(a,b)
% Elapsed time is 0.048593 seconds. Builtin SORT
I cannot explain this exhaustively. Passing arguments was not the bottleneck of the original code, but deal, especially the varargout command according to the profiler.
Amazing neat trick:
[input(i), input(j)] = swap(input(i), input(j));
% Yes here the same order! i,j <-- i,j
...
function [a,b] = swap(b, a) % This function has no body!
end
Could you bench with this swap method?
input([i j]) = input([j i]);
x = rand(1, 1e6);
tic
y = sort(x);
toc % Builtin SORT
Elapsed time is 0.056039 seconds.
tic
y = quick_sort1(x, 1, 1e6);
toc % Original with DEAL
Elapsed time is 4.484404 seconds.
tic
y = quick_sort2(x, 1, 1e6);
toc % tmp=a, a=b, b=tmp
Elapsed time is 0.263274 seconds.
tic
y = quick_sort3(x, 1, 1e6);
toc % SWAP
Elapsed time is 0.262050 seconds.
tic
y = quick_sort4(x, 1, 1e6);
toc % input[i,j] = input[j,i]
Elapsed time is 4.858321 seconds.
This looks like indexing with a vector is a shot in the knee of the JIT acceleration. I'll run it in my local R2018b in the evening.
I've replace while true by while 1, because true is a funtion with a certain overhead, while 1 is a constant. This does not affect the runtime measurably.
function input = quick_sort1(input, s, e)
if (e - s) < 1
return;
end
i = s;
j = e;
m = input(s + floor((e - s) / 2));
while 1
while (input(i) < m)
i = i + 1;
end
while (input(j) > m)
j = j - 1;
end
if i < j
[input(i), input(j)] = deal(input(j), input(i));
i = i + 1;
j = j - 1;
else
break;
end
end
input = quick_sort1(input, s, i - 1);
input = quick_sort1(input, j + 1, e);
end
function input = quick_sort2(input, s, e)
if (e - s) < 1
return;
end
i = s;
j = e;
m = input(s + floor((e - s) / 2));
while 1
while (input(i) < m)
i = i + 1;
end
while (input(j) > m)
j = j - 1;
end
if i < j
tmp = input(i);
input(i) = input(j);
input(j) = tmp;
i = i + 1;
j = j - 1;
else
break;
end
end
input = quick_sort2(input, s, i - 1);
input = quick_sort2(input, j + 1, e);
end
function input = quick_sort3(input, s, e)
if (e - s) < 1
return;
end
i = s;
j = e;
m = input(s + floor((e - s) / 2));
while 1
while (input(i) < m)
i = i + 1;
end
while (input(j) > m)
j = j - 1;
end
if i < j
[input(i), input(j)] = swap(input(i), input(j));
i = i + 1;
j = j - 1;
else
break;
end
end
input = quick_sort3(input, s, i - 1);
input = quick_sort3(input, j + 1, e);
end
function [b,a]=swap(a,b)
end
function input = quick_sort4(input, s, e)
if (e - s) < 1
return;
end
i = s;
j = e;
m = input(s + floor((e - s) / 2));
while 1
while (input(i) < m)
i = i + 1;
end
while (input(j) > m)
j = j - 1;
end
if i < j
input([i, j]) = input([j, i]);
i = i + 1;
j = j - 1;
else
break;
end
end
input = quick_sort4(input, s, i - 1);
input = quick_sort4(input, j + 1, e);
end
Thanks @Jan I arrive with similar timings with R2022a on my PC.
To me it seems the JIT is able to do inplace when single element of array are replaced (on lhs), but as soon as more than 1, then MATLAB makes deep copy of the original array.
MATLAB behavior is tricky to understand, at least to me.
@Bruno Luong: It is the other way around: Your intuitive knowledge about the efficient solution of a problem is tricky to understand, at least for Matlab's JIT.
This is an inherent problem of interpreting languages, which tries to be smart enough to guess, what the users want. In C or Assembler creating deep copies cannot happen unintentionally. But it would take a week to debug an Assembler tool for a quicksort, which accepts U/INT8/16/32/64&Single&Double&Unicode characters.
I remember vividly few years ago TMW folks tell us: "don''t worry about benchmarking for speed, just code like you feel and let us optimize behind the scene".
That won't works as far as I can tell.
@Bruno Luong: perhaps you were thinking of these blogs about the JIT engine:
"We recommend not writing specifically for the JIT since the JIT is constantly evolving. You should though keep in mind the general good MATLAB-coding practices, such as those listed above"
"Keep your code natural. Don't write unnatural code just to take advantage of this."
I beg to disagree; I always want to know how things work under the hood, not only for MATLAB but everthing. Never like object-programing for the same reason.
This is just me.
This is also the reason this question and Jan's answer, James Tursa various's post are fascinating for me. If you don't care then I would say too bad for you.
@Stephen: "We recommend not writing specifically for the JIT since the JIT is constantly evolving." I understand the idea, but it does not match my workflow. I maintain some programs used for clinical decision making. The reliability of the results is much more important than modern features of Matlab. When the Matlab version is changed, a lot of time and work is required to validate the results. The code has some 100'000 lines of Matlab and C code and the results for some 1000 of patients must be checked. Therefore the labs moved from Matlab 6.5 to 2009a and to 2018b only.
On the other hand the tools are used in the daily routine during an examination. It causes stress, if the patient and the measuring person wait more than 5 seconds for the display of a result.
In consequence optimizing the code for a specific version of the JIT is useful and important in my case. I cannot wait for an evolving of the JIT.
When I optimize code for the JIT, I keep the different versions in the unit-test functions. Then it is easier to check, if another version is much faster in another Matlab release. Without excessive unbit-testing it would be impossible to keep the overview.
Many important patterns are not affected by the JIT version, like pre-allocation or avoiding repeated calculations inside loops. But e.g. for writing efficient C-Mex functions, a documentation of the copy-on-write process would be very useful.
I think we see the difference here between the rule and the exception. 'normal people' should not write for the JIT because they will probably make mistakes that will be hard to catch. Optimizing for JIT (or EE, or whatever the exact term) is a high risk high reward undertaking. For most things Matlab is fast enough.
There are legitimate reasons for diving into details. That is one of the ways Yair makes a living, and apparently it is also important for Jan's job. I will still tow the party line that you should not be optimizing for the JIT. You should write logical, predictable code structures. That way Mathworks engineers can optimize the next JIT for that style of writing.
There are always exceptions. The difficulty is in figuring out whether your situation is one of them.
@Rik: I agree. Premature optimization is a common programming anti-pattern.
Matlab is very efficient for writing stable code and reducing the time for debugging. The processing speed has a lower priority. If I had written the tool for clinical decision making completely in C, it would run with perhaps the double speed, but the debugging and a migration to another compiler or graphics library would take half a year. In my case I can let physiotherapists with limited Matlab skills write their on small add-ons without the danger to let the computer carsh.This would be impossible in C.
So for my total purpose, Matlab is the best choice. And if a specific detail of the JIT really matters, it is useful to contact the corresponding programmer personally.

Sign in to comment.

More Answers (0)

Products

Asked:

on 27 Apr 2022

Commented:

Jan
on 2 May 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!