What extra data is stored by an anonymous function?

12 views (last 30 days)
I have learned recently, that anonymous functions can carry around large amounts of extra data from the workspace that they don't use, even if this data is created after the anonymous function. The following example, together with the FUNCTIONS comand, illustrates this,
function fun=test
a=1;
b=2;
c=3;
fun=@(x)x+b+a;
a=7;
b=rand(1000);
c=5;
q=3;
r=4;
end
Now, back in the base workspace, when I apply the functions() command to 'fun', I see
>> fun=test; s=functions(fun); s.workspace{:}
ans =
b: 2
a: 1
ans =
fun: @(x)x+b+a
a: 1
b: [1000x1000 double]
c: 3
I would like to understand (with official documentation if possible) what rules anonymous functions use to decide what data to carry around. The above seems to suggest that s.workspace{1} will always contain the external variables and their values that the anonymous function actually uses. Meanwhile s.workspace{2} seems to contain updates to variables that came into scope before fun was defined. Am I correct that these are the rules? But if so, then why, in the above, does s.workspace{2} contain an update to b, but not to a and c?

Accepted Answer

Philip Borghesani
Philip Borghesani on 10 Feb 2014
Edited: Philip Borghesani on 10 Feb 2014
I will start my answer with a quote from the documentation of functions:
The functions function is used for internal purposes, and is provided
for querying and debugging purposes. Its behavior may change in
subsequent releases, so it should not be relied upon for programming
purposes.
The output from this function for this code WILL change in a future version of MATLAB and can change in your current version:
>> fun=test; s=functions(fun); s.workspace{:}
ans =
b: 2
a: 1
ans =
fun: @(x)x+b+a
a: 1
b: [1000x1000 double]
c: 3
>> feature accel off
>> fun=test; s=functions(fun); s.workspace{:}
ans =
b: 2
a: 1
ans =
fun: @(x)x+b+a
a: 7
b: [1000x1000 double]
c: 5
q: 3
r: 4
workspace{2} will contain the final state of the function workspace at exit but might not take into consideration non-visible changes to that workspace that are optimized by the jit.
The contents of workspace{2} should be considered completely version dependent and subject to removal or being inconsistent due to current and future optimizations.
  3 Comments
Philip Borghesani
Philip Borghesani on 11 Feb 2014
Anything returned by functions or even the existence of the function functions is subject to change but the contents or existence of workspace{2} is known to change. There is a slight difference there.
One note on the contents you are seeing of workspace{2}. This is not a copy of variables in the function but a pointer to the actual workspace. Nested functions or multiple anonymous functions will see the same values in workspace{2} even if they are changed by a nested function so the memory used is not usually noticed and there is little performance overhead caused by this data as long as parfor is not in the equation.
Matt J
Matt J on 11 Feb 2014
Edited: Matt J on 11 Feb 2014
I agree that that's true as long as they anonymous function is used transiently, i.e., that it goes out of scope in the same workspace where it was created. Admittedly, too, that's what you do most of the time.
However, parfor isn't the only exception to this, I don't think. If you return an anonymous function handle from a function to a calling workspace, it will prevent workspace{2} variables from going out of scope and its (potentially large) memory from being released. Similarly, when saving to a .mat file, deep copies will be made.
I think most users know how to navigate this when it comes to workspace{1} data. They know that the anonymous function uses that and so that it must be kept stored somewhere. However, workspace{2} data is data that anonymous functions never use, and the documentation doesn't warn that it is there. Thus, it seems very easy to lock large amounts of memory by accident.
I still do wonder why anonymous functions care about and keep track of workspace{2}...

Sign in to comment.

More Answers (2)

Matt J
Matt J on 4 Mar 2014
Edited: Matt J on 4 Mar 2014
I still do wonder why anonymous functions care about and keep track of workspace{2}...
Assuming workspace{2} really has no purpose, I've posted this cleaning tool as a potential remedy
It strips away workspace{2} data leaving only workspace{1}, which presumably contains all/only the variables that the function needs.
  3 Comments
Matt J
Matt J on 6 Mar 2014
Pretty tricky, Philip. Can I assume my function will act as intended if no nested functions (with externally scoped variables) are used by the anonymous function? If not, can you tell me more about when workspace{2} is used?
I can't imagine scenarios where someone would want to save an anonymous function to disk if it relied on externally scoped variables.
Philip Borghesani
Philip Borghesani on 6 Mar 2014
I can't guarantee it but I believe you are correct, workspace{2} is only needed with nested functions.

Sign in to comment.


James Tursa
James Tursa on 6 Mar 2014
Edited: James Tursa on 6 Mar 2014
This topic has already been addressed in this thread:
For convenience I will repeat my answer here:
When you create an anonymous function handle, all variables that are not part of the argument list are regarded as constants. Shared data copies of them are made at the time you create the function handle and actually stored inside the function handle itself. They retain their value and use up memory even if you change the source of the "constant" later on in your code. E.g., if you had done this:
A = v;
f = @(x) A*x; % could have done f = @(x) v*x; and got same result
A = 2*v;
the last line has no effect on the function handle f output (EDIT). Note that if A happens to be a very large variable, its memory effectively gets "locked up" inside f and can only be cleared by clearing (or re-defining) f. E.g., in the above code snippet, the 2nd line will put a shared data copy of A inside of f. The 3rd line will cause this shared data copy to essentially become a deep data copy (it gets unshared with A at that point).
Bottom line is, once the anonymous function gets created the standard rules for shared data copies applies. At least that was the behavior I observed last year. I may need to re-examine this ...
  6 Comments
Matt J
Matt J on 6 Mar 2014
Edited: Matt J on 6 Mar 2014
Also, to be clear, everything I have written above is still correct if my words "has no effect on the function handle f" are replaced by "has no effect on the function handle f output"
Well, then I agree with all of it. However, it seems to pertain to a slightly different question from the one I posted.

Sign in to comment.

Categories

Find more on Get Started with MATLAB in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!