You are now following this channel
- You will see updates in your content feed.
- You may receive emails, depending on your notification preferences.
You are now following this topic
- You will see updates in your content feed.
- You may receive emails, depending on your notification preferences.
What should go in a next-generation MATLAB X?
Latest activity Reply by David Young
on 13 Mar 2024
Let's say MathWorks decides to create a MATLAB X release, which takes a big one-time breaking change that abandons back-compatibility and creates a more modern MATLAB language, ditching the unfortunate stuff that's around for historical reasons. What would you like to see in it?
I'm thinking stuff like syntax and semantics tweaks, changes to function behavior and interfaces in the standard library and Toolboxes, and so on.
(The "X" is for major version 10, like in "OS X". Matlab is still on version 9.x even though we use "R20xxa" release names now.)
What should you post where?
Next Gen threads (#1): features that would break compatibility with previous versions, but would be nice to have
@anyone posting a new thread when the last one gets too large (about 50 answers seems a reasonable limit per thread), please update this list in all last threads. (if you don't have editing privileges, just post a comment asking someone to do the edit)
359 Comments
The first change I would make would be to scrap the special treatment of Nx1 and 1xN matrices. These are given special status (as "column vectors" and "row vectors"), which must, I suppose, be helpful sometimes, but in practice it's confusing (that is, it confuses me) and makes general code much more complex than it should be.
For example, if you write c = a(b) where the values of all the variables are numeric arrays, the rule is that c will be the same shape as b except when a or b is a column vector and the other is a row vector. An exception to a general rule is, as a general rule, a bad thing. One that affects as fundamental an operation as indexing an array is a very bad thing.
Another exception: size truncates trailing 1s except in the case of column vectors, and ndims returns 2 for column vectors. General code therefore has to handle this case specially. For an example of code that could be simpler without these complexities see exindex.
It makes for messy code in other ways: my arguments blocks are peppered with (1,1) to indicate scalars, when (1) would be easier to read and should be sufficient.
It's not as if row vector and column vectors are always treated the same as each other. Matrix multiplication distinguishes between them of course, as does the loop construct for. Making them a special category, when they're actually just different shapes of arrays, simply adds complexity.
Can anyone make a case for keeping this peculiarity?
Some simple things would be nice:
- counter += 1; salery *= 2 % operator assignment, or whatever it is called
- y = (x < 0) ? 3 : 2*x; % ternary operator
y = (x < 0) ? 3 : 2*x
Would that be:
y = zeros(size(x), 'like', x); %option 1
y(x<0) = 3;
y(~(x<0)) = 2*x(~(x<0));
or would it be
y = zeros(size(x), 'like', 3); %option 2
y(x<0) = 3;
y(~(x<0)) = 2*x(~(x<0));
or would it be
if any(x < 0, 'all') %option 3
y = 3;
else
y = x;
end
Or would it be
if all(x < 0, 'all') %option 4
y = 3;
else
y = x;
end
or should it be
if isempty(x)
y = x;
else
if x(1) < 0 %option 5
y = zeros(size(x), 'like', 3);
else
y = zeros(size(x), 'like', x);
end
y(x<0) = 3;
y(~(x<0)) = 2*x(~(x<0));
end
Option 1 has a result that is always class(x); option 2 has a result that is class double (because 3 is double). Option 3 and Option 4 force a scalar test and have a class that depends upon which way the test gets answered. Option 5 has a result that is the same class as the first result.
Suppose you had
y = (x < 0) ? ones(1,10,'uint8') : x
and suppose that x is non-scalar that does not happen to be 1 x 10. Does the statement make sense? It potentially does if, as is consistent with if and while you interpret it as a test over "all" of x . But that would only be consistent if you treat the ?: operator as syntax that can only occur as a syntactical shortcut for if/elseif with an assignment always needed. If you treat ?: as operators then for consistency you need the ?: test to be vectorized, more like
(x<0).*3 + (~(x<0)) .* x
(except that expression doesn't work for cases where x might contain infinity, since infinity times 0 is nan rather than 0)
Yes this is what I mean.
I would suggest expanding your answer with an explanation of what your syntax should do, e.g.:
% counter += 1; salery *= 2
counter = counter + 1;
salery = salery * 2;
and
% y = (x < 0) ? 3 : 2*x;
if x < 0
y = 3;
else
y = 2*x;
end
Is this indeed the correct interpretation?
Insert 'parfor' option into splitapply( ), grouptransform( ) or create separate parallel versions of those two functions.
Right now the groupbased functions run through groups with for-loop. It's very slow for data with large number of groups. When the said data set was run through with parfor-loop, it was 5 to to 10 times faster.
functional programming hiding looping details makes the coding process closer human cognition. And parfor is a really powerful beast. The combination of these two infowar-horses will make Matlab take a decise lead ahead of sluggish reptile.
My wish list:
- A real, beautiful dark theme
- Improving the appearance of figures. Reduce padding around subplots, set default axis and tick mark color to black, adjust default linewidth and font sizes to be a bit larger. In general, try to make figures made quickly with default settings look better.
- Multi-start options for all solvers in the optimization/curve fit toolbox.
- Consistent arguments for plotting functions. I think some still use different capitalization schemes (like "LineWidth" vs "linewidth").
This is exactly what I was thinking of. I use both methods to change plot attributes. Maybe I should pick one method and stick with it...
When you use name/value pairs for the plotting functions, then the comparisons done are case-insensitive. The same is true when you use set() calls.
When you use dot-syntax like
h.LineWidth = 2.5;
then the comparisons done are case sensitive
At the moment I do not know whether you need to use case-sensitive when you use name=value calling syntax.
The current thread is fairly close to my arbitrary suggested limit of 50 answers. If you think it makes more sense to start a new thread, go ahead. I'm happy to start a new one, but you can also do it and add it to the list of threads (don't forget to edit the other threads as well).
In an attempt to discourage new answers (while waiting for the ability to soft-lock threads), I have started editing the older questions by putting '[DISCONTINUED]' at the start of the question.
This wonderful thread is becoming unwieldy and slow to respond to editing on both laptop and desktop for me. If others are having the same problem, perhaps this Question should be locked, at least for new Answers (is that possible?), and a new Question opened for new Answers?
My wish list, not about code improvement but about official tutorials.
- a tutorial of using splitapply to take advantage of parallel computation.
- a tutorial of assignment and indexing involving comma-seprated list, cell array. It not only shows what works, but also explains what syntax would go wrong, and why it go wrong.
For example, x = ["a", "b"] is a 1x2 string array. But then x(:) becomes a column vector, then x{:} is a comma-seprated list; then [x{:}] is a character vector 'ab'. Such 'delicate' usage is the biggest bottleneck for my coding process. @Stephen23 has written a tutorial of comma-separated list. I hope Mathworks staff can take from there to expand it, covering the use cases of table. For example, if T is a table. T(1,:) is a single-row table. But then T{1,:} sometimes works if variables' data type can be lumped together; sometimes fails if variables have mixed data types. But then when it works, say, all table variables are 'string'. Why then T{1,:} is a string array, intead of a comma-separated list? Two similar syntaxes, x{:} and T{1,:}, have two different semantic meaning. That really causes workflow jam in my coding.
@Simon - I do cast some of my table variables to categorical, and have also noticed things go slower than I expected with them. (Kinda the whole point of categoricals is that they're small and fast compared to strings, right?) I have no idea what would cause categoricals in table variables to go slow.
@Andrew Janke Do you cast your table variables to categorical? In my case, if a task is to process strings, it will be many-times slower if variables are casted as categorical. I don't know why. What you think might have caused that?
> Sounds like you're fairly satisfied with the Matlab table implementation, except for the {} indexing.
Oh yes. I think the Matlab table array implementation is quite good. I'm not even sure the {}-indexing behavior is a problem; I use the multi-variable form of it seldom enough that I don't really have an opinion on whether its concatenation-instead-of-comma-separated-list behavior is an issue. I was just saying it is inconsistent with the {}-indexing behavior of cells and strings and thus unintuitive for a Matlab programmer new to tables; I don't know if that makes it bad.
The main thing about table arrays I think is not great is speed in some cases: addressing variables inside a table, and doing chained/multi-level indexing in to them, is not as fast as with structs, which makes tables unsuitable for some performance-sensitive contexts. (As of R2019b; I haven't benchmarked newer versions.) And I believe that the in-place-modification optimization ("IPMO") of Matlab's execution engine does not work for variables/columns inside a table array, even if the table array is in a local variable and there are no shared references to its underlying variables' data. (I believe that structs, cells, and objects in general share this no-IPMO-on-field-contents limitation, so it's not a weakness unique to table arrays.) And concatenating several tables can be kind of slow.
Thanks for the detailed response. Sounds like you're fairly satisfied with the Matlab table implementation, except for the {} indexing.
Also, as of 2023a, dictionary can also be referenced with curly braces to access cell elements of dictionary entries
vals = {1:3 , "Bicycle" , @sin};
keys = [1 2 3];
d = dictionary;
d(keys) = vals
d =
dictionary (double ⟼ cell) with 3 entries:
1 ⟼ {[1 2 3]}
2 ⟼ {["Bicycle"]}
3 ⟼ {@sin}
d{1}
ans = 1×3
1 2 3
d{2}
ans = "Bicycle"
d{3}
ans = function_handle with value:
@sin
Apparently curly brace indexing is returning a CSL of the elements stored in the dictionary value cells, rather than a CSL of the dictionary elements themselves, which would be a CSL of cells.
d{1:2}
ans = 1×3
1 2 3
ans = "Bicycle"
> When you wrote your own table implelmentation, if {} returned a table, what did () indexing return?
()-indexing also returned a table. This was one of the big differenes between my table array design and Matlab's design: My table array was an array of tables, where each element was a whole table (aka relation), and an array of tables could be of arbitrary size, as opposed to a table array always being a 2-D array of the rows & columns/variables inside that single table like Matlab's does. Like, if you call size(t) on a Matlab table array and it returns [2, 5], you're looking at a "single" table of 2 rows and 5 variables/columns.But a size of [2, 5] in my design is an array of 10 tables, each of which might have different numbers of rows and differnt numbers/names/types of columns. So you could do things like joins or projections with a single method call, and have them apply to many tables at once, with scalar expansion. My table array was more of a container with "stronger" encapsulation of its contents, and each element of a table array was a container that held a whole table/relation, kind of like how in a Matlab cell array, each element of the cell array contains a whole arbitrary-size-and-type array.
I don't think my approach of "array of tables with multi-table function application" ended up being very useful, and I'd probably just do the sizing and ()-indexing Matlab's way if I had to do it all over again. Doing operations over a plain list or array of tables, as opposed to a set of named tables, doesn't seem to happen much in practice, and you can always just slap them in a cell array if you need to do that.
I also had a different approach to dot-indexing. Instead of tbl.Foo being the column/variable Foo inside the table array, I had a special "cols" property that contained the columns for dot-indexing, so it would be tbl.cols.Foo. This meant that methods on table arrays could be called like tbl.meth(...) and you could use tab-expansion on them, and address table-level properties as tbl.Blah instead of tbl.Properties.Blah. I still don't know which of these ways I like better. Probably Matlab's, because column access is such a common operation, and Matlab's direct-column-addressing approach means you can use table arrays as drop-in replacements for structs in many places.
@Stephen23. "x is numeric. It has no comma-separated list syntax because it is not a container". I didn't really mean Matlab should do that kind of crazy thing. Just from a mathematical point of view, a scalar can be seen as a vector (so can be implemented as a container). Even Lisp, which construts almost everything as list, doesn't render scalar as list.
@Stephen23. "**There are some subtleties/differences due to the need to refer to rows: unllike other container types, with tables it is useful to be able to refer to rows (which refer to the content not the container itself)."
"assigning a row of different data types to a table, a challenge I have seen several times on this forum."
Absolutely agreed. At first, I had not been paying enough attention to different levels of extraction from table rows, causing self-doubt :-).
@Walter Roberson. "A few days ago I was trying to write some splitapply code that would have gone notably easier if {} indexing of tables returned comma separated values (or if there had been other syntax that did that.)"
Sharing your frustration. Recently I was trying to shift toward functional programming, and table row stands in the way like a Matryoshka doll--can't access the innermost one without stripping away the outer ones first. Reminder to myself:
- T(row,:); a single row table
- C = table2cell(T(row,:); a cell array
- C{:}; a comma-separated list.
When you wrote your own table implelmentation, if {} returned a table, what did () indexing return?
Are there other classes in base Matlab besides cell, table, and string that accept curly brace indexing?
I wonder if there are any toolbox classes that accept brace indexing and how that works, i.e., does {} return addressed elements or something else.
> A few days ago I was trying to write some splitapply code that would have gone notably easier
I hear that. Matlab's splitapply, join, and similar table functions have never really felt quite "right" to me, in terms of their interfaces. I usually end up writing my own wrapper functions on top of them that translate them to interfaces that feel more natural to me. But I'm an old SQL/table-head, maybe my tastes in code are just weird here.
A work around to do comma list on table content in single command
person=["maman"; "papa"; "moi"];
x=rand(3,1);
T=table(person,x)
T = 3×2 table
person x
_______ ________
"maman" 0.028974
"papa" 0.20146
"moi" 0.3114
struct('dummy',num2cell(T{:,1})).dummy
ans = "maman"
ans = "papa"
ans = "moi"
struct('dummy',num2cell(T{:,2})).dummy
ans = 0.0290
ans = 0.2015
ans = 0.3114
Unfortunately I don't know how to put this little command in a function, since it will return only the first element of the comma list.
A few days ago I was trying to write some splitapply code that would have gone notably easier if {} indexing of tables returned comma separated values (or if there had been other syntax that did that.)
Anyway, FWIW, I am similarly bothered as @Simon about how "{...}" brace indexing in to a table array returns a single array (subject to concatenation of the addressed elements) in a single variable/argout instead of a comma-separated list. I think this is probably the more-commonly-used case, so on the one hand it makes sense there. ("{}"-indexing is just an operation you can override or define however you want, at least in user-defined classes.) But there's just no precedent for it in the Matlab base language or standard library. I can't think of any other datatype that accepts brace-indexing and doesn't return a comma-separated list containing "addressed elements" in return. And the inverse as an lvalue.
Back when I implemented table arrays in my own Matlab library, I overrode {}-indexing to subset tables, producing tables. (Because I thought, who would do brace-indexing across multiple columns in a table array, and want it back as a comma-separated list? Why would you even do that?) And my {}-indexing was even weirder: it accepted a string with a SQL-style "WHERE" clause predicate, so you could do like t2 = t{'Date > now-7 && NumErrors > 1'} if you want to see things that blew up recently. Which was clever and concise. But I came to regret it: in standard Matlab usage, brace-indexing is such a specific thing with certain semantic/low-level effects, that I think it's best to just not devaite from that, even if it seems like a really useful thing to do. If I were doing it again today, I would have skipped the {}-indexing override and just used a really short method name, like "q()"`.
> It already does, that is exactly what happens
Yes. This is maybe a rhetorical issue. I was doing the British-style thing of "oh, perhaps there are reasons this thing works the way it does, and we should try to understand and think about those, instead of just getting grumpy and demanding it work in a different manner" thing. Like, I'm not actually wondering if there's perhaps a reason for that; I am somewhat familiar with those reasons and it's more of a Socratic dialog thing.
Sorry if that approach is condescending; I didn't mean it to be an insult.
Yet later you state that you want "Suppose T is a table. T{1,:} returns the first row's contents as a comma-separated list.". That would be a change of meaning of curly braces for tables.
"If you've got string arrays in table variables, then I think maybe T{x,y} should return a string array there, and then if you want to get at the "contents" of that string array in terms of char vectors, then you should hit that string array with an additional level of {...} indexing..."
It already does, that is exactly what happens:
one = ["hello";"world"];
two = [pi;NaN];
T = table(one,two)
T = 2×2 table
one two
_______ ______
"hello" 3.1416
"world" NaN
out = T{1:2,1}
out = 2×1 string array
"hello"
"world"
class(out)
ans = 'string'
out{:} % content of string container in a comma-separated list
ans = 'hello'
ans = 'world'
> In scalar case, x=x(:)=x{:}.
Are you sure this actually happens? (I assume by "=" you actual mean something like "isequal()" or "is the same as" in the broader sense; one-equal-sign "=" is the assignment operator, and two-equals "==" is the elementwise equality test.)
In the scalar case of an array x, then x and x(:) are the same thing. But the {...} dereferencing/pop-out operation produces something different. The x{...} operation "reaches in to" the contents of x subsetted by "...". I'm not aware of any case where x{...} is the same as x, unless you do some silly special-case subsref magic to make that happen. And I don't think any regular Matlab datatypes do that.
Also, note that "comma-separate lists" are – as far as I undersand it – not a Matlab datatype, but a special value-passing form or whateveryoucallit that only happens in the context of M-code syntactical and control flow constructs. CSLs can be captured in to cell arrays and vice versa, but they are not the same thing.
> {:} acts like CIA trying to turn Jason Bourne back to what he used to be [...]
Yeah well, what if the CIA pays for your Matlab licenses? Bc in my experience that's usually how it is: Matlab is commercial software, often paid for by the "business" intead of "tech" department, and the biz folks like things to just stay more or less like how they're used to.
> Suppose T is a table. [...] T{1,:} returns [...]
I think maybe there's another level of indirection going on here. If T is a table, then T{x,y} will "pop out" the contents of that table array's columns/variables, as opposed to T(x,y) subsetting the table and then returning another table. (And imho, for Matlab, "table array" means exactly the same thing as "table", because tables are arrays, just like everything else in Matlab.)
If you've got string arrays in table variables, then I think maybe T{x,y} should return a string array there, and then if you want to get at the "contents" of that string array in terms of char vectors, then you should hit that string array with an additional level of {...} indexing. Like, if T is a table array with a variable/column "mystr" that contains strings, maybe T{:,1} should pop out that one var and return a string array, and then T{:,1}{:} should then pop out the the string array's elements in to a list of charvecs, returned as a "comma-separated list" in this context.
My thesis here is that the string array type provides and additional layer of "indirection" or encapsulation that wraps charvecs in a higher-level type, and that table arrays are another level of composition on top of that, and you should expect one application of {...} indexing to only "pop out" one level's worth of encapsulation or composition.
IMHO, string arrays are kind of a special case here, because Matlab string arrays are kind of new thing, and the older ways of Matlab string handling were all kinda sloppy hacks layered on top of kinda-too-low-level representations. (http://matlab.wtf)
@Simon: it would be quite handy having {:} also define a comma-separated list for tables, which would make that syntax meaning consistent** for all data types. It would also make a few kinds of operations much easier for tables (e.g. assigning a row of different data types to a table, a challenge I have seen several times on this forum). As far as I can tell, the main difference would be in case of multiple columns/variables, which would need to be replaced with horizontal concatenation, i.e. t{:} -> [t{:}].
Note that for comma-separated lists x(:) != x{:}, so your "singular case" example is inconsistent with all other comma-separated lists and is also inconsistent in and of itself: why should a comma-separated list of one array have a completely different behavior to a comma-separated list with any other number of arrays?. I would not expect or desire that, it would make comma-separated lists much harder to use (need to program special cases) with all of the resulting latent bugs etc.
" Even in scalar case, x = 2; x{:} returns 2."
No, it does not. Lets try that right now:
x = 2;
x{:}
Brace indexing is not supported for variables of this type.
x is numeric. It has no comma-separated list syntax because it is not a container. It makes no sense to attempt curly-brace indexing on something that is not a container.
**There are some subtleties/differences due to the need to refer to rows: unllike other container types, with tables it is useful to be able to refer to rows (which refer to the content not the container itself).
@Rik, thanks for the cellstr solution. I'll give it shot.
@Paul, your feedback led me to think more deeply about {:}.
x = "abc string";
x{:}
ans = 'abc string'
In that example, {:} acts like CIA trying to turn Jason Bourne back to what he used to be. Bad practice. I would like {:} to be like the toughest NKVD intogerator. Whatever container it touches up, the container will spew his or her contents.
Suppose T is a table.
T{1,:} returns the first row's contents as a comma-separated list. Even in scalar case, x = 2; x{:} returns 2. Under this semantic principle, {:} will behave as a nice symetric complement to (:).
T(1,:) returns a single-row table, wrapping the ocntent inside.
T{1,:} returns naked contents.
In scalar case, x=x(:)=x{:}. a singular case that doesn't break general rule, kind of fitting Matlab's birth purpose of serving mathematicians, isn't it?
Paul, the examples you give have my codes want to shout out MeToo. When I looked into built-in rowfun, splitapply, I found they also had the same MeToo moment. Those built-ins must have a carefully crafted local function to handle inconsistent semantic interpretation involving {:}, usually 'flatten' or 'expand' table rows, or other contructs to cell array. (But don't take my words 100% because I lost my patience during the code tracing.)
If you have feedback about a specific documentation page (something you expected to find on the page that's not present, a bug on the page, or a suggestion for an alternate way to phrase something on the page that may be clearer or more general) you can select a rating for "How useful was this information?" at the end of the page. Once you select a number of stars that will be replaced with a box asking "Why did you choose this rating?" where you can enter free-form text. I know for a fact that the documentation staff reviews this feedback.
If you have feedback about something that's missing entirely from the documentation, for that I'd recommend you contact Technical Support directly using this link. They can enter your feedback into the enhancement database for the documentation staff to review.
If only we could cascade brace indexing
y = [1 2 3]
num2cell(y){:} % won't work currently
My understanding is that you have two basic concerns:
a) TMW should have a tutorial on Comma Separated Lists. See Comma Separated Lists. Unfortunately, that doc page is lacking as it does NOT discuss how to generate a comma separated list from a string array, when doing so is a feature as you've pointed out. However, that doc page would not discuss comma separated lists as they relate to tables, because there is no way to generate a comma separated from a table (at least not according to the doc page I linked previously).
b) T{1,:} returns an array, not a comma separated list, and is therefore inconsistent with use of {} on classes like cell and string where {} does return a comma separated list. Do you think a comma separated list in this case would be more useful?
As an aside, I've sometimes wanted to be able to generate a comma separated list from a numeric array. Alas such is not possible and one as to resort to workarounds
y = 1:3
y = 1×3
1 2 3
try
y{:};
catch ME
ME.message
end
ans = 'Brace indexing is not supported for variables of this type.'
struct('y',num2cell(y)).y
ans = 1
ans = 2
ans = 3
@Steven Lord "...backwards compability with functions that accept cell arrays containing char vectors"
I see that thanks.
Strictly speaking I consider the backward compatibility is not broken even without the {} behavior on string, since every code written for char still works.
What you can "compatibility" is more like wanting the same functionality working for both string and char. However there are a bunch of other things that cannot work for both classes, such as char arithmetics, extract sub-char, numeric conversion, etc...
I remember we were working with a robot using a TCP-IP protocol with sending char-array/string. One of my intership changed char to string or using a function intended working for one and not for another, I can tell you that was the a very frustrating experience for us when the bug occurs because it is not working exactly the same...
I really like how string vectors have extremely similar bahavior to cellstr. You can pretty much rely on cellstr(data) to convert a string and your code should not require any changes. That especially helped me when the Name=Value syntax was introduced:
MyFun(Name='Value')
ans = "Name"
ans = 'Value'
function MyFun(varargin)
for n=1:nargin
varargin{n}
end
end
With varargin{:} forwarding the lot to your parser function, this new syntax is automagically supported.
@Steven Lord, I think back compatibility is the reason. When I began using Matlab, there was no string. That was good old days. Cell array was a wonderful, powerful thing. Life was much simpler. But it was a little too simple without string. Then there was string. Like any new useful invention, it requires users to be re-adjusted. I have saved Loren Shure's wonderful blog post for a quiet raniny day reading.
Frankly, I've never thought about it until I saw that use in this thread. Without thinking about it much more, I don't have any issue with curly brace indexing of a string array returning a comma separated list of char.
I did find this doc Access Characters Within Strings that at least shows how curly brace indexing works to convert one element of a string array to a char and relates that back to similarity of indexing into cell arrays of chars. In that sense, having x{:} return a CSL makes sense in that it mimics the behavior of the "old days" when x would be cell array.
x = {'a' , 'b'};
x{:}
ans = 'a'
ans = 'b'
Having Access Characters Wtihin Strings as a subordinate topic on a doc page for Create String Arrays is quite illogical IMO.
If I recall correctly, one of the reasons (perhaps the main reason) for curly brace indexing on a string array returning a comma-separated list of char vectors is for backwards compability with functions that accept cell arrays containing char vectors. See the third bullet point in the Looking to the Future section on this post from Loren Shure's blog about working with text in MATLAB.
If we'd made that operation error, I suspect our users would be grumbling something along the lines of "MathWorks, you know what I meant, just go ahead and do it instead of making me change my code to distinguish cellstr and string!" [Actually, you probably wouldn't have because internally MathWorks developers who would have had to make those same changes in our code base would have grumbled before the feature got released!]
@Paul, from the perspective of a pracctical user of Matlab, I don't mind what x{:} returns. An end-user-oriented tutorial would be good enough. Comma-separated list is a wonderful construct, and I have come gradually to embrace it. Great help came from @Stephen23, who has written an excellent tutorial: comma-separated lists. I hope Matlab staff could expand on that.
I remember I read somewhere, Hacker News maybe, that some experienced programmer in other languages, who wanted to get into Matlab, complained about similar things. The tutorial I suggest would lower Matlab's entrance barrier for, say, C++ programmers.
I, not surprisingly, couldn't find a doc page for x{:} where x is string array
x = ["a" , "b"];
x{:}
ans = 'a'
ans = 'b'
Setting aside concerns about inconsistent semantics for the moment, would T{1,:} returning a comma-separated-list of char be more useful than returning a string array?
Agree.
It seems to me table overloads heavily subasgn and subsref especially for {} and it's done internally.
I tried long ago do overload {} with my own class but it won't be able to make it works.
It sounds like we agree. This thread is for changes that would break compatibility. What you're suggesting is keeping the technical behavior the same, but improving the documentation. The threads I linked are more suited for that. I would like to show support for your suggestion by giving you an up-vote, but I don't feel this thread is the most suitable location.
I know that Mathworks staff is monitoring these threads and do consider the comments and votes when deciding what to do with a suggestion. Posting in the correct thread helps giving your suggestion the correct exposure.
I don't mean that Matlab should change the meanings of x{:} and T{1,:}. Just a more comprehensive tutorial would lessn a greate degree of headache. Better documentation oriented toward beginners go quite well with 'What should be in next generation', I think. But it's all personal opinion.
How exactly would this break compatibility? These things sound like new features, but not really things that will prevent older versions from running the same code.
This sound more suited to a missing feature threads (#1 #2): features that you whish Matlab would have had.
Feel free to move your answer (by posting it there and deleting it here; moving answers between threads is work in progress).
- Being dynamically typed makes large programs irritating to develop and makes the language slower (JIT needs some time to do its thing); I think a compiled statically typed MATLAB would be amazing (yes, I know the arguments block is a thing, but that's still checked at runtime)
- In-editor vim emulation (IdeaVim is the ideal case)
Coder - MATLAB Coder isn't what I mean. I would like MATLAB to be statically typed while retaining much of the neat math syntax that makes it pleasing to use.
Static typing - Nothing I've used in C requires that level of specificity for mathematical operations, casting is fine, and even implicitly widening ints to floats, or singles to doubles, is totally ok by me. I am thinking mostly about the "MATLAB Compiler" inn the sense of checking that e.g. a class I'm making implements an interface, and making sure all the code is type-valid without needing to run (for example) most of an expensive simulation.
Static typing gets in the way of rapid software development, which is the purpose of MATLAB.
Consider for example,
function C = myplus(A, B)
C = A + B;
end
In MATLAB, this one function is compatible with any datatypes for which the plus() method is defined. In a statically typed language, you need separate
function single C = myplus_single_single(single A, single B)
C = A + B;
end
function double C = myplus_single_double(single A, double B)
C = double(A) + B;
end
function double C = myplus_double_single(double A, single B)
C = A + double(B);
end
function uint8 C = myplus_uint8_double(uint8 A, double B)
C = uint8( double(A) + B);
end
and so on -- every possible combination must be named and defined... unless you have a template system.
If you do have a template system, you need something like
function <$superiortype(<$type_A>,<$type_B>)> C = myplus(<?type_A> A, <?type_B> B)
C = cast(cast(A, `$superiortype(<$type_A>,<$type_B>`) + cast(B,`$superiortype(<$type_A>,<$type_B>`), `$inferiortype(<$type_A>,<$type_B>`);
end
with special syntax to grab the types of input arguments, and functions to be able to calculate which type should be prompted to or demoted to, and special syntax to convert computed type names to parameter so as to be able to pass to functions such as cast() . Unless, that is, you want to introduce a semantic type cast instead of a function type cast...
MATLAB Coder is compiled statically typed MATLAB.
I have a couple of wishlists:
# 1. Machine Learning applications should have a few features to extract/store the simulation results (numerical data) in the workspace: (1) Regression data (target vs. predicted values), R (correlation coefficient values), Mean Squared Error values (Training, Test, Validation and overall).
# 2. Chord diagram function (a 3rd party function posted on mathworks.com)
Neural Network App, I am talking about. Thank you - Steven.
Is there a specific Machine Learning app you have in mind? Looking at the list of Statistics and Machine Learning Toolbox apps, I haven't used them extensively but based on the pictures at least Classification Learner and Regression Learning have Export sections on their toolstrips.
Using arrays of string by default to anything that right now is cellstr by default. For example, string columns with readtable, some_table.Properties.VariableNames, etc. Apart from it saving me a lot of time having to adjust things myself every time, it would help novice matlab users who may not know stuff like how bad long cellstr's are (huuuuge overhead), and have to learn this the hard way (like I had). Even put one of these "not recomended" warnings if somebody uses readtable opting for cellstr as default for character columns. It would require adjusting older code accordingly, but it would save a lot more. Maybe put a warning in general whenever somebody defines or uses a 1-d cellstr that is above some length.
I had the same headache and am still having it. I recently modify my readtable opts to set all variables as "string" in data-extraction step. This seems to reduce error-pronenss in this step and to speed up looping algorithm. However, when I want to store the extracted tables, I would change group variables to 'categorical' to reduce storage size. String data would occupy much larger space in the hard drive.
I also very much like Matlab to default some_table.Properties.VariableNames to be of string array. Somehow, Matlab is inconsistent in defaulting things as cell array or string array. That kind of inconstency is the biggest slow-down for me.
Better folder path utility. Python's pathlib is powerful and very intuitive to use. Matlab's dir is cumbersome. To be the problem lies in dir use structure and comma separated list, which I don't feel at home with.
The point of my comment was also that someone could write a function pathlib that would replicate the Python behavior, without impacting currently existing code.
There is a fundamental difference between the MATLAB dir() approach and the python pathlib approach.
The MATLAB dir() approach immediately returns information about all of the matching files. The python pathlib approach instead gives you an object that you have to iterate over, asking each time for the details you want.
As indexing is relatively efficient but making repeated system calls is less efficient, the MATLAB approach is faster, requiring fewer system resources.
Once it is understood that MATLAB is returning all of the information at the same time, then there are only a small number of representations available for the information:
- dir() can return a struct array, like it does now
- dir() could hypothetically return a 2D cell array in which you "just had to know" which offset corresponded to which information. This would use more memory than a struct array and would be notably less user-friendly
- dir() could hypothetically return a table() of information. From a user experience perspective this might not have been a bad choice. But in practice, table() operations are slower than struct array operations. And in practice tables() were not introduced until R2013b, long after dir() had been designed
- dir() could hypothetically return an object array with a bunch of properties and methods. Comma seperated lists would probably end up being used internally, but I suppose could have been hidden from the user. In practice though, object operations are slower than struct operations.
The python operations available for pathlib look over-engineered to me -- a lot of operations that should probably have been string operations.
How exactly would this break compatibilty with the current code base? At first glance I don't see much difference beyond the syntactic differences between Python and Matlab.
I'm not deep into how code "should" be written, but more of a user who realizes theoretical papers in matlab and python. So just want to frame my comments in that light. I also want to give kuddos to mathworks as I can't get away from MATLAB. The tool is extremely well done and well supported.
I would like to see the following:
- lose the brackets when assigning output arguments and make carrage returns end lines.
Example:
x, y = myFunc(temp, temp2)
- Add a way to do bulk comments
%* these are comments
still commenting *%
- add functionality to take highlighted code and immediately turn it into a function
- make it simpler to make code with variable arguements. For example, we have several ways right now to do name-value pairs. One of which cannot do autofill and one that can. Having the default values within the function definition is nice in python and somehow that one does auto filling without any extra code like is required by matlab. When I say autofill, I mean you hit tab in the argument spot and it gives you a list of options.
-native arduino or generic microcontroller support
-native AI code writing support like co-pilot
-improve the symbolic toolbox. Mathmatica kills you guys here.
Thank you for your attention to my comments and bringing some MATLAB features to my attention.
When I said "Native Arduino Support" my intent was to be able to code arduino in C in one file of the MATLAB IDE and be able to do the register read/writes in another file in the matlab IDE while executing some scripts on the data. After reading the Arduino documentation, it seems MATLAB already has or nearly has what I was asking for!
For the brackets, it was a comment on the features of python that I preferred over MATLAB. I liked not neededing to end every line with a semicolon, not needing to add brackets in the outputs, and not needing end statements for loops. For me, it would simply be a quality of life improvement.
In summary, only your suggestion of leaving out the brackets would actually break compatibility. Can you explain why you would like to see this change?
improve the symbolic toolbox
Mathworks has been working for several years to remove internal functions that it inherited with MuPAD, trying to get further and further away from user access to MuPAD, and meanwhile adding a few MATLAB-level functions that replace some of the key pieces.
Now, the more details of the symbolic engine that Mathworks hides, then the more potential there is for doing wholesale replacement or rewriting of the symbolic engine, as long as the replacement can satisfy what is left of the defined MATLAB-level functions. You should not expect a future of Mathworks providing MATLAB-level functions to do tasks such as setting the DOMAIN slots of overloaded symbolic functions --- but enhanced pattern matching in mapSymType is not out of the question.
Whether Mathworks will ever do the work to make the Symbolic Engine thread-safe (to improve performance by operating on multiple cores) is a big unknown to me. I'm not hearing anything about that possibility, but also not hearing a "No" on that possibility. My gut feeling is that at this point a more-or-less replacement of the internals of the symbolic engine is more likely than a tune-up, but I really don't know.
native arduino or generic microcontroller support
Could you expand on that?
aurdino() is already supported by installing a (free) add-on package.
There is absolutely no chance that MATLAB or compiled MATLAB executables itself will ever execute on arduino: the memory limitations for arduino are so severe that it is Not Going To Happen. (Raspberry Pi has a non-zero chance at some future point.)
Simulink can already target Arduino if the appropriate (free) add-on package is installed.
What is currently "missing" is that MATLAB Coder does not have a native target of "Arduino" if I recall correctly -- just of whatever ARM (or as appropriate) chip. (MATLAB Coder does have native support for Raspberry Pi)
There are some minor exceptions to the block comments having to appear alone on a line; see https://www.mathworks.com/matlabcentral/answers/92498-can-i-comment-a-block-of-lines-in-an-matlab-file-using-as-i-can-in-c#comment_1451187
lose the brackets when assigning output arguments and make carrage returns end lines.
That would be ambiguous if a variable named x is present in the workspace or there's a function named x that can be called with no output arguments. Should that line call myFunc with two output arguments and assign those to the variables x and y? Or should it display the value of the variable x / the value returned by the x function when called with no outputs then call myFunc with one output and assign that output to the variable y?
- Add a way to do bulk comments
This is possible and has been for several years. Use the block comment operators %{ and %}. These do need to appear alone on a line, though. See this documentation page for more information.
add functionality to take highlighted code and immediately turn it into a function
From the Editor (see the Refactor Code section) or from the Command History? Okay, the latter doesn't technically satisfy your request since it creates a script rather than a function. But all you have to add is the function declaration line.
improve the symbolic toolbox. Mathmatica kills you guys here.
If you have specific suggestions for what Mathematica does better than the Symbolic Math Toolbox, please send them to Technical Support as enhancement requests.
Bulk comments: %{ at the beginning of a line, an %} at the beginning of a line to terminate
Cool, thanks!
I just added this thread to the list of 'where to post' discussion threads. At 41 answers as of today it is getting pretty large already, so I think a second thread will soon be a good idea.
My goodness, the IDE can be annoying sometimes. What's missing...
- I use the editor undocked. Please can we have the capability to display a watchlist of variables in a panel in the editor. Also, you should be able to right click on a watched variable and set a breakpoint to halt when the value changes or some user specified conditional relating to that variable is satisfied. Basically, please can we have the MS Visual Studio watchlist.
- The call stack display in the editor is absolutely useless if the call stack is deep, which it often is with OOP. Can't we have this as a proper list? Having to open and re-open a tiny dropdown menu is hopeless. The horizontal list that you get with the live editor is also useless if the stack is deep. It needs to be a list which you can pin open, and where you click on it to move the stack frame. I routinely resort to using dbstack at the command line to get round this, but then clicking the output from dbstack doesn't move the stack frame so it is only half useful. Also, because the output from dbstack moves off the screen when you enter other commands and has to be regenerated to stay up-to-date, it's hard to mentally "keep your finger in the pages of the book where you want to go back to" when you are concentrating hard.
- Finally, and this is a big ask I'm sure, can we have the capability to drag the instruction pointer during debugging and also modify code on the fly when debugging.
When using the debugger, I would love to have a button to Step (run the next line) and display output regardless of the ; being there or not.
What I do in cases like that is to highlight the line up to the semicolon and then hit F9 to execute it.
Allow (prefer?) use of square brackets for indexing into arrays:
A[1:10,1:10]
Second this. I have tried Julia that uses brackets, and it makes soooo much more sense and code is soooooo much more clearer, just because of this. Taking a mental step (which may not be trivial actually) to mentally parse if a certain line is about a function call or an array is unnecessary, and makes things more obscure.
I think saying that "indexing an array is a function call" is a bit of an overstretch in this situation. Even if it makes sense to see it like this, then it would make sense to write array.index(n) rather than array(n) , because the latter would imply that array is a function itself, not a method over array .
This would also make stuff like function_that_outputs_an_array(10)[4] possible, while now we often have to take detours for that.
> I can't vistually tell if someObjectName(1,3) is a function call or index into an array.
This is an intentional part of the design of the Matlab language. Not only can't you tell visually, you can't tell by parsing or analyzing the code either: it's actually indeterminate until run time, and depends on whether someObjectName refers to a regular value or is the name of a function or is a callable object like a function handle or an object that implements subsref in your execution environment. It's called the "uniform access principle", though I don't see that term used much. The idea is that you can write some client code that indexes into someObjectName, and then the code defining someObjectName can supply it as either a function or a value array, without close coupling to which it is. It's a sort of duck typing. Mathematically speaking, an array basically is a function which maps integer-valued inputs to the values in those elements of the array.
I almost never see this flexibility used in practice, though, so I don't know how useful it is.
Similary, and this one I actually see used, you might have a class Foo that defines a property blah, which exposes an array into which callers acces. At some point you might want to generate that value dynamically or through a more complex process. You can change blah from a property to a method, or vice versa, and existing client code will continue to work without change (if you do it right). If array indexing and function calls had different syntax, this would be a breaking change and all your clients would have to change their code (or you would have to just iimplement it as a method wrapping the array in the first place, for future-proofing).
If you switch, there is a minor issue with array concatenation syntax: given [x [1]], does that concatenate x with a numeric array, or index in to x? Seems like a degenerate case that could be resolved by making the space between the identifier and brackets significant, or requiring you to use "," or ";" in the concatenation to resolve the ambiguity.
I have no opinion on which way is actually better. I might prefer the "[ ]" version, because you don't have to hold down shift to produce "[ ]" but you do for "( )", and I am a lazy typist: that's mostly why I use single-quoted char literals instead of double-quoted string literals in most places.
I note that R and Python are newer languages than MATLAB, so it could be argued that they should change.
() indexing is used by Ada, ALGOL W, BASIC, COBOL, Fortran, RPG, GNU Octave, MATLAB, PL/I, Scala, Visual Basic, Visual Basic .NET, Xojo
() and [] and . (dot) all invoke methods of their class. Indexing of an array is a function call.
As an implementation optimization, when the class is one of the built-in numeric-like classes, the execution engine can take shortcuts instead of a full method call. But that is an optimization.
Yes, in fact R does this now.
My primary interest is to reduce the challenge of switching between languages. I regularly use Python, R and Matlab and the thing that bites me most often is that Matlab uses () for both indexing and function calls, for two reasons:
- Confusion: I can't vistually tell if someObjectName(1,3) is a function call or index into an array.
- Muscle memory: In the other languates, indexing always uses [ ] , so I end frequently up typing the square brackets and creating syntax errors.
If this were done, then hypothetically Mathworks could declare that in the case where a value were assigned to a name that is a function, then if [] were used that the reference would be to the variable, but if () were used then the function would be invoked.
This would reduce the problem of people assigning a value to variable named "sum"
Why exactly? I'm just curious. This seems simply a matter of taste. What would be the benefit?
Documentation on how to change the default size of figures in Live Scripts.
Extend Find/Replace regular expression support to include substitution of matched elements from 'find' into 'replace', so that one can do things like:
Find: (call\(\w+ *, \w+, *)(\w+ *))
Repace: \1uint16(\2))
and accomplish the transformation
call(a, b, c)
call(d, e, f)
to
call(a, b, uint16(c))
call(d, e, uint16(f))
@Gregory Warnes I am not clear as to what your desired output is?
S = string('%#codegen') + newline + ...
'coder.extrinsic("disp");' + newline + ...
'coder.extrinsic("warning");'
S =
"%#codegen
coder.extrinsic("disp");
coder.extrinsic("warning");"
regexprep(S, 'extrinsic\("(\w+)")', 'extrinsic($1)')
ans =
"%#codegen
coder.extrinsic(disp);
coder.extrinsic(warning);"
S = 'disp("hello"); call(a, b, c); disp("bye");'
S = 'disp("hello"); call(a, b, c); disp("bye");'
regexprep(S, '(call\(\w+ *, \w+, *)(\w+ *)', '$1uint16($2)')
ans = 'disp("hello"); call(a, b, uint16(c)); disp("bye");'
Not in Matlab Online 2021a.
For example, with the original code:
function obj = bladeRF_new()
%#codegen
coder.extrinsic("disp");
coder.extrinsic("warning");
obj = bladeRF();
end % function
and using
Find: extrinsic\(("\w+")\)
Replace: extrinsict\( $1 \)
yields:
function obj = foo_new()
%#codegen
coder.extrinsict\( $1 \);
coder.extrinsict\( $1 \);
obj = foo();
end % function
(I also tried \1 and \$1).
This is already supported using $N instead of \N for numeric N
Please unify/combine the Matlab coder (`ceval` and friends) and C API (`calllib` and friends) to remove the need to double-code all C calls in code that needs to be run by the interpeter and processed by coder.
For example, I currently have a device driver where every c library call looks like:
if coder.target("MATLAB")
[status, ~, val] = calllib( ...
'libFoo', ...
'foo_get_correction', ...
obj.foo.device, ...
foo.str2ch(obj.module), ...
'FOO_ENUM_STRING', ...
val ...
);
else
status = int32(0);
val = int16(0);
enum_val = foo_enuminfo('foo_correction').FOO_CORR_GAIN;
status = coder.ceval( ...
'foo_get_correction', ...
obj.foo.device, ...
foo.str2ch(obj.module), ...
enum_val, ...
val ...
);
end;
Unable to resolve the name 'obj.foo.device'.
Even bettter, would be a tool that automatically generates a wrapper from a (well formed) C/C++ header file, that can be customized by the user, and that is compatible with both interpreded use (coder.target("MATLAB")) and compiled/embedded use (~coder.target("MATLAB")).
One thing that I love about the way MATLAB has evolved over the 20+ years I've been using it is the way you keep adding modern features while keeping the fast matrix operations. Beautiful plotting built in helps a lot too. Like a lot of other people, I've said, "I'll use Numpy because it's free," and then 8 hours later I'm like this would take 5 minutes in MATLAB. And then I do it in MATLAB and it's done. I love that. Here are my favorite features from other languages that could be added, probably without breaking anything:
- Haskell's guards and list comprehensions,
- Lazy containers,
- LISP keywords,
- LISP style maps, in which the :keyword-with-hyphens is also a function that retrieves data from an object,
- Python's convention of defining ```__str__(self)``` to mean "This is what happens when you cast to a string," ```__int___``` for "This is what happens when you cast to an int," etc. Optional methods that support every kind of cast you could want.
- More modern kinds of loops. ```for i in <arbitrary_container>``` for example. Whether the loop is executed in any particular order depends on whether the container has any kind of order, etc.
- An API for defining language extensions. This would allow the community to experiment with new language features, making it cheaper & easier for Mathworks to see which language features gain traction. Mathworks would always have the option to include the most popular language extensions in a future release.
Oh, I totally forgot: In Matlab, iteration over a table array already is defined! A Matlab for loop is defined as iterating over the columns in a 2-D array. (That's why your sequences used in for loops generally need to be row vectors.) A Matlab table array is defined to be a 2-D array, of rows x variables. So a for loop will iterate over the variables of a table array, producing a 1-variable table as the iterator/index variable each time.
For example, this works:
tbl = array2table(magic(3))
for it = tbl
it
end
Producing:
it =
3×1 table
Var1
____
8
3
4
it =
3×1 table
Var2
____
1
5
9
[...]
I don't think that's a very useful behavior, but it is defined. :)
> Python's convention of defining ```__str__(self)``` to mean "This is what happens when you cast to a string,"
This is basically a thing now! Matlab classes can define an overidden string() method to define what happens when you cast that object to a string (even implicitly, as happens when assigning into a string array). Overriding disp() controls how the object is displayed at the command window. And in the most recent Matlab releases, there are now additional things you can override to provide finer-grained control of how the object is displayed in other contexts, such as when it is contained in a field of a composite type like a struct or cell and a more compact representation is required. See: https://www.mathworks.com/help/matlab/group_analysisopt.html
If you want a simpler string-representation API that works more like Python's str() and repr() functions, you can also hack that together in userland Matlab code and it'll work in many places, though not in some built-in Matlab display contexts. Here's a little prototype example: https://dispstr.janklab.net/
You can also define overridden int32(), uint32(), double() and similar methods to define what happens when your user-defined object is cast to various other Matlab types, including numerics.
A thought on tables specifically: From a SQL/relational algebra standpoint, and how it's done in Python, a table/dataframe is a container that holds a list of tuples (records or rows), and iterating over a table iterates over records, which are presented as something that looks in Matlab terms like a scalar struct array. That operation is kind of opposed to the array-oriented, vectorized (or "column-store") nature of (fast, idiomatic) Matlab code for data structures like this. I'm not sure that encouraging interation-oriented programming though convenience functions or syntax is necessarily a good idea for Matlab "containers" like this. I think we might want to lean more towards APIs that encourage the use or creation of vectorized or array-oriented operations, and push interation requirements out to the user-defined code you want to apply to tables and similar containers.
LISP keywords,
Switching the editor to be emacs ??
CAR and CDR as alternative functions instead of indexing?
Or do you mean that you would like MATLAB to support, for example,
show_members :a p :b 'q :c 2
as an alternative syntax for
show_members( a=p, b='q', c=2 )
complete with it being the value of p that would be passed in to show_members, even though at present MATLAB's command-function equivalence would currently treat that call as
show_members(':a', 'p', ':b', '''q', ':c', '2')
in which all of the elements are treated as character vectors ?
... I think there would be conflicts with the existing use of : as the range operator, if you are going to start interpolating values from text...
Questions...
A table() is a container. What would it mean to for i in MyTable ?
A hggroup() is a container. What would it mean to for i in a hggroup ?
A scalar struct is a container. Would for i in the scalar struct mean to iterate over the fields? Giving you the content of the fields? Giving you a scalar struct that contains just the one field?
A nonscalar struct is a container. Would for i in the nonscalar struct mean to iterate over the array elements (giving you one full struct at a time), or would it mean to iterate over the fields? Giving you a nonscalar struct that contains just the one field? Giving you a cell array of the contents of the field?
Replace the pinv function with a function tikhonov that defaults to the Moore-Penrose generalized inverse without regularization-parameter, zeroth-order Tikhonov with a second scalar input for the regularization-parameter and a L-th-"order" Tikhonov-regularization with a third-order L-matrix.
The argument for this compatibility-break is that it would force users of pinv to think about what they've done and why, and let them consider the more general and preferable regularized solutions than the M-P inverse.
Clean inconsistencies, or counter-intuitive behaviours.
1.- [dr,dc] = size(data)
2.- dsize = size(data)
3.- [dr,dc] = dsize
1) works, but 2 and 3, wich is intuitively same, do not work.
Improve consistency in general.
For consistency with the first call, already said to work, [dr, dc] = dsize2 would have to mean dr = dsize2(1), dc = prod(dsize2(2:end))
To give a more concrete example of what Rik and Jan pointed out:
A = ones(2, 3, 4);
% Option 1
[dr, dc] = size(A) % Yes, dc being 12 is the correct and documented behavior
dr = 2
dc = 12
% Option 2
dsize = size(A)
dsize = 1×3
2 3 4
Given this dsize vector, what exactly would you expect dr and dc to be in the Option 3 case and what rules did you apply to decide on those expected values? What if dsize had been created not by a call to size but explicitly?
dsize2 = [4 8 15 16 23 42]
What would you expect dr and dc to be if the code [dr, dc] = dsize2 were executed?
This is the documented behaviour of the size() function. Many codes rely on this feature, which is implemented in the mex level also by mxGetN().
This is a useful feature, if the size of the first dimension matters and all following dimensions are treated as "slices". Therefore the special output of size with 2 output arguments was introduced on purpose and you cannot change it anymore.
As soon, as 1 output is replied by size, it contains all dimensions and there is no consistent way to solve "[dr,dc] = dsize".
If you want to get the sizes of the first 2 dimensions, size(X, [1,2]) is intuitive.
You mean 2 followed by 3? You can split a vector into 2 parts if you want.
What would be the expected behavior if dsize is not exactly 2 elements? Should one of the variables be empty?
Also can you explain what would make your suggested syntax more consistent? I don't really see why it is inconsistent. Maybe I'm set in my ways and don't see the quirk you see. Can you provide a bit more context about what this would solve?
A small but handy function that allows when in workspace to press a letter and automatically highlight the variable with this first letter.
I wish there was a way to undo Editor text changes to the max level possible. Clicking the little blue curved arrow 50 times to undo as much as possible seems excessive. I'd an option where I could just back up all the way to the beginning immediately.
Perhaps they cover the same things but it's hard to combine posts. If we copied over some things, it would basically look like I posted it instead of the original author.
Should this thread be included in the list of links in those posts?
Wishlist threads, frustation threads, missing feature threads (non-breaking), and this thread with breaking feature requests.
@Rik, no, this is a suggestion for MATLAB itself. The "Answers Wish List" you gave the link for is for ideas to improve Answers (this forum). At least it was intended to be that when you first posted it and said "bugs and feature requests for Matlab Answers".
Couldn't you do something like this with git?
Commit before you start working and reset if you don't like what you have done since than? :)
Sounds like a request for a light-weight version control system built-in to Matlab.
I use Windows' File History feature to back up all of my changed files every hour. Each version of every file is stored for 6 months.
Another good and useful tool for students is to have a built-in function to reset all changes made by a user in preferences and interface menu options back to the default. Students quite frequently make changes and have diffculty to reset back their MATLAB menu panel and preferences.
One of the most common pitfalls for the beginners are how to do correct memory allocation even though MATLAB automatically pinpoints that memory allocation is necessary for [for .. end] and [while .. end] loops when the values from every iteration are being saved.
That would be great to have additonal MATLAB's builtin function that detects a necessary memory allocation. And if the user decides to employ this, he/she could just click ok to the proposed option, and all is done like filler options of a Live Script Editor.
The point is exactly that there should (ideally) be an Autofixup that took care of the pre-allocation.
... On the other hand, the auto-fixups seem to have all been removed in R2021b, with the change over to the Live editor.
Pre-allocation for a while loop is always tricky, unless the user knows that the loop is convertable exactly to a for loop.
Pre-allocaiton for a for loop that has a break or continue always involves a design decision.
I think this is mostly already done by the Matlab Code Inspector's "SAGROW" inspection, like here?
Though there's no Autofix for it. Dunno how hard it would be to code one up, since preallocation can be done in various ways.
Or is this not what you were talking about?
I would like to see support for a more structured form of helptext, like Javadoc or Markdown, which could be used to produce richer documentation pages from the inline helptext in class and function source code.
Right now, the helptext is minimally-processed (in a loosey-goosey manner that I've never found formally specified anywhere) that supports basic references to other functions and classes, and definition of an Examples section. In doc for user-defined classes and functions, the helptext is rendered simply, mostly as-is in fixed-width font.
I'd like to be able to have an alternate helptext format that produced richer documentation output, which could be rendered as web pages with proportional font by default and support for various formatting, like section headers (maybe multi-level), fixed-width and demarcated code examples, hyperlinks, maybe even embedded images. It might also be nice to have some structuring that allowed you to specifically document the exceptions a method throws, maybe pre-and-postconditions, function arguments (for functions and methods which do not have arguments blocks that document the arguments separately), return values, and so on.
For methods and functions which have arguments blocks, I'd like to be able to add helptext on each of the arguments, in the manner in which one can put helptext on individual class properties, and have the help for all those arguments be automatically incorporated into the display of help <func> and doc <func>. That auto-generated documentation should also include representations of any declarative type & value constraints and default values that are defined for those arguments. Would be nice if arguments were expanded to include output arguments, so those could be documented as well (though I'm not sure how that would work in the case where one uses the same variable name as both an input and output argument).
I think Markdown, specifically GitHub Flavored Markdown (but maybe allowing arbitrary embedded HTML; I'm not sure), would be a nice format to do this "richer helptext" in. It's easy for most people to pick up, very readable in its source form (for people who are browsing the source code and reading the help there, and for the back-compatibility case where you want to use Matlab code written in the new format in an older version of Matlab), and supports most of the formatting controls I would like.
Maybe there should be a mechanism to use alternate formats for helptext.
One way you could do this in a flexible and even back-compatible manner would be to introduce a new %# pragma for specifying the format that helptext is in: something like %#<helpfmt:foo> where "foo" is the format of the helptext, like "markdown" for Markdown, "helptext" for the legacy Matlab helptext format, maybe "html" for arbitrary HTML, or "<whatever>" for a new structured Matlab documentation format, if you want to use that. For example:
%#<helpfmt:markdown>
%#<helpfmt:helptext>
%#<helpfmt:html>
If the pragma appears at the beginning of a block of helptext for a classdef, function, property, or so on, it would apply only to that one helptext block. If it appears at the beginning of a file, before the initial classdef or function line (or at the top of a Contents.m file), it should apply to all helptext in that file (and could be overridden by additional %#<helpfmt:...> pragmas on a per-block basis. Maybe the could even be some config file at the root of a source tree (that is, in the directory that goes on the Matlab path) to set the default helptext format for all files in a project/codebase.
It would maybe be nice if this supported some mechanism for linking to separate doco pages supplied by a user-defined Matlab library/project as separate HTML/Markdown/whatever files, that could be viewed in the Matlab doc browser, but have larger and richer content than is feasible to stick into embedded helptext comments, or doesn't make sense as the main help for a specific function or class.
You could even support user-defined custom helptext formats by allowing the "format" in %#<helpfmt:format> to be an arbitrary identifier (valid Matlab name), and provide a per-session hook to register user-defined handlers for custom formats. Like matlab.registerHelpfmtHandler('formatname', 'pkg.qualified.class.Name' where pkg.qualified.class.Name is the name of a user-defined Matlab class that conforms to an interface (or maybe inherits from a specific abstract class) that Matlab defines for helpfmt processor/handlers. Maybe it should be an actual object instance, but I don't think that would play well with clear classes.
I've been playing around with something like this in my MlxShake project, but it's hard to implement decently without some built-in support from Matlab itself.
As part of a discussion https://www.mathworks.com/matlabcentral/answers/1450984-what-should-go-in-a-next-generation-matlab-x#comment_1788796 I hypothesized that:
If, hypothetically, a new assignment operator were created that allowed the user to manage
A = object_of_class_B
inside class B, something along the lines of
function target = assign(obj, target) %obj being the object of the class
then that could perhaps have some advantages.
But what should the semantics be ? What would the use-cases be?
- such a thing could potentially make resource tracking easier
- there might be reason to warn about assigning between unlike data types. For example if A were uint8 but class B carried int8 then you might want a warning about negative values being truncated
- not sure what else...
If such an operator existed, you would need a way to distinguish the case where the target was a location that did not exist yet.
Hypothetically that could be handled by nargin < 2 or exist('target', 'var') being false.
But hypothetically perhaps there would be reasons to instead associate each name with a class such as UnassignedLocation, and then isa(target, 'UnassignedLocation')
An existing target of an assignment should definitely be made available inside such a function, so that its datatype can be examined, and resources poked around at.
There is commentary somewhere along the lines that if the target of an assignment is a class name or static method of a class, then the class cannot have influence on what the assignment means: that otherwise the statement
A = B
could change its meaning if a new class A were introduced. I think the implication of that is that there should not be an operator introduced that intercepted assignments onto a class. But possibly I have overlooked some reason why the kind of assignment operator I describe here should not be created.
The ability to assign a subset of fields to a struct (array) would be useful. It is common to want to be change a few settings, such as in a user initialization file, or to have a function that is concerned with getting only a subset of properties from the user. There thus might be a struct of updates to be applied to an existing struct. At the moment you have to loop through the fieldnames of the update struct, setting the fields of the existing struct one by one.
The ability to concatenate or assign between structs with the same fields in different orders would be useful. We have the experience of tables to look at: tables re-order as necessary to match the first order.
+1. struct and object "subset of fields" assignment or "merging" like this is such a common use case in the sort of code that I work with that any nontrivial code base typically ends up with a half dozen different custom helper functions for doing this, each with slightly different behavior.
Allow elementary mathematical operations on function-handles. So instead of writing the sum of two functions as:
f1 = @(x) x.^2;
f2 = @(y) cos(y);
f_sum = @(x,y) f1(x) + f2(y);
It would be allowed to do:
f1 = @(x) x.^2;
f2 = @(y) cos(y);
f_sum = f1 + f2;
With the same resulting f_sum. Sure some design-choices would have to be made, but I can see benefits with such capability.
One point about @Walter Roberson's last comment is that this is (at least approximately) done reasonably well for struct-variables.
One thing that would be nice (but would not take an incompatibility to do) would be to have a way to view function handles that expanded captured variables... probably with limits on the amount of expansion and the types expanded.
A = 1; B = 2; C = 5;
G = @(D) C + tf([A B], [D A B])
disphandle(G)
should probably display
5 + tf([1 2], [D 1 2])
but
H = tf([1 2], [3 1 2]);
J = @(D) D + H
disphandle(J)
it is not immediately obvious whether that should display
D + tf([1 2], [3 1 2])
or
s + 2
D + -------------
3 s^2 + s + 2
That one is not bad, but displaying all captured variables in full formatting might be a bit much at times.
Sometimes people end up defining a function handle iteratively, such as
format long g
f = @(x) zeros(size(x));
for K = 1 : 5
f = @(x) K + x .* f(x).^2;
end
f
f = function_handle with value:
@(x)K+x.*f(x).^2
string(f(-4:4))
ans = 1×9 string array
"-1806331" "-43" "-387" "5" "5" "21909" "12044237" "999844613" "29887494405"
This is, of course, not efficient at all in terms of function execution.. but it might be the easiest approach for cases where the Symbolic Toolbox is not available.
It would be nice if iterative function calls like this could be "flattened", or at least transformed internally to involve fewer anonymous function calls. Anonymous function calls are comparatively expensive.
syms X
f(X)
ans =
expand(ans)
ans =
There is a distinct challenge for this kind of expansion: captured variables like K live in different workspaces. I think it would still be doable, though, such as by using some automatic variable renaming system perhaps.
By way of comparison:
you can add symfun (symbolic functions) only if their parameters are named the same thing in the same order
syms f(x,y) g(x,y) h(y,x) k(x)
try; f + g, catch ME; disp('f+g fail'); end
ans(x, y) =
try; f + h, catch ME; disp('f+h fail'); end
f+h fail
try; f + k, catch ME; disp('f+k fail'); end
f+k fail
Here are some low-falutin' features I'd like to see. My perspective: Periodic Matlab user who mostly likes the tool but has frustrations all the same.
1) DIfferent data types having access to commonly used operators like ==, <=, etc.
2) A good general purpose data container. Cells with their smooth/curly braces and different operators are very confusing when writing code and even more so when revisiting it weeks or months later. e.g.
T = readtable('patients.dat'); % Tables are a great addition
[T.Age < 30].' % This works because I can compare numbers with <, ==, >, etc.
ans = 1×100 logical array
0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
T(T.Gender == 'Female',:); % but not chars or strings apparently
Operator '==' is not supported for operands of type 'cell'.
3) Better error messages. This isn't 1986 Unix or 1993 Microsoft anymore. What operators are available? doc cell doesn't list them. At minimum provide a pointer/link to someplace in TFM where such a list IS available.
4) Reduce the amount of web searching required to find answers. e.g. subsetting a table. Best I could find was to make T.Gender a categorical or use strcmp to make the comparison. Both nonobvious, the latter especially because strings are doublequotes whereas single quotes are used for chars (See 1).
T(strcmp(T.Gender,'Female'),:);
The above works but since there's an error before it, I can't run this fragment by itself in this tool.
5) Dictionaries can be useful. Renaming containers.Map would be fine by me. Just be clear in the docs what datatypes can be keys.
And so on.
Hittin' that download site now... :)
Dictionaries are now available. R2022b and later. An introduction to dictionaries (associative arrays) in MATLAB » The MATLAB Blog - MATLAB & Simulink (mathworks.com)
@Tucker Downs I'm really liking this JetBrains Mono font. Thanks for the recommendation! Wonder why I hadn't heard of it before?
JetBrains Mono has a very liberal license and seems like it would be fine to include in Matlab Online with no special arrangements. I've sent a feature request to MathWorks Tech Support to do so; TS case #05158354 if you want to pile on.
@Andrew Janke Turns out Verdana is available on Matlab Online - I hadn't scrolled far enough. I now need to set aside the time to look at cells in more detail.
@Ravi Narasimhan Oh, yeah: the font selection on Matlab Online is pretty limited. I've put in a Tech Support request to see if they'd consider expanding it a bit, maybe with some freely-usable programmer-oriented fonts like the Nerd Fonts family. Tech Support case #05158354.
Re: Fonts, I love Jetbrains Mono
@Andrew Janke Yes, I've made it my default font on Matlab Home Desktop. I can't change it on Matlab Online. Obtaining a license and installation permissions for my work instances is a lot of effort so I'm going to stick with Verdana which has some visual distinction for parens, brackets, and braces.
@Ravi Narasimhan, have you had a chance to give Input a good try? I've been rocking it in my own Matlabs since mentioning it here, and have been liking it pretty well, but I have "normal-ish" vision, at least when I'm wearing my glasses. Curious to hear how it works for you.
On Windows there are two ways to install fonts. Try installing with both.
Right click the font file and install
Open "Font Settings" in the new settings app and click and drag the file. You can quickly open font settings by pressing windows key and searching for "Font S..."
I don't know why, but I usually have problems getting fonts into matlab. I just always do both of these things and it works. I'm a color scientist, not a font installing expert haha. Good luck :)
After I went to chrome and then back to Firefox, the links opened as expected. Weird, like Chrome fixed Firefox.
Anyway I installed the fonts and they didn't show up in MATLAB preferences. I even restarted MATLAB and rebooted my computer. They're definitely there - Window Fonts can see them - but they don't show up as an option in MATLAB preferences. I might have to call tech support.
Huh. Dunno what to tell you there; the comment and it's links look pretty normal to me in both Firefox and Chrome when I look at them (on macOS; latest Firefox and Chrome).
Andrew, strangely enough when I click on those in your comment, it (Firefox) does not go to the URL shown in the popup status string, but just scrolls this page to a totally different place! It goes to the first comment under this answer. Bizarre.
I opened your comment in a different browser (Chrome) and that opens a new tab to the correct places. I'll try out the fonts you suggested.
@Image Analyst Yes, generally fonts need to be downloaded and installed at the OS level, either at the system scope (for all users, if you have Admin rights), or just for your own user. (Matlab uses fonts from the OS.) Then restart Matlab to get it to pick up the new fonts. For example, if you're on Windows, download the font package, unzip it, select all the OTF or TTF files in it, right-click, and choose "Install" or "Install for all users".
The Input and Operator fonts are available through the links in my earlier comment.
@Andrew Janke, how do I apply those fonts? They don't show up in my MATLAB preferences/fonts window. Do they have to be downloaded and installed into the operating system and then they will automatically show up in the list? If so, where do we get these fonts?
No, I had not heard of these. I just downloaded Input for my Matlab Home desktop instance and will try it. The curly brace is certainly more distinct. Thanks for the pointers.
Addendum: The main settings do not affect Live Editor fonts. That apparently requires
s.matlab.fonts.editor.normal.Name.PersonalValue='Input Mono';
@Ravi Narasimhan – Have you tried David Ross's "Input" or Hoefler's "Operator" fonts? They're designed specifically for programming, with extra-distinctive (and enlarged, in the case of Input) glyphs for "( )", "[ ]", and "{ }". I'm running my Matlab on Input right now, and rather like it. (I usually use Meslo LG M.) Input is free for personal use.
Agreed. I'm glad I learned about reduce because I'd heard so much about mapreduce from "Big Data this-n-that" and didn't know anything about it. MRocklin's development of it resonated with me.
The "flow into the pipeline" concept came in very handy when I began to work with FPGA programmers a couple of years later. I haven't learned to do that for myself (yet?) but it enabled me to understand broadly what my colleagues do and how they analyze and solve problems in that world.
> Of the three, I used map and filter the most on the few occasions I wrote Python code. I learned about reduce as part of the education but never had a need to use it in real life.
And that is kinda the fundamental distinction: map and filter are straightforward, but "reduce" is about the iterative combination of independent result sets into a smaller aggregate result sets in a way that the result set is useful, but shippable between nodes in an efficient manner, and combinable with other intermediate "reduce" outputs. I'm no expert, but Map/Reduce programming seems to largely be about structuring your intermediate results so that they can flow into a Reduce pipeline.
1) "Have you taken a look at the "Functional Programming Constructs" submission on File Exchange
Yes. I installed it a couple of weeks ago and started exploring but have a long way to go.
2) "...reduce = if you're writing your Matlab programs with "reduce" algorithms, you're probably going to have a sad."
Of the three, I used map and filter the most on the few occasions I wrote Python code. I learned about reduce as part of the education but never had a need to use it in real life. FWIW, I learned a lot from https://nbviewer.org/github/mrocklin/pydata-toolz/blob/master/1-map-filter-reduce-groupby.ipynb
It is instructive to learn about these approaches even if there isn't an immediate need for them.
3) "OTOH, Matlab is kind of getting in to this space now too, with tall arrays and Parallel Server and all that. So I dunno."
Yes, when I looked into datastore I saw a lot about this and ran a couple of the examples.
4) "...the main thing that Matlab lacks for implementing Functional Programming ..."
I guess the big question is are there enough users who would find it useful despite the performance hit to add it to the language without displacing something already there. No idea if there's a "business case."
> ... I had to figure out that an explicit type conversion is needed to loop over a cell array ...
Nope, you're correct here: Cells are containers for other values. So if you want to loop over cells usefully, you generally need to take the extra step of "popping out" the value in the cell that you get in the iteration.
some_cells = {'foo', 'bar', 'baz', 'qux'};
for c = some_cells
do_something_on_a_string(c); % NOPE! c is still a scalar cell here!
% Gotta do this:
the_actual_thing = c{1}; % Pop the contained array out of the cell
do_something_on_a_string(the_actual_thing);
end
> I think Python's list comprehensions are ok but when I learned about map, filter, and reduce (Python 2.7), I preferred writing my code that way.
Have you taken a look at the "Functional Programming Constructs" submission on File Exchange? https://www.mathworks.com/matlabcentral/fileexchange/39735-functional-programming-constructs
This is worth a whole discussion. "map", "filter", and "reduce" are implementable in Matlab over what we have now. But there are serious performance considerations - like, 100x performance impacts - to doing so, instead of using Matlab's native "vectorized" operations. Matlab is all about arranging things into numeric arrays, and then passing operations on them down to BLAS, which is the hardcore optimized numeric library underlying most of Matlab's built-in functions, and can make things Go Fast. The paradigm is somewhat different. The map/reduce paradigm is primarily oriented towards "embarrassingly parallel, decomposable" operations with biases towards shipping aggregate operations between nodes; it's kinda different from what Matlab is oriented towards. Most map/reduce algorithms have different, but equivalent, algorithms in the Matlab/array space. Matlab is eager and local; map/reduce is lazy and distributed. To a first-order approximation.
map = Matlab cellfun/arrayfun or just regular vectorized operations.
filter = apply a logical mask using logical indexing.
reduce = if you're writing your Matlab programs with "reduce" algorithms, you're probably going to have a sad. You probably want native vectorized aggregate operations like sum(), std(), etc. And if what you want isn't directly expressible in terms of native Matlab vectorized operations, it's usually better to build your own "basically-vectorized" operation on top of that, instead of try to implement it map/reduce-style: reduce is about successive, iterative combinations of structured input sets, which isn't Matlab (or any array language's) forte.
OTOH, Matlab is kind of getting in to this space now too, with tall arrays and Parallel Server and all that. So I dunno.
I was talking about this somewhere else, and I think that at a language level, the main thing that Matlab lacks for implementing Functional Programming stuff is a syntactic mechanism for "lazy" or "delayed" evaluation of arbitrary expressions. Don't remember where that was, though sorry.
@Andrew Janke You're pushing my knowledge of Matlab, Python, and such to or beyond its limits. I'm conversant in them but not fluent. I may therefore unintentionally misapply or misuse terms and concepts in what follows.
1) I think I agree with you that cellstrs are too powerful/non optimal for general-purpose users such as myself.
2) Looping over cell vectors. Perhaps I am being too literal or misunderstanding what you are saying but, while working with tables recently, I had to figure out that an explicit type conversion is needed to loop over a cell array
T=table();
a = [{'a'},{'b'},{'c'}];
for j = string(a)
T.(j) = j;
end
T
T = 1×3 table
a b c
___ ___ ___
"a" "b" "c"
works but
T=table();
a = [{'a'},{'b'},{'c'}];
for j = a
T.(j) = j;
end
Table variable names must be strings or character vectors.
3) I think Python's list comprehensions are ok but when I learned about map, filter, and reduce (Python 2.7), I preferred writing my code that way. When I'd revisit weeks or months later I'd recall the transformation I had intended vs. remembering what it means to have the condition on one side of the comprehension as opposed to the other.
4) I acknowlege cellfun exists after reading various sites but would like TMW to be more explicit in the docs about basic issues like comparing cells or elements of cells.
5) "But kinda the whole point of Matlab is that most values/variables are themselves arrays and you can call vectorized operations on them,..."
I will occasionally just write loops instead of vectorizing. For one- or few-off cases, I may save more time that way since there's less to go search.
Numpy: Not a fan so I won't defend it.
6) Everything is an array vs. everything is an object/class, singletons, MEX, etc.: Far beyond my knowledge. I'll revert to lurking.
Thanks to you both for the discussion. I've learned a lot of good things that I can put to use and share with coworkers.
Especially when you view it from the perspective of Matlab internals. Consider the C/C++ MEX and Matlab Engine interfaces, and their mxArray and related types. Is there any way to represent, at a low level, a Matlab data structure which is not an array? Not that I've seen.
Well, now we're getting in to semantics: those "singleton" types are still more or less degenerate 1-by-1 2-dimensional arrays whose type/class definitions reject construction of nonscalar arrays. You can do that with any classdef too if you want to.
I would say that the way to understand this conceptually is still to think that "everything in Matlab is an array, and there are some degenerate edge cases that will reject some array behaviors" as opposed to thinking that "scalars and arrays/collections/containers are different sorts of things". In other languages, scalars and collections are fundamentally different things, with well-defined, conventional interactions between them, and often differences in their in-memory representations and performance. Like, Python lists or the Java Collections API or the C++ STL containers stuff. In Matlab, that fundamental scalar/collection distinction doesn't really exist, and understanding that "unified" behavior is a fundamental insight into understanding how Matlab works, and how it differs from "ordinary" programming languages.
If you really want to, you can create a classdef class that responds 1 to ndims() and responds [1] to size(). Is that a distinct sort of "scalar" thing, or is that just a super-degenerate case of an array that will just screw with the expectations of every other piece of Matlab code and break things? I say the latter.
Everything in Matlab is an array.
There are some datatypes that are always singletons -- there is even a special attribute that can be given to mark such things.
An example: You cannot create normal arrays of function handles, just cell arrays of function handles. This is because the () operation on a function handle is invocation rather than indexing.
> == on a double-quoted thing (allowed) vs. the single-quoted thing in the actual cell (not allowed)
IMHO, this is 100% about how Matlab cells are a very abstract, flexible data type, and string arrays are a more concrete, single-purpose datatype, and using "cells of charvecs" to represent arrays of strings is a lousy hack. A double-quoted thing stitched together with "[...]" square brackets is a string array, a special-purpose type. A single-quoted thing stitched together with "{...}" curly brackets is a cellstr, which is a special-purpose application of a much more generic type.
Defining an "==" or "eq()" operation on cells generically is difficult, because each element of a cell array can contain anything – an array of any type or size. A cellstr is just one particular application of a cell array. Special-casing "=="/"eq()" over cells to behave "naturally" in the case of cellstrs has both philosophical and practical problems.
> ... Python lists can contain various datatypes and the standard operators such as <,>, == ...
So, there's a couple fundamental issues here. A Python list is basically like a Matlab cell array: any element of a Python list can contain any type of Python value; they're heterogeneous. When you loop over them or apply list comprehensions or whatever, the operators get applied to the contained values individually, and those contained Python values are generally scalars. You can get the same effect in Matlab by using a cell vector, and looping over it and applying <, >, == etc on its elements, or by calling cellfun or arrayfun to do the equivalent of Python list comprehension. If you want that behavior, you can get it in Matlab, just at the expense of a bit more syntax.
Python:
y = [foo(xi) for xi in x]
Matlab:
y = cellfun(@(xi) foo(xi), x)
But kinda the whole point of Matlab is that most values/variables are themselves arrays and you can call vectorized operations on them, which turn in to optimized low-level BLAS or C/C++/Fortran-driven operations in the Matlab internals. These operations don't correspond to vanilly Python type operations; they correspond to NumPy array operations. And I think you'll find that the operations exposed by NumPy are a bit more restricted in both their behavior and syntax.
The most basic difference between Matlab and "regular" programming languages is that in Matlab, every type and every variable is not just a value, but it is also a container for that type of value. Everything in Matlab is an array. Which is both good and bad; it implies a particular kind of programming if you want to Go Fast.
> We already talked about why not to switch over to string everywhere: string objects are far too slow.
I think I already responded how Matlab could handle that: make strings faster.
@Andrew Janke I'll consolidate:
1) "One take I have on this: I think that a major issue in this particular case @Ravi Narasimhan is talking about isn't so much that Matlab's help facilities are lacking, but more that cellstrs are a lousy way to represent string arrays, and always have been."...
Yes, in fact, I wrote up posts to this thread five or six times leading off with that and deleted them as too confrontational/flamebait for a relative newcomer to the chat. I've always found cellstrs hard to understand and remember.
Again, I like Matlab's documentation for many reasons. I've learned a lot about signal and image processing and communications theory by reading the Toolbox docs and examples on my work machines. My complaints are more along, "Holy cow, that's useful! Why isn't it in the function description?" when I find something on Stack*, Answers, and whatnot that solves a problem.
e.g. @Walter Roberson's solution to subsetting a table by using == on a double-quoted thing (allowed) vs. the single-quoted thing in the actual cell (not allowed).
2) "What would a good container look like?"
I'm no Python fan or advocate but Python lists can contain various datatypes and the standard operators such as <,>, ==, ... seem to work across those types. They've also burned me more than once because they are 'mutable' so I'm not advocating in any way for an exact copy.
That's about all I have to offer since, again, I don't use Matlab day-in, day-out nor am I a developer that delves deep into the entrails of a language to extract the most performance from it. The Matlab R2021a shortcut on my work machine still says "The Language of Technical Computing" and that's how I use the product.
Home: Exploring open source data, trying different algorithms, and general self education as time permits.
Work: I spend most of my time working with hardware that generates data that have to be worked up to figure out what's working or not with the hardware. I need to import files, figure out what's possible quickly, and then move on - Matlab is great for most of that. None of this gets close to optimized "production" code that has to meet any sort of requirements for speed or quality.
When I read that numeric and array comparisons are done with <, >, ==, and such it seems absolutely reasonable to me that they're generally applicable as in other languages I've seen. When it fails with cells, I am surprised that the docs don't say "To compare cells do the following..."
This is the kind of thing that slows me down and makes me want a better way.
We already talked about why not to switch over to string everywhere: string objects are far too slow.
> A good general purpose data container.
@Ravi Narasimhan Can you go in to more detail on what you mean by a "general purpose data container", and how you would like it to behave, in terms of the operations it performs, and the type of syntax you'd like to use to invoke those operations?
One take I have on this: I think that a major issue in this particular case @Ravi Narasimhan is talking about isn't so much that Matlab's help facilities are lacking, but more that cellstrs are a lousy way to represent string arrays, and always have been. See http://blog.apjanke.net/2019/04/20/matlab-string-representation-is-a-mess.html.
IMHO, a basic issue here is that a cell array is a very general-purpose, but highly "abstract" or generic data type. It was not designed specifically for storing arrays of strings; it just got pressed in to that duty because old versions of Matlab didn't have a better option. If you're asking the Matlab documentation system, "what kinds of operations can I perform on cells?", then from a "typology" standpoint, there's not much reason to present you with a bunch of string-manipulation stuff. Now there are actual string type arrays, which are a better way of representing arrays of strings IMHO, and are more tightly associated with string manipulation methods. Try methods string and see what that gets you.
(Though the doc string array still doesn't seem to have a listing of methods, at least when I tried it in R2021a.)
Maybe the better approach here would be to have a MATLAB X lean hard in to deprecating and even breaking cellstrs, and represent arrays of strings using string arrays everywhere.
Ok. Thanks for the references. I will look them up.
FWIW, I just sent a colleague, a very experienced Matlab user, the methods/methodsview info and he was pleasantly surprised to know about them.
I agree that there are workarounds in Matlab Present but perhaps this can inform Matlab Future so things are more clear/intuitive.
> ...at a low-level programming-internals level...
I take this back: I don't mean Matlab's own low-level programming internals; I mean the low-level externals that Matlab presents to the M-code layer and which user-defined M-code can use as their internals.
@Ravi Narasimhan – The general category or term for this sort of programmatic discovery of how programming objects or facilities work is "introspection" or "dynamic help discovery".
Have a look here for the facilities available in Matlab for class-oriented stuff: https://www.mathworks.com/help/matlab/get-information-about-classes-and-objects.html
For non-class-oriented stuff, it's mostly about doc, help, and doing documentation searches; I don't know of anything better at this point.
> I suggest TMW take a page out of the Python playbook (one of the few times I'll say that) and state that "Everything/Most things is/are objects"
As of a few releases ago, I think this is actually pretty much true, at a low-level programming-internals level? All the primitive built-in types have been unified with the classdef type hierarchy; you can now do wacky stuff like inherit from double: https://www.mathworks.com/help/matlab/matlab_oop/extend-built-in-class-operators.html and stuff like mc = meta.class.fromName('double') works. (Check out the 267 methods defined on that bad boy!) But the presentation of the type/class hierarchy isn't unified in terms of all the documentation and behaviors of isa and the like, last time I checked. Maybe that's a back-compatibility thing that a breaking-change-MATLAB-X could smooth out. ;)
"Nothing is obvious to the uninformed" --- Anonymous
I suggest TMW take a page out of the Python playbook (one of the few times I'll say that) and state that "Everything/Most things is/are objects"
@Andrew Janke Can't nest a reply so I'll do it here:
"Maybe the doco for cell and table could just use some expansion, unification, and better navigation? That shouldn't take a "MATLAB X" style breaking change, IMHO."
I think that this would go beyond just those functions such as a footer on every page where it is true and in error messages. Some error messages do point to documentation; red text with a clickable link. All in all I like Matlab's documentation which is one reason I use it when other languages offer similar features.
But, it'd be nice to not have to keep these subtle details in mind so that's why I put it in this thread regarding a new Matlab.
When you start using MATLAB, it is common to concentrate on the numeric datatypes, and then character and logical are really just numeric underneath and you kind of give them a mental minor special case... "It's all numeric (except for a few things bolted on.)
But eventually comes the realization that in MATLAB, everything is a member of some class or other, so you can start asking OOP questions about things that you are accustomed to thinking of as pure numeric. Like
methods double
Methods for class double:
abs bitcmp double gegenbauerC issortedrows norm sinpi
accumarray bitget ei get_performance_time issparse not size
acos bitor eig gt istril numel sort
acosd bitset ellipticCE harmonic istriu nzmax sortrowsc
acosh bitshift ellipticCK hermiteH isvector or sparse
acot bitxor ellipticCPi hess jacobiP ordeig sqrt
acotd bsxfun ellipticE hurwitzZeta jordan ordqz ssinint
acoth ceil ellipticF hypot kummerU ordschur sum
acsc charpoly ellipticK ichol laguerreL permute superiorfloat
acscd chebyshevT ellipticNome ifft ldivide plus svd
acsch chebyshevU ellipticPi ifftn ldl pochhammer symrcm
airy chol end igamma le poly2sym tan
all cholupdate eps ilu legendreP polylog tand
amd circshift eq imag length pow2 tanh
and colon erf int16 linsolve power times
any complex erfc int32 log prod transpose
append conj erfcinv int64 log10 psi triangularPulse
asec conv2 erfcx int8 log1p qr tril
asecd cos erfi inv log2 qrupdate triu
asech cosd erfinv isbanded logical qz uint16
asin cosh euler iscolumn logint rcond uint32
asind coshint exp isdiag lt rdivide uint64
asinh cosint expm1 isempty ltitr real uint8
atan cospi fft isequal lu rectangularPulse uminus
atan2 ctranspose fftn isequaln max rem underlyingType
atan2d cummax filter isequalwithequalnans maxk repelem uplus
atand cummin find isfinite min repmat vecnorm
atanh cumprod fix isfloat mink reshape whittakerM
balance cumsum flip isinf minpoly round whittakerW
bernoulli dawson floor isinteger minus set_webscope_testinghook_value wrightOmega
besselh det fresnelc islogical mldivide sign xor
besseli diag fresnels ismatrix mod signIm zeta
besselj diff full isnan mrdivide sin
besselk dilog gamma isnumeric mtimes sind
bessely dirac gammainc isreal ndims single
betainc display gammaincinv isrow ne sinh
betaincinv divisors gammaln isscalar nnz sinhint
bitand dmperm ge issorted nonzeros sinint
methods char
Methods for class char:
abs flip isinteger mclsetcomponentdata real transpose
and floor ismatrix mink registerSTAWebScopeMessageHandler tril
anonymousFunction ge isnan minus register_birds_eye_scope triu
bsxfun gt isnumeric mod rem uint16
ceil imag isrow mtimes repelem uint32
circshift int16 isscalar ne repmat uint64
colon int32 issorted nnz reshape uint8
conj int64 issortedrows nonzeros setmcrappkeys uminus
ctranspose int8 isvector not sign uplus
diag iscolumn java_array nzmax single xor
diff isequal ldivide or sort
display isequaln le permute sortrowsc
double isequalwithequalnans linsolve plus sparse
eq isfinite logical pm_fullpath sparsfun
find isfloat lt power superiorfloat
fix isinf maxk rdivide times
Thanks. I do no OOP so it never occurred to me to even look there. Even moreso than colors for the UX, I'd like to see TMW put this bit about methods at the end of pertinent doc pages. e.g. "methods cell returns a list of available methods/operations for the cell datatype"
Interestingly, strcmp doesn't show as a way to compare cells but it seems to work.
which can be very valuable for java objects
methods can be invoked with a class name, or invoked on an expression that is a member of the appropriate class.
This leads to a conflict about what methods() should do for a variable that happens to be string() or character vector. Normal MATLAB function argument resolution pulls in the value of the variable before passing it to methods() so methods() cannot tell that the value came from a variable, so it treats it as if the user had requested a class name
A = "uint8"
A = "uint8"
methods string
Methods for class string:
append compose double erase extractAfter ge insertBefore join lt or replace sort startsWith upper
cellstr contains endsWith eraseBetween extractBefore gt ismissing le matches pad replaceBetween split strip
char count eq extract extractBetween insertAfter issorted lower ne plus reverse splitlines strlength
methods uint8
Methods for class uint8:
abs ceil display gt isfinite issparse mldivide permute size uminus
accumarray circshift double ifft isfloat isvector mod plus sort underlyingType
all colon end ifftn isinf ldivide mrdivide power sortrowsc uplus
and complex eq imag isinteger le mtimes prod sparse xor
any conj fft int16 islogical length ndims rdivide sum
bitand conv2 fftn int32 ismatrix linsolve ne real times
bitcmp ctranspose filter int64 isnan logical nnz rem transpose
bitget cummax find int8 isnumeric lt nonzeros repelem tril
bitor cummin fix iscolumn isreal max norm repmat triu
bitset cumprod flip isempty isrow maxk not reshape uint16
bitshift cumsum floor isequal isscalar min numel round uint32
bitxor diag full isequaln issorted mink nzmax sign uint64
bsxfun diff ge isequalwithequalnans issortedrows minus or single uint8
methods(A) % --> is that a request for methods of strings, or a request for methods of uint8 ?
Methods for class uint8:
abs ceil display gt isfinite issparse mldivide permute size uminus
accumarray circshift double ifft isfloat isvector mod plus sort underlyingType
all colon end ifftn isinf ldivide mrdivide power sortrowsc uplus
and complex eq imag isinteger le mtimes prod sparse xor
any conj fft int16 islogical length ndims rdivide sum
bitand conv2 fftn int32 ismatrix linsolve ne real times
bitcmp ctranspose filter int64 isnan logical nnz rem transpose
bitget cummax find int8 isnumeric lt nonzeros repelem tril
bitor cummin fix iscolumn isreal max norm repmat triu
bitset cumprod flip isempty isrow maxk not reshape uint16
bitshift cumsum floor isequal isscalar min numel round uint32
bitxor diag full isequaln issorted mink nzmax sign uint64
bsxfun diff ge isequalwithequalnans issortedrows minus or single uint8
methods cell
Methods for class cell:
cellismemberlegacy intersect ismember issorted maxk reshape sort union
ctranspose iscolumn isrow issortedrows mink setdiff strcat unique
display ismatrix isscalar isvector permute setxor transpose
"methods(cell(0))" - Where has this been all my Matlab life? Could I have found it if I had RTFM more assiduously? Do you have a compilation of these nuggets on File Exchange or elsewhere?
Color coding: Appreciate your willingness to suggest this to TMW. I think that might work for me but would be hard to make work for a broad spectrum of users. I have been told by software developers and manufacturing planners alike that not everyone responds to colors the same way.
Remembering: This is indeed the larger issue that might be addressed by reschnootering the language. I don't use Matlab enough to create a library of functions that I can use across machines and across time. My current method is to (over)document any spot in the code where I have to remember something or use a trick. e.g.
% I wanted to do xyz but that gives an error due to abc so I'm kludging a workaround by <whatever>
With the advent of LiveScripts, I usually create a companion LS as a glorified README with examples of the issue needing a memory check, workaround, web searches, and/or tech support.
Try methodsview cell, too. More useful for "actual" or classdef classes, like methodsview containers.Map, but a useful habit to have. Maybe MathWorks could spruce that up and add columns with an H1 line excerpt, and make the names clickable to take you to the helptext for that function/method for types like whatever cell is that's causing methodsview to give it just a single-column view.
Though there are a couple issues: Some of the useful things that you can call on a cell array are top-level (or whatever you call that) functions, and not methods on the cell class, so they don't show up in the methods listing. E.g. sortrows, strjoin, cellfun.
And the things that are methods don't show up readily in the doco. When I do doc containers.Map, doc database.connection, or doc <some_user_defined_class>, one way or another that doc has a listing of the class's methods – in the "Object Functions" section in the left-hand navigation bar for Matlab-supplied documentation, or in the "Method Summary" section of doc output that's auto-generated from user-supplied helptext in classdefs. Maybe doc cell should have an "Object Functions" section too?
In R2021a, there's a "Functions" section in doc cell's top "All/Examples/Functions" navigation thing, but AFAICT it only includes a subset of cell's methods and relevant functions. Hmm. Looks like maybe that's "functions related to cells that are not actually methods on cell".
Maybe the doco for cell and table could just use some expansion, unification, and better navigation? That shouldn't take a "MATLAB X" style breaking change, IMHO.
What operators are available
methods(cell(0))
Methods for class cell:
cellismemberlegacy intersect ismember issorted maxk reshape sort union
ctranspose iscolumn isrow issortedrows mink setdiff strcat unique
display ismatrix isscalar isvector permute setxor transpose
Of course this just transforms the problem from one of remembering the {} syntax into remembering to use ExpandCell() or whatever you named it.
Hypothetically, the challenge of distinguishing the characters could be reduced if they were colorized differently. Which wouldn't be a bad idea. I will suggest it to Mathworks.
However, that would not affect the challenge that you have to remind yourself about the syntax each time you come back to MATLAB.
To deal with the problem of people not remembering the syntax, I would suggest that the only recourse would be to switch to using all the same character, probably () , and to convert the the {} and [] into named functions.
[] already has two named functions: horzcat() and vertcat() .
[1 2 3; 4 5 6]
ans = 2×3
1 2 3
4 5 6
vertcat(horzcat(1,2,3), horzcat(4,5,6))
ans = 2×3
1 2 3
4 5 6
You can already write your code like that, and then at least for your own code you would not have to worry about distinguishing between () and [] .
{} for cell content retrieval does have functions, but they are ugly to use. But you could define your own functions to do cell content indexing and use those instead of using {} in your code. It might not be a language-level fix, but it is something you can do for your own code. It is not mandatory that you code
disp(A{:})
you could write a function ExpandCell and then
disp(ExpandCell(A))
You do however lose the ability to use end as nicely if you wrap inside your own function.
I don't know enough about the beaks and gizzards of Matlab to say. The question was about a major update where things would get broken. For my uses it isn't a big deal. What are big deals to me are
1) How the (,{,and, [ blur to my presbyopic eyes unless I magnify the screen (browser) or adjust the font/size (Matlab)
2) I have to remind myself about syntax every time I come back to the package
3) How the error message doesn't say anything about where to look to understand the mistake and do something different
A good general purpose data container. Cells with their smooth/curly braces and different operators are very confusing
How would you distinguish between the container entry and its contents in a way that was not confusing and also was not unnecessarily verbose?
You need to be able to pass around containers without accessing contents because accessing contents triggers cell expansion, which is far too valuable to give up.
Z = {3, 5}
Z = 1×2 cell array
{[3]} {[5]}
disp(Z(:))
{[3]}
{[5]}
disp(Z{:})
Error using disp
Too many input arguments.
Too many input arguments.
Thanks. That'd be a swell example to put in the docs for table and cell.
T = readtable('patients.dat');
T(T.Gender == "Female",:)
ans = 53×10 table
LastName Gender Age Location Height Weight Smoker Systolic Diastolic SelfAssessedHealthStatus
_____________ __________ ___ _____________________________ ______ ______ ______ ________ _________ ________________________
{'Williams' } {'Female'} 38 {'St. Mary's Medical Center'} 64 131 0 125 83 {'Good' }
{'Jones' } {'Female'} 40 {'VA Hospital' } 67 133 0 117 75 {'Fair' }
{'Brown' } {'Female'} 49 {'County General Hospital' } 64 119 0 122 80 {'Good' }
{'Davis' } {'Female'} 46 {'St. Mary's Medical Center'} 68 142 0 121 70 {'Good' }
{'Miller' } {'Female'} 33 {'VA Hospital' } 64 142 1 130 88 {'Good' }
{'Taylor' } {'Female'} 31 {'County General Hospital' } 66 132 0 118 86 {'Excellent'}
{'Anderson' } {'Female'} 45 {'County General Hospital' } 68 128 0 114 77 {'Excellent'}
{'Thomas' } {'Female'} 42 {'St. Mary's Medical Center'} 66 137 0 115 68 {'Poor' }
{'Harris' } {'Female'} 36 {'St. Mary's Medical Center'} 65 129 0 114 79 {'Good' }
{'Garcia' } {'Female'} 27 {'VA Hospital' } 69 131 1 123 79 {'Fair' }
{'Clark' } {'Female'} 48 {'VA Hospital' } 65 133 0 121 75 {'Excellent'}
{'Rodriguez'} {'Female'} 39 {'VA Hospital' } 64 117 0 123 79 {'Fair' }
{'Lewis' } {'Female'} 41 {'VA Hospital' } 62 137 0 114 88 {'Fair' }
{'Lee' } {'Female'} 44 {'County General Hospital' } 66 146 1 128 90 {'Fair' }
{'Walker' } {'Female'} 28 {'County General Hospital' } 65 123 1 129 96 {'Good' }
{'Allen' } {'Female'} 39 {'VA Hospital' } 63 143 0 113 80 {'Excellent'}
Very basic constistency points that currently defy my comprehension:
- Default everything to the 1st dimension (i.e. columns are default, not rows). Such as 1:3 should give [1 2 3].' and not [1 2 3].
- Then follow the dimensions in order (everything scales accordingly). In such way you can drop all those nd doppelgängers...
- Suppress the minumum of 2 dimensions and drop this 2D matrix "shortcut" (for instance repmat(1, 2) should give [1 1].' and not [1 1 ; 1 1]).
- Make the order of axes X, Y, Z not Y, X, Z as it is now for *some* functions, but not for others. In my humble opinion MATLAB should follow maths, not CRTs...
- do ... while ???
- Correspondence between MATLAB's online help and the "help" for each function. Formatted help for custom functions.
Less urgent but kind of:
- utf-8 as standard.
- Easier and possibly native handling of large constant datasets in parallel working (parallel.pool.constant().... really?).
- Ability to assign different GPUs to different workers.
- Type check for variables? So much time could be spared when the compiler can check and warn about what you are feeding a function... but there are pros and cons.
All the best!
/Max
As you can see in this thread, the default encoding for m-files was switched to UTF-8 in R2020a. So starting from that release you can use rich text in chars and strings without fear of losing the contents (except if you need/want compatibility with older releases).
Thanks for pointing these out Walter.
As for the encoding, I was referring mostly to the actual script and function files. Interesting to know about the parameter validation!
Ah, but I am far from sure that I would want MATLAB to act that way.
Whatever tools along those lines are provided should be the same as the tools for marking for code generation. Which might perhaps end up looking a bit different than at present.
@Walter Roberson This idea of doing something like applying constraints/conversion/validation to local/automatic variables like how you can with function arguments and object properties is really intriguing to me. Would you like to post it as a separate Answer so it can get more attention and some upvotes?
MATLAB now has parameter validation declarations
This is, however, only for input parameters, and does not give a way to specify that for any given name, that any expressions assigned to the name must be converted to a particular type (or must error if the type is different.)
At the moment, any assignment to an entire variable discards all of the old attributes associated with the name, and there is nothing available at the user level that can stop that. The only assignments that can be intercepted are to partial variables being indexed with dot or () or {} indexing notation. At the moment there is no way to say
A = zeros(5, 7, 'double', 'locktype') %not valid
or
A = zeros(5,7,'double');
locktype A %not valid
locksize A %not valid
with the idea that you want to force A to always be 5 x 7 and force it to be double, with the intention that
A(6,3) = 2; %intended to error because it is extending size
A = 'hello' %intended to error because it is wrong datatype
There is no mechanism for this at all in MATLAB.
However... it is not out of the question. Remember that when you assign to a name that has been declared global, the result retains the global attribute, and likewise when you assign to a named declared persistent, the result retains the pesistent attribute. So there must already be some mechanism in MATLAB that is different than pure "incoming value completely replaces everything about the name".
For the last few releases:
When you fopen() a file for reading without specifying an encoding, then the first time you ask to read characters (rather than asking to read binary), then the contents of the file will be examined to try to figure out what encoding it is in.
When you fopen() a file for writing without specifying an encoding, UTF-8 is used.
An option to Show Inport name to left side instead of below.
and Outport name to right side instead of below.
To avoid this weird looking in case we have more than 20,00o input .
Are you using and older version of matlab? It worked for me in 2020b and as you can see it is shown on the doc page I linked, though not on any other doc pages strangely enough.
If just want to move the name to the left side of the inport to declutter the diagram
set_param(blk,'NameLocation','Left')
where blk is the handle to the inport. Same thing with outports.
The NameLocation property for some reason does not appear in the list of Common or Block Specific properties, but an example of its use is shown here.
Apparently subsystem ports can be moved; https://www.mathworks.com/matlabcentral/answers/31981-change-the-position-of-a-part-of-the-input-ports-of-a-simulink-block
If I could design Matlab from scratch I'd
- get rid of semicolons to suppress output, and
- make element-wise operations the default, rather than having to specify .*, ./, and .^ for the operation that most people want to do most of the time.
but then it wouldn't be MATlab anymore...would it?
As per Peano, addition of 1 and subtraction of 1 are the only "natural" operations, and all other operations are "invented". In particular all forms of multiplication are "invented".
I think it's more appropriate to use the standard operator symbols for true mathematical operations, and use invented operator symbols for invented operations, like element-wise multiplication.
I would use #* for matrix multiplication, #/ for matrix division, #^ for matrix power.
Maybe exp# for matrix exponential -- but that might require that expm still be the "formal name" for overloading purposes.
Symmetric variables and Hermitian variables. MATLAB could implement bit flags in the mxArray header to indicate this and they could propagate through operations and function calls when appropriate. This could make symmetric tests easier/faster and background functions could take advantage of this. Also provide mex functions access to these flags.
I don't think I know what Reference Copy implies in this context?
And is it related to the way that now sufficiently small hard-coded vectors get shared... provided that the defining text is identical ?
Yes. Reference copies are when two variables share the actual mxArray header ... i.e., they have the same exact memory address at the header level. There is a counter inside the mxArray header that keeps track of how many reference copies there are in MATLAB of this variable. This is the method used for cell element and struct field element sharing. E.g.,
C = {1:3};
C(2:5) = C(1); % The C(2:5) elements will be reference copies of C(1)
And this behavior also is used for variable assignments in later versions of MATLAB. E.g.,
X = 1:3;
Y = X; % shared data copy in earlier versions of MATLAB, reference copy in later versions of MATLAB
Shared data copies are where two or more variables have different mxArray headers but share data pointers. E.g.,
X = 1:3;
Y = X'; % transpose causes dimensions to be different, resulting in different mxArray header
In this last case, the mxArray header cannot be shared because the dimensions which are part of the mxArray header are different. But the data is the same and in the same memory order, so the data pointers can be shared.
This makes a lot of sense to me. I would like to be able to selectively implement classdef classes' functionality using MEX files without having to go through the hassle and expense of popping out all the objects' data into alternate structures and sucking it back in. The variable/data-sharing stuff makes sense too.
Thank you for the analysis, James.
mxIsReferenceCopy
I don't think I know what Reference Copy implies in this context?
And is it related to the way that now sufficiently small hard-coded vectors get shared... provided that the defining text is identical ?
I may expand this comment as I think of more things, but at the moment ...
1) The obvious one off the top of my head is dealing with classdef OOP objects in mex routines. The old style @directory class objects were just structs with a thin class wrapper, so dealing with them in a mex routine was super easy because you could get pointer access to the data and attach reference copies to objects using the struct mex routines. Both fast and memory efficient. But with the new classdef OOP objects, the data is hidden and there is no official way to get pointer access to the data, or to attach reference copies of variables to the object properties without deep data copies. This greatly hampers efforts to efficiently work with them in a mex routine, both for user created classes and also for MATLAB defined classes such as string and half. My fear is that this will simply get worse over time. So I would greatly welcome official functions such as:
mxGetPropertyPtr
mxSetPropertyPtr
2) Make these functions official:
mxCreateSharedDataCopy
mxCreateReference
My guess is TMW probably fears the amount of technical support they will be asked to give when users abuse these functions and crash MATLAB. Probably true. But they are immensely useful to those of us that know how to use them properly. E.g., there is a lot of variable sharing that goes on in a struct variable at the MATLAB level through normal manipulation, but it is impossible to mimic this sharing in a mex routine using only official functions.
3) It would be nice to be able to detect variable sharing at the mex level, but that seems to be getting harder and harder. There are no official functions for this, so one must resort to hacks. And even the linked list for shared data copies that used to be visible with hacks is now hidden. So it would be nice to have functions such as:
mxIsSharedDataCopy
mxIsReferenceCopy
mxIsParentCopy
mxGetSharedDataCopyLinkedList
This would be particularly useful for the new C++ interface. Right now when you manipulate large variables in the C++ interface you have no way of telling apriori if the manipulation is going to suddenly create a deep copy and blow up your memory since all that memory management happens automatically for you behind the scenes. Convenient, yes, but not good for memory management at the user level. If you had variable sharing insights before the operation you could potentially avoid the memory blow up and take different action in your code.
In your opinion, what would you say the user documented mex functions that are most missing? Flags not accessible in documented ways, attributes not accessible in documented ways, that sort of thing ?
I'd like to define ranges using square brackets for inclusive and rounded brackets for exclusive. So insead of
if x>=20 & x<30
disp 'x is in the twenties'
end
I'd introduce another comparison operator, say #, to look like this:
if x # [20 30)
disp 'x is in the twenties'
end
With this new syntax perhaps we could eliminate the all-too-common usage of elseif forever. Because in my opinion, elseif tends to produce error-prone and unreadable code like this:
if x<0
disp 'x is negative'
elseif x==pi
disp 'x is pi'
elseif x>=20 & x<30
disp 'x is in the twenties'
elseif x>=30 & x<40
disp 'x is in the thirties'
else
disp 'x might be a hundred'
end
The code above is the cleanest, simplest version I can come up with to illustrate the difficulty of following the logic of a series of elseif statements, but in practice it tends to be much more difficult to parse, because it's usually cluttered with longer variable names or more complicated logic.
With the bracket syntax I'm suggesting, switch could be adapted to accept ranges like this:
switch x
case <0
disp 'x is negative'
case pi
disp 'x is pi'
case [20 30)
disp 'x is in the twenties'
case [30 40)
disp 'x is in the thirties'
otherwise
disp 'x might be a hundred'
end
Isn't that so much nicer?
Taking this one step further, multiple switch inputs:
switch x,y
case >0,<0
'The point x,y is in the lower right quadrant.'
case >0,>0
'The point x,y is in the upper right quadrant.'
case <0,>0
'The point x,y is in the upper left quadrant.'
case <0,<0
'The point x,y is in the lower left quadrant.'
case 0,0
'The point x,y is at the origin.'
otherwise
'The point x,y cannot be found on a cartesian plot.'
end
Remove the length function.
Its behavior of "size along the longest dimension, picked at run time" is a little weird, most junior programmers don't expect it, and it leads to subtle bugs that can silently produce incorrect results instead of erroring out. In my 15 years of Matlab programming experience, I've seen so many people call length, and I've never seen one who actually wanted what length does instead of numel or size.
Let everyone just use numel or size instead; those work "safely".
I have used length() many times. When a vector is expected (possibly having been constructed as a vector), and I want to know how many elements are there but I do not care whether the vector is row vector or column vector, then length() is fine. In such a situation, yes, numel() could substitute.
Now, in the context of a non-vector, I can't say that I recall ever having wanted to know the size of the "longer" dimension.
+1, numel ftw!
I would also like:
- auto-complete options on inputs to custom functions
- specified type of arguments such that if an argument is supposed to be a filename or path, then it would allow you to autocomplete a path the way imread() and dir() do, but for custom functions
- keyword arguments (is that already a thing?) like in python, instead of all arguments being "equal" and having to parse out
- functional programming features such as in-line loops, if statements, direct indexing into function outputs (without an intermediate variable explicitly created).
@TADA this specific suggestion would probably be more at home as a separate answer here. I don't seen how it would require a break in compatibility if implemented. Now the two editors have been merged, the functionSignatures have much more benefit, so maybe there will come better ways to generate it.
This seems like a useful thing. But it also seems largely redundant with the new-ish arguments block that allows you to specify constraints on the type and size of inputs in the code itself, and have those enforced. Maybe some of this should go in comments/helptext on the arguments block elements, the way you can comment properties in an object, instead of in separate helptext in the function comments? Like:
function [a,b] = foo
% does something awesome
%
% Extended details
arguments
% input number
x numeric
% input name
y {isStringy} % or maybe Union[char,string]
% a description of output arg a would go here
a numeric
b logical
end
end
(This assumes that output arguments could be included in the arguments block somehow, in addition to input arguments.)
I also think if we want to do this, we should think in a bit larger terms about how to support more structured forms of documentation in code comments, sort of like how Javadoc does things. Matlab's current form of helptext-in-comments support is a pretty loose and minimally-formatted thing. Maybe there should be a mechanism for including and indicating richer, more structured forms of documentation, like function metadata as you suggest here, or HTML/TeX/Markdown/AsciiDoc-formatted doco, or some Matlab-specific form of structured doco like an M-code equivalent of Javadoc, or even Knuth style Literate Programming stuff.
I do like the idea of something explicit to differentiate comments-that-are-embedded-documentation from comments-that-are-just-comments.
Personally, I think it's really nice to have this sort of doco inline in comments in the code itself, right next to the actual code, instead of in a supplemental sidecar file like an external Markdown or AsciiDoc file. Easier to keep things in sync, and it's useful for code readers to have right there when they're reading the code itself.
It would be great if the JSON signature feature be parsed from the comments in the .m file
something like this:
function [a, b] = foo(x, y)
% {
% "foo":
% {
% "inputs": [
% {"name":"x", "kind":"required", "type":["numeric"], "purpose":["input number"]},
% {"name":"y", "kind":"required", "type":[["char"],["string"]], "purpose":["input name"]}
% ],
% "outputs": [
% {"name":"a", "type":["numeric"]},
% {"name":"b", "type":["logical"]}
% ]
% }
% }
end
Can also make a special comment symbol to make parsing easier, kind of like they do in VS
%% is already caught for script blocks, but any other unlikely combination can work just as well (%#, %$, %@, %~, etc.)
since this is done inside the function/class block, the obvious and redundant function name can be ommited
the signature documentation can also be extended to add more stuff, like summary:
function [a, b] = foo(x, y)
% foo is an awesome function that does something awesome. no one will see
% this comment unless they look at the file, since I used the regular
% comment symbol %.
% here you can put the documentation you are used to writing
%
% signature documentation will follow next:
%# {
%# "summary": "does something awesome",
%# "inputs": [
%# {"name":"x", "kind":"required", "type":["numeric"], "purpose":["input number"]},
%# {"name":"y", "kind":"required", "type":[["char"],["string"]], "purpose":["input name"]}
%# ],
%# "outputs": [
%# {"name":"a", "type":["numeric"]},
%# {"name":"b", "type":["logical"]}
%# ]
%# }
end
I believe r2021b brings json to the normal editor (since the editors merged)
Note that there are (or at least used to be) a few limitations when using the function signatures JSON in the normal editor (instead of the live editor).
If you don't know the answer to that question, then the answer is "no". :) Lisp macros and sexps are advanced functional programming tools for processing and transforming parsed source code in a functional manner. Basically writing programs that write other programs at run time. They are advanced, very powerful, subtle, and very hard to use.
I am against adding Lisp-style macros to Matlab. Matlab is a practical language for engineers, researchers, and non-superstar programmers to get work done in. It should be easy to understand. If macros are made part of the Matlab language, people will be tempted to use them, and only seriously advanced programmers can really understand macro-based programming, so if you add macros, now you've got a bunch of Matlab code that most Matlab programmers, probably including the original authors of that code, can't understand or work with correctly.
@Andrew Janke honestly, wouldn't even know how to answer that question. I'm not a funcitonal programmer, just someone who leverages the syntax to get around limitations or shorten the code, lol.
Ah! Yes, that's a helpful link.
I dunno if you need in-language or syntactic support for all these: part of the point of FP is that developers can define their own functional constructs to do whatever they want, and pass them around as first-class functions and compose them. The language just provides the basic support for building those. (E.g. the "Lisp is a tool for building your own programming language" attitude.)
IMHO, there's one big thing at a language level you need for FP stuff like this to work nicely, though, and that Matlab currently lacks: lazy evaluation. Otherwise you end up having to wrap all your stuff in anonymous function handles, like the if() function in the Functional Programming Constructs requires. Maybe that's okay, though: the syntax for anonymous function construction is pretty small and convenient. But when you do this, then the stuff you pass your lazy-evaluated function handles has to be specifically built to accept function handles and invoke them, instead of just operating on regular values.
On the other hand, if you want it to work nicely, you need your main FP operations to have short, simple names. Which basically need to go in the global namespace, because Matlab's import sucks because it's function-scoped. So maybe they should be built in to the core language itself, just to standardize the names. On the other hand, if Matlab improved import (e.g. by making it file-scoped), then maybe that wouldn't matter.
Hmmmm.
I should try playing around with that Functional Programming Constructs thing.
And then the other thing: Do you want Lisp-style macros and sexps? Because that's a whole nother thing.
I'm thinking of functions similar to what is found here:
This little library solves a lot of my requests but there are some issues with it and of course it's not "native". But basically, the kind of functionality in there is what I'd like to see in native Matlab.
> functional programming features such as in-line loops
What do you see this looking like?
I'm one of those people who would claim that arrayfun() and friends are basically in-line loops, in terms of functionality. And you can write your own variants, too. But because they are functions, and not language syntax, they don't get JIT speedup.
You can do a one-line loop statements like this:
for i = 1:100; x(i) = dosomething(y(i)); end
But I'm guessing you want an expression that returns something, so you can compose with it.
In my limited familiarity with functional programming, in-line loop-like constructs usually are functions, like map(), foldl(), foldr(), etc., because FP likes things to be functions and not syntax, so you can deal with those operations generically. And there's lots of different ways outputs to an "in-line loop expression" could be collected and transformed into an output.
Could you give some more details on what behavior you're looking for here? Something like a Python list comprehension?
> if statements
You're talking about like an if/else expression or ternary operator that does lazy ("short-circuiting") evaluation of its inputs, and returns an output, right? I think that's what @Walter Roberson is suggesting with "ifelse" here - https://www.mathworks.com/matlabcentral/answers/1450984-what-should-go-in-a-next-generation-matlab-x?s_tid=mlc_ans_email_view#answer_789544 - and I'm a big fan of it too.
@Steven Lord I wasn't aware that was an option but looking at it, it seems pretty formalized and for packages. I was envisioning something that was within the function itself so it would travel with it easily. Kind of like a validate arguments thing but instead of just validating, it also allowed for the intelligent autocompletion. I guess you could say what is going on in the json stuff you linked to but have it within the function itself, not as a reference file in a package.
Some would claim that arrayfun(), cellfun(), structfun() are in-line loops.
A couple of releases ago, direct indexing into function outputs became possible, but only for the situation where the function output was a struct. This leads to ugly but permitted hacks such as
struct('data', SomeFunction(parameters)).data(:,3:5)
I have used this kind of hack a small number of times. I am more likely to use an auxillary anonymous function,
IDX = @(X, varargin) X(varargin{:})
IDX(SomeFunction(parameters), ':', 3:5)
Regarding auto-completion of inputs, does the functionality provided by writing your own functionSignatures.json files (described on this documentation page) meet some or all of your needs?
keyword arguments have been supported for a couple of releases. They are processed pretty much as-if the user had used a name/value pair, but argument processing has been enhanced; see https://www.mathworks.com/help/matlab/ref/arguments.html
- Darktheme
- Standart font with distinguishable lI1, 0O etc. (eg. FiraCode, Input)
As of release R2022a the issue I described has also been resolved, as the highlight is now a dark shade of green, instead of a light shade. I have not experimented to see whether this depends on any other color settings.
As of release R2022a you can select a dark or light theme in MATLAB Online and you can change the text and background colors in the Live Editor. See the first two items in the Environment section of the Release Notes for more information.
@Tobias Held In R2021b the two editors were merged (mostly, sort of), so code completion tools are now available in the normal editor as well.
One big downside for me is that the line highlight color in the debugging mode is green, with no way to change the color. With white text that renders code unreadable. I hope they introduce a color option in R2022a. I don't do pre-releases, so I don't know if my wish will be granted in a month or so.
Does it work with live scripts? I think I have tried this before. I only use live scripts because the code competition doesn't work on normal scripts....
Matlab Schemer is nice, but it does not work in the App Designer or when writing Matlab-Functions in Simulink.
I use Matlab Schemer for my dark theme and personally enjoy Consolas for the font.
An LSP for other IDEs, better documentation of the Python engine, easier install of MEfP using some kind of shell script or dep manager, and a modern IDE UI supporting dark theme.
Also all components like Coder require a support of MATLABs licensing scheme so that they are usable in CI etc.
For context: It would be nice to have CI runs test your Matlab Compiler builds, because this exercises your build & packaging scripts, and lets you test the actual compiled artifacts, but more importantly IMHO, it verifies that your new M-code code base can actually be compiled. In interactive Matlab, if there are syntax errors in your code, that raises a run-time error when you try to run that line or function or load that class, so it only matters if that code actually gets used, only affects that particular functionality, and unit testing can catch it. But in the Matlab Compiler, if there is a syntax error anywhere in a code base that is included in a compilation, it breaks the build and you cannot deploy your code at all. So errors that a programmer may have introduced and not even noticed in interactive testing can bork your entire system.
It's impractical to test this by compiling interactively, because the Matlab Compiler is so darn slow, and most of your developers probably won't have Matlab Compiler licenses anyway; only a few devs who do your builds/releases (I call these "release engineers") will have Compiler licenses, and their time is probably expensive.
It raises a "no license for Matlab Compiler" error and the CI run fails.
What happens now when you try to use the products with Continuous Integration that you would like to see changed?
Thank you for editing the answer to clarify that you're looking to use the Coder products in a continuous integration system.
> Also all components like Coder require a support of MATLABs licensing scheme so that they are usable in CI etc.
Oh, I see what you mean! Yeah, I get tripped up by the fact that I can't run Matlab Compiler in CI. If somebody broke the build, I won't find out until I go to actually do a release. :/
I'm not sure what you mean by "Also all components like Coder require a support of MATLABs licensing scheme so that they are genuinely useful." Could you clarify why you think MATLAB Coder and/or Simulink Coder are not "genuinely useful" right now?
"LSP" = "Language Server Protocol", right? I'd really like to see that too.
I don't know the technical name for it but being able to call methods, properties, or indexing without having to make a new variable first. Kind like in python where you can call a function that will output an array or whatever and instead of saving it to a variable first and then indexing, you can just index right off the end of the function call. I know you can do this for strings and structs, but not for cells or arrays. Also, being able to perform a series of functions on an array, the way you can now with strings.
For example:
[5, 1, 2](2) = 1
horzcat([3;2;1], [5;6;7])(3,2) = 1
This would also be really helpful for anonymous functions where you can not define a variable at all.
Cool!
I've heard this called "chained indexing" or "chained operations" in other contexts.
I'd like a way to enter 2-D matrices interactively easier. The current way with inputdlg() or input() is not WYSIWIG and very clunky and non-intuitive (do I put bracket, parentheses, commas, semicolons - no clue!) We need something like
% Pop up a modal dialog box with a 4 by 5 grid (worksheet) where users can enter values:
m = inputmatrix('Enter your values', 4, 5);
Excel has a couple limitations here. It requires an Excel license and installation, doesn't work at all on Linux or Matlab Online, and is difficult to automate on Mac because you don't have COM/ActiveX automation. Not all users or execution environments are going to have that.
@Image Analyst This is the sort of thing that I or another developer could probably whip up with user M-code for current Matlab. Would this be something you'd be interested in if I wrote it up as a library? It'd be Java code, so it'd work now, but Jexit is coming so that's going to be less useful in the near future. And I don't think it'll work for Matlab Online or web-based Matlab presentation contexts, but I don't know how to do custom GUI components using the new web-tech-based Matlab GUI stuff.
Personally, I just use Excel for that. It's the right tool for the job of manually entering data into a spreadsheet. MATLAB can import it easily.
@Sean de Wolski Uh, ok, but virtually no one would know to do that (even I didn't and have never heard of that function). We need an input function similar to what people already know how to use. If you put "input matrix" into the help, openvar does not show up.
x = zeros(4,5)
openvar('x')
?
This wouldn't work in a compiled app but works fine for in MATLAB.
I would suggest MATLAB learn how to implement Pure Object Oriented Programming from Smalltalk.
Pure OOP embodies the following fundamentals:
- Everything is an object all the time.
- Every operation is through message passing.
Pure OOP enables the following capabilities:
- Environment is running and alive all the time.
- Run everywhere, Inspect everywhere, Debug everywhere, Edit everywhere.
- Entire environment object is saved to disk for fast reload.
Ahh, I didn't know about that "stop at the exception, tweak stuff, and continue" thing. That sounds interesting.
Andrew:
You know Pure OOP very well. Yes, MATLAB has great potential to be Pure OOP and reap its benefits. 1, 2.0, nil, true, false, self, super, are all objects that can respond to messages. Smalltalk's syntax is so simple as to fit on a postcard. When MATLAB encounters an unhandled exception, execution terminates. Smalltalk opens a debugger right at the exception. Offending code or objects can be edited and execution resumed. This can done on multiple concurrent objects running code in multiple debuggers.
The environment object is also called the image. On "supersave", it stores all Smalltalk objects (including Processes, Stacks, Memory, Windows, etc) and the necessary information to reestablish external interfaces. On reload, the environment is alive at exactly the same state just before the save. In a sense, Smalltalk has never stopped running and has been evolving from one image to another.
As a business, Smalltalk is a failure. But its technology is still useful to copy. Just as Newston's Laws will be relevant forever, Pure OOP has that quality too.
Current versions of Smalltalk are Pharo.org (open source) and Cincom Smalltalk (commercial).
I'm curious how that interacts with Matlab's performance characteristics of "fine, object and function operations aren't all that fast, but if you're doing numerics what you care about is vectorized number crunching, and that's all BLAS-ified".
I'm also curious to hear the name of this firm: I have good memories of programming elders talking about how Smalltalk was highly effective, even in commercial contexts, but that's all from like 20-30 years ago.
I know some people in Ottawa who created what was effectively a professional Smalltalk programming company. Did fairly well, and eventually were bought by IBM.
If memory serves, they ultimately found that they could not get enough performance from the SmallTalk, including through compilation methods.
I don't know what the group is up to these days, but I do know that one of them is lead on the Eclipse editor.
Now this is an interesting idea!
Let me see if I understand what you’re calling for correctly:
As I see it, Matlab’s basic object and method-call model is already pretty functional-programmy and Smalltalky: now that the primitive types have been largely unified with the MCOS type hierarchy (you can inherit from double now; what more do you want? ;) ) and handle graphics have been revamped to be objects, pretty much everything (except low-level IO state) is an object. And every Matlab operator, including “.”, “{}”, and “()” indexing, maps to a method which can be overridden by a class so it can be intercepted and interpreted arbitrarily. So we’re pretty much message passing here, right? And save() mostly stores arbitrary Matlab objects.
The next step, and what I think you’re calling for here, is for the entire Matlab process’s execution state to be “freezable” and/or addressable as an object. Like, what if we had a supersave() function, which not only saved the contents of the variables in the current workspace, it saved:
- All the workspaces on the M-code function call stack, and the contents of all variables in them.
- The state of the call stack, including what function is being executed, and where the execution pointer is in each frame.
- Contents of all persistent variables in all functions and methods, and all Constant properties in all loaded classes.
Basically, freeze the entire Matlab process in a machine-independent way, right?
This sounds kind of awesome! Instead of walking someone through debugging something over the phone, I could just be like “call supersave() and email me your supermat file!” and now I’m debugging it in my Matlab session.
This also sounds really difficult, because of where Matlab sessions touch non-serializable state:
- Matlab’s low-level IO is unified with C, Live file handles are represented by ints which disconnected my refer to OS-level resources managed in the C standard library or the like.
- External Interfaces to embedded Java, C/C++, .NET, or Python code.
- Network socket state.
- Etc.
Maybe it’s be able to establish a boundary for “pure” Matlab code that doesn’t touch these? Is this what “monads” are for?
Personally, I think that the Answers format is particularly well suited to this sort of discussion, because instead of a linear threading model like a regular forum, it allows people to post various suggestions as top-level Answers and to have them be voted on to indicate community interests, and to let each of those suggestions have their own discussion thread hanging off it.
Benjamin : you flagged this as Not Appropriate for MATLAB Answers . However, it is a classic discussion that fits in well, similar to existing questions such as https://www.mathworks.com/matlabcentral/answers/1325-what-is-missing-from-matlab from early 2011.
This would not break backwards compatibility, but something to consider:
A lot of time, people try to
for x = first:increment:last
with non-integer increment. And then they want to
f(x) = value;
but of course x is non-integer so that fails.
There are standard ways of rewriting this: the common
counter = 1;
for x = first:increment:last
f(counter) = value;
counter = counter + 1;
end
x =
or (less likely by far, but cleaner since counter is more sensible)
counter = 0;
for x = first:increment:last
counter = counter + 1;
f(counter) = value;
end
or the formal and flexible
xvals = first:increment:last;
num_x = numel(xvals);
f = zeros(1, num_x);
for xidx = 1 : num_x
x = xvals(xidx);
f(xidx) = value;
end
But... keeping those counters is a bit of a nuisance, and people get them wrong.
So I would suggest something I have seen in a couple of programming languages: that there be an accessible automatic counter. We could imagine, for example,
for x = 0:.01:2*pi
f(#x) = sin(x.^2 - pi/7);
end
where the #x translates as "the number of x values we have processed so far".
Indexing a variety of arrays with the same # would be considered valid, so you could write
for x = 0:.01:2*pi
f(#x) = sin(x.^2 - phase(#x));
end
But now we have a question that might lead to some backwards incompatibility: suppose we have
for x = 0:.01:2*pi
y = 0;
for x = 1 : .5 : 5
y = y + z.^(x-1)./gamma(x+1);
end
f(#x) = sin(x.^2 - y);
end
and the question is: in that f(#x) that is after the nested for x, should the #x refer to
- the last index associated with the inner x?
- the index after the last one associated with the inner x?
- the index associated with the outer x?
Consistency with existing nested for loops would say it should be the first of those, that at any point, this hypothetical #x should refer to the last for index for variable x that was encounted in the flow of execution -- just like the way that the sin(x.^2 - y) is going to use the last x value from the for x = 1 : .5 : 5 .
I would kind of like such an operator to be associated with the innermost enclosing loop so that in this example the f(#x) would be counting relative to the for x = 0:.01:2*pi loop, but I do admit that it would be confusing to have the #x refer to that loop at the same time that the x itself would be what was left-over for the for x = 1: 0.5 : 5 loop. Also, in a context such as
f = zeros(1,5000);
for x = 0:.01:2*pi
if x.^2 - sin(x) > 1; break; end
f(#x) = acos(x);
end
f(#x+1:end) = [];
then it would make sense for the counter to survive the loop itself, which argues for the status quo of "last value assigned" rather than "according to scope". I think the factors are in tension here.
Now, if we are going to have automatic counters with for loops it might make sense to have automatic counters associated with while loops as well:
x = 0;
while x <= 2*pi & x.^2 - sin(x) < 1
f(#???) = acos(x);
x = x + 0.01;
end
But while loops have no associated variable. So I might suggest
x = 0;
while x <= 2*pi & x.^2 - sin(x) < 1
f(#) = acos(x);
x = x + 0.01;
end
where # by itself is the counter for the innermost enclosing for or while loop. Which would then permit
for x = 0:.01:2*pi
f(#) = sin(x.^2 - phase(#));
end
which is not ambiguous. Now about about with nested loops?
for x = 0:.01:2*pi
y = 0;
for x = 1 : .5 : 5
y = y + z.^(x-1)./gamma(x+1);
end
f(#) = sin(x.^2 - y);
end
The innermost enclosing for or while loop would be the outer for x loop... the one the user probably intended in such a context.
With the discussion above about what #x means after the end of a for x loop, this proposed behavior of # by itself would lead to the possibility that at that point, assigning to f(#) would be assigning according to the loop counter for the outer for x, but that assigning f(#x) would be assigning according to the loop counter for the inner for x . That is not ideal for readability, and is likely to lead to confusion.
It seems to me that in some cases, people would want a #x at that point to refer to the outer loop, but people would also sometimes want a #x to refer to the inner for x . It would also not surprise me at all if people wanted both ways at the same time. Of course, if they wanted clarity and readability, they probably should not have used nested for loops with the same variable name !!!
I would end up using this a lot; I often have to restructure my for loops to get an index to go with the actual value.
What are your thoughts on Python's approach to this with the "for i, x in enumerate(xs)" sequence generator and multiple assignment for for loops? I could see Matlab doing something like this. Let's say you have some parallel arrays xs, ys, and zs, you could do:
for [i, x, y, z] = enumerate(xs, ys, zs)
and get an index plus the ith element from each of the input arrays. This might generalize to creating other loop-pass-dependent variables using "generator functions" or the like.
This doesn't help with the while case; I still like the convenience of an implicit # and don't see how to handle that using the "generator function" approach.
I love #x idea. +1 that would save me so much LOC
MATLAB is intended to make it easy for people to write code, and
for x = 0:.01:2*pi
f(#x) = sin(x.^2 - pi/7);
end
is significantly easier than thinking to initialize a counter and increment the counter in the right place and take care of the counter edge cases.
There is a additional factor to consider, which is that people often want to follow with something like
plot(x, f)
with their intention being that the x refers to the entire span of values that x was assigned in the for loop. I'm not sure there is a reasonable way to handle that. But I can say that I would find it more compact and less thought to write
xvals = 0:.01:2*pi;
for x = xvals
f(#x) = sin(x.^2 - pi/7);
end
plot(xvals, f)
-- that is, the more the formalized tracking burden can be reduced, the better.
However... there is one additional factor that careful programmers take into account that merits some additional thought, which is pre-allocation.
It is tempting to propose that in a loop that uses #, that MATLAB behind-the-scenes does:
- If any destination array being indexed with #var or # did not already exist and MATLAB can deduce the number of elements involved, then MATLAB pre-allocates the array
- MATLAB keeps track of the highest location written to, per dimension
- After the loop, if MATLAB pre-allocated, then it also truncates per-dimension according to the highest written into
This would represent some backwards incompatibility, in that with current for loops and arrays that do not already exist, then inside the loop, size() and whos() currently show the size it has grown to based upon the code flow, whereas under the above proposal, size() and whos() would reflect the size preallocated.
This could potentially still be worked around, if size() and whos() reported based upon the largest dimension written to, with there also being a "shadow" size that reflected preallocation. But I'm sure there would be some complications about that. For example, what happens if you (:) an array with a shadow size that is larger than the consumed size? And does linear indexing work according to the consumed size?
... might be easier to insist that #var could only be used on output with pre-allocated variables.
I think the people who forget to round their indexes to integers, and are the ones who would need #, would be the same people who would never even know/remember that it exists as an option, and would not even know how or when to use it.
Parallel array iteration!
Let's say I've got some arrays in variables x, y, and z, with the same number of columns.
I'd like to be able to say this:
for (x_i, y_i, z_i) = (x, y, z)
% ... do stuff ...
end
Instead of this:
for i = 1:size(x,2)
[x_i, y_i, z_i] = deal(x(:,i), y(:,i), z(:,i));
% ... do stuff ...
end
Oh, that num2cell thing is a nice trick!
If you need them to be separate, yes. You could also abbreviate the above, however, to
for t = num2cell( [x; y; z] )
[x_i, y_i, z_i] = t{:};
% ... do stuff ...
end
I don't see how that helps? You still have the inconvenience of splitting out the rows of xyz into separate variables at the start of each loop pass, don't you? Like:
for xyz = [x; y; z]
[x_i, y_i, z_i] = deal(xyz(1), xyz(2), xyz(3));
% ... do stuff ...
end
You could do this.
for xyz=[x;y;z]
...
end
Convenience thing:
- The fieldnames function returns a string row vector, not a cellstr column vector, so you can loop over cell fields with for fld = fieldnames(s) instead of for fld = string(fieldnames(s)'), which is uglier.
Alternatively, just return all 1D results into a column (or a row if you insist) and change how for loops work, or [maybe less desirable] have an un-oriented 1D array?
... Or there could be some kind of reshape-to-row operator similar to the (:) reshape-to-column operator... along with a slightly different interpretation of (:)
Currently reshape-to-column is considered an indexing expression, so you cannot use
f(x)(:)
but you can use
f(x).'
The fact that (:) is considered an indexing expression has consequences for complex arrays whose imaginary part is 0: it is required to drop the zero imaginary part, whereas reshape() does not drop it.
Currently the model of MATLAB is that it always evaluates from left to right [*] finding the left-most unprocessed sub-expression and evaluating it, and then finding and evaluating the right hand side operand, and then performing the operation. The right operand is not processed until the left is evaluated, but unless the left operand results in an error, or the operation is && or || the right will always be evaluated.
[*] exception: there are some funky things with chains of ^ and .^ operators, they are not left strictly left to right.
This behavior prevents there from being function forms of if/else operations -- there is no equivalent to C's ?: operation. In C, the unselected operation is not evaluated at all.
The hack work-arounds require embedding the work to be done inside an anonymous function and writing a function like
function varargout = ifelse(expr, basepart, elsepart)
if expr
if isa(base_part, 'function_handle')
[varargout{:}] = basepart();
else
varargout{1} = basepart;
end
elseif isa(elsepart, 'function_handle')
[varargout{:}] = elsepart();
else
varargout{1} = elsepart;
end
end
and using that gets ugly... and probably messes up multiple output processing.
Piecewise(x ~= 0, 0, 1./x)
can't be done and would have to look like
Piecewise(x ~= 0, 0, @(x)1./x)
I would like to see a cleaner way of handling this -- one in which the function being called does not need to know that a delayed evaluation is being done.
In the Maple programming language, there are two related mechanisms available. First, there is a simple syntax to delay evaluation. This is indicated by using ' ' around the expression. For example,
Piecewise(x <> 0, 0, '1/x')
In Maple, this is not a quoted string: Maple uses double-quotes for strings. Instead it is a delayed evaluation. Each time the relevant expression is evaluated, one level of unevaluation is removed; when it is eventually evaluated in a context where there are not remaining protective uneval() levels, then the expression is evaluated.
Secondly, Maple allows procedures (that is, functions) to declare a parameter as being of type "uneval", which has the effect of adding a layer of uneval around what is passed in. For example,
Piecewise := proc(x, basepart::uneval, elsepart::uneval) #stuff; end proc;
would permit uses to code
Piecewise(x <> 0, 0, 1/x)
and the 1/x will not be evaluated before being passed in to the procedure.
Some programming languages deal with these kinds of issues by using "lazy evaluation". Something like
Piecewise(x <> 0, 0, 1/x)
would not evaluate any of the parameters until such time as the code inside Piecewise asked for their value -- so if the code logic did not ask for the value of a particular parameter, it would never be evaluated.
If I understand correctly, tallarray() already does some delayed evaluation, building up expressions and then internally finding ways to reduce the memory access during evaluation.
Ooh, yeah. I'd also like lazy evaluation in some cases, especially in the context of a ?: ternary operator.
I use your ifelse function hack a lot, and it's not very satisfactory because it doesn't short-circuit.
Besides making it easier to handle exceptional cases, having a method of delaying evaluation is quite important in symbolic processing.
For example, suppose I have
int(a*sin(theta)^4 + b*cos(theta)^4 + f(theta), theta, 0, 2*pi)
and suppose that f(theta) is expensive to attempt to integrate, and suppose in its current form the expression does not have a closed form, so the after struggling for a long time to find the integral, the int() is going to return the int() form unevaluated.
Now suppose that I want to rewrite a*sin(theta)^4 + b*sin(theta)^4 as something like (a-b)*sin(theta)^4 + b*sin(theta)^4 + b*cos(theta)^4 and then group to (a-b)*sin(theta)^4 + b*(sin(theta)^4 + cos(theta)^4) which would be (a-b)*sin(theta)^4 + b .
At present, once I have made int() struggle to evaluate the integral, knowing it will not succeed, I can get back the unevaluated symbolic integral into a variable, and then I can start using findSymType and mapSymType() to manipulate children() of the integral. And I might know full well that the result is not expected to converge either (or at least is not likely to); perhaps I have more processing steps to do afterwards. So how do I do that? If I ask MATLAB to evaluate int() of the revised expression, expecting that it will decide it cannot integrate and eventually return unevaluated int(), then that takes a lot of time. I need to be able to tell MATLAB that I have here an expression that I do not want to have evaluated just yet .
Recently, MATLAB introduced a mechanism that does help with this process: there is now the "hold" option for int(), so I could emit int(expression, variable, range, 'hold', true) . And in the specific case of int() that does help for sure...
But... eventually I want to matlabFunction() the integral, expecting it to produce a call to integral() inside an anonymous function. And matlabFunction() cannot process the "hold" option. matlabFunction() also cannot process vpaintegral() calls. So... I have to release() the hold on the integral and let it struggle to find a closed form (which could literally take days before it gives up), just to have the unevaluated int() in a form that matlabFunction() is willing to process.
When you are implementing any kind of rewriting rule such as trig identities or hypergeometric identities, or implementing differentiation rules, then you need to be able examine and change the form of a symbolic expression without continually triggering evaluation. And the primary mechanism for that is careful use of unevaluation.
My wish list:
(1) Colon operator produces column vectors, not row vectors:
x=1:4
x = 4×1
1
2
3
4
(2) Optimization Toolbox solvers should have only one algorithm per solver, i.e., instead of,
x1=lsqnonlin(fun,x0,lb,ub, optimoptions(@lsqnonlin,'Algorithm','levenberg-marquardt'))
x2=fminunc(fun,x0, optimoptions(@fminunc,'Algorithm','trust-region'))
we would just have
x1=lsqnonlinLevMarq(fun,x0,lb,ub)
x2=fminuncTrustReg(fun,x0)
etc...
(3) The Image Processing and the Computer Vision Toolboxes would be designed around the coordinate conventions of ndgrid() instead of meshgrid().
(4)One-dimensional array types, i.e., with ndims(X)=1.
I agree anything that returns a 1D array of something should default to returning it in a column vector, or at least pick an orientation! Some functions return columns, some return rows. Maybe an alternate approach if you don't want to break the for loop is to add a "for..in" construct that behaves like Matt proposes, just going through even element in linear index order.
The ndgrid vs meshgrid is an interesting one...wouldn't that mean for images we would want to index columns first, then rows? If I haven't got it mixed up, that seems juxtaposed to the previous sentiment about wanting default to columns?
@Paul I don't think the dot operator is necisary in your example. Just transpose.
Only one character, but for something I type A LOT it's syntax convieniences like this that matter. Also, for anyone else reading, you can capture a colon-operator list and transpose it to a column. Not as convinient but maybe helpful.
for v = x(:)'
end
y = [[1:5]' [3:7]']
FYI there's an old discussion on comp.soft-sys.matlab from shortly before I started at MathWorks where Cleve stated:
I will admit to:
a) "Inventing" this "clever" feature in the original MATLAB, and
b) Never using it for anything useful.
I've often (well, not really all that often) wondered why the for statement was designed to loop over columns from the very beginning. I think I might have used that feature once to loop over eigenvectors.
OTOH, it's probalby more common to have to do things like
for v = x(:).'
when one needs to loop over elements of a vector and it's unknown if the vector is column or row, e.g., due to user input or something.
Gotcha.
I think I'd be fine with for looping over elements instead of columns; I don't know that I've ever (intentionally) used the looping-over-columns behavior. But maybe that just means that I don't know what it's used for.
Why do you want column vectors instead of row vectors for the colon operator?
In linear algebra (which Matlab is designed around), the convention is to work more with column vectors than row vectors. I always find myself cursing under my breath whenever I have to type x=(1:N).'. I would settle for a new operator, though, e.g. x=1::N
And the for loop iterates over columns of the array, so for i = 1:100 would now only do one pass. I think this use of a for loop to iterate over numbers in a range is a common use case.
Walter said the same thing, but since we're saying we don't care about back compatibility, that could be abandoned as well.
I think I would also like 1-dimensional array types. Having to deal with vectors as a special degenerate case of 2-d arrays is a bit of a hassle, IMHO, and can cause edge cases in interface definitions.
Why do you want column vectors instead of row vectors for the colon operator?
Row vectors seem easier to read, because they produce more compact, single-line output when displayed at the command window.
And the for loop iterates over columns of the array, so for i = 1:100 would now only do one pass. I think this use of a for loop to iterate over numbers in a range is a common use case.
I have used the feature a couple of times. Not often.
They can iterate over rows or over the individual entries of the input matrix in linear indexing order. I don't care. I don't think I've ever used that feature in 25 years of using Matlab.
Matt, if colon operators produces column vectors, then how would you deal with the fact that at present, for loops iterate over columns?
for i = [1 2 3; 4 5 6]
disp(i)
end
1
4
2
5
3
6
A complete list of changes for each command.
Currently we find "introduced in Rxy" already, but modifications of inputs and outputs are very useful also. Examples: When did unique introduce the 'legacy' flag? When did strncmp change the behaviour for empty strings and n=0?
This would be useful.
No reason to wait until MATLAB X to start doing it though; MathWorks could add a per-function/class Changelog to the doco any time, I think!
Most important
- Start indexing from 0
- Redo package system
- Improve the class system
- Improve language a bit (like value += delta)
A C programmer uses 0-based indexing because they care about "how many offsets from the start address is this thing", a MATLAB programer uses 1-based indexing because they care about "which element is it". Indexing from 0 only makes sense when you care deeply about memory, most MATLAB users do not and should not. (in my opinion, C# and Java should index from 1, since they hide the concept of memory addresses to some extent)
Just cracked open my nonlinear continuum textbook and am greeted with a glut of things like:
that are based on 1-based indexing. Point being that switching to zero-based indexing will just cause pain for people in technical fields that are one-based. To me, seeing means an initial stress state, while means the maximum principle stress. Two entirely different concepts.
Going along with this exercise, instead of changing existing functionality, I'd rather see an extension to functionality -- maybe allowing specifying whether an array/vector is a one-based or zero-based array:
x = zeros(10,1); % A standard 1-based indices array
y = zeros(10,1,"startIndex", 0); % A 0-based indices array
So for something like getting an array of Legendre basis polynomials (which are 0-based indexing) instead of:
P = legendreBasisArray(3, sym("x"))
P =
disp(P(1)) % Display the constant Legendre polynomial, which is usually referred to as P_0(x)
1
function P = legendreBasisArray(p,variate)
P = sym(zeros(p+1,1));
for n = 0 : p
P(n+1) = 1/((2^n)*factorial(n))*diff((variate^2-1)^n,variate,n);
end
P = simplify(P);
end
You'd instead do something like:
P = legendreBasisArray(3, sym("x"))
disp(P(0))
function P = legendreBasisArray(p,variate)
P = sym(zeros(p+1,1), "startIndex", 0);
for n = 0 : p
P(n) = 1/((2^n)*factorial(n))*diff((variate^2-1)^n,variate,n);
end
P = simplify(P);
end
The challenge here, though, would be that now I'd have to remember that P is zero-based and after a few years and 100k more lines of code, whomever takes over will probably be very confused. Personally, I don't think it's worth the effort and would rather just figure out once whether I need to use n+1 or n in my loops.
I did not see a problem yet, which could be solved with 0-indexing, but not with 1-indexing, and vice versa. I tell the children I teach Matlab and C to say it loud:
- Matlab: "x(1) is the first element of x"
- C: "x[0] is the contents found 0 elemtens behind the pointer x"
- Say it loud until you feel it. If you can feel it, it is programming, and not a crossword puzzle anymore.
It is not only the 0 or 1 base of the index, but a fundamental concept: You see this in C if you understand, that x[i], *(x + i) and i[x] are the same.
Therefore I consider the discussion about the base of the indexing as equivalent to:
- German: DAS Auto (neutrum)
- French: LA voiture (femininum)
There is no reason to decide for the genus of cars. Just use the language to express, what you want to say.
I have seen programmers get all hot about how starting indexing at 0 leads to better efficiency for indexing using pointers, since it uses one less subtraction per dimension... but those same programmers often see nothing wrong with implementing arrays with 2 or more dimensions as being structured as (N-1) layers of vector of pointers that you have to consult to get to the actual data, rather than using blocks of linear memory.
It's not just Maple: Fortran, on which Matlab is built, defaults to array indexing starting at 1, but allows it to be customized on a per-array basis (tied to array objects, not to the context of code which is addressing them, I'm pretty sure) so you can index from 0 or 42 or whatever.
I think this is a bad idea that will just lead to confusion and higher code complexity, for no actual benefit that I can see.
If you really want to customize indexing for an array in Matlab, you can do that now by creating a classdef object and overrideing its subsref/subsasgn methods.
Another way to handle mixed indexing might be to have a different file/type suffix. Like arrays in file foo.m2 uses 0-based index, and arrays in bar.m uses old 1-based. And matlab translates automatically if functions in one calls functions in the other. Then you could also add the other new stuff in the new file type.
Again, indexing happens so much that breaking backwards compatibility on indexing is probably too high of a cost. But it might be reasonable to add an option indicating which index base to use.
Or... if arrays could be marked about which indexing form is preferred, then we could ask what it would look like if the functions such as max() returned indexing using the same base as the input.
At first glance that sounds plausible. It does, though, lead to the question about which indexing should be marked for the results of calculations, especially calculations between mixed arrays. I suspect that would have to work something like "if either of the operands is 0-based then return 0-based". If that were done, then
A = rand(3,5,'index0')
B = A + 1
would see the 1 (which would be index1) and the index0 of A and would produce a result that would be index0 . That's probably reasonable.
Though I do not understand why people would consider zero-based indexing such a priority that they would consider breaking decades of software development just to get it. 0-based indexing makes pointer user a bit easier -- multiply object size by index, result is the offset relative to the pointer. But MATLAB doesn't have pointers.
Yes, 0-based indexing does allow you to simplify calculating linear indexing, slightly -- A(J,K) is currently A((K-1)*#rows + J) and with 0-based indexing it would simplify to A(K*#rows + J) . But really, is that something that is done a lot at the user level?
Like if the purpose were to make MATLAB indexing compatible with C / C++, then why concentrate on it being zero-based: why not get all up about the fact that MATLAB is column-major order when C and C++ are row-major order?
It is not only about how to access an array you have. It is also about indexes returned from functions, such as [M, K] = max(x), then K should be zero based indexes. But I feel this is mostly an academic discussion as somthing like very unlikely to every happen in Matlab.
One important other thing that is backward compatible I would like to see in "Matlab X" is to bring back the old standard menus in the gui, not the horrible office ribbon type, I will never like or get used to that.
The Maple programming language allows some kinds of arrays to be declared with arbitrary integer bounds. You can declare those kinds of arrays as being indexed from (for example) -10..10, 1000..1999 . But it has two styles of indexing, one of which pays attention to the declared bounds, and the other of which is 1-based indexing.
the premise was "one-time breaking change that abandons back-compatibility", so why not
Because people have existing code -- a lot of existing code. Upgrading to the new facilities cannot be too hard or else people are not going to use the new language. So something like the idea posted about not allowing a statement immediately after a condition on an if or while would probably be feasible: it's the sort of thing that a tool could rewrite automatically for most cases. It would not catch cases where character vectors are constructed and eval() or evalc() or evalin(), but the fraction of programs that do that in an incompatible way is relatively small.
But 1-based indexing is all over the place, and it is common for indices to be calculated.
It would therefore make more sense to leave in 1-based indexing with () indexing -- but to add support for a different indexing base using a different syntax.
I was about to suggest using [] for the alternate indexing, on the grounds that it is currently not legal to use [] immediately after a variable. But then I got caught in the question of list building and cell building, and what happens if you have something of the form
A = rand(1,5);
[A (1:3)]
and whether that is treated as two expressions or as subscripting. The answer is it is treated as two expressions... and that means it should not be a problem to define [] indexing that had a difference between
[A [1:3]]
[A[1:3]]
Well the premise was "one-time breaking change that abandons back-compatibility", so why not. Mixed indexing sounds even worse because it makes it harder to read and need more context to understand. I don't see any problem with linear indexing, A(0) = A(0,0) = first element.
Starting all array indexing at 0 would require that a lot of code be changed. Even just simple,
B = A(1:4,:)
in existing code would have to be rewritten. Subscripts like that occur a lot .
Ideas such as allowing comments to start with # do not affect any existing valid code, Ideas like changing object properties from character vectors to strings only require limited rewriting to upgrade old code to new. But 1-based indexing has been fundamental, and would require touching the majority of code.
I would suggest that instead of having all array indexing start at 0, that instead the default would stay with 1, but that ways were provided to create arrays that used 0-based indexing (more generally, the indexing base could potentially be set to any finite integer.)
Permitting the array indexing for any particular object to vary at run-time probably makes some kinds of flow analysis more difficult.
Linear indexing would take a hit for objects with unusual indexing. With one-based indexing, A(1) is the same as A(1,1) which is the first element of A, and A(1) is simultaneously linear indexing of A as well as the abbreviation for A(1,1) . But with 0-based indexing, A(1) is... what? The second element of A if A(1) is considered the abbreviation for A(1,0), but the first element of A if A(1) is considered linear indexing.
Linear indexing is far too useful to just discard. The code to get "the first element of A, no matter what dimensions and base of indexing" ought not to be complicated -- not like A(bounds(A,1,'lower'))
What would you like to see changed about the package system? (Are you referring to the +blah directories that create namespaces for classes and functions? Or how code is packaged using Matlab Toolboxes or Projects?)
Yes but I don't think Mathworks cares about any of this.
One example of improve class system: Remove contructor name in the file, as it is already given by the filename (e.g. function this = _ctor(...)). Information should not be duplicate. This will make refactoring easier. And in the same way the top function in a file should not be named in the file, because Matlab uses the filename anyway.
"Redo" and "improve" do not give us any information about what needs to change :(
Oh, here's one!
- Comments can begin with "#" in addition to "%".
This would enable Octave compatibility. But I think that might be to MathWorks's benefit: it would enable you to easily take existing Octave code and migrate your workloads to Matlab, which is the direction that MathWorks would like people to move.
Also enables use of "shebang" lines on Unix, so you could easily create executable commands as Matlab scripts.
> It's easy enough to do a find/replace for # -> %
How would one do this easily, without mangling # characters that are inside strings or comments?
> I don't believe this should be done for the sake of Octave users.
It is more for the sake of Matlab users that wish to take advantage of existing Octave code.
> Octave should conform to Matlab syntax, not the other way around.
This is likely never happening here, because there's too much Octave code out there using "#" comments, and Octave coders like "#" comments.
While the shebang functionality is a good reason for this, I don't believe this should be done for the sake of Octave users. It's easy enough to do a find/replace for # -> % that anyone who wants to convert from Octave to Matlab can with relative ease.
But it's Octave that makes the claim: "There are still a number of differences between Octave and Matlab, however in general differences between the two are considered as bugs." Continuing:
- Furthermore, Octave adds a few syntactical extensions to Matlab that might cause some issues when exchanging files between Matlab and Octave users.
Octave should conform to Matlab syntax, not the other way around. Thus I'm only interested in # from a shebang perspective.
I also see a fair number of people trying to use // comments
A possibly radical one:
Semicolons are no longer needed to suppress display of a statement's result. Instead, output is suppressed by default, and if you do want it displayed, you append a "!" (or something else) to the end of the statement. Semicolons are now just statement separators, and you can omit them in most places with no effect.
Maybe this should apply only to function and classdef files, and statement result display is on by default in script files, and you still suppress its display by appending a ";" there.
By your logic, we might as well add every function attribute that we can possibly think of
[[NeverOnASunday]] [[HighTide]] [[NoSiciliansWhenDeathIsOnTheLine]] [[Alignment=4096]] [[SpellMyNameWithAnS]] [[IFeelOrangeToday]] [[KOI8FilesOnly]] [[NoThreads]] [[NeverGonnaGiveYouUp]] [[OddResults]] [[BaseIndex=0]] [[BirbImagesOnly]]
@Andrew Janke that is a useful mixin / util. I'll promptly be adding it to our code. Simple idea. I just haven't thought about no-discard until this conversation. I got the idea from C++
Sure, if displaying an output is a side effect that you want, you don't have to use nodiscard. The nice thing about function attributes (as a feature idea) is that you wouldn't have to use them.
Your last example raises a good point. I would like to say that yes that would count as discarding. But in matlab we need to accomodate the case where you want the 3rd output but not the other two. So I would say no, that does not count as discarding.
Another useful attribute would be [[supress_output]] so I can prevent contributors from causeing output by accidently deleting a semicolon. Also [[const]] on object functions to prevent / guarantee that the object properties state of a handle object isn't gonig to change. But that might be an entire new can of worms.
Hmmm... I think that's a NO, from me, that that would not be nice. My belief is that if a function that can output something, and it does not deliberately avoid output when nargout == 0, then the proper response from MATLAB is to display the output, not to complain that the values are not being assigned to variables.
For example I think that the proper response for having a setvartype() without an assignment, is not for there to be an error, but rather for it to be flagged as a warning by mlint
If, hypothetically, a new assignment operator were created that allowed the user to manage
A = object_of_class_B
inside class B, something along the lines of
function newobj = assign(obj, newobj) %obj being the object of the class
then yes in such a case it might be important to verify that newobj really was present, since it could make a difference in how you manage reference counts or whatever. But shy of such a new operator... No, I think the semantics you propose is not good design.
Question: if [[nodiscard]] were implemented, then would
[~, ~, ~] = function_call(parameters)
be counted as "discarding" the outputs in violation of the proposed [[nodiscard=3]] ?
@Walter Roberson sure. But wouldn't it be nice if you could write all the code to check if nargout > 1 and if it is exit by just typing [[nodiscard]] or, for even more control [[nodiscard=3]]
Checking nargout and error if not being assigned, is already done by some of the functions to process Import Options, such as setvartype()
It seems easy enough to implement the [[nodiscard]] behavior using regular M-code constructs with current Matlab.
Add this function to your library:
function mustCaptureOutput(callerNargout)
if callerNargout == 0
error('You must capture the output of this function!')
% ... and use fancy dbstack() parsing techniques if you want
% to automatically include the name of the called function in the
% error message ...
end
end
Then in your functions that want their outputs to be captured, you can say:
function outYouReallyWant = myFunc(x, y, z)
mustCaptureOutput(nargout);
% ... do stuff ...
end
I can not think of a way to deal with auto-display on omitted semicolons in this manner; that's a different language mechanism.
> How about function attributes [[nodiscard]] (Error if this function is called without output arguments) and [[nodisp]] supress all print statements. Possibly other useful traits
Personally, I'm hesitant about this. This is probably a matter of taste and judgment, but in my view this seems like a substantial increase in complexity of Matlab code definitions, which is not great for the typical Matlab user. Especially because it makes behavior of a given line of code more context/scope dependent. Seems like this would make M-code start looking like modern Java, and I'd be plastering [[nodisp]] over every single function/method I write. That doesn't seem ideal to me, especially if your software design style involves factoring things into lots of small functions or OOP methods.
I think that I would actually enjoy being able to write Matlab code that used function/method annotations or decorators (like Java and recent Python do), but I suspect 99% of my fellow Matlab coders would hate me for doing so, and have trouble working with that code.
I would rather have more uniform code behavior and live with the inconvenience of hunting down missing semicolons, I think.
How about function attributes [[nodiscard]] (Error if this function is called without output arguments) and [[nodisp]] supress all print statements. Possibly other useful traits
Thanks @Loren Shure! That actually looks totally like I was looking for; I guess I'm just not up to speed on how the Matlab Code Analyzer works these days. In particular, it looks like one could easily use that to do "show me all the missing semicolons in this code base" by building on checkcode calls. This will be useful to me.
FYI, only in reference to creating reports from the Code Analyzer, please see this page of the doc! Glad you are talking about what you would like to see changed...
--loren
> Not when you are working at the command line. Or in scripts.
Hmmmmmm. At work, I do a fair amount of command line work, and we've got a good collection of scripts from our users. In our scripts and command line usage, it seems a lot more common to do small "dumb" modifications like for i = 1:numel(x); x(i) = some_simple_operation(x(i)); end where you'd want the output suppressed, or do things like:
obj = MyBigObject;
obj.Property1 = 'something';
obj.Property2 = 420;
obj.Property3 = 69
where you wouldn't want to get a big 40-line dump of the new obj state on every property setting operation.
I'd be interested in seeing some objective analysis of this. Maybe someone with some ANTLR skill could whip up a little analyzer to scan a code base (of scripts), take a look at the number of lines which have suppressible output, and count up the number and proportion where they were actually suppressed?
I may be biased here; I'm working in a large-ish enterprise-y codebase, and all our stuff operates in that context.
I think suppressing output display for statements is by far the more common case.
Not when you are working at the command line. Or in scripts.
Basically, people who write scripts usually expect output for each statement unless they turn the output off. And a lot of people use MATLAB in mostly-interactive mode instead of writing functions (or only writing functions from time to time.)
I like how it's done in Julia, where if you're in the REPL and enter a single line of code without a semicolon:
_
_ _ _(_)_ | Documentation: https://docs.julialang.org
(_) | (_) (_) |
_ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 1.5.4 (2021-03-11)
_/ |\__'_|_|_|\__'_| | Official https://julialang.org/ release
|__/ |
julia> x = abs(-2)
2
julia> x = abs(-2);
julia>
it prints output. But within functions it doesn't require semicolons to suppress output:
julia> function hypot(x,y)
x = abs(x)
y = abs(y)
if x > y
r = y/x
return x*sqrt(1+r*r)
end
if y == 0
return zero(x)
end
r = x/y
return y*sqrt(1+r*r)
end
hypot (generic function with 1 method)
julia> d = hypot(-2,1)
2.23606797749979
My thinking here is this: If there's two ways to do something, and one of the ways is much more common and what you usually want, then that way should be the default way and take less work to accomplish, and the less-common way should be the way that requires some extra work or indicator.
I think suppressing output display for statements is by far the more common case. Even the Matlab Editor seems to agree with me! If you omit a semicolon, it'll flag a code inspection, and if that's really what you want, you have to add an inspection-suppressing %#ok<NOPRT> pragma to quiet it down. (I like having code that is "code-inspection-clean".)
And sometimes I run in to the problem that somewhere in a large code base, somebody omitted a semicolon somewhere, and now my program is spitting out lots of undesired output. It is pretty difficult to search for a missing semicolon in a Matlab codebase! Hard to write a grep or regex pattern that will detect it. Because there are lots of non-statement code constructs that are lines that don't end in semicolons but also don't produce output (like classdef Foo, if <something>, for i = 1:10, etc etc), you can't just search for non-blank lines that don't end in semicolons. And there's no M-lint report generator any more AFAIK, so you can't try looking for that NOPRT inspection across a large body of code.
Searching a code base for exclamation marks would be pretty easy, since they have little other syntactic role in Matlab code. (Not much code is probably using the !some shell command syntax, and if it is, you can just search for "! that isn't at the beginning of a line", which is easy enough.) So it'd be easier to find where unexpected/undesired output is coming from.
That's a good point. Whatever is done should work uniformly; no different behavior in classdef/function files vs scripts or on the command line.
I would expect this would be how to do it:
a = 3!, b = 5!
Commas separate statements; output suppression is a per-statement property, so each statement should get its own output suppression control thingie.
I am not in favour of anything that would lead to differences between pasting code at the command line (script) and having the exact same code inside a function or classdef file.
Suppose you are debugging and you have stopped at a statement, and you wish to examine the results of the statement without formally executing it. Remembering that if you formally execute a statement inside a function that you cannot tell MATLAB to repositition its notion of the "current" execution point. You might be expecting errors, perhaps; you might be needing to explore possibilities for what-ever reason.
So you are stopped at a statement, you copy the statement out using copy-and-paste, you paste the code in at the command line... and it gives you a different result than would be the case for telling MATLAB to go ahead and execute the statement. Because the command line is "script" environment and you defined scripts and functions to execute differently.
Maybe this should apply only to function and classdef files
Remembering that applies to the idea that "Semicolons are now just statement separators, and you can omit them in most places with no effect." The implication is that syntax that would be valid inside function files would not be valid inside scripts or command line...
if you do want it displayed, you append a "!" (or something else) to the end of the statement
At the moment, the comma statement separator acts as a separator and does not suppress output. Would
a = 3, b = 5
get transformed into
a = 3! b = 5!
or would it get transformed into
a = 3!, b = 5!
I'm not sure I see the issue? In command syntax currently, trailing semicolons are treated specially. The trailing "!" or whatever could be similarly treated specially in this alternate behavior.
function out = fooblah(in)
if nargin > 0
fprintf('in="%s"\n', in);
end
out = 420.69;
end
Output:
>> fooblah
ans =
420.6900
>> fooblah x
in="x"
ans =
420.6900
>> fooblah x;
in="x"
>>
Or am I misunderstanding the concern here?
How would this remain compatible with current command dual syntax?
Oh, thanks! Reading through that now.
Yes! I think in all established products it's occasionally neccissary to make major pruning of older functionality for the good of the product / eco system. In companies I've worked for we've done this and made plenty of annoucements "Your legacy code might not work in version!!!! but we have guides on how to change it / we will support old matlab for the next X (many many) years."
For the most part it's always been well recieved.
I'll add
max(2,[]) should not return []
++ incrementing
maps as a more prominant base data type
expose more internal apis for making subclasses for plot objects, like custom arrows
Oh, cool! I'm on R2019b now, and I didn't realize that was already a thing. Going to try upgrading to R2022a soon anyway to get the other UI coding improvements.
As of release R2019b you can create your own custom chart classes using the matlab.graphics.chartcontainer.ChartContainer class.
As of release R2022a you can also create your own custom Live Editor tasks or custom UI components in App Designer. See the Release Notes for more information.
I dig your block scoping idea. Establishing scope like this by breaking things up into subfunctions can be a hassle.
I really would want the variables to be cleared at the end of the block, though: then it could be used for RAII/SBRM style resource management using onCleanup and object delete destructors!
But Python maps do not take strings as keys... they take "hashables" (i.e. immutable, valuewise-equivalance-testable objects that the Python interpreter can create hashcodes for). A Matlab equivalent could be "anything that supports isequal and is nonmodifiable".
I get where you're coming from, though. I run in to this use case a lot, and always end up just hashing/munging my keys to be valid Matlab variable names so I can use a struct for this case. A decent Map or Dictionary type would be really nice. And I think it would have to be a built-in type, and possibly introduce the notion of a generic "hashcode(value)" operation, in order for it to perform decently. Otherwise you're left iterating isequal over a list of keys; ouch.
I wonder if that's something that could be strapped on to the existing Matlab using a MEX implementation, without requiring a non-back-compatible change?
I think just... python string maps would be good to have more prominant. More complicated maps could still require special constructors.
Also I'd like to add nesting scopes for variables in functions. I don't care if the variables aren't cleaned until the parent function returns but it would make my workspace so much cleaner when debugging. It's also a really useful way to group a "thought" in your code that isn't yet deserving of being a function on it's own.
function out = myFun(x,y,z)
SomeVariable = %...
{
%Can access SomeVariable from the parent scope here
DifferentVariable = 3;
if condition && exFunction(SomeVariable, DifferentVariable)
%do something interesting
end
}
%Cant access DifferentVariable here, it's out of scope
end
Oh, yeah, I'd like "augmentation operations" like ++ and +=, too.
I'm not sure we're ready for maps as a more prominent base type: IMHO, nobody has yet figured out what a good general-purpose Map or Dictionary type in Matlab should look like, in terms of interface and semantics. containers.Map definitely ain't it.
Lower-level APIs for custom plot objects would be great too.
In MATLAB X, I would like to see:
- An object display customization API like Python's __str__ and __repr__. (`disp` isn't suitable.) (See The Dispstr API)
- In mixed-mode arithmetic (combining floats and ints), ints widen to floats instead of narrowing to ints.
- Integer-looking literals (like 1234) produce ints instead of doubles.
- Both single-quoted and double-quoted string literals produce string arrays; to get char arrays you need to explicitly call char(...).
- Every function uses string arrays instead of char vectors or cellstrs in its return values, when not determined by the type of one of the inputs.
- Figure handle properties use string arrays instead of char vectors.
- In string literals, backslash escapes are interpreted by the string literal itself, and not by the *printf() functions.
- import statements have file scope, not function scope.
- Class properties with (1,1) string validators default to string(missing) instead of the empty string "".
- There's a date-only localdate type to complement the date + time datetime type.
- now() and today() return datetime and localdate values, instead of double datenums.
- For that matter, pretty much every date or time returned by a function is a datetime or localdate instead of a double datenum.
- Maybe classes and functions in the same package are visible by default, using unqualified names, instead of requiring package qualification or an import statement. (Though this is mostly handled if import gets file scope.)
- The "`if false or true`" parsing quirk (where the stuff after "false" is considered the first statement inside the if block) is fixed, and the whole "false or true" is considered part of the if condition.
- File IO is done OOP style, with fopen returning a file object instead of a numeric handle.
- UTF-8 becomes the default encoding for all external text IO on all platforms.
- A revamped helptext system for embedding somewhat-formatted, somewhat-structured API reference documentation in source code. The existing helptext format is too simple and loosey-goosey.
- Maybe chars should become Unicode code points instead of UTF-16 code units, and strings and chars should be stored in Python-style "flexible-width string" format. Would save memory, and make it easier to work with emoji or exotic scripts.
- The GUI Layout Toolbox's functionality is pulled in to base Matlab, including support for relative positioning and sizing of widgets (like how Java Swing layouts work), and relative positioning layouts become the default (instead of 'normalized' or absolute-units positioning like it is now).
Things I do not want to see:
- Multithreading.
Sweet! That's a nice little convenience. I missed the R2022a Prerelease this time around, but will hop on the main release this weekend.
Regarding some of the date and time related requests, in release R2022a we moved some of the functions for working with dates and times from Financial Toolbox into MATLAB. In particular today will give you today's date [just like datetime('today') does.]
Also in release R2022a "MATLAB uses UTF-8 as its system encoding on Windows®, completing the adoption of Unicode® across all supported platforms."
I saw this in the release notes and in discussion on the Discord! Very interested! I am going to play with this over the next week or so, and then I'm sure I will have Opinions. Thanks!
There's a new feature in release R2021b that allows you to customize how your object is displayed when it's stored in a container like a cell array or a struct array.
The fixed-precision toolbox permits defining saturation and related behaviours on a per-object basis. I have a vague memory of seeing a FI object that could expand its bit representation at need (with an upper bound... I think it was 1023 bits.)
That's a good point about saturation behavior, and I don't have an idea for how to handle it gracefully. (A global setting to control saturation vs. overflow errors vs. auto-widening in int arithmetic would be terrible. You could define separate saturating and widening/overflow-erroring int datatypes, but eww.) Maybe auto-widening is just a bad idea.
The fact that basic arithmetic is divided into multi-core pieces tells us that dynamic widening based upon actual values gets a bit messy. Possible though. It would probably be a lot easier to widen and do the arithmetic and then determine whether any location needs the wider type and if not to drop down (you might be able to get the value scan nearly "for free" as you store the values.)
I tell you though, that the current behaviour of saturating when adding two uint8 is relied upon by a lot of image processing... expected dynamic range is determined by the datatype, so if you overflow a single uint8(255) to uint8(256) based upon dynamic widening, then suddenly what used to display as full-bright colors are only 1/16th intensity colors....
Oops, you're right; got order of operations wrong.
a = intmax('uint64') / 8 + 5
Should be intmax('uint64') gives you a uint64; 8 is an int32 which widens to a uint64; then 5 is an int32 which then widens to a uint64.
"Narrowest type that fits" – you're right; that's C's behavior – instead of defaulting to an int32 might not be as convenient for Matlab, where literals might be used to initialize an array, which is then assigned into elementwise. x = repmat(0, [1 1000]) giving you an int8 may not be ideal, because then x(1) = 420 won't work. int32 seems like a nice common default type for ints to be.
> So the resulting type would be determined dynamically ?
Yup. The temporary output variable would probably need to start out wide, and then be narrowed if the result fits in a narrower type. Or start it out narrower and then dynamically widen it if the operation hits an element that doesn't fit. (That might be really hard to do for multithreaded operations.) Like I said, might not be feasible.
> Do you demote to the narrowest type that will fit all of the results, so uint64(1) + uint64(2) demotes right down to uint8(3) ?
Don't think I'd bother with that, for stuff like the assignment reasons mentioned above.
> integer types are often used in hardware interfaces or interfaces that expected a fixed type...
Ooh, didn't think of that. Yeah, dynamic widening based on the actual value of arithmetic results probably isn't a good idea. Getting too clever here.
8 and 5 should be signed int32s; 8 + 5 yields an int32 13;
Careful, the expression was not
a = intmax('uint64') / (8 + 5)
In some languages, the rule for integer-like literals not marked as unsigned, is that they occupy the narrowest integers that they fit into, so in this case 8 and 5 would be int8.
Maybe it could go to an even wider type if necessary to hold both values, like the combination of a uint32 too large to fit in an int32 and a negative int32; that could widen to an int64.
So the resulting type would be determined dynamically ? A uint32 + B uint32 results in uint32 if it fits and uint64 if necessary? If you are vectorizing then you might not know you need to widen until some distance into the array... do you go back and widen the already-calculated results and resume ? Do you calculate in maximum width that is possible for adding two values of that type (in this case uint64) and then later scan to find out if widening was needed, and if so then demote down to the narrower type? Do you demote to the narrowest type that will fit all of the results, so uint64(1) + uint64(2) demotes right down to uint8(3) ?
... remembering that integer types are often used in hardware interfaces or interfaces that expected a fixed type...
I think I'd have them work pretty much like C's mixed-mode conversion rules.
a = intmax('uint64') / 8 + 5
8 and 5 should be signed int32s; 8 + 5 yields an int32 13; that int32 widens to uint64 when combined with the intmax result and a is a uint64.
If you did 5 - 8 instead to get -3, I think that should throw an error (a type of overflow error) when combined with a uint, because uints can't represent negative numbers.
b = double(1)
a - b
Combining a double and a uint64 would produce a double, and you'd live with the roundoff.
c = single(1)
a - c
I think this should produce a single, following the general rule of "ints 'widen' to floats".
When integers of different classes are combined, like so:
int8(3) + int32(420)
Then the smaller ints should widen to the larger ints.
Maybe it could go to an even wider type if necessary to hold both values, like the combination of a uint32 too large to fit in an int32 and a negative int32; that could widen to an int64. Raise an error if no type could be found that could exactly represent all of the input values. I'm not sure if that's actually feasible, though; the algorithm might be complex, and it could cost extra array scans and min/max accumulation when an arithmetic operation is being done on nonscalars.
format long g
a = intmax('uint64') / 8 + 5
a = uint64
2305843009213693957
b = double(1)
b =
1
c = single(1)
c = single
1
fprintf('double(%.999g)\n', double(a) - b) %widen uint64 to double
double(2305843009213693952)
fprintf('single(%.999g)\n',single(a) - c) %widen uint64 to single
single(2305843009213693952)
"In mixed-mode arithmetic (combining floats and ints), ints widen to floats instead of narrowing to ints."
What would be your proposals for how the following should work?
a = intmax('uint64') / 8 + 5
a = uint64
2305843009213693957
b = double(1)
b = 1
c = single(1)
c = single
1
a - b
ans = uint64
2305843009213693956
a - c
Error using -
Integers can only be combined with integers of the same class, or scalar doubles.
Integers can only be combined with integers of the same class, or scalar doubles.
That's okay, I trust our Technical Support team to handle that case.
@Steven Lord - Have a look at Tech Support case #04456828, where I've put in a related suggestion for this before.
I'm going to send these string vs. char benchmarks along to MathWorks Tech Support and engineering later this week as part of a "Please speed up string arrays" enhancement request. If you'd like to be on that email thread and TS ticket, send me the email you have your MathWorks account under and I'll include you.
You've inspired me to add some string array vs char array comparisons to matlab-bench: https://github.com/janklab/matlab-bench/blob/master/bench_matlab_ops/compareStringAndCharOps.m. If you want to contribute some items for use cases relevant to your parsing activity, I could use some more examples here!
Strings are significantly slower than chars for pretty much every operation I tested. The most relevant one here, I think, is implicit conversion of 1-character-long strings to char arrays. That could really hurt performance of char array operations if '...' literals were to construct strings intead of char arrays.
>> compareStringAndCharOps(NaN, ["convert"])
String vs. char benchmark:
Matlab R2021a on MACI64
OS: Mac OS X 10.14.6
Intel(R) Xeon(R) W-2150B CPU @ 3.00GHz, 10 cores, 128 GB RAM
Name CharNsec StringNsec StringWin
_____________________________________________________ ________ __________ _________
Construct from char, n=1 85 1666 -18.67
Construct from char, n=1000 272 606 -1.23
Construct from char, n=100000 133 9221 -68.12
Convert scalar string as char for (s{1}), n=1 129 522 -3.05
Convert scalar string as char for (s{1}), n=1000 272 749 -1.75
Convert scalar string as char for (s{1}), n=100000 85 10096 -118.42
Convert scalar string as char for (char(s)), n=1 32 103 -2.19
Convert scalar string as char for (char(s)), n=1000 37 171 -3.58
Convert scalar string as char for (char(s)), n=100000 14 6620 -466.44
Impl conv to from variable scalar char, n=1 52 5642 -108.02
Impl conv to from variable scalar char, n=100 3 5130 -1908
Impl conv to from variable scalar char, n=1000 2 5123 -2630.2
Impl conv to from variable scalar char, n=10000 1 5257 -4331.1
Impl conv to from variable scalar char, n=100000 3 5099 -2035.8
Impl conv from literal to scalar char, n=1 10 4797 -474.25
Impl conv from literal to scalar char, n=100 4 5110 -1231.2
Impl conv from literal to scalar char, n=1000 1 5228 -4806.5
Impl conv from literal to scalar char, n=10000 1 5123 -4601.6
Impl conv from literal to scalar char, n=100000 3 5070 -1729.6
Yes, it is common for users to do custom parsing. regexp() and textscan() get a lot of work-out. loops of fgetl and sscanf() too, but if you have the memory, you can often get much higher performance by reading the entire file as character and fishing through with regexp() or strmatch()
I have not re-checked lately, but at least for several years, readtable()'s handling of xlsx files involved using regexp()
As measured in my post above, the random-access performance using extract() is far far slower than using character indexing.
Do people do much parsing in Matlab? These days, most Matlab-relevant data seems to come in standard formats like CSV, JSON, XML, or Excel, for which there is built-in native I/O and parsing support. Are people writing custom/unusual file format parsers in M-code?
I don't think this change would be a big problem for parsing: the char array type would still be available, and implicit conversion between char and string would handle the rest. Only concern I can see would be performance implications of creating the extra temporary string arrays. Dunno how much to worry about that: parsing code would already be creating a bunch of temporary char arrays, and even now if you do parsing of nontrivial text file formats in M-code, you're going to be a sad panda when it comes to performance.
I just think that characterwise operations are a relatively unusual case in Matlab, seldom happen inside performance-critical code
Character operations are important for parsing, which has strong connections to file I/O.
Hi Steven! Thanks for responding.
I find the current API for customizing the display of objects inadequate for a lot of my uses:
- It doesn't differentiate between contexts needing per-element string representations, or a single string representation for a whole array.
- It doesn't work for formatting objects for inline display to %s placeholders in the *printf family of functions. (I think? Maybe there's
- It doesn't apply when the user-defined objects are stored inside fields, cells, or properties of compound data types, such as structs, cell arrays, table arrays, or other objects.
- Doesn't apply to display inside the Workspace widget of the Matlab desktop.
- It would be nice to have a distinction between human/user-presentation formats (like "str" in Python) and lower-level debugging oriented formats (like "repr" in Python).
As far as I can tell, the new extended display customization API (beyond the original disp and display override support) is just a way of conveniently producing more complexly-structured disp output.
I'll see if I can find time to write up a blog post about this; I discuss it often enough.
I've had no problem passing string arrays to properties of Handle Graphics figure handles, but they seem to get converted to char or cellstr when you do that, and always come back out as char or cellstr;
>> disp(version('-release'));
2021a
>> f = figure;
>> f.Name
ans =
0×0 empty char array
>> f.Name = "marshmallow";
>> f.Name
ans =
'marshmallow'
>>
So user code still has to deal with chars in this case. Would be nice if it were just uniformly string-ified.
A few of these you can do already, more or less.
- An object display customization API -- see the documentation on customizing the display of objects.
- Figure handle properties use string arrays instead of char vectors. -- I think most if not all Handle Graphics properties that could accept char vectors or cellstrs should also be able to accept string arrays. If you see one that you think should accept strings but it does not, please report that to Technical Support.
Also, and this is kind of off-topic, but: I'm kind of disappointed with the overall performance of string arrays in current versions of Matlab. It seems like since they're a dedicated datatype that has (or should have!) a simpler internal storage model than cellstrs, they should outperform cellstrs in many cases, and never be slower. But in my testing as of R2019b or so, they often don't!
I've reported this to Tech Support and discussed it with a couple MW developers. If this is a concern to y'all, maybe you could pile on and report your own results too?
Oh yeah. Your performance concerns here totally make sense to me. I'm not suggesting we get rid of the char datatype. I'm just saying that the literal expression 'foobar' should produce a string array by default, and if you really want chars, you call char('foobar') and get a char array back and it works pretty much like chars do now and you can index into it and get individual characters or char subarrays. All I want here is a change in the behavior of the literals, not the datatypes.
Even as is, I'm not too worried about the performance difference here. If you want to do characterwise operations, convert the string to a char array. Like you do in Java. I just think that characterwise operations are a relatively unusual case in Matlab, seldom happen inside performance-critical code, and in most cases strings are a better type, to the extent that I don't think it's best to have the '...' and "..." literal expressions produce characters. Could be wrong here; I dunno. So maybe '...' should stay as chars. I'd just like it to produce strings because
a) I want to use strings pretty much everywhere, and I don't like doing the extra work of holding down the shift key to produce " when writing string literals.
b) I think the language should encourage most developers to use string arrays pretty much everywhere, and getting to char arrays should be extra work.
Well yeah. Goes without saying that I'd want it to be Fast.
I was curious about the performance differences between char and string:
format long g
N = 1e5;
Achar = char(randi([33 127], 1, N));
Astr = string(Achar);
idx = randi(N, 1, floor(N/10));
t0 = timeit(@() Achar(idx), 0)
t0 =
3.51935e-05
t1 = timeit(@() Astr{1}(idx), 0)
t1 =
3.86935e-05
t1/t0
ans =
1.09945018256212
t2 = timeit(@() arrayfun(@(K) extract(Astr, K), idx), 0)
t2 =
1.421328801
t3 = timeit(@() arrayfun(@(K) K, idx), 0)
t3 =
0.020044801
The extract() version is the official API, with the {1}(idx) being considered to be a legal but discouraged hack. The extract() version is, however, up to fifty thousand times slower in my tests (one of my earlier runs had higher ratios than this one)
round(t2/t0)
ans =
40386
round(t3/t0)
ans =
570
part of this is the time required for the arrayfun(), but you can see from t3 compared to t2 that extract has by far the major cost.
This is reported as MathWorks Tech Support case #05062335.
No. I do not want "or" to become an infix operator. I'm talking about a syntax/parsing quirk with the if statement, where if you put more stuff after the first thing following the if which is a valid expression, that stuff is interpreted as the first statement inside the if block and not as part of the condition for the if.
I'm using "or" here to illustrate a potentially common newbie mistake that gets interpreted by Matlab's parser in a surprising and potentially misleading way.
Here's another example:
if false whatever bogus stuff you want here
disp('yup')
else
disp('nope')
end
Run this and you get:
nope
>>
Similarly:
if true or 2 < 3
disp('yup')
end
You get:
Error using |
Too many input arguments.
>>
I think this is confusing to newbies, a potential source of subtle bugs, and the grammar/parsing weirdness just bothers me.
if false or true
To clarify: do you mean literally using the operator spelt out as or instead of the | operator ?? The named operators have never been infix in MATLAB.
Could you give a short example ?
Sign in to participate
Posts by this author
-
Discussion
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom(English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)