'ABC'-'A' Considered Harmful

2 views (last 30 days)
per isakson
per isakson on 5 May 2012
Commented: Walter Roberson on 9 Jun 2018
The trick 'ABC'-'A' is that good programming style?
--- edit ---
"tag: goto" alludes to Edsger Dijkstra's famous letter ;-)
"tag: answer" is a trace of the reason why I submitted the question. I've seen questions by "new to Matlab", which have received answers including not so obvious code with "'ABC'-'A'" embedded for no good reason.
  3 Comments
Walter Roberson
Walter Roberson on 7 May 2012
I have probably used this kind of expression from time to time in replying to people who might not be experienced in MATLAB.
In the cases where I was not adding or subtracting '0', the situation was likely one in which the lack of "good programming style" was intentional, such as cases where code was provided to prove that the task could be done, but where it "felt" likely to me that if plain code had been provided, the user would have copied the plain code for their assignment without attempting to understand it -- cases where if they blindly copied the more obscure code, any marker who actually read the code would immediately know that the person did not write the expression themselves.
per isakson
per isakson on 10 May 2012
@Walter, the hidden message to the teacher is a reason.

Sign in to comment.

Accepted Answer

Walter Roberson
Walter Roberson on 5 May 2012
Adding or subtracting '0' is the most efficient method of converting between decimal-coded binary and character-coded binary.
Subtracting 'A' or 'a' (and then adding 10) is a well known and efficient conversion from character-encoded hexadecimal to binary.
Adding or subtracting ' ' (space) is used often in base64 encoding/decoding (e.g., MIME) [though you do need to special-case that binary 0 is coded as period instead of as space)
Adding or subtracting 32 used to be very common magic for converting between upper and lower-case ASCII. So common that it became a problem when dealing with EBCDIC and then later with ISO-8896-* and UNICODE. So common that this bug was hard to find, because programmers would read the 32, know that it was upper/lower case conversion, and then be puzzled that letters weren't being converted properly...
The characters '1' through '9' have been in consecutive coding positions since the ITA2 code of 1930. Any program that is not required to work with Baudot or Murray or older codes may assume that for a fact. Any program written the ASCII / ANSI / ISO / UNICODE line may assume that upper-case "Latin" (English) characters are consecutive, and that the lower-case "Latin" (English) characters are consecutive: this is a fundamental standardization no worse than assuming that all of the MATLAB operator characters are present in the character set. As best I know, MATLAB has never been supported on any EBCDIC-based system on which the assumption is not true.
  1 Comment
Walter Roberson
Walter Roberson on 9 Jun 2018
A few weeks ago I was helping someone learn C for a Harvard online course. Some of the early exercises involved ciphers (such as Caeser Cipher) and later involved translation of musical note letters (note and octave) into frequencies.
It turned out to be surprisingly difficult to get the person to retain the idea of computing relative position by subtracting the first member of an ordered sequence.
On the other hand, it would have been difficult to teach a beginner the idea of indexing a mostly-unpopulated matrix by a character. It would have been ridiculous to have them test for equality with each alphabetic character individually. The only realistic implementations within reach for the person were subtracting the base character, or looping comparing against a reference vector of character to extract out the index of the match.
Now that they have retained the idea of finding relative position, that is a efficient general technique they can apply in future programming in many situations; looping comparing against possibilities known to be consecutive is not, I would say, any more "clean" than subtracting the base.
A lot of programming is about looking for patterns in the task that can be readily solved.

Sign in to comment.

More Answers (4)

per isakson
per isakson on 13 Aug 2012
Edited: per isakson on 18 Jul 2016
To explicitly convert to numeric before doing arithmetic is faster. (Real reason: I find 'abc'-'a' confusing.:)
Try
>> [ t1, t2 ] = cssm( 1e5 )
n1==n2 is true
t1 =
0.3252
t2 =
0.0838
>>
where
function [ t1, t2 ] = cssm( N )
str = char(49:120);
id1 = tic;
for ii = 1 : N
n1 = str - 'A';
end
t1 = toc( id1 );
id2 = tic;
for ii = 1 : N
n2 = double( str ) - double('A');
end
t2 = toc( id2 );
if all( n1 == n2 )
disp( 'n1==n2 is true' )
else
disp( 'n1==n2 is false' )
end
end
&nbsp
2016-07-18, Rerun of the test with R2016a. The first run was done with R2011a(?) and on the same old vanilla desktop.
>> [ t1, t2 ] = cssm( 1e5 )
n1==n2 is true
t1 =
0.0327
t2 =
0.0214
>>
The new "JIT-engine" seems to be more efficient in this case. And the effect of using double is smaller.
&nbsp
Testing is tricky! In contrary to Jammes Tursa I see some advantage of the double() at the command line.
>> S = char('A'+floor(rand(1,1e7)*25));
>> clear a; tic; a = S - '0'; toc
Elapsed time is 0.137705 seconds.
>> clear a; tic; a = S - '0'; toc
Elapsed time is 0.125812 seconds.
>>
>> clear a; tic; a = double(S) - double('0'); toc
Elapsed time is 0.082686 seconds.
>> clear a; tic; a = double(S) - double('0'); toc
Elapsed time is 0.081962 seconds.
>> clear a; tic; a = double(S) - double('0'); toc
Elapsed time is 0.078693 seconds.
  1 Comment
James Tursa
James Tursa on 17 Jul 2016
Edited: James Tursa on 17 Jul 2016
Interesting result. This appears to be the work of the parser optimizing stuff. E.g., from the command line:
>> S = char('A'+floor(rand(1,1e7)*25));
>>
>> clear a; tic; a = S - '0'; toc
Elapsed time is 0.068686 seconds.
>> clear a; tic; a = S - '0'; toc
Elapsed time is 0.068900 seconds.
>> clear a; tic; a = S - '0'; toc
Elapsed time is 0.064843 seconds.
>> clear a; tic; a = S - '0'; toc
Elapsed time is 0.059291 seconds.
>> clear a; tic; a = S - '0'; toc
Elapsed time is 0.070644 seconds.
>>
>> clear a; tic; a = double(S) - double('0'); toc
Elapsed time is 0.075243 seconds.
>> clear a; tic; a = double(S) - double('0'); toc
Elapsed time is 0.067354 seconds.
>> clear a; tic; a = double(S) - double('0'); toc
Elapsed time is 0.077742 seconds.
>> clear a; tic; a = double(S) - double('0'); toc
Elapsed time is 0.073420 seconds.
>> clear a; tic; a = double(S) - double('0'); toc
Elapsed time is 0.078540 seconds.
So at the command line the advantage of the double( ) disappears. Maybe there is some compiled code that MATLAB is using in the double( ) case that the parser has available, and no such compiled code exists for the character minus case.
Since this result seems to be the result of optimized code that the parser is able to use, I would not be surprised if this result varied quite a bit between MATLAB versions.
E.g., a simple mex routine result shows that the raw S - '0' calculation could be significantly improved with compiled code:
>> clear a; tic; a = char_minus(S,'0'); toc
Elapsed time is 0.044921 seconds.
>> clear a; tic; a = char_minus(S,'0'); toc
Elapsed time is 0.043699 seconds.
>> clear a; tic; a = char_minus(S,'0'); toc
Elapsed time is 0.046204 seconds.
>> clear a; tic; a = char_minus(S,'0'); toc
Elapsed time is 0.051441 seconds.
>> clear a; tic; a = char_minus(S,'0'); toc
Elapsed time is 0.058610 seconds.
The mex routine (bare bones, not production quality):
#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
mwSize i, n;
mwSize *dims;
mwSize ndim;
mxChar *cp;
mxChar c;
double *pr;
n = mxGetNumberOfElements(prhs[0]);
dims = mxGetDimensions(prhs[0]);
ndim = mxGetNumberOfDimensions(prhs[0]);
cp = mxGetChars(prhs[0]);
c = *mxGetChars(prhs[1]);
plhs[0] = mxCreateNumericArray(ndim, dims, mxDOUBLE_CLASS, mxREAL);
pr = mxGetPr(plhs[0]);
while( n-- ) {
*pr++ = *cp++ - c;
}
}

Sign in to comment.


Jan
Jan on 5 May 2012
It depends. The result is clear and well defined, but not obvious. If you store large arrays in an M-file, char occupies less memory in the RAM than double arrays. But storing large data sets in M-files is a bad programming style already, because this mixes data and program.
I use 'abc' - 'a' only to encode icons in M-files, because it allows a vague view of the result.
color = ['CCCCCHFFHCCCCC'; ...
'CDFNBFFFFFFFDC'; ...
'DPGGGGGGGGGGBH'; ...
'DPDMMMMOOAADPD'; ...
'DFFFNFFFFFFFIH'; ...
'CILLKJGEKNGEIC'; ...
'CILBKJLGKFNKIC'; ...
'CILBKJLGKFIKIC'; ...
'CILBKJLGKFIKIC'; ...
'CDLBKJLGKFIKDC'; ...
'CDLBKJLGKFIKDC'; ...
'CDLBKJLGKFIKDC'; ...
'CDLBKJLGKFNKDC'; ...
'CDLLKJGEKNGEDC'; ...
'CHBGEEEGGLPNHC'; ...
'CMDDIIDDDDDHMC'] - ('A' - 1);
map = [28, 26, 36; ...
116, 118, 132; ...
NaN, NaN, NaN; ...
73, 74, 89; ...
177, 176, 193; ...
96, 98, 112; ...
151, 152, 167; ...
60, 63, 76; ...
84, 84, 96; ...
220, 222, 236; ...
191, 193, 206; ...
135, 133, 143; ...
39, 41, 55; ...
108, 107, 118; ...
36, 34, 44; ...
124, 126, 140];
[x, y] = size(color);
Icon = reshape(map(color, :) / 255, x, y, 3);
uicontrol('Position', [10, 10, 32, 32], 'CData', Icon);
This is, in my opinion, the best way to store an icon in an M-file. But icons can be stored and edited much more comfortable in graphic files.
  6 Comments
Geoff
Geoff on 10 May 2012
If MatLab used a single backslash for line continuation, I'd probably do it more often. =) I find the typing-in of three characters ...
on ...
every ...
line ...
quite ...
enervating.
Jan
Jan on 10 May 2012
Btw., "..." does not only continuate the line, but starts a comment also. There is no need for an additional % and this is even documented.

Sign in to comment.


Daniel Shub
Daniel Shub on 5 May 2012
I beleive that coding styles that sacrifice readibility for efficiency are generally bad style. It is possible that under some circumstances the gain in efficiency can offset the loss in readability. For example, in MATLAB loops used to be so slow that that we had to sacrifice readability for performance all the time by vectorizing everything. Thankfully that is not the case anymore.

Daniel Shub
Daniel Shub on 9 May 2012
I asked a similar question, although not identical by any means, a while back:

Categories

Find more on Characters and Strings in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!