Why are computational results in MATLAB sometimes different on different machines and how can I prevent this has a high impact on my application?

Question

Andreas Goser on 26 Jan 2011

4
Link

Direct link to this question

https://www.mathworks.com/matlabcentral/answers/438-why-are-computational-results-in-matlab-sometimes-different-on-different-machines-and-how-can-i-prev

Answered: Bahador Marzban on 13 May 2020

I like to collect reasons why computational results in MATLAB and other MathWorks tools are sometimes different on different machines - assuming the same release is used. I know the operating system or the processor type, even the BLAS library can play a role. But why exactly? Can also be the C-compiler used make a difference?

I also like to collect best practices to avoid the differences become a real issue. E.g. when running an optimization, small differences can lead to dramatically different end results (butterfly effect).

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Jan on 26 Jan 2011

3
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/438-why-are-computational-results-in-matlab-sometimes-different-on-different-machines-and-how-can-i-prev#answer_637

Edited: Jan on 29 Apr 2019

Look at the ACOS function: Matlab uses the FDLIBM method published at netlig.org, as state in the doc. Now compile this C-source with different compilers and you get different results for some input. While this ACOS implementation is stable for input near to 0.0, it is less stable at +-1.0, where the values between Matlab and FDLIBM-ACOS compiled by LCC (shipped with 32bit-Matlab) differ by up to 128 EPS, while Borlands BCC differs by up to 12288 EPS (both values found by trial an error, so they are lower bounds!). Under MSVC the difference depends on the compiler directives /arch:SSE2 and /fp:fast. Note: These differences appear even on the same machine!

In consequence a good practice is does not mean preferring a specific compiler or value for ACOS(1.0-1000*eps), but to avoid ACOS near the edges. Usually the problems can be solved more stable by ATAN2. See the evergreen thread "Angle between two vectors": http://www.mathworks.com/matlabcentral/newsreader/view_thread/151925#805660

If the result of an optimization is dominated by the butterfly effect, its is questionable, if it should be called "result" at all. It is like the simulation of a pencil standing on the tip: In a naive implementation the pencil will stay motionless for ever, but this "solution" is not physically meaningful. To estimate the stability of a found trajectory (or result of the optimization), it is recommended to calculate the sensitivities of the outputs to a variation of the inputs. E.g. the initial conditions of a system can be determined with a certain measurement error only. In consequence there is no scientifically reliable method to forecast the temperature for the 2111-Jan-26 15:25:00 in Heidelberg with an error bound of 2 Kelvin even if you use a super-compiler, a giantic computer-cluster and octupel precision numbers. But I dare to predict the this temperature with an error bound of 100 Kelvin just by reaching out of the window today.

So on one hand the algorithm can be instable and subject to round-off and cancellation problems. On the other hand the simulated system can be instable and this has to be considered in the question which is investigated. In other words: Do not care about a single result, if you simulate rolling dice.

Jan

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Answer 2

Walter Roberson on 26 Jan 2011

1
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/438-why-are-computational-results-in-matlab-sometimes-different-on-different-machines-and-how-can-i-prev#answer_617

Yes, absolutely, if you have C code then the compiler can make a difference. There are parts of C in which the order of operations is unspecified, such as the order in which arguments to a function are evaluated. The C99 standard takes a lot of care to talk about "sequence points"; within the constraints imposed by the sequence points of the expression, the order of operations of sub-expressions of the same priority is sometimes unspecified.

A C compiler (or more correctly, code generator) might generate a fused multiply-and-add, which increases precision. I would need to review up through TC6 to see whether that has been made legal in C yet -- it does not (or at least did not)match the C abstract semantics.

Using 80 bit registers for double precision on a machine that can only store 64 bit double precision values is considered enough of a violation of the abstract semantics as to be formally forbidden (but it happens in practice.) [Note: IEEE 754 has defined 80 bit "extended precision" arithmetic... provided the system has a mechanism to store all 80 bits.] One of the issues with 80 bit registers is that if there is an interrupt or normal change of process context to run a competing routine, then the 80 bit registers have to be spilled to memory... as 64 bit quantities. Thus depending exactly when interrupts or context changes occurred, calculations could come out differently.

The Best Practices must, unfortunately, involve re-coding a number of Matlab routines, as Mathworks has never made any promises about operations being optimized for accuracy.

I think one of the first things I would do is write a routine with a name such as "accurate_sum", that re-ordered each vector so as to minimize the effect of loss of precision and catastrophic canceling. Unfortunately when I have thought about this before, I end up getting myself confused about the best handling of a mix of positive and negative values.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Answer 3

Bahador Marzban on 13 May 2020

1
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/438-why-are-computational-results-in-matlab-sometimes-different-on-different-machines-and-how-can-i-prev#answer_432311

I agree with Jan, and I would like to add this and I hope it helps in case we are dealing with adaptive time step ODE solvers. I actually had the same experience, when I ran my code in different machines, personal PC, computer or the cluster, the solution was diferent (I would say slightly). So at the end by making the 'RelTol' and ,'AbsTol', and 'MaxStep' Size smaller, the solution was converging to a one value. This simply means the problem is too stiff and different machines adapt different time steps so the solution is not the same, but if we put a stronger limit on the timestep /tolerance the solution will converge. So in this case we get different results since the problem is too stiff and it's not because of the stability.