Is the value of the coefficient of determination in fitlm.m correct?

7 views (last 30 days)
I am wondering whether the value of the coefficient of determination in fitlm is correct. Below there is a code that calculates it in two ways, which should be equivalent. The first approach returns Rsq1=-3.35. The second approach uses fitlm and returns Rsq2=0.36. The two values are not equal. It may be that fitlm assumes SST=SSE+SSR, which may not apply in the case of linear regression with intercept.
x=[37,31,3,1,16,98,12,14,75]';
y=[15,28,44,19,33,10,21,14,15]';
% approach 1
b=x\y;
yCalc = x*b;
SSE=sum((y-yCalc).^2);
SST=sum((y-mean(y)).^2);
SSR=sum((yCalc-mean(y)).^2);
SSTchk=SSE+SSR;
% coefficient of determination calculated with the correct approach
Rsq1=1-SSE/SST;
% approach 2
dlm = fitlm(x,y,'Intercept',false);
% coefficient of determination in the fitlm matlab function
Rsq2=dlm.Rsquared.ordinary;
% expression of the coefficient of determination in the fitlm function?
Rsq2fitlm=1-SSE/SSTchk;

Answers (2)

the cyclist
the cyclist on 4 Sep 2018
MATLAB clearly "knows" that something is up with the value of R^2. Note that if you had instead included an intercept
dlm = fitlm(x,y)
then MATLAB will report the R^2 value as part of the model output, so that output must be actively suppressed in the no-intercept case.
I'm a little bit surprised that the negative-R^2 value is not reported (perhaps with a warning against the interpretation) and also that this behavior is not documented. (Well, I couldn't find it, anyway).
I'm specifically surprised because the much-older regress function will report a negative R^2 (with a warning), and this behavior is documented.

Jeff Miller
Jeff Miller on 5 Sep 2018
I don't have a complete answer, but here are some possibly-relevant observations:
1. Without the constant in the model, the total SS should be:
SST=sum(y.^2);
This is because the intercept 'a' is assumed to be 0, so the total SS to be explained is not the SS around the mean Y, but rather the SS around 0. The value of SST computed in this manner is 5357 for these data. (SPSS also gives 5357 as the total SS in a regression with no constant term.) For some reason, MATLAB's fitlm give 6504.7 as the value of dlm.SST. I am not sure where it gets that value. Perhaps it is using the model where x and y are both assumed to have some error, so it minimizes the perpendicular distance from the predicted y's to the prediction line rather than the vertical distance (as in the usual regression modelling assuming x's are measured without error).
2. With the corrected SST given above, the equation
Rsq1=1-SSE/SST;
yields an R^2 value of 0.2227, which SPSS also produces.
3. SPSS also produces the values that MATLAB reports in dlm.Coefficients.
  2 Comments
faber fenix
faber fenix on 5 Sep 2018
It looks like different software use different definitions of R^2 in the case of no intercept, and this is not always well documented. E.g. R would produce 0.2227 as you reported with SPSS. Excel produces -3.35.
Jeff Miller
Jeff Miller on 5 Sep 2018
And seemingly different definitions of SST, which is somehow even more surprising.

Sign in to comment.

Tags

Products


Release

R2017a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!