Is the value of the coefficient of determination in fitlm.m correct?
7 views (last 30 days)
Show older comments
I am wondering whether the value of the coefficient of determination in fitlm is correct. Below there is a code that calculates it in two ways, which should be equivalent. The first approach returns Rsq1=-3.35. The second approach uses fitlm and returns Rsq2=0.36. The two values are not equal. It may be that fitlm assumes SST=SSE+SSR, which may not apply in the case of linear regression with intercept.
x=[37,31,3,1,16,98,12,14,75]';
y=[15,28,44,19,33,10,21,14,15]';
% approach 1
b=x\y;
yCalc = x*b;
SSE=sum((y-yCalc).^2);
SST=sum((y-mean(y)).^2);
SSR=sum((yCalc-mean(y)).^2);
SSTchk=SSE+SSR;
% coefficient of determination calculated with the correct approach
Rsq1=1-SSE/SST;
% approach 2
dlm = fitlm(x,y,'Intercept',false);
% coefficient of determination in the fitlm matlab function
Rsq2=dlm.Rsquared.ordinary;
% expression of the coefficient of determination in the fitlm function?
Rsq2fitlm=1-SSE/SSTchk;
0 Comments
Answers (2)
the cyclist
on 4 Sep 2018
MATLAB clearly "knows" that something is up with the value of R^2. Note that if you had instead included an intercept
dlm = fitlm(x,y)
then MATLAB will report the R^2 value as part of the model output, so that output must be actively suppressed in the no-intercept case.
I'm a little bit surprised that the negative-R^2 value is not reported (perhaps with a warning against the interpretation) and also that this behavior is not documented. (Well, I couldn't find it, anyway).
I'm specifically surprised because the much-older regress function will report a negative R^2 (with a warning), and this behavior is documented.
0 Comments
Jeff Miller
on 5 Sep 2018
I don't have a complete answer, but here are some possibly-relevant observations:
1. Without the constant in the model, the total SS should be:
SST=sum(y.^2);
This is because the intercept 'a' is assumed to be 0, so the total SS to be explained is not the SS around the mean Y, but rather the SS around 0. The value of SST computed in this manner is 5357 for these data. (SPSS also gives 5357 as the total SS in a regression with no constant term.) For some reason, MATLAB's fitlm give 6504.7 as the value of dlm.SST. I am not sure where it gets that value. Perhaps it is using the model where x and y are both assumed to have some error, so it minimizes the perpendicular distance from the predicted y's to the prediction line rather than the vertical distance (as in the usual regression modelling assuming x's are measured without error).
2. With the corrected SST given above, the equation
Rsq1=1-SSE/SST;
yields an R^2 value of 0.2227, which SPSS also produces.
3. SPSS also produces the values that MATLAB reports in dlm.Coefficients.
2 Comments
Jeff Miller
on 5 Sep 2018
And seemingly different definitions of SST, which is somehow even more surprising.
See Also
Categories
Find more on Spreadsheets in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!