How to use use an anonymous function to fit data based on the fittype and fit functions.

Hi there,
I have some sort of data point and I want to use an anonymous function to fit this data based on the fittype and fit functions.
Also, the function is look like this:
F(x)=a*x^2+b*x+c+d*diff(x^2)+e*diff(x)
Any suggestions?
Thanks

 Accepted Answer

I think the manual page examples for this are pretty good.
a=1; b=2; c=-1; d=-2; e=3;
xdata=-3:.03:3;
%ydata=a*xdata(2:end)+b*xdata(2:end).^2+c+d*diff(xdata)+e*diff(xdata).^2+randn(1,length(xdata)-1);
fun=@(p,x) p(1)*x(2:end)+p(2)*x(2:end).^2+p(3)+p(4)*diff(x)+p(5)*diff(x).^2;
yclean=fun([a,b,c,d,e],xdata);
ydata=yclean+randn(1,length(xdata)-1);
p0=[1,1,1,1,1];
p = lsqcurvefit(fun,p0,xdata,ydata)
Local minimum possible. lsqcurvefit stopped because the final change in the sum of squares relative to its initial value is less than the value of the function tolerance.
p = 1×5
1.0099 2.0178 -0.8965 -3.6526 -48.7815
yfit=fun(p,xdata);
plot(xdata(2:end),yclean,'-r',xdata(2:end),ydata,'rx',xdata(2:end),yfit,'-b')
legend('Clean Data','Noisy Data','Best Fit')
The fit to the data is excellent, and the two curves are so close that they overlap, but the fitted parameters do not match the original parameters. This is because the input diff(x) is a constant, and therefore is not independent of the "c" term, and the input diff(x^2) is a straight line, and therefore not indepndent of the "a" term.
Try this. Good luck.

7 Comments

@payam samadi, I did not explain the reason for the parameter fit mismatch very well. Allow me to try again. You have five predictor variables: x, x^2, constant (which corresponds to a column or a row of ones as the predictor), diff(x), and diff(x^2). But you have only three independent predictor variables, because diff(x) is a scaled copy of the ones-column, and because diff(x^2) is a scaled copy of the x column. The predictor variables need to be independent to get a robust reproducible fit.
I think you are assuming that x is equally spaced: the scaled copy you mention holds for equal spacing.
Thanks for your answer,
Yes, I can confirm that the equation is the same that I mentioned.
Also, I add the data sets, which may can be useful.
[edit: attaching the script and adding console output and the plot]
@payam samadi, Here is a script that fits your data. I included a lot of comments so you will learn as much as possible from it. It generates a line of console output and a plot, both shown below.
>> linRegFit1
Fitted values: a=13.305, b=-1.335, c=-25.451, d=-14.679, e=-142.202.
As @Walter Roberson stated, if x has length N, then diff(x) has length N-1. Therefore we only fit y(2:end), which has length N-1. For the predictor variables x and x^2, we only use elements x(2:end). The anonymous function fun=@(p,x)... returns a vector of length N-1. These adjustments allow you to use a fitting function that includes diff(x).
If I were reviewing a manuscript which included this regression analysis, or if a student used this analysis in a thesis, I would ask them to use forward selection of predictor variables, or backward elimination, to justify the inclusion of all five predictor variables.
Good luck with your work!
[edit: correct typographical errors]
@payam samadi, I would also ask my student to tell me the p-values and 95% confidence intervals for each fitted parameter. To get this info easily, and to do the whole regression easily (with the potential to add stepwise regression), use the Statistics & Machine Learning Toolbox. See how easy it is:
load('xdata'); load('ydata');
%Next: Create array with the four predictor variables as columns.
X=[xdata(2:end),xdata(2:end).^2,diff(xdata),diff(xdata).^2];
y=ydata(2:end);
mdl1=fitlm(X,y) %fit the linear model
mdl1 =
Linear regression model: y ~ 1 + x1 + x2 + x3 + x4 Estimated Coefficients: Estimate SE tStat pValue ________ _______ _______ ___________ (Intercept) -25.451 1.0814 -23.534 1.9701e-104 x1 13.305 0.73942 17.994 1.1296e-65 x2 -1.3349 0.125 -10.679 1.0479e-25 x3 -14.679 0.73364 -20.009 4.2594e-79 x4 -142.2 22.027 -6.4559 1.4506e-10 Number of observations: 1499, Error degrees of freedom: 1494 Root Mean Squared Error: 0.667 R-squared: 0.888, Adjusted R-Squared: 0.888 F-statistic vs. constant model: 2.97e+03, p-value = 0
coefCI(mdl1)
ans = 5×2
-27.5725 -23.3299 11.8543 14.7551 -1.5801 -1.0897 -16.1182 -13.2400 -185.4090 -98.9959
The values intercept,x1,x2,x3,x4 correspond to c,b,a,d,e in your original model. If you look back, you will see that the parameter estimates here match the values obtained above, when we used lsqcurvefit(). Because we used the Statistics toolbox, we get the p values and confidence intervals (with coefCI()) for each fitted parameter. The p-values show that all five fitted parameters are highly significant.
We can do a stepwise linear regression to see if all four predictors are useful in addition to the intercept. Given the highly significant p values above, I expect that all four will turn out to be significant.
%If you don't include 'Upper','linear' below, stepwiselm() will also try to fit
%interaction terms in the model. YOu did not request those.
mdl2=stepwiselm(X,y,'Upper','linear')
1. Adding x1, FStat = 8320.6404, pValue = 0 2. Adding x3, FStat = 361.6586, pValue = 2.142266e-72 3. Adding x2, FStat = 101.3947, pValue = 4.035306e-23 4. Adding x4, FStat = 41.6787, pValue = 1.45062e-10
mdl2 =
Linear regression model: y ~ 1 + x1 + x2 + x3 + x4 Estimated Coefficients: Estimate SE tStat pValue ________ _______ _______ ___________ (Intercept) -25.451 1.0814 -23.534 1.9701e-104 x1 13.305 0.73942 17.994 1.1296e-65 x2 -1.3349 0.125 -10.679 1.0479e-25 x3 -14.679 0.73364 -20.009 4.2594e-79 x4 -142.2 22.027 -6.4559 1.4506e-10 Number of observations: 1499, Error degrees of freedom: 1494 Root Mean Squared Error: 0.667 R-squared: 0.888, Adjusted R-Squared: 0.888 F-statistic vs. constant model: 2.97e+03, p-value = 0
The result above shows that stepwiselm() keeps all four predictor variables in the model - so you are justified in including all four, plus the intercept.
Try it.

Sign in to comment.

More Answers (2)

There is no chance of fitting that function.
fit() and fittype() pass in numeric data, and diff() of numeric data is always smaller than the original data. Therefore your a*x^2+b*x+c is an incompatible size to be added to d*diff(x^2)+e*diff(x)
It might make sense to rewrite in terms of gradient()
Does diff(x) happen to be constant (points are equally spaced)? If so then I suspect you can rewrite the formulas for more efficient fitting

2 Comments

fit() needs to work mathematically when the order of the input vectors is changed. fitting x, y needs to give you the same sum of squared error as fitting x(P) y(P) gives, where P is a permutation of the indices.
If you were to reorder the inputs, e*diff(x) might balance out, but d*diff(x^2) cannot balance out.
Also could you confirm for us that you mean diff(x.^2) and not diff(x).^2 ?
Thanks for your answer,
Yes, I can confirm that the equation is the same that I mentioned.
Also, I add the data sets, which may can be useful.

Sign in to comment.

Categories

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!