How do you fit a distribution to data?

25 views (last 30 days)
Sam
Sam on 26 Mar 2014
Commented: Shashank Prasanna on 31 Mar 2014
I am trying to fit my data to a normal distribution. The data is a distribution of signal over a rate (i.e. Y - Signal magnitude, X - time). I am trying to fit the distribution by transforming the function for the normal distribution PDF so that I can use it in a least squares fit of the data. This is what I do:
I change:
f(x) = 1/(s*sqrt(2*pi)) * exp( -((X - m).^2)./(2*s^2) );
to:
log(f(x)) = M(1).*X^2 + M(2).*X + M(3);
where,
M(1) = -1/(2*s^2); M(2) = m/s^2; M(3) = ( log( 1/(s*sqrt(2*pi)) ) - m/(2*s^2) );
where,
m = mean (Mu) and s = standard deviation (Sigma).
I then do a least squares fit for M(1), M(2), and M(3):
D = log(X); G = [X.^2 X ones(numel(X),1)]; M = inv(G'*G)*G'*D;
I then use M(1) and M(2) to calculate s and m (Sigma and Mu):
M(1) = -1/(2*sig^2) => sig = sqrt( -1/(2*M(1)) );
M(2) = m/s^2 => m = M(2)*s^2;
I then plug m and s into the PDF function and calculate what I expect to be the fit to the data, but it isn't. I also calculate f(x) using M(1), M(2), and M(3), and I get something which looks more like my data, but the PDF and f(x) I've calculated look nothing alike.
Does anyone see what I've done wrong?

Answers (3)

Star Strider
Star Strider on 27 Mar 2014
I can’t follow what you’re doing.
There is an easier way using fminsearch:
% Generate x- and y- data:
x = linspace(0,10);
% Normal distribution — b(1) = mean, b(2) = std
N = @(b,x) exp(-((x-b(1)).^2)./(2*b(2)^2)) / (b(2)*sqrt(2*pi));
b = [5; 1];
d1 = N(b,x); % Generate smooth data
d2 = d1 + 0.05*(rand(size(d1))-.5); % Add noise
% Use ‘fminsearch’ to do regression:
y = d2;
OLS = @(b) sum((N(b,x) - y).^2); % Ordinary Least Squares cost function
opts4 = optimset('MaxFunEvals',50000, 'MaxIter',10000);
B = fminsearch(OLS, rand(2,1))
Nfit = N(B,x);
Substitute your X and Y data vectors for x and y in place of my y = d2 line, and delete or comment-out the d1 and d1 lines. (I used them to be sure the code works.) It should work without problems. The resultant vector B has the mean as B(1) and the standard deviation as B(2). The Nfit line uses x and B to generate a normal distribution fit to your data if you want to plot it.
  2 Comments
Image Analyst
Image Analyst on 27 Mar 2014
He's making the equation linear in X by taking the log, so that he can then use polyfit to fit a quadratic, like I did in my answer.
Star Strider
Star Strider on 27 Mar 2014
I got that part. What I didn’t understand is not calculating M as:
M = G\D;
or just using polyfit. With binned data from histc, the fminsearch approach works.

Sign in to comment.


Image Analyst
Image Analyst on 27 Mar 2014
I did the same thing in my demo to fit a Gaussian to a histogram. Someone may say that's not the right way (and actually I know it's not theoretically right because it minimized the residuals to the log of the function, not the original function) but it's close enough for me. See attached file (below in blue text).
  2 Comments
Sam
Sam on 27 Mar 2014
Your approach is the same as mine, but I think the problem is that I'm not using the PDF to describe a variable with random noise, I'm using the PDF as a function form that fits a time signal I am measuring. The envelope of the signal behaves like a bell curve, but it isn't exactly a bell curve. If I make up hypothetical bell curve data points versus time, centered at some time, I can get the function to fit the data. As soon as the data starts behaving slightly unlike a bell curve, the amplitude of the fitting bell curve is nearly twice the data.
Also, you have different model parameters then I do when I transform the PDF function. I get param(1) = -1/(2*Sigma^2), param(2) = Mu/Sigma^2, and param(3) = log(1/(Sigma*sqrt(2*pi))) - Mu/(2*Sigma^2).
Image Analyst
Image Analyst on 27 Mar 2014
I'm not sure why it's a problem where the data came from - if it's a PDF or envelope or whatever. I just don't see why it matters. The data is the data regardless of how or what generated it. Our parameters may be different if we chose somewhat different equations.

Sign in to comment.


Shashank Prasanna
Shashank Prasanna on 27 Mar 2014
If you have the Statistics Toolbox use normfit
Likewise, lognfit for log normal distributions:
If you want a visual way to do this try:
Note: Normal distribution is parametrized by its mean and standard deviation. No optimization (least-squares or MLE) is required since these can be computed explicitly. The parameter estimates will be the sample mean and square root of the unbiased estimator of the variance.
Log normal distribution log transforms the data and follows the above route. Here is the relationship between the mean and std of normal and lognormal.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!