Minimize error between data distribution and expected distribution

3 views (last 30 days)
Hi all,
I have a 3 set of data which are expected to:
1) 1st data-block to approach a Gaussian distribution with mu = 0 and sigma = 1;
2) 2nd data-block to approach a Gaussian distribution with mu = 0 and sigma = .8;
3) 3rd data-block to approach a Gaussian distribution with mu = 0 and sigma = .5;
Each data-block has only a limited number of representations (generally between 2048 and 8192) and because of some filter effects drawn by the specific code I use, they will not exactly match the corresponding expected distribution.
The point is that, although what it implies in terms of manipulation, I want each data-block to minimize the discrepancy between actual and expected distribution. It's to be remarked that I won't increase the number of representations, due to some need I will not explain in detail.
Generally, the first data-block, respect to the normal Gaussian distribution, looks like the followinf figure:
I was thinking to use lsqcurvefit for this purpose.
What would you suggest?

Answers (1)

Wouter
Wouter on 20 Mar 2013
Do you know this function:
histfit
  6 Comments
Wouter
Wouter on 21 Mar 2013
Edited: Wouter on 21 Mar 2013
You could try to change individual datapoints after your filteringset in order to update your datapoints; this will change the blue bars. For example; find a blue bar that is too high; change one of those datapoints into a value which lies in a blue bar that too low (compared to the red line). This does however changes your data and will render step 2)treat_with_piece_of_code useless.
However it makes more sense to find a better fit to the histogram; i.e. change the red line. Lsqcurvefit would only be useful if you would like to update the red line (fit)
PEF
PEF on 21 Mar 2013
I think that you started to get the point :)
The major concern is that I don't want to find the best fit to the data, but the best data fitting the standard normal distribution: for some reasons I need my data to fit gaussian distribution with mean 0 and sigma 1.
At the moment I'm proceeding this way:
data = randn(4096,1);
[f_p,m_p] = hist(data,128);
f_p = f_p/trapz(m_p,f_p);
x_th = min(data):.001:max(data);
y_th = normpdf(x_th,0,1);
f_p_th = interp1(x_th,y_th,m_p,'spline','extrap');
figure(1)
bar(m_p,f_p)
hold on
plot(x_th,y_th,'r','LineWidth',2.5)
grid on
hold off
figure(2)
bar(m_p,f_p_th)
hold on
plot(x_th,y_th,'r','LineWidth',2.5)
grid on
hold off
Now, I would proceed with calculating a scaling factor
sf = abs(f_p_th,f_p);
and I consequently scale the data in accordance to the scale factor of the corresponding bin; for example:
if data(1) falls within bin(1) --> scale with sf(1) and so on.
I do think that my question is no counter-intuitive, it's only reversing the standard procedure of fitting a distribution to a given set of data.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!