Minimize error between data distribution and expected distribution

Question

0 votes

Hi all,

I have a 3 set of data which are expected to:

1) 1st data-block to approach a Gaussian distribution with mu = 0 and sigma = 1;

2) 2nd data-block to approach a Gaussian distribution with mu = 0 and sigma = .8;

3) 3rd data-block to approach a Gaussian distribution with mu = 0 and sigma = .5;

Each data-block has only a limited number of representations (generally between 2048 and 8192) and because of some filter effects drawn by the specific code I use, they will not exactly match the corresponding expected distribution.

The point is that, although what it implies in terms of manipulation, I want each data-block to minimize the discrepancy between actual and expected distribution. It's to be remarked that I won't increase the number of representations, due to some need I will not explain in detail.

Generally, the first data-block, respect to the normal Gaussian distribution, looks like the followinf figure:

I was thinking to use lsqcurvefit for this purpose.

What would you suggest?

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Wouter on 20 Mar 2013

Open in MATLAB Online

0 votes

Do you know this function:

histfit

6 Comments
Show 4 older comments Hide 4 older comments

Wouter on 21 Mar 2013

Edited: Wouter on 21 Mar 2013

You could try to change individual datapoints after your filteringset in order to update your datapoints; this will change the blue bars. For example; find a blue bar that is too high; change one of those datapoints into a value which lies in a blue bar that too low (compared to the red line). This does however changes your data and will render step 2)treat_with_piece_of_code useless.

However it makes more sense to find a better fit to the histogram; i.e. change the red line. Lsqcurvefit would only be useful if you would like to update the red line (fit)

PEF on 21 Mar 2013

Open in MATLAB Online

I think that you started to get the point :)

The major concern is that I don't want to find the best fit to the data, but the best data fitting the standard normal distribution: for some reasons I need my data to fit gaussian distribution with mean 0 and sigma 1.

At the moment I'm proceeding this way:

 data = randn(4096,1);
 [f_p,m_p] = hist(data,128);
 f_p = f_p/trapz(m_p,f_p);
 x_th = min(data):.001:max(data);
 y_th = normpdf(x_th,0,1);
 f_p_th = interp1(x_th,y_th,m_p,'spline','extrap');
 figure(1)
 bar(m_p,f_p)
 hold on
 plot(x_th,y_th,'r','LineWidth',2.5)
 grid on
 hold off
 figure(2)
 bar(m_p,f_p_th)
 hold on
 plot(x_th,y_th,'r','LineWidth',2.5)
 grid on
 hold off

Now, I would proceed with calculating a scaling factor

sf = abs(f_p_th,f_p);

and I consequently scale the data in accordance to the scale factor of the corresponding bin; for example:

if data(1) falls within bin(1) --> scale with sf(1) and so on.

I do think that my question is no counter-intuitive, it's only reversing the standard procedure of fitting a distribution to a given set of data.

Sign in to comment.

Minimize error between data distribution and expected distribution

0 Comments
Show -2 older comments Hide -2 older comments

Answers (1)

6 Comments
Show 4 older comments Hide 4 older comments

Categories

Tags

Community Treasure Hunt

Minimize error between data distribution and expected distribution

0 Comments Show -2 older comments Hide -2 older comments

Answers (1)

6 Comments Show 4 older comments Hide 4 older comments

Categories

Tags

See Also

Community Treasure Hunt

0 Comments
Show -2 older comments Hide -2 older comments

6 Comments
Show 4 older comments Hide 4 older comments