What would cause a histogram to have regular spikes in bin counts?

7 views (last 30 days)
I tried to use histc to plot a histogram, but the resulting histogram has noticeable,regular spikes when the data should fit a smooth gaussian distribution. I am doubtful that these spikes are real, and instead something to do with the calculation in Matlab. Could there be systematic rounding of the data points that so happen to be the major axis tick marks? How would these spikes even occur?
  7 Comments
Jan
Jan on 15 Nov 2017
By increasing it by half, the original spikes at 0.1 and 0.2 are
still there
By increasing what?

Sign in to comment.

Answers (2)

the cyclist
the cyclist on 15 Nov 2017

Your code and your file don't quite match up with each other (e.g. variable named "displacement" rather than "N") and also you have not defined some of the variable you used (e.g. binsize).

Nonetheless, I was able to replicate something close to what you are seeing using your data file and this code:

load data
binsize = 0.001;
maxbin = 0.5;
binranges = 0:binsize:maxbin;
bincounts = histc(displacement,binranges);
figure
plot(binranges,bincounts)

If I just let MATLAB make 100 bins, I get a relatively smooth curve (although you can see some hints of spiky behavior):

But if I increase the number of bins to 800, it gets very spiky like your original:

So, I would conclude that this is definitely a result of how you are enforcing the binning.

  4 Comments
the cyclist
the cyclist on 16 Nov 2017

The data you uploaded are definitely clumped. Try this code, where I have "jittered" the data a little bit, and zoomed in.

figure
plot(sort(displacement)+0.00005*randn(size(displacement)),rand(size(displacement)),'.')
xlim([0.095 0.125])
ylim([-0.5 1.5])

You can see structure at spacing of about 0.001, and the density of points around 0.015. There is also a decreased density at around 0.018, but that is not as easy for the eye to perceive in this chart. The best way to perceive that is -- the histogram!

Jan
Jan on 17 Nov 2017
The historgram looks, like this is not simply a noisy data set, but it looks like 2 different overlapping Poisson distributions. A bold speculation: This is not noise, but a signal. Depending on what you are looking for it might be worth to check or reject this hypothesis.

Sign in to comment.


Image Analyst
Image Analyst on 15 Nov 2017
This can be caused by compressing the dynamic range, for example by taking an image with 256 gray levels and remapping the pixel values so that they only range from 0 to 230 or something.
It can also be caused by having your histogram bins not exactly match up with the values. For example if you had 230 bins for data that ranged from 0-255 and would look "proper" with 256 bins. But with fewer bins than needed, some bins get more pixels than they otherwise would have.
  1 Comment
Jan
Jan on 15 Nov 2017
If 256 bins are resampled to 230 bins with a linear mapping, I'd expect the spikes to be positive only and to have a equal distances, because they are caused by rounding.

Sign in to comment.

Categories

Find more on Data Distribution Plots in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!