What is the recommended practice for plotting the outputs of histcounts?

28 views (last 30 days)
I have generated multiple histograms with the histcounts function. The histograms are computed from large data sets that I cannot store in memory all at once. I want to be able to plot the histograms to compare the distributions of these large data sets. The 'histogram' function accepts the raw population and makes a histogram plot, but does not accept the outputs of histcounts and simply plot them up. The 'bar' function does not directly accept the outputs from histcounts either, since the edge array is an N+1 point array and the counts array is a N point array. Do I have to convert my N+1 point edge arrays to N point bin center arrays to plot these histograms? Or is there a another function that directly accepts the outputs from histcounts to make a histogram plot? Not a difficult problem to solve, but it makes these 'new and encouraged' histogram functions seem kind of clunky compared to the 'discouraged' histogram functions.

Answers (2)

Steven Lord
Steven Lord on 29 Nov 2018
If you cannot directly call histogram to bin your data while visualizing it but need to visualize the output of one or more calls to histcounts (for example if you call histcounts repeatedly with a common set of bin edges and a subset of your data each time then add the bin counts together), call histogram with the 'BinCounts' and 'BinEdges' properties. Compare the two figures created by this example:
rng default
x = randn(10000, 1);
y = randn(10000, 1) + 5;
[minCombined, maxCombined] = bounds([x; y]);
BE = linspace(minCombined, maxCombined, 100);
figure
h1 = histogram([x; y], BE);
bincounts = histcounts(x, BE);
bincounts2 = histcounts(y, BE);
figure;
h2 = histogram('BinCounts', bincounts + bincounts2, 'BinEdges', BE);
While I did concatenate x and y together to identify the minimum and maximum of the combined data to generate a set of bins that would work for both data sets, you may be able to generate a common set of bin edges without needing x and y to be in memory simultaneously.
If you're planning on working with data too large to fit in memory regularly, I would like to point out that the histcounts and histogram functions support tall arrays with some limitations given in the Extended Capabilities section on their documentation pages.

Image Analyst
Image Analyst on 24 Mar 2016
I use bar(), along with other functions like grid, xlabel, ylabel, title, and whatever else I want to use to fancy up the chart.
  9 Comments
Image Analyst
Image Analyst on 30 Nov 2018
Personally I like the fact that histogram does the whole array and doesn't force me to do (:) after the array. I never would want the histogram of just the columns or just the rows. If I pass an image, why would I want , say, four thousand histograms instead of just one for the whole image? I wouldn't. And I'd guess that others also would rarely want dozens or hundreds of histograms. So while I agree it's inconsistent because it doesn't give a result for every row or every column, I believe it's an inconsistency for the better.
Bruno Luong
Bruno Luong on 30 Nov 2018
It's not question of inconsistency, it's a question of a feature that has been removed (working along a dimension).
If you don't use it good for you, IA.
But some people like me do. I do not work with not an image but array of data that I need to bin along some dimension that I select.
But if you prefer image processing, I can give you a concrete example where such thing is needed IA: if one one to do sliding histogram on a small rectangle area of an gray image, one could conver the image to im2col then use HISTC along the first dimension. This is very convenient.
Try to do the same with HISTCOUNTS, you"ll end up with few slow for-loops instead of fast and compact code.

Sign in to comment.

Categories

Find more on Data Distribution Plots in Help Center and File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!