How to create a boxplot from a PDF?

14 views (last 30 days)
Hello!
I have a somewhat embarrassing question, but me and my colleagues cannot figure it out since several days. Thinking block ^^ So I would appreciate help!
I have a pdf of my data called pdfxcor (598x1), which resembles a normal distribution when I plot it along a x-axis resembling the molecular weight of my data (called pixelweight (598x1)).
plot(pixelweight,pdfxcor)
boxplot(pdfxcor)
I want to display the distribution as boxplot according to the correct molecular weight.
Thanks for your patience! :)
Jette

Accepted Answer

Teja Muppirala
Teja Muppirala on 23 Apr 2013
How about something like this. Generate the CDF from your data as Tom suggested, invert it, use the inverted CDF to generate a bunch of samples that follow your distribution exactly, and send those to BOXPLOT:
%%Just making some data that resembles yours
x = linspace(1000,12000,598);
P = normpdf(x,5800,1800);
figure, plot(x,P), title('PDF');
%%Generate the CDF
C = cumsum(P);
C = C/C(end);
figure, plot(x,C); title('CDF');
%%Sample linearly along the inverse-CDF to get a bunch of points
% that have your same distribution
BigNumber = 100000;
p = interp1(C,x,linspace(C(1),C(end),BigNumber));
figure, hist(p,100); % Confirm p indeed has your distribution
figure ,h = boxplot(p);
delete(findobj(h,'tag','Outliers')) % Hide the outliers
  4 Comments
Tom Lane
Tom Lane on 23 Apr 2013
It looks like your distribution is not symmetric. The normal distribution is symmetric, so it would not resemble the histogram in that respect.
Janett Göhring
Janett Göhring on 23 Apr 2013
Hi Tom,
the curve was calculated via a Gaussian fit and is symmetric. The x-axis though is based on data, which was fitted with nlinfit and looks like a logarithmic decay. So, after correction the x-axis is not linear anymore. That's why it is so important to plot the pixelweigth against the pdf, otherwise the distribution is not symmetric anymore.
modelFun = @(p,x) p(1)*exp(p(2)*x);
In between, I calculate start parameters for the fit, which is not important for the example.
Next, I fit the pixel position and the Molecular weight of the DNA standard.
p = nlinfit(positionOfStandard, MWOfStandard, modelFun, paramEstsLin(:,1));
The pixelrange is just the y-length of my image in pixel. Here 1:598
pixelweigth = p(1)*exp(p(2)*pixelrange);
After lots of corrections of the original data I fit a Gauss fit through it and calculate the curve, mean and sigma.
cf3 = fit(pixelweigth',data','gauss1');
pdfxcor = cf3(pixelweigth)
After that I need a representation of the normal distributed data along this specialized x-axis (pixelweigth). But not as a curve ... I was asked to display it as a boxplot. And since it is a normal distribution, I thought it must be possible. But Matlab doesn't give an option in "boxplot" to specify a different axis.
thanks for the help! much appreciated :)

Sign in to comment.

More Answers (1)

Tom Lane
Tom Lane on 22 Apr 2013
The boxplot shows the median, lower quartile, and upper quartile. You may be able to calculate these for your pdf. For example, if you have the pdf as a numeric vector, you might compute cumsum on the vector, then divide by the last value to impose the correct probability normalization, then interpolate.
The boxplot also shows a notion of the range of the data, and sometimes outliers. These are harder to extend to a pdf. You could decide that you want to compute the 1% and 99% points as in the previous paragraph, and use those to represent the end points of the range. You could decide not to show outliers.
Plotting these as lines or points will be relatively simple. It would be more of a challenge to plot them in exactly the way that the boxplot function does.
  1 Comment
Janett Göhring
Janett Göhring on 23 Apr 2013
Hello Tom, thanks for your answer! Can you explain how to interpolate in this case?
For my problem I created two solutions, but I don't like both.
a) I gauss fit my original data to create the pdf, mean and sigma. Then, I sample with randn (1Mio) & the mean and sigma as parameters. This creates a normal distribution based on my fit which can be plotted via boxplot. Since I already fit my original data with a gaussfit, I am not very interested in the outliers. I just was asked to represent the normal distribution as boxplot for easier comparison of mean and range of data. So, I would feel much better when I wouldn't have to sample a new distribution and of course it takes ages to calculate.
b) I calculate mean and the quartiles of the pdf and extract the respective position from the pixelweigth. Then I draw a barplot(colored for the upper quartile and white for the lower quartile) with error bars. I couldn't make this work, since the pdf is only normally distributed when it is plotted against the pixelweigth.
Bit stuck there ^^ Thanks for your help! Jette

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!