Calculation of probability using Beta ,lognormal and weibull distribution
6 views (last 30 days)
Show older comments
Hello,
I have computed three different properties of sample objects and have stored the discrete data values in three vectors.I have fit weibull, lognormal and beta distribution on these three vectors. Now how can i find the individual probability of each of the data values using their respective fitted distributions? I want to multiply these probabilities for each element to find the joint probability.
Thanks
0 Comments
Accepted Answer
Jeff Miller
on 5 Apr 2018
"if I want to obtain the likelihood of getting a particular combination of values which represent these 3 properties of another cancerous tumor(not from the dataset),i use my fitted distributions for this purpose and i multiply the computed likelihoods to obtain the joint likelihood."
Several comments on this.
First, multiplication is only appropriate here if the 3 properties are independent. If they are at all correlated with one another, the joint likelihood for any given combination of three is not simply the product of the individual ("marginal") likelihoods. You can check this by looking at the pairwise correlations among the three properties across the 200 tumors.
Second, it sounds like you really do want probabilities rather than likelihoods. For that purpose, you would be best to consider each property as falling within a certain numerical range or "bin". For a first pass, I suggest you classify each tumor as falling above vs below the median with respect to each of the three properties. This will give you 8 categories of tumors: 2x2x2, and you can count how many (out of 200) you have in each category. With a sample of only 200, I doubt that you have enough data to get usable estimates of the probabilities in more than 8 bins, but you might try 3x3x3 if you are a daredevil.
Third, it still doesn't sound like you quite have appreciated the difference between likelihoods and probabilities. To maybe make that distinction more meaningful to you, I suggest you do the following: Rescore all of your properties into a different unit of measurement--say, for example, that you divide each property value by 10. I hope you agree that logically this changes nothing. Likewise, if you look at the probabilities in the 2x2x2 bins, you will see that the rescoring has also changed nothing. But if you refit your distributions to the rescored property values, you will see that all of the PDF values (i.e., likelihoods) have increased by a factor of 10. The reason is that the PDF values (likelihoods) must integrate to 1 across the whole range of X, so the PDF values--unlike the corresponding probabilities--depend on the units of measurement for X.
0 Comments
More Answers (1)
John D'Errico
on 31 Mar 2018
Edited: John D'Errico
on 31 Mar 2018
I think you are making what is a fairly common mistake about continuous probability distributions.
Suppose you have a 6 sided die, with the numbers 1-6 on the faces. Assuming a fair die, each number will come up equally often. So it you see the number 2, you know the probability of that event was 1/6. This works because the distribution is a discrete one.
But a continuous distribution does not work that way. Consider a normal distribution, for example.
randn
ans =
0.69551
The probability that 0.69551 would result from a normal distribution is zero. You might say, but we just got that number! How can it have probability zero? But a continuous distribution has probability zero for ANY single event. We can talk about the probability that we will see a value in the interval [0.6,0.7]. That is given by the integral of the PDF over that interval, or we can use the CDF.
normcdf(.7) - normcdf(0.6)
ans =
0.032289
So the probability that we would have seen an event that lies in the interval [.6,.7] is 0.032289... But the probability of the exact event that we saw is zero.
You might think you can use the pdf.
normpdf(0.69551)
ans =
0.31323
I'm sorry. That is NOT a probability. Even though it comes from the probability density function, normpdf does not compute a probability.
Perhaps you understand all of this. Your question suggests that you do not. I would suggest you should probably read this:
https://en.wikipedia.org/wiki/Probability_density_function
Usually when people want to do as you are doing, you will want to use the CDF for the respective distribution, in some way. It might be for an MLE computation, whatever.
So IF you do understand the difference, and how to compute probability over some interval from a cdf, then you can use tools from the stats toolbox, thus normcdf, betacdf, and wblcdf. (I never get the last name right. I always want to type weibcdf.) If you do not have the stats toolbox, then there are still ways to compute the CDF for these distributions, via an appropriate transformation from one of several special functions.
6 Comments
Jeff Miller
on 3 Apr 2018
I think that in order to get any correct advice, you will need to give us an answer John's questions: "exactly why are you trying to compute the probability of any given sample? What are you trying to do?"
See Also
Categories
AI and Statistics
Statistics and Machine Learning Toolbox
Probability Distributions and Hypothesis Tests
Continuous Distributions
Beta Distribution
Find more on Beta Distribution in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!