How do I understand the output of seqpdist when the Jukes Cantor distance is not defined?
1 view (last 30 days)
Show older comments
Using the bioinformatics toolbox and executing the following commands:
Seq1='AAAAAA'
Seq2='GGGGGG'
SS={Seq1,Seq2}
seqpdist(SS,'Alphabet','NT')
then you will find that the sequences are at a distance 27.032740041837865 apart.
This Jukes-Cantor distance should not be defined in this case, since the sequences differ in more than 75% of the sites. Does anyone know
1) why this output occurs? or what heuristic is used?
2) whether seqpdist always returns finite real numbers for pairwise distances?
2 Comments
Paola Favaretto
on 6 May 2015
Hi Elizabeth,
If the fractional dissimilarity of two sequences is greater than 3/4, a straightforward Jukes-Cantor formula includes a negative logarithm, which for this application can be considered undefined. In the Bioinformatics toolbox, the formula used is -3/4 * log(max(eps,1-4*f/3)) where f is the fractional dissimilarity (i.e. the fraction of different observations). This overcomes the limitation of a negative log and returns a number that you can interpret as "large" distance.
Indeed, under the Jukes-Cantor assumptions, if you consider two completely unrelated sequences, you would expect the sequences to have a dissimilarity fraction equal to 3/4, because by chance 1/4 of the sites would agree if all is chosen at random with uniform distribution of the bases. Thus, any two related sequences that differ for more than 3/4 of their sites will have a distance comparable to that of two unrelated sequences.
There are several other methods implemented in the function seqpdist, some of these methods are more sophisticated and overcome the limitations of Jukes-Cantor model. I would suggest you to try them and see if those models makes more sense for your data.
Hope this helps. -Paola
Answers (0)
See Also
Categories
Find more on Biological and Health Sciences in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!